We consider the ultimate limits of program-specific garbage collector (GC) performance for real programs. We first characterize the GC schedule optimization problem. Based on this characterization, we develop a linear-time dynamic programming solution that, given a program run and heap size, computes an optimal schedule of collections for a non-generational collector. Using an analysis of a heap object graph of the program, we compute a property of heap objects that we call their pre-birth time. This information enables us to extend the non-generational GC schedule problem to the generational GC case in a way that also admits a dynamic programming solution with cost cubic in the length of the trace (number of objects allocated). This improves our previously reported approximately optimal result. Our experimental results on traces from Java programs of the DaCapo benchmark suite show that there is considerable promise to reduce garbage collection costs for some programs by developing program-specific collection policies. For a given space budget, optimal schedules often obtain modest but useful time savings, and for a given time budget, optimal schedules can obtain considerable space savings.
We present a new machine-learning algorithm with disjunctive model for data-driven program analysis. One major challenge in static program analysis is a substantial amount of manual effort required for tuning the analysis performance. Recently, data-driven program analysis has emerged to address this challenge by automatically adjusting the analysis based on data through a learning algorithm. Although this new approach has proven promising for various program analysis tasks, its effectiveness has been limited due to simple-minded learning models and algorithms that are unable to capture sophisticated, in particular disjunctive, program properties. To overcome this shortcoming, this article presents a new disjunctive model for data-driven program analysis as well as a learning algorithm to find the model parameters. Our model uses boolean formulas over atomic features and therefore is able to express nonlinear combinations of program properties. Key technical challenge is efficiently determine a set of good boolean formulas as brute-force search would simply be impractical. We present a stepwise and greedy algorithm that efficiently learns boolean formulas. We show the effectiveness and generality of our algorithm with two static analyzers: context-sensitive points-to analysis for Java and flow-sensitive interval analysis for C. Experimental results show that our automated technique significantly improves the performance of the state-of-the-art techniques including ones hand-crafted by human experts.
Consistent subtyping is employed in some gradual type systems to validate type conversions. The original definition by Siek and Taha serves as a guideline for designing gradual type systems with subtyping. Polymorphic types à la System F also induce a subtyping relation that relates polymorphic types to their instantiations. However Siek and Tahas definition is not adequate for polymorphic subtyping. The first goal of this paper is to propose a generalization of consistent subtyping that is adequate for polymorphic subtyping, and subsumes the original definition by Siek and Taha. The new definition of consistent subtyping provides novel insights with respect to previous polymorphic gradual type systems, which did not employ consistent subtyping. The second goal of this paper is to present a gradually typed calculus for implicit (higher-rank) polymorphism that uses our new notion of consistent subtyping. We develop both declarative and (bidirectional) algorithmic versions for the type system. The algorithmic version employs techniques developed by Dunfield and Krishnaswami to deal with instantiation. We prove that the new calculus satisfies all static aspects of the refined criteria for gradual typing. We also study an extension of the type system with static and gradual type parameters, in an attempt to support a variant of the dynamic criterion for gradual typing. Armed with a coherence conjecture for the extended calculus, we show that the dynamic gradual guarantee of our source language can be reduced to that of »B, which, at the time of writing, is still an open question.
Pointer analysis is at the heart of most interprocedural program analyses. However, scaling pointer analysis to large programs is extremely challenging. In this article, we study differential pointer analysis and present a new algorithm for computing the points-to information incrementally (i.e., upon code insertion, deletion and modification). Underpinned by new observations of differential pointer analysis, our algorithm significantly advances the state-of-the-art in that it avoids redundant computations and the expensive graph reachability analysis without losing the pointer analysis precision. Moreover, it is parallel within each iteration of the fixed- point computation. We have implemented our algorithm, DPA, for Java based on the WALA framework and evaluated its performance extensively on real-world large complex applications. Experimental results show that DPA achieves more than 200X speedups over existing incremental algorithms, two to five orders of magnitude faster than the whole program pointer analysis, and also improves the performance of an incremental data race detector by orders of magnitude. Our DPA implementation is open source and has been adopted by WALA.
Developing accurate and efficient program analyses for languages with higher-order functions is known to be difficult. Here we define a new higher-order program analysis, DDPA, which extends well-known demand-driven lookup techniques found in first-order CFL-reachability program analyses to higher-order programs. This task presents several unique challenges to obtain good accuracy, including the need for a new method for demand-driven lookup of non-local variable values. DDPA is flow- and context-sensitive and provably polynomial-time. To efficiently implement DDPA we develop a novel push-down automaton metaprogramming framework, the Push-Down Reachability automaton (PDR). The analysis is formalized and proved sound, and an implementation is described.
Minification is a widely accepted technique which aims at reducing the size of the code transmitted over the web. This paper concerns the problem of semantic-preserving minification of Cascading Style Sheets (CSS) ---the de facto language for styling web documents --- based on refactoring similar rules. The cascading nature of CSS makes the semantics of CSS files sensitive to the ordering of rules in the file. To automatically identify rule refactoring opportunities that best minimise file size, we reduce the refactoring problem to a problem concerning CSS-graphs, i.e., node-weighted bipartite graphs with a dependency ordering on the edges, where weights capture the number of characters. Constraint solving plays a key role in our approach. Transforming a CSS file to a CSS-graph problem requires us to extract the dependency ordering on the edges (an NP-hard problem), which requires us to solve the selector intersection problem. To this end, we provide the first full formalisation of CSS3 selectors (the most stable version of CSS) and reduce their selector intersection problem to satisfiability of quantifier-free integer linear arithmetic, for which highly-optimised SMT-solvers are available. To solve the above NP-hard graph optimisation problem, we show how Max-SAT solvers can be effectively employed. We have implemented our rule refactoring algorithm, and tested our tool against ~70 real-world examples (including top 20 most popular websites). Compared against six well-known minifiers (which implemented other optimisations), our tool produced larger savings. An even better minification rate was shown when our tool is used together with these minifiers.
We present a rich type system with subtyping for an extension of System F. Our type constructors include sum and product types, universal and existential quantifiers, inductive and coinductive types. The latter two may carry annotations allowing the encoding of size invariants that are used to ensure the termination of recursive programs. For example, the termination of quicksort can be derived by showing that partitioning a list does not increase its size. The system deals with complex programs involving mixed induction and coinduction, or even mixed polymorphism and (co-)induction (as for Scott-encoded data types). One of the key ideas is to completely separate the notion of size from recursion. We do not check the termination of programs directly, but rather show that their (circular) typing proofs are well-founded. We then obtain termination using a standard semantic proof of normalisation. To demonstrate the practicality of our system, we provide an implementation which accepts all the examples discussed in the paper.
We introduce a system of monadic affine sized types, which substantially generalise usual sized types, and allows this way to capture probabilistic higher-order programs which terminate almost surely. Going beyond plain, strong normalisation without losing soundness turns out to be a hard task, which cannot be accomplished without a richer, quantitative notion of types, but also without imposing some affinity constraints. The proposed type system is powerful enough to type classic examples of probabilistically terminating programs such as random walks. The way typable programs are proved to be almost surely terminating is based on reducibility, but requires a substantial adaptation of the technique.
We prove that the observational equivalence problem for a finitary fragment of ML is recursively equivalent to the reachability problem for extended branching vector addition systems with states (EBVASS). Our proof uses the fully abstract game semantics of the language. We introduce a new class of automata, VPCMA, as a representation of the game semantics. VPCMA are a version of class memory automata equipped with a visibly pushdown stack; they serve as a bridge enabling interreducibility of decision problems between the game semantics and EBVASS. The results of this paper complete our programme to give an automata classification of the ML types with respect to the observational equivalence problem for closed terms.