The Deterministic Parallel Java Language
Reference Manual
Version 1.0
University of Illinois at Urbana-Champaign
Revised June 2010 |
Contents
1 Introduction
This document is a reference guide to the Deterministic Parallel Java
programming language, Version 1.0 (DPJ v1.0). It explains in detail
the new language features that DPJ adds to Java, and how they work.
It is intended to be accessible to readers with no previous knowledge
of DPJ, though familiarity with Java is assumed.
Many cross-references appear throughout the document. To refer to
sections, we use the section symbol §: for example,
§ 5.1 refers to the section numbered
5.1.
Readers of this document may also wish to consult the following:
-
The Deterministic Parallel Java Tutorial provides a tutorial introduction to DPJ, and
explains how to write some common parallel patterns in DPJ. The DPJ
programmer should probably read that document first. It
cross-references this document, so you can look up particular
features that you want to learn more about.
- The Deterministic Parallel Java Installation Manual explains how to get started installing the
dpjc compiler, using the compiler to compile DPJ programs, and
running DPJ programs.
- The Deterministic Parallel Java Language Specification gives the specification of DPJ more precisely. The
reference manual you are reading is sufficiently detailed that most
if not all of the specification can be gleaned from it; but the
specification is both more precise and more concise than this
document. It will be most useful to someone who is already familiar
with and understands this manual.
The rest of the document is organized as follows:
-
§ 2, Classes and Interfaces, describes
DPJ's extensions to Java classes and interfaces. These include
features for declaring region names, assigning region names to class
fields, summarizing method effects, and writing classes and methods
with region variables.
- § 3, Region Path Lists, describes region
path lists, or RPLs. These are the general form of a region name in
DPJ. RPLs are hierarchically structured, and the structure allows
many fine distinctions between sets of regions to be expressed.
- § 4, Types, describes DPJ's extensions to
Java's class and array types. The main difference is that both
class and array types can have region arguments. There are also new
rules for type comparisons, casts, etc. to support the extensions.
- § 5, Effects, describes DPJ's effect
system, which is closely integrated with the regions in the type
system. Effects describe operations on the heap. The programmer
declares the effects of methods, and the compiler infers the rest of
the effects. The compiler checks to make sure that the effects of
parallel tasks are mutually noninterfering.
- § 6, Parallel Control Flow, describes
DPJ's constructs for creating parallel tasks.
- § 7, The DPJ Runtime, explains the classes in
the DPJ runtime that are useful for manipulating arrays.
- § 8, Exception Behavior, explains
what happens when an exception is thrown in DPJ.
2 Classes and Interfaces
DPJ classes and interfaces are identical to classes and interfaces in
Java, with the following additional features:
-
New members called class region name declarations
(§ 2.1) declare region names at class or
interface scope that can be associated with class fields
(§ 2.2).
- Methods have several new features to support regions and effects
(§ 2.3).
- Classes, interfaces, and methods support region
parameters, so that different objects of the same class or
interface, and different invocations of the same method, can use
different regions (§ 2.4).
Throughout this section, we use the term “class” with the
understanding that the concepts apply identically to interfaces,
unless otherwise noted.
2.1 Class Region Name Declarations
A class region name declaration may appear as a class or
interface member. A class region name declaration consists of the
keyword region followed by one or more identifiers separated by
commas. For example, the following declaration declares a region name
Data within the class Element:
class Element {
region Data;
...
}
Class region name declarations function like static class members in
Java (though the keyword static need not be used with a class
region declaration — if it is, it has no effect). In particular,
subject to visibilty restrictions, other packages and classes can
refer to the declared names by prepending the proper package and class
qualifiers. For example, outside of class Element, region
Data could be referred to as Element.Data.
As with Java fields and methods, the programmer may control the access
to the declared names with the qualifiers public, private,
or protected. For example, the following code declares two
regions that can be referred to from anywhere in the program:
class Node {
public region Left, Right;
...
}
Class region name declarations are available for use in RPLs
(§ 3) in the scope where they are visible. A class
region name standing alone is a particular case of an RPL (see
§ 3.1.2).
2.2 Field RPL Specifiers
Every class field in DPJ resides in a region, named with a
region path list (§ 3). This is a fundamental aspect of
DPJ versus plain Java; it allows the specification and checking of
effects (§ 5). There are two ways to specify the RPL
of a field: (1) with an explicit field RPL specifier; and (2)
by using the default field RPL.
Explicit field RPL specifiers: A field RPL specifier has the
following form:
- in rpl
rpl is an RPL, as defined in § 3. If the field
RPL specifier is present, then it must appear immediately after the
field name (and before the field initializer expression, if there is
one). For example:
class FieldRPLSpecifiers {
region X, Y; // X and Y name regions
int x in X; // field x is in region X
boolean y in Y = false; // field y is in region Y
}
Notice that because a class region name declaration functions like a
static class member (§ 2.1), it creates a
single region name per class, not per object. Therefore, a
field RPL specifier such as int x in X assigns the same
region FieldRPLSpecifiers.X to the field of every object
instance of class FieldRPLSpecifiers created at runtime.
§ 2.4 explains how to use region
parameters to assign different regions to the fields of different
objects of the same class.
Final fields: Effects on final fields are ignored
(§ 5.4), because their values never change
after initialization. Therefore, field RPL specifiers are not
meaningful for final fields. If a field is declared
final, then an RPL specifier may be given for the field; but if
so, it is ignored.
Default field RPL: If a field has no explicit RPL specifier,
then its RPL is Root. The name Root is always in scope
(§ 3.1.1). For example, in the following code,
field x is in region Root:
class DefaultRegionSpecifier {
// Equivalent to int x in Root = 5
int x = 5;
}
2.3 Methods
DPJ adds the following features to Java methods:
-
Every method (§ 2.3.1) and
constructor (§ 2.3.2)
summarizes its effects, either explicitly or through a default
effect summary.
- The programmer may declare and use local region names
inside a method body (§ 2.3.3),
to express effects that do not escape the method scope.
- Methods may be marked commutative to indicate that their
effects commute, even though they have conflicting reads and writes
(§ 2.3.4).
2.3.1 Effect Summaries (Non-Constructor Methods)
Every DPJ method must summarize its effects. The compiler uses the
summaries to check noninterference of effect
(§ 5.7). This section discusses effect
summaries for non-constructor methods.
§ 2.3.2 describes effects
summaries for constructors. There are two ways to summarize a
method's effects: (1) with an explicit method effect summary; or (2)
with the default method effect summary.
Explicit method effect summaries: An explicit method effect
summary appears immediately after the method's value parameters and
before the throws clause, if any. It has the form given in
§ 5.2. For example:
class Summaries {
region X, Y;
int x in X;
int y in Y;
// pureMethod has no effect on the heap
int pureMethod(int y) pure { return y+1; }
// throwsMethod reads X and throws an exception
void throwsMethod() reads X throws Exception {
if (x != 0) throw new Exception();
}
// readWriteMethod reads Y and writes X
void readWriteMethod() reads Y writes X {
x = y;
}
}
A method effect summary must represent all the effects of the method
body. More precisely, the actual method effects must be a
subeffect (§ 5.6) of the summarized
effects. The actual effects are computed as described in
§ 5.4, except that any local effect
(§ 5.3) can be omitted from the summary. The
representation can be conservative, i.e., it is permissible to include
effects in the summary that can never occur in executing the method.
But if any effect of any possible execution of the method is omitted
from the summary, it is a compile-time error.
For example:
class MoreSummaries {
region X;
int x in X;
// OK, summary is conservative
int readsMethod() writes X {
// Write effects cover reads
return x;
}
// Error! Read effect must be reported
int pureMethod() pure {
return x;
}
}
A method effect summary must also represent all the effects of any
method that overrides it. If it does not, then there is a
compile-time error. To see why, consider the following classes:
class SuperClass {
region X;
int x in X;
void method(int x) writes X {
this.x = x;
}
}
class SubClass extends SuperClass {
// Compile-time error: pure does not cover writes X
void method(int x) pure {
// Do nothing
}
}
The method method defined in SubClass overrides the method
method defined in SuperClass. There is an error in the
code, because the effect pure of the subclass method does not
cover the effect writes X of the superclass method. While this
may look innocuous (after all, the SubClass version of method
really has no effect!), suppose we allowed the code given above and
then wrote the following method:
void callMethod(SubClass subClass) pure {
subClass.method();
}
Based on the previous code, this looks fine: we are calling
subClass.method, which has effect pure. But does it?
Because of polymorphic dispatch, the runtime type of the object
bound to subClass could be SuperClass. And in that case,
the call to method would invoke the superclass version of
method, which has the effect writes X. The write to
region X would be “hidden” by the polymorphic dispatch. To
prevent this kind of hiding, DPJ requires that superclass effects
cover subclass effects. Notice that if we change the effect summary
in the SubClass version of method to writes X, as
DPJ requires, then this problem goes away.
Default effect summary: Any DPJ method may be written with no
explicit method effect summary. In particular, an ordinary Java
method is always a valid DPJ method.
If a method lacks an explicit effect summary, then the compiler
assigns it the default summary writes Root:*. This is the most
conservative possible effect summary; it says that the method may read
or write to any globally-visible memory location. The default effect
summary is always valid, because it covers all possible effects of any
method. For example, the following code is valid:
class DefaultEffectSummary {
// Equivalent to 'void method() writes Root:*'
// OK, because effects may be overreported
void defaultSummary() {
// No actual effect!
}
void parallel() {
// Reports interference, even though defaultSummary has no
// actual effect!
cobegin {
defaultSummary();
defaultSummary();
}
}
}
In general, the default effect summary is too coarse-grained for
methods that are called either directly or indirectly inside a
parallel task: as shown in the example, it causes DPJ to detect and
warn about interfering writes, even if the method's actual effects are
not interfering. However, for methods that are only ever used in
sequential parts of the program, the default effect summary saves
effort, because the effects of those methods are not important.
2.3.2 Effect Summaries (Constructors)
Constructors are special methods; therefore they must also summarize
their effects. All the rules given in
§ 2.3.1 apply, with one important
exception: the effect summary of a constructor does not have to
report any initialization effects on fields of the object being
constructed. For example, the following code is legal:
class ConstructorExample {
region X, Y;
int x in X, y in Y;
// Effect 'pure' is valid, because initialization effects
// on x and y don't have to be reported
ConstructorExample(int x, int y) pure {
this.x = x;
this.y = y;
}
}
This works because the DPJ type and effect system guarantees that if
one parallel task calls a constructor, then the fields of a
constructed object are never read by any other parallel task until the
first task is finished. In particular, that means that no other
parallel task can ever see object fields in an uninitialized state.
So there can be no interference due to initialization effects of
constructors.
However, if we wrote the same initializer as a non-constructor method,
then the rules in § 2.3.1 would apply,
and we would have to write the effects:
class NonConstructorExample {
region X, Y;
int x in X, y in Y;
// Effect 'pure' would cause a compile error here
static void initialize(int x, int y) writes X, Y {
this.x = x;
this.y = y;
}
}
2.3.3 Local Region Name Declarations
A region name declaration may appear as a statement in the body of a
method. This kind of declaration is called a local region name
declaration. Like a class region name declaration
(§ 2.1), a local region name declaration
consists of the keyword region followed by a comma-separated
list of identifiers. The declared names are available for use in RPLs
(§ 3) in the scope of the enclosing statement block. In
particular, a local region name standing alone is a valid RPL
(§ 3.1.4).
For example, the following code declares
region names A and B that are available for use as region
names in the scope of method localRegionNames:
void localRegionNames() {
region A, B;
...
}
Because of their limited scope, local names (and RPLs constructed from
them) cannot appear in field RPL specifiers
(§ 2.2). They can be used
only in arguments to class or method region parameters
(§ 2.4). Their purpose is to indicate that the
effects of operating on the class object, or invoking the method, are
local to the enclosing method, and need not to be reported in the
effect summary of the enclosing method or constructor
(§§ 2.3.1, 2.3.2).
This technique is called effect masking. For example:
class EffectMasking<region R> {
int x in R;
// method has no globally-visible effect
void method() pure {
// Declare region r local to method
region r;
// Use local region to create new EffectMasking object
// 'masking' cannot escape method
EffectMasking<r> masking = new EffectMasking<r>();
// Effect 'writes r' is masked from callees
masking.x = 0;
}
}
In this example the masked effect is somewhat useless. But there are
plenty of realistic cases where objects are created and assigned to
temporarily to support some computation, then thrown away because only
the final result is needed by the callee. By creating the temporary
objects with local regions, the callee can reduce its effect
signature, minimizing potential interference and making it more useful
inside parallel tasks. See The Deterministic Parallel Java Tutorial for more details.
Like Java local variables, a local region name declared in a statement
block is in scope only in that block; its scope ends when the block
ends. For example, the following code would cause a compile error,
because the name OutOfScopeRegion is not in scope where it is
used:
class ScopeExample<region R> {
void method() {
{
region OutOfScopeRegion;
// Scope ends here
}
// Error: OutOfScopeRegion is no longer in scope
ScopeExample<OutOfScopeRegion> x = null;
}
}
2.3.4 Commutative Methods
The keyword commutative may appear as a method qualifier, before
the return type and before the type and/or region parameters of the
method, if any:
commutative int m(...) { ... }
The commutative qualifier is a programmer-specified guarantee
that any invocation of the method commutes with itself. It is
typically used for commutative operations on concurrent data
structures such as counter updates, set inserts, histograms, and
reductions. These operations write to shared data in a way that
“looks interfering” to the DPJ effect system, but due to the
semantics of the data structure (which the effect system does not know
about), still preserves determinism. For example, here is a use of
the commutative qualifier to write a simple counter class:
class Counter<region R> {
private int count in R;
void clear() writes R { count = 0; }
commutative synchronized void increment() writes R {
++count;
}
int getCount() reads R { return count; }
}
When a method m is labeled commutative, the DPJ compiler
treats the effects of multiple invocations of m as noninterfering,
even if the read and write effects by themselves, without the
commutative, would be interfering. For example, in the case of
the counter class, concurrent invocations of increment have
interfering writes to region R. However, the commutative
annotation tells the DPJ compiler to “ignore” that interference.
So, for example, the following code compiles with no errors or
warnings:
Counter<Root> counter = new Counter<Root>();
foreach(int i in 0, 10)
counter.increment();
Note that the commutative qualifier only tells the DPJ
compiler that it is safe to ignore interference. It does not
introduce any special synchronization or other concurrency control.
In particular, it is the programmer's responsibility when using
commutative to ensure two things:
-
The method is properly synchronized so that concurrent accesses
to the method behave as though they have occurred in sequence (i.e.,
the accesses have serializable semantics). In the example
above, this property is enforced by making method increment
synchronized. Without the synchronized keyword,
concurrent invocations of increment would have a
read-modify-write race, and some of the updates could be lost.
- The method behaves so that either order of a pair of invocations
produces the same result. In the example above, this property holds
because incrementing a counter twice, in either order, has the same
result (i.e., the final counter value is 2 more than the starting
value).
Typically, in user code, the commutative annotation is used for
simple commutative read-modify-write operations like the one
illustrated above. More complicated operations, e.g. set or tree
inserts, where the commutativity property is more subtle to verify,
would typically be provided by library or framework code. Only very
skilled programmers should attempt to “roll their own”
implementation of such commutative operations, as the potential for
subtle bugs is high, and the DPJ effect system provides no assistance
in checking for such bugs.
2.4 Region Parameters
DPJ extends Java's generic type parameters by allowing region
parameters in class and method definitions. Class region
parameters become bound to actual regions when the class is
instantiated to a type (§ 4.1.1). Method
region parameters become bound to actual regions when the method is
invoked (§ 2.4.2).
2.4.1 Class Region Parameters
Declaring class region parameters: As in ordinary generic
Java, the class parameters are given as a comma-separated list of
identifiers enclosed in angle brackets (<>) immediately after
the class name. The new features are as follows:
-
Both type and region parameters may appear. If both appear, all
the region parameters must follow the type parameters.
- The first region parameter must be preceded by the
keyword region. Any of the other region parameters may
be preceded by the keyword region.
- Any of the type parameters may be preceded by the keyword
type.
The keyword type is provided for convenience in distinguishing
type from region parameters; but to preserve compatibility with
ordinary Java syntax, it is never required.
For example, the following are valid class declarations:
// Type parameter T
class A<T> {}
class B<type T> {}
// Region parameter R
class C<region R> {}
// Type parameter T, region parameter R
class D<T, region R> {}
class E<type T, region R> {}
// Type parameters T1, T2, region parameters R1, R2
class F<type T1, T2, region R1, R2> {}
Using class region parameters: A class region parameter is
available for use in RPLs (§ 3.3) within the class
definition. In particular, a class region parameter standing alone is
a valid RPL. For example, the following code declares a class
ParamInField with a region parameter R and uses R in
a field RPL specifier (§ 2.2):
class ParamInField<region R> {
public int x in R;
}
This code says that the region of field x of an object
instantiated from the ParamInField class is given by the RPL
provided as an argument to R in the type of the object. For
example, this code creates a fresh ParamInField object called
p, such that the field p.x is in region Root:
ParamInField<Root> p = new ParamInField<Root>();
As another example of using parameters, the following code declares a
class region parameter R and uses it to instantiate a type
(§ 4.1.1):
class ParamInType<region R> {
ParamInType<R> x in R;
}
And this code creates a fresh ParamInType object whose field
x has type ParamInType<Root>:
ParamInType<Root> p = new ParamInType<Root>();
This pattern is very useful for creating chains or graphs of objects,
all in the same region. For example, without the region parameter
R, there would be no way to get the object pointed to by x
in the same region as the parent object. The RPL specifier x in
R is insufficient, because that says only that the field x is
in R, not the object that it points to. To get both the field
and the object in the same region, you have to write
ParamInType<R> x in R
as shown. Moreover, DPJ provides the flexibility to assign different
regions to the field pointing to the object and the object itself, for
example:
ParamInType<r1> x in r2
This allows for finer-grain partitioning of memory than if the field
and the object always had the same region.
As in generic Java, a new declaration of a parameter named R
hides any parameter named R that is already in scope. For
example, the following two definitions of the class Hiding are
equivalent:
// First version
class Hiding<region R1> {
class Inner<region R2> {
Hiding<R2> x;
}
}
// Second version
class Hiding<region R> {
class Inner<region R> {
Hiding<R> x;
}
}
In both versions we are declaring a parameter for the outer class,
declaring a parameter for the inner class, and using the inner class
parameter to instantiate a type inside the inner class. In the first
version we have given distinct names to the two parameters; while in
the second version we have used the same name.
Also as with Java generic type parameters, a class region parameter
may not be used in a static context. This is because DPJ region
information (like Java generic information) is erased during
the compilation process. In particular, only one set of code is
generated for all types instantiating a given class. If region
parameters could be used in a static context, then different code
would have to be generated for each different type (as with C++
templates).
For example, the following code is invalid, because it attemps to use
region R in a static context:
class Outer<region R> {
public static class Inner {
// Not allowed!
public int x in R;
}
}
Fortunately, this limitation is easy to work around. The standard way
(again, as with Java generics) is to introduce a new parameter for the
inner class, and write Inner<R1> to place x in region
R1:
class Outer<region R1> {
public static class Inner<region R2> {
public int x in R2;
}
}
2.4.2 Method Region Parameters
Declaring method region parameters: As in ordinary generic
Java, the method region parameters are given as a comma-separated list
of identifiers enclosed in angle brackets (<>) immediately
before the method return type (for non-constructor methods) or class
name (for constructors). A method parameter declaration has the same
form as a class parameter declaration
(§ 2.4.1).
For example, the following code defines a method with type parameter
T and region parameter R:
interface MethodParams {
public <type T, region R> void method();
}
As for class parameters, the type keyword is always optional,
and the region keyword is optional except as to the first region
parameter.
Using method region parameters: A method region parameter is
available for use in RPLs (§ 3.3) with the method
definition. In particular, a method region parameter standing alone
is a valid RPL. For example, the following code declares a method
method with a region parameter R and uses R in the
formal (value) parameter and return type of the method:
interface MethodParamUses<region R> {
<region R1>void method(MethodParamUses<R1> arg);
}
As in generic Java, a new declaration of a parameter named R
hides any parameter named R that is already in scope. For
example, the following code is equivalent to the code above:
interface MethodParamUses<region R> {
<region R>void method(MethodParamUses<R> arg);
}
Invoking methods defined with region parameters: To invoke a
method defined with region parameters, the programmer may always use
explicit RPL arguments. Depending on the compiler's ability to
infer the arguments, in some cases the programmer may alternatively
use inferred RPL arguments. In either case, the arguments must
obey any disjointness constraints on the method's region parameters
(§ 2.4.3).
Explicit RPL arguments: The programmer can always supply the
RPL arguments to the method region parameters explicitly, using an
extension of the Java syntax for generic method arguments. For this
form, as for Java generic methods, there must be an explicit selector
and a dot preceding the method name; and the arguments must appear in
angle brackets after the dot, and before the method name. For
example:
- this.<args>meth-name()
Here args are type and/or RPL arguments (defined below), and
meth-name is the name of the method being invoked. Notice that
to invoke a method defined in the enclosing class with explicit RPL
arguments, this (or the class name, for a static method)
must be used as a selector.
The new DPJ features are as follows:
-
Both type and region parameters may appear. The number of type
and RPL arguments must exactly match the number of type and region
parameters (so all type arguments come first, because that is true
for the parameters).
- The first region parameter must be preceded by the
keyword region. Any of the other region parameters may
be preceded by the keyword region.
- Any of the type parameters may be preceded by the keyword
type.
The following code example illustrates the use of explicit RPL
arguments to a method invocation:
abstract class ExplicitArgs<region R> {
abstract <type T, region R>T invokedMethod(ExplicitArgs<R> param);
ExplicitArgs<Root> invokingMethod() {
ExplicitArgs<Root> arg = new ExplicitArgs<Root>();
return this.<ExplicitArgs<Root>,Root>invokedMethod(arg);
}
}
The abstract method invokedMethod is generic in type T and
region R. It takes an ExplicitArgs<R> object as a value
argument and returns type T. In invokingMethod, we create
a new ExplicitArgs<Root> and pass it to invokedMethod. We
pass ExplicitArgs<Root> as the type argument to T and
Root as the RPL argument to R, using explicit RPL
arguments. Notice that the type of arg matches the type of
param, after substituting Root for R in param.
If the types did not match, there would be a compile error. See
§ 4.3.3 for more information about how typing
works for method invocations in the presence of region parameters.
Note that the keyword region must be present to identify the
first RPL argument in the list, unlike the case of RPL arguments to
classes (§ 4.1.1), where the region
keyword is optional. The reason for this rule is Java's method
overloading: because multiple methods can be declared with the same
name but different parameters, information about which arguments are
types and which are regions is not always available from the method
name, as it is from the class name.
Inferred RPL arguments: As in generic Java, a method with
type and/or region parameters may be written without any explicit
generic arguments. In this case the compiler will attempt to infer
the type and/or region arguments from the types of the value arguments
supplied to the method. For example, the dpjc compiler accepts
the following code, because it is able to infer from the type
InferredArguments<Root> of arg that the argument to
R is Root:
abstract class InferredArguments<region R> {
abstract <region R>void callee(InferredArguments<R> param);
void caller() {
InferredArguments<Root> arg = new InferredArguments<Root>();
callee(arg);
}
}
This code is equivalent to the following:
abstract class InferredArguments<region R> {
abstract <region R>void callee(InferredArguments<R> param);
void caller() {
InferredArguments<Root> arg = new InferredArguments<Root>();
this.<Root>callee(arg);
}
}
While the compiler can infer region arguments to methods in many
common cases, in some cases it cannot, either because the inference
algorithm is insufficiently powerful, or because the information is
simply not available from the types of the value parameters. The
compiler uses Root:* as the argument to any region parameters in
the method that it cannot infer. For example, the compiler would
infer Root:* as the argument to R in the following code:
abstract class InferredArguments<region R> {
abstract <region R>void callee() writes R;
void caller() {
// There are no value arguments, so there is no way to infer
// the argument to R! This is equivalent to this.<Root:*>callee();
callee();
}
}
2.4.3 Disjointness Constraints
It is sometimes necessary to require that two or more region
parameters (or a region parameter and some other region) be disjoint.
For example, the following code can have a data race unless regions
R1 and R2 are bound to disjoint regions:
class Unsafe<region R> {
int x in R;
<region R1, R2>void method(Unsafe<R1> o1, Unsafe<R2> o2) {
cobegin {
++o1.x; // writes R1
++o2.x; // writes R2
}
}
}
To support this reasoning, DPJ provides optional disjointness
constraints for region parameters. If used, the constraints must
appear after the region parameters, and be separated from them by a
vertical bar:
- <region R1, R2, ... | constraints >
constraints is a comma-separated list of constraints, where a
constraint has the form rpl1 # rpl2, and
rpl1 and rpl2 are valid RPLs (§ 3). The
constraint states that rpl1 and rpl2 are disjoint
regions (§ 3.6.4). For class region
parameters, the requirement is enforced when the class is instantiated
to a type by substituting RPLs for parameters
(§ 4.1.1). For method region parameters, the
requirement is enforced when RPL arguments are provided to an
invocation of the method (§ 2.4.2).
For example, the code above could be rewritten as follows to make it
safe:
class Safe<region R> {
int x in R;
<region R1, R2 | R1 # R2> void(Safe<R1> o1, Safe<R2> o2) {
cobegin {
++o1.x; // writes R1
++o2.x; // writes R2
}
}
}
There is no interference between the statements of the cobegin,
because R1 and R2 are guaranteed to be disjoint regions.
Therefore, the effects of ++o1.x and ++o2.x are to
disjoint memory locations.
We can also use constraints to make parameters disjoint from region
names, not just other parameters. For example, we can write the
following:
class Safe2<region R> {
region r;
int x in R;
<region R | R # r>void(Safe2<R> o1, Safe2<r> o2) {
cobegin {
++o1.x; // writes R
++o2.x; // writes r
}
}
}
This technique is useful when methods need to operate disjointly in
parallel on global data and data passed in as an an argument. Of
course, the region of the global data could also be passed in as an
RPL argument, but using the global region directly in the constraint
saves writing parameters and arguments.
It is an error to write a constraint that cannot be satisfied, because
the RPLs given cannot be disjoint. For instance, this class
declaration generates a compile error:
class BadConstraint<region R | R # R> {}
It is also an error to provide RPL arguments to a class or method that
do not satisfy the constraints. For example:
abstract class BadArgs<region R1, R2 | R1 # R2> {
// Error: Instantiating BadArgs with R1=Root, R2=Root
BadArgs<Root, Root> x;
abstract <region R3, R4 | R3 # R4>void callee();
// Error: Invoking callee with R3=Root, R4=Root
void caller() {
this.<Root,Root>callee();
}
}
BadArgs<Root, Root> is an invalid type, because the RPLs
supplied as arguments to R1 and R2 are not disjoint, as
required by the definition of class BadArgs. Similarly,
this.<Root,Root>callee() is an invalid invocation of
callee, because the RPLs supplied as arguments to R3 and
R4 are not disjoint, as required by the definition of method
callee.
3 Region Path Lists (RPLs)
This section describes region path lists, or RPLs. RPLs are
the fundamental tool for describing sets of memory locations in DPJ.
RPLs can be fully specified or partially specified. A
fully-specified RPL names a region, which represents a set of
memory locations on the heap. The correspondence between memory
locations and regions is given by the way that the programmer has
assigned RPLs to class fields (§ 2.2)
and bound RPLs to region parameters in class types
(§ 4.1.1), array types
(§ 4.2.1), and method invocations
(§ 2.4.2). A partially-specified RPL names
a set of regions and is useful for expressing hierarchical
effects (e.g., “all regions under this one”).
RPLs are naturally nested. In fact the motivation for name “region
path list” is that an RPL is a sequence of names that describes a
path from the root in a tree of regions. Together with partially
specified RPLs, this nesting structure provides a great deal of power
in comparing sets of memory locations to determine whether they are
overlapping or disjoint. These properties are the key to specifying
and checking effects. In particular, checking subeffects
(e.g., that a method's effect summary covers its effects,
§ 2.3.1) requires reasoning about
inclusion of RPLs, while checking noninterfering effects (e.g.,
that the branches of a cobegin statement do not interfere,
§ 6.1) requires reasoning about disjointness
of RPLs.
The rest of this section proceeds as follows. § 3.1
describes the simplest form of RPL, which is just a name, such as a
region name declared at class scope
(§ 2.1). § 3.2
describes how to construct RPLs using sequences of names, which
put a nesting structure on RPLs. § 3.3 describes
the use of region parameters (§ 2.4) with RPLs.
These parameters allow RPLs to vary with the class type or method
invocation. § 3.4 explains how to write
partially specified RPLs, which describe sets of fully
specified RPLs. § 3.6 explains the rules for
comparing RPLs for properties such as inclusion and disjointness.
Finally, a word about the uses of RPLs. This section is about the
definition of RPLs, and the relations among them (inclusion,
disjointness, etc.). The various ways in which RPLs can be
used (i.e., how they actually appear in a DPJ program) are
covered in other sections. To provide context for this section, we
list all the possible uses here, together with the sections discussing
those uses:
-
RPLs can appear in field RPL specifiers, to assign class fields
to regions (§ 2.2).
- RPLs can appear in effect summaries
(§ 5.2). These RPLs describe the effects
of a code statement.
- RPLs can appear as arguments to parametric class types
(§ 4.1.1) and to array types
(§ 4.2.1). These RPLs provide actual
regions for the parameters in the instantiation of a class or array.
- RPLs can appear in method invocations, as arguments to method
region parameters (§ 2.4.2). These RPLs
provide actual regions for the parameters in an invocation of a
method.
Throughout this section, we provide example uses to illustrate the
definitions in context.
3.1 Basic Region Names
This section describes the simplest form of RPL, which is a single
name. There are five kinds of region names:
-
Root, a special name that is always in scope.
- A region name declared at class scope
(§ 2.1).
- An array index region [exp], where exp
is an integer expression.
- A region name declared in a method-local scope
(§ 2.3.3).
- A final local variable or this.
The first three kinds of names are available everywhere (except that
class region names may have access qualifiers, see
§ 2.1). The last two kinds of names are
available only in the scope where the region names or variables
are active.
3.1.1 The Name Root
The region name Root is always in scope, and can be used
anywhere as an RPL. For example, the following code uses Root
in a field region specifier (§ 2.2)
and to instantiate a class type (§ 4.1.1):
class RootExample<region R> {
RootExample<Root> x in Root;
}
The name Root has a special meaning: it is the root of the
region tree. Every RPL is nested under Root. For more
information about the nesting relation on RPLs, see
§ 3.6.2.
3.1.2 Class Region Names
A region name declared as a class member
(§ 2.1) is available for use as an RPL in
any scope where it is visible. For example, the following code
declares a region name ClassRegion at class scope, then uses it
in a field region specifier (§ 2.2)
and to instantiate a class type (§ 4.1.1):
class ClassRegionExample<region R> {
region ClassRegion;
int ClassRegionExample<ClassRegion> x in ClassRegion;
}
Because class region names can be package- and class-qualified, like
Java static fields, the following code is also legal:
class Class1 {
region ClassRegion;
}
class Class2 {
int x in Class1.ClassRegion;
}
When a class region name R is used as an RPL, it is shorthand for
the name sequence Root:R, as described in
§ 3.2. For example, the following definition of
Class2 is equivalent to the one given above:
class Class2 {
int x in Root:Class1.ClassRegion;
}
3.1.3 Array Index RPLs
The name [e], where e is an integer expression,
functions as an RPL, called an array index RPL. The array
index RPL represents a region indexed by the number that e
evaluates to at runtime (so there is a different region for each
natural number). It is useful in conjunction with
index-parameterized arrays (§ 4.2.2) and
with DPJ's built-in features for array partitioning
(§ 7.2).
As with class region names used as RPLs
(§ 3.1.2), an array index RPL [e]
is short for Root:[e].
3.1.4 Local Region Names
A region name declared inside a method body
(§ 2.3.3) is available for use as
an RPL in the scope where it is visible. For example, the following
code declares a region name LocalRegion in a local scope, and
uses it to instantiate a type:
class LocalRegionExample<region R> {
void method() {
region LocalRegion;
LocalRegionExample<LocalRegion> lre =
new LocalRegionExample<LocalRegion>();
}
}
As with class region names and array index regions, an RPL consisting
of a single local region name implicitly starts with Root. For
example, the class definition above is equivalent to the following:
class LocalRegionExample<region R> {
void method() {
region LocalRegion;
LocalRegionExample<Root:LocalRegion> lre =
new LocalRegionExample<Root:LocalRegion>();
}
}
3.1.5 final Local Variables and this as Region Names
The following variables may be used as an RPL:
-
A final local variable or method parameter with a class
type.
- The variable this, in a non-static context.
This usage is called a variable RPL. For example, the
following code uses this as an RPL in a field region specifier
(§ 2.2), as an argument to a class
type (§ 4.1.1), and in an effect summary
(§ 5.2):
class VarRegionExample<region R> {
VarRegionExample<this> x in this;
void setX(VarRegionExample<this> x) writes this {
this.x = x;
}
}
Variable RPLs allow runtime objects to function as regions, as
in type systems based on ownership. The variable stands in for
the object stored to the variable at runtime. Because the variable
must be final or this (which cannot be assigned to), the
variable always refers to the same object. This feature is
particularly useful in conjunction with the DPJParititon class
(§ 7.2), because it provides a way to
instantiate a partition with the region corresponding to the array
being partitioned.
As in ownership systems, a variable RPL is nested under the first RPL
argument of its type, which is called the owner RPL of the type
(§ 4.1.4). For example, in the code above,
the variable this has type VarRegionExample<R>, so it is
nested under the RPL R. For more information about the nesting
relationship on RPLs, see § 3.6.2.
3.2 Sequences of Basic Names
RPLs are built up from colon-separated sequences of names; this gives
them a natural nesting structure. For example, if A and B
are names, then A, B, A:B, B:A, A:A:B,
and so forth are all RPLs. The nesting structure is given by the
syntax: for example, A:B and A:A:B are both nested under
A. Note that nesting does not imply inclusion; A and
A:B are distinct regions. In particular the effect writes
A does not cover (or imply) the effect writes A:B.
However, the nesting structure creates a hierarchy of regions that is
useful in conjunction with partially specified RPLs
(§ 3.4), because then we can use it to say things
like “all RPLs under this one” or “all RPLs under this one that end
in B.” Separating nesting from inclusion like this makes RPLs
more complicated, but it also allows greater precision.
An RPL is valid if it is conforms to the following rules:
-
It is composed of basic names (§ 3.1),
separated by colons.
- If Root (§ 3.1.1) or a variable name
(§ 3.1.5) appears in the sequence, it must
appear first (so, in particular, both cannot appear).
The names in the sequence are called the elements of the RPL.
If an RPL does not start with Root or a variable element, then
it is treated as implicitly starting with Root. For example,
A is the same as Root:A, and [0] is the same as
Root:[0].
The following are examples of valid RPLs, assuming that A and
B are declared region names in scope:
A // Declared name
Root:A // Same as A
A:B // Sequence of names
Root:A:B // Same as A:B
[0] // Array index region
Root:[0] // Same as [0]
this // Variable region
this:A // Variable region with A appended
The following are examples of RPLs that are not valid, because
they do not conform to the rules above:
A:Root // Root must appear first
this:Root // Root must appear first
A:this // this must appear first
Root:this // this must appear first
3.3 Parameterized RPLs
Class and method region parameters (§ 2.4) work
with RPLs in a natural way. First, a region parameter standing alone
is always a valid RPL, as shown in the examples in
§§ 2.4.1
and 2.4.2. Second, an RPL may be
constructed from a region parameter followed by a colon-separated
sequence of elements, similar to a sequence of basic names
(§ 3.2). If a parameter appears in an RPL, then
it must appear first, and no Root or final local variable
may appear in the RPL (because those, when they appear, must also be
first).
The following are examples of valid parameterized RPLs, assuming that
R is a region parameter in scope and A and B are
declared region names in scope:
R // Region parameter alone
R:A // Region parameter with name appended
R:A:B // Region parameter with names appended
R:[0] // Region parameter with index appended
The following are examples of RPLs that are not valid, because
they do not conform to the rules above:
A:R // R must appear first
Root:R // R must appear first
R:Root // Root must appear first
R:this // this must appear first
The meaning of a parameterized RPL is that at runtime, the parameter
will be substituted away, generating a region that has Root or
an object as its first element. For example, consider the following
class definition:
class ParameterExample<region R> {
region r;
int x in R:r
}
The class ParameterExample has one region parameter R, and
one field x placed in RPL R using a field RPL specifier
(§ 2.1). Suppose we now instantiate that
class to a type (§ 4.1.1), by binding
Root to R:
ParameterExample<Root> pa = new ParameterExample<Root>();
This type says that the object bound to pa at runtime has its
field x in RPL Root:r (substituting Root for R
in R:r). For more information about how this subsitution works,
see § 4.3.
3.4 Partially Specified RPLs
In order for the nested structure of RPLs to be useful, the programmer
must be able to name a set of RPLs such as “all RPLs nested
under this one.” To support this kind of naming, DPJ has a special
element * that stands in for any sequence of RPL elements.
There is also a wildcard array index element [?], where the
? stands in for any natural number. This is useful for naming
sets of array regions.
3.4.1 The * RPL Element
An RPL may be written with * as one if its elements. The RPL
must otherwise follow the rules in §§ 3.2
and 3.3: for example, if a parameter appears in the
RPL, it must be first. It is permissible for an RPL to start with
*; in this case, there is an implicit Root before the
*. In particular, the RPL * is equivalent to Root:*
and refers to all regions.
The meaning of an RPL containing * is that it stands in for all
legal RPLs that could be constructed by substituting sequences of zero
or more elements for the *. For example, A:*:B stands in
for A:B, A:A:B, A:B:B, A:A:A:B, etc. However,
A:*:B does not stand in for A:Root:B, because that
is not a valid RPL (Root must appear first). See
§§ 3.6.2
and 3.6.3 for more information about the
nesting and inclusion rules for RPLs using *.
Here are some examples of valid RPLs constructed with the *
element, assuming that A and B are declared region names
in scope:
* // All RPLs
Root:* // Same as *
*:A // All RPLs under Root ending in A
A:* // All RPLs under Root:A
Root:A:* // Same as A:*
this:* // All RPLs under this
this:*:A // All RPLs under this ending in A
Although more complex RPLs are possible, typically in user code at
most one * appears in an RPL, as shown in the examples above.
Here are some examples of RPLs that are not valid, because they
do not conform to the rules stated in §§ 3.2
and 3.3 (R is a region parameter):
*:R // R must appear first
*:this // this must appear first
A:*:Root // Root must appear first
3.4.2 The [?] RPL Element
The wildcard array index element [?] may be used in an
RPL anywhere that an array index element [e] may appear
(§ 3.1.3). It stands in for any array index
element. For example:
[?] // Matches [e] for any [e]
Root:[?] // Same as [?]
[?]:A // Matches [e]:A for any e
this:[?] // Matches this:[e] for any e
3.5 Local RPLs
A local RPL is an RPL that contains a local region name
(§ 3.1.4) as an element. It may be
parameterized and/or partially specified. For example:
class LocalRPLs<region R> {
region ClassRegion;
void method() {
region LocalRegion;
// LocalRegion is a local RPL; it is equivalent
// to Root:LocalRegion
LocalRPLs<LocalRegion> x = null;
// R:LocalRegion:* is a local RPL
LocalRpls<ClassRegion:LocalRegion:*> y = null;
// ClassRegion is not a local RPL
LocalRpls<ClassRegion> z = null;
}
}
Local RPLs are important because they define local effects
(§ 5.3). A local effect is visible only in a
method and its callees, so doesn't have to be reported to the caller.
For example, in the code above, any effects on LocalRegion or
R:LocalRegion could be omitted from the effect summary of
method.
3.6 Comparing RPLs
To reason about effects, both the programmer and the compiler have to
make judgments like “this RPL is included in that one” or “this RPL
is disjoint from that one” (where “disjoint” means that the
corresponding sets of memory locations are nonintersecting). The
programmer has to do this in order to reason about which code sections
may be run in parallel safely, i.e., without interference. And the
compiler has to do it to check that the programmer got it right.
To support this kind of reasoning, there are four relations on RPLs:
-
Equivalence, meaning that two RPLs refer to the same set
of regions.
- Nesting, meaning that one RPL is a descendant of another
in the region tree. For example, A:B is nested under A.
- Inclusion, meaning that the set of memory locations
represented by one RPL includes the set of locations represented by
another. For example, A:B is included in A:*. Note
that A:B is not included in A. The * is
necessary to get the inclusion. This is to increase the precision
of effect specifications with RPLs.
- Disjointness, meaning that the set of memory locations
represented by one RPL has no common elements with the set of memory
locations represented by another RPL. For example, if A and
B are distinct names, then they represent disjoint regions.
Or R1 and R2 are disjoint if they are parameters
declared to be disjoint (§ 2.4.3).
3.6.1 Equivalence
Equivalent RPLs: The meaning of the equivalence relation on
RPLs is that the sets of regions represented by the RPLs are the same.
Two RPLs are equivalent if they have the same number of elements, and
the corresponding pairs of elements are equivalent, as defined below.
This test is done after adding the implicit Root to the front of
RPLs that begin with a declared name element, array index element,
*, or [?].
For example, the following pairs of RPLs are equivalent:
-
Root:A and A
- this:[i] and this:[i], if i is a final
local variable or method parameter
- R:[i+1] and R:[i+1], if i is a
final local variable or method parameter
The following pairs of RPLs are not equivalent:
-
Root:A and Root:A:B
- this:[i] and this:[i], if i is not a
final local variable or method parameter
- this:[i] and this:[j]
- R:[i+1] and R:[i+1], if i is not a final
local variable or method parameter
- R:[i+1] and R:[i+2]
Equivalent RPL elements: The following rules govern
equivalence of RPL elements.
Basic names other than array index elements: Two RPL elements
are equivalent if they refer to the same basic name
(§ 3.1), except for array index elements.
The elements * and [?]: Two RPL elements are
equivalent if they are both * or [?].
Array index elements: Two array index elements [e1]
and [e2] equivalent if e1 and e2 always evaluate to the
same value (i.e., are “always equal”).
Since the compiler can't always know when two expressions evaluate to
the same value, it applies the following conservative rules to
identify pairs of always-equal expressions:
-
e1 and e2 are always-equal if they refer to the same
compile-time integer constant.
- e1 and e2 are always-equal if they are the same local
variable or method parameter, and the variable or parameter is
declared final.
- e1 and e2 are always-equal if they represent the same
binary operation, and the corresponding subexpressions are
always-equal. For example, i+1 and i+1 are
always-equal, if i is a final local variable.
Otherwise, the compiler conservatively assumes that the expressions
are not always-equal.
The meaning of the nesting relation on RPLs is that one RPL is a
descendant of another in the region tree. Nesting is important
because, in conjunction with partially specified RPLs, it defines
inclusion of RPLs (§ 3.6.3). Four rules
govern nesting of RPLs.
Nesting under Root: Any RPL is nested under
Root.
Nesting by syntax: One RPL is nested under another if the
second is a prefix of the first. For example, A:B is nested
under A, R:A is nested under R, and this:[0]
is nested under this.
Nesting by inclusion: If one RPL is included in another, then
the first RPL is nested under the second. For example, A:B is
included in A:* (see § 3.6.3), so
A:B is nested under A:*.
Nesting by ownership: An RPL that starts with a variable
element (§ 3.1.5) is nested under the owner RPL of
the variable (§ 4.1.4). For example, in
the following code, the method parameter param has type
NestByOwnership<R>, so it is nested under R when used as
an RPL:
class NestByOwnership<region R> {
<region R>method(final NestByOwnership<R> param) {
// param is used as an RPL here; it is nested under R
NestByOwnership<param> localVar = new NestByOwnership<param>();
}
}
The meaning of the inclusion relation on RPLs is that the set of
regions represented by one RPL is included (in the sense of set
inclusion) in the set of regions represented by another RPL.
Inclusion of RPLs is important in checking type comparisons
(§§ 4.1.2, 4.2.4)
and subeffects (§ 5.6). The following rules
govern inclusion of RPLs.
Inclusion by equivalence: One RPL is included in another if
the RPLs are equivalent (§ 3.6.1).
Inclusion by trailing *: One RPL is included in another
if the second ends with *, and the first is nested under the
second after the trailing * is removed. For example,
A:B:C is included in A:*, because A:B:C is nested
under A (i.e., A:* with the * removed).
Inclusion by trailing elements other than *: One RPL is
included in another if the last element of the first is included in
the last element of the second, and inclusion holds for the RPLs after
stripping the last elements of both. One RPL element is included in
another if the elements are equivalent
(§ 3.6.1) or the first element is any array
index element or [?], and the second is [?]. For example:
-
A:B is included in A:*:B, because B is
included in B, and A is included in A:*
- A:[0] is included in A:*:[?], because [0] is
included in [?], and A is included in A:*.
- A:*:B is not included in A:B, because A:* is
not included in A.
- A:[?] is not included in A:[0], because [?]
is not included in [0].
Inclusion by constrant: Two RPLs may be constrained to be
included, one in the other. In the current language specification,
the only way this can happen is when capturing a type
(§ 4.3). There is no way for the programmer to
directly write an inclusion constraint on RPLs.
3.6.4 Disjointness
The meaning of the disjointness relation on RPLs is that the sets of
regions represented by the RPLs have no common elements. This
relation is important in checking noninterference
(§ 5.7).
Disjoint RPLs: The following rules govern disjointness of
RPLs.
Distinctions from the left: Two RPLs are disjoint if they
start with a sequence of one or more equivalent elements, and then
their elements are disjoint, where disjoint elements are defined
below. An application of this rule is called a “distinction from the
left,” because it uses a difference in the RPLs starting on the left
to infer disjointness. This rule works because RPLs form a tree, so
two RPLs that start the same then diverge must be on different
branches of the tree.
The following are examples of distinctions from the left, assuming
A and B are class region names
(§ 3.1.2):
-
A and B are disjoint, because they are short for
Root:A and Root:B, so they start with the same element
Root and then diverge. A:* and B:* are also
disjoint, for the same reason.
- If R is a region parameter, then R:A and R:B
are disjoint, as are R:A:* and R:B:*.
- this:A and this:B are disjoint, as are
this:A:* and this:B:*.
Note that there must be at least one equivalent element for a
distinction to from the left. For example, region parameters R1
and R2 are not disjoint unless they are constrained to be
disjoint (see below).
Distinctions from the right: Two RPLs are disjoint if they
end with disjoint elements, where disjoint elements are defined below.
An application of this rule is called a “distinction from the
right,” because it uses a difference in the RPLs starting on the
right to infer disjointness. This rule works because two RPLs that
end in disjoint elements can never refer to the same regions, even
after substituting for parameters, *, and [?].
The following are examples of distinctions from the right, assuming
A and B are class region names
(§ 3.1.2):
-
Root:*:A and Root:*:B are disjoint RPLs.
- R:*:A and R:*:B are disjoint RPLs.
- this:*:A and this:*:B are disjoint RPLs.
Disjointness by constraint: Two RPLs are disjoint if they are
each included in another RPL (3.6.3) and
the including RPLs are constrained to be disjoint
(2.4.3). For example, in the following
code, R1:r and R2:r are disjoint RPLs, because R1:r
is included in R1:*, R2:r is included in R2:*, and
R1:* and R2:* are constrained to be disjoint.
class DisjointConstraints<region R1, R2 | R1:* # R2:*> {
region r;
void method() {
DisjointConstraints<R1:r> dc1 =
new DisjointConstraints<R1:r>();
DisjointConstraints<R2:r> dc2 =
new DisjointConstraints<R2:r>();
}
}
Note that R1:r and R2:r would not be disjoint if
the constraint were the weaker R1 # R2; the * is
required. This is because nesting does not imply inclusion. For
example, setting R1=A and R2=A:B satisfies
R1 # R2. But then R1:B=A:B is not disjoint from
R2, after substituting A for R1 and A:B for
R2. However, R1=A and R2=A:B does
not satisfy R1:* # R2:*, because A:B is included in
A:*.
Disjoint RPL elements: Two RPL elements are disjoint if one
is a name and the other is an array index (including [?]); or
they are different names; or they are the array index elements
[e1] and [e2], where e1 and e2 always evaluate
to different values at runtime (i.e., are “never-equal
expressions”).
Since the compiler can't always know when two expressions evaluate to
the same value, it applies the following conservative rules to
identify pairs of never-equal expressions:
-
e1 and e2 are never-equal if they refer to different
compile-time constants. For example, 0 and 1 are never-equal.
- e1 and e2 are never-equal if they represent the same
binary operation, one of the corresponding subexpressions is
always-equal (§ 3.6.1), and the other is
never-equal. For example, if i is a final local
variable, then i+1 and i+2 are never-equal.
Otherwise, the compiler conservatively assumes that the expressions
are not never-equal.
4 Types
DPJ extends the Java type system by adding RPLs (§ 3) to
class and array types. The RPLs in the types support the effect
system: different class and array objects can be created with
different RPLs, and the compiler can use the RPLs to infer the effect
of an operation on the object (i.e., accessing a class field or array
cell, or invoking a method).
This section discusses DPJ's extensions to the Java type system:
§ 4.1 discusses class and interface types,
§ 4.2 discusses array types, and
§ 4.3 discusses the rules for determining the type of
an expression (e.g., what type is returned by calling a method).
4.1 Class and Interface Types
This section discusses DPJ's class and interface types. We will use
the term “class” throughout this section, with the understanding
that the concepts apply identically to interfaces, unless otherwise
noted. § 4.1.1 discusses the instantiation
of a class to a type by supplying RPLs for its parameters.
§ 4.1.2 explains how the DPJ compiler
compares two class types for compatibility, for example in checking an
assignment statement. § 4.1.3 discusses casting
one type to another in DPJ. § 4.1.4
explains owner RPLs, which determine the nesting relation for
variable RPLs (§ 3.1.5).
4.1.1 Writing Class Types
In DPJ, as in generic Java, a class with parameters defines a family
of types, one for each set of arguments to the parameters. However,
in addition to type parameters, DPJ classes can have region parameters
(§ 2.4.1). When writing a DPJ class type,
parameter arguments must be supplied. They can be supplied implicitly
or explicitly.
Implicit parameter arguments: As in generic Java, a DPJ class
with parameters may always be used as a type by just naming the class,
without providing any explicit parameter arguments. For example, the
following code is valid:
class ImplicitArgs<type T, region R> {
// ImplicitArgs is a valid type here
ImplicitArgs field = null;
}
When the type arguments are omitted for a class with parameters, the
following occurs:
-
The type parameters, if any, have no argument. That is, the
type functions as a raw type in generic Java.
- The RPL parameters, if any, become bound to Root.
In the example written above, field is of class type
ImplicitArgs, with Root bound to R and no
argument for T.
Implicit parameter arguments are useful in two ways. First they help
with porting legacy Java code to DPJ. For example, adding a region
parameter to a preexisting class will not break any code that uses the
class. Second, because region arguments are used to compute the
effects on fields of a class (§ 5.4), they are
usually not important for code that is never invoked in a parallel
context (for example, in a sequential initialization phase). For this
code, the programmer can avoid writing the arguments.
Explicit parameter arguments: As in Java, if explicit
arguments to class parameters are given, then they appear in angle
brackets after the class name. The arguments must appear in the same
order in which the parameters appear in the class definition; so, in
particular, any type arguments must precede any RPL arguments. Any of
the type arguments may be preceded by the keyword type, and any
of the RPL arguments may be preceded by the keyword region; but
the keywords are optional, as the compiler can infer which are the
types and which are the RPLs from the class definition.
For example:
class ExplicitArgs<type T, region R> {
// Instantiate an ExplicitArgs type with T=Object, R=Root
ExplicitArgs<Object,Root> field;
}
This code defines a class ExplicitArgs with one type parameter
T and one RPL parameter R. The field field has
class type ExplicitArgs with T=Object and
R=Root.
The number of type arguments must exactly match the number of type
parameters. However, fewer RPL arguments can be given than the number
of RPL parameters; any missing RPL arguments (matching the arguments
that are there from the left) are implicitly bound to Root. For
example, in the code above, we could have written the type of field
field as ExplicitArgs<Object>. Here is another example:
class ImplicitRPLArg<region R1, R2> {
region r;
ImplicitRPLArg<r> field;
}
The class ImplicitRPLArg has two parameters R1 and
R2, but the type ImplicitRPLArg<r> of field has only
one explicit RPL argument, r. r is bound to the first
parameter R1, and the second parameter has no argument, so it is
bound to Root. This type is the same as
ImplicitRPLArg<r,Root>.
4.1.2 Class Type Comparisons
The notion of type comparison is important in Java. For
example, if class B extends class A, and variable x
has type A, then you can assign an object of type B to
x; but it is not permissible to assign an object of some type
C that does not extend A (or extend another class that
extends A, etc.).
DPJ extends Java's type comparison rules to account for the RPL
information in the class types. The rules ensure that the actual type
of an object always corresponds to the type that appears in a program
variable storing a reference to the object. This property in turn
allows the compiler to reason soundly about effects by looking at the
types of variables. If the “wrong” type of object could be assigned
to a variable, then this kind of reasoning would not work.
To state the comparison rules for DPJ, we introduce the concept of the
DPJ erasure of a type. This is the type obtained by ignoring
all RPL parameters and arguments. For example, suppose class C
has a type parameter T and a region parameter R. Then the
DPJ erasure of C<Object,Root> is C<Object>.
The following rules determine whether a variable of class type T1
can be assigned to a variable of class type T2 in DPJ.
Types that instantiate the same class: T1 may be assigned
to T2 if the two types instantiate the same class, the DPJ erasure
of T1 may be assigned to the DPJ erasure of T2 in ordinary Java,
and the RPL arguments of T1 are included
(§ 3.6.3) in the corresponding arguments
of T2.
Here are some examples of compatible types that instantiate the same
class:
-
C<Root> may be assigned to C<Root:*>, because the
DPJ erasures are identical, and Root:* includes Root.
- C<Object,Root> may be assigned to C<Object,Root:*>
for the same reason.
- C<C<Object,Root>,Root> may be assigned to C<? extends
C<Object,Root:*>,Root:*>. The reasoning here is a bit more
complicated. First, look at the DPJ erasures of the types, that is
C<C<Object,Root>> and C<? extends C<Object,Root:*>>.
According to the rules for Java generic wildcards, the first type
may be assigned to the second if C<Object,Root> may be
assigned to C<Object,Root:*>. To figure that out, we have to
apply the rule recursively. First look at the DPJ erasures:
assigning C<Object> to C<Object> is OK. Now look at the
RPLs: Root:* includes Root. So that test checks out.
Now look at the RPL arguments in the original types: Root:*
includes Root. So that test checks out as well.
Here are some examples of incompatible types that instantiate the same
class, assuming r1 and r2 are region names:
-
C<r1> may not be assigned to C<r2>, because r1
is not included in r2.
- C<Object,r1> may not be assigned to C<Object,r2> for
the same reason.
- C<C<Object,r1>,Root> may not be assigned to C<?
extends C<Object,r2>,Root because C<Object,r1> may not be
assigned to C<Object,r2>.
Types that instantiate different classes: T1 may be
assigned to T2 if T2 is a direct supertype or
indirect supertype of T1.
Direct supertypes: T2 is a direct supertype of T1 if a
type with T2's class or interface appears in an extends or
implements clause of T1's class or interface, and that type
is assignable to T2 after substituting arguments for parameters
according to T1.
For example:
-
Assume class declarations class B<region R> extends A<R>
and class A<region R>. Then A<Root> is a direct
supertype of B<Root>, because A<R> appears in the
extends clause of B, and the type obtained by
substituting Root for R from the type B<Root> is
A<Root>.
- With the same assumptions as in (1), A<Root:*> is a direct
supertype of B<Root>, because A<Root> is assignable to
A<Root:*>.
- Assume class declaration class B<type T, region R> extends
A<B<Root>,R>. Then A<B<Root>,Root:*> is a direct supertype
of B<Object,Root>.
Indirect supertypes: T2 is an indirect supertype of T1
if there is a chain of direct supertypes connecting T1 to T2.
For example, if we have class declarations class C<region R>
extends B<R>, class B<region R> extends A<R> and class
A<region R>, then A<Root> is an indirect supertype of
C<Root>, because A<Root> is a direct supertype of
B<Root>, and B<Root> is a direct supertype of
C<Root>.
4.1.3 Class Type Casts
One DPJ type may be cast to another if the cast would be allowed for
the DPJ erasures of the types (§ 4.1.2) in
ordinary Java. For example, assuming a class declared as class
C<region R> the following cast is legal:
C<r1> x = (C<r1>) new C<r2>();
This code creates an object of type C<r2> on the right-hand
side, then casts it to type C<r1> before assigning it to a
variable x of type C<r1>. The DPJ erasure of both the
new expression and the target type is C, so the cast is
allowed. Without the cast, C<r2> is not assignable to
C<r1>, because r2 does not include r1.
However, the following cast is not legal, assuming a class declared as
class C<type T, region R>:
// Compile error!
C<C<Object,r1>,r1 x =
(C<C<Object,r1>,r1>) new C<C<Object,r2>,r2>();
That is because the DPJ erasure of the type of the new
expression is C<Object,r2>; the DPJ erasure of the target type
is C<Object,r1>; and these two types are not compatible.
As in ordinary Java, casts allow assignments that the compiler cannot
check for correctness. That is, in general for a statement of the
form
T x = (T) y
there is no way to ensure at compile time that the type of the object
y is really consistent with the type T of variable
x. Further, for efficiency reasons, DPJ has no checks for
catching bad assignments at runtime. Therefore, a DPJ program with
casts in it is a potentially nondeterministic program! Casts should
be used very carefully and only as a last resort, when there is no
other way to express the program.
4.1.4 Owner RPLs
Owner RPLs define the nesting relationship between final local
variables used as RPLs (§ 3.1.5) and other RPLs.
A variable RPL is nested under the owner RPL of its class type
(§ 3.6.2).
If a class has RPL parameters, then the owner RPL of a type
instantiating the class is the argument to the first parameter. For
example, assuming a class declaration class C<region R1,R2>, and
region names r1 and r2, the owner RPL of the type
C<r1,r2> is r1.
If a class does not have RPL parameters, then the owner RPL of a type
instantiating the class is Root. For example, assuming a class
declaration class C<type T>, the owner RPL of the type
C<Object> is Root.
4.2 Array Types
This section discusses DPJ's array types.
§ 4.2.1 discusses the instantiation of an
array to a type by supplying an RPL for its parameter. This assigns
an RPL to each cell of the array. § 4.2.2
discusses DPJ's index-parameterized array type, which provides
a way to assign a different region to each cell of an array. This is
useful for updating different array cells (or different objects stored
in different array cells) in parallel. § 4.2.3
explains the creation of new array objects.
§ 4.2.4 explains how the DPJ compiler
compares two array types for compatibility, for example in checking an
assignment statement. § 4.2.5 discusses casting
one array type to another in DPJ.
Because Java arrays do not support slices (e.g., extracting a
contiguous subsection of an array), DPJ provides a generic class
called DPJArray with an interface similar to Java's
ArrayList that supports slices. There are also versions of
DPJArray specialized to the various primitive types.
§ 7.1 discusses these classes.
4.2.1 Writing Array Types
Every DPJ array type has exactly one RPL parameter. It works
similarly to a class RPL parameter. When you write an array type, you
must either specify an explicit RPL as an argument to the parameter,
or use the default RPL. The explicit argument is given in angle
brackets, after the square brackets. If you leave out the explicit
argument, the default is Root. In either case, the RPL argument
specifies the RPL associated with the cells of the array.
For example:
// Array of int with cells in Root
int[]<Root>
// Same as int[]<Root>
int[]
// Array of int with cells in R
int[]<R>
As in ordinary Java, you can construct arrays whose elements are
themselves arrays. In this case the RPL argument to the array type is
the argument (either implicit or explicit) associated with the
leftmost pair of brackets. The type of an element is given by
deleting the leftmost set of brackets and its RPL argument, if any.
For example:
// Array with cells in R1; each element is an int[]<R2>
int[]<R1>[]<R2>
// Array with cells in Root; each element is an int[]<R>
// Same as int[]<Root>[]<R>
int[][]<R>
// Array with cells in R; each element is an int[]<Root>
// Same as int[]<R>[]<Root>
int[]<R>[]
This process may be continued to arbitrary depth. For example,
int[]<R1>[]<R2>[]<R3> is an array with cells in R1 whose
element type is int[]<R2>[]<R3>. The element type in turn is an
array with cells in R2 whose element type is int[]<R3>.
Class and method RPL parameters are frequently given as RPL arguments
to array types. For example, the following class declares a field
arr whose type is an array. Both the field arr and the
cells of the array are in the RPL given by the class region parameter
R:
class ArrayClassParam<region R> {
int[]<R> arr in R;
}
Notice that without the capability to bind R as the RPL argument
to the array type, there would be no way to get the cells of arr
in region R. The specifier in R is insufficient, because
it only specifies the region of the arr reference itself, not
the region of the cells of the array that the reference points to.
Method region parameters can also be used to write a method that takes
an array with any region:
abstract class ArrayMethodParam {
// This method will accept an int[] with any RPL argument
<region R>void method(int[]<R> arr);
}
Non-parametric RPLs are generally bound to array parameters when the
arrays are global data. For example, one might declare an array of
data that is initialized once, in a sequential phase, and then never
changes during the computation. One could do that by declaring a
region ReadOnlyData and then putting the array in that region:
region ReadOnlyData;
final int[]<ReadOnlyData> data = new int[N]<ReadOnlyData>;
On the right-hand side of the assignment, we have created a new array
of type int[]<ReadOnlyData> with N elements (see
§ 4.2.3).
Inside the parallel computation, with the proper use of disjointness
constraints on parameters and disjoint RPLs, read effects on
ReadOnlyData are noninterfering, and write effects on other
regions are noninterfering because the regions are disjoint. See
The Deterministic Parallel Java Tutorial for more details.
4.2.2 Index-Parameterized Array Types
For parallel computations on arrays, it is often necessary to update
different parts of an array in parallel in different tasks. A DPJ
feature called an index-parameterized array allows this to be
done safely. We discuss index-parameterized arrays first for the
simpler case of arrays of primitive and class types, and then for the
case of arrays of arrays.
Arrays of primitive and class types: There are two cases to
consider:
-
Updating different array cells in different tasks. For example,
there could be an array of 10,000 cells, with each of 100 tasks
updating 100 cells. Because the array cells are distinct, there is
no inteference.
- Updating disjoint objects pointed to through different cells in
different tasks. For example, there could be an array of 10,000
cells, each of which stores a reference to a unique object. Each of
100 tasks in parallel could read 100 of the cells, follow the
references, and update the objects. Because the objects are
distinct, there is no interference.
DPJ's index-parameterized arrays can express both cases.
Updating different array cells: To update different array
cells, it is necessary to put each cell in a different region. DPJ
does this with an index-parameterized array type. An
index-parameterized array type uses the array index element [i]
(§ 3.1.3) in the RPL associated with cell
i. Because the index value of i is different for every
array cell, every cell has its own region.
To write an index-parameterized array type, you can use the special
array index element [_] in the RPL argument of the array type.
This special element is allowed only inside an array type, where it
stands in for the array index element [i] associated with each
cell. For example:
// Array of int such that cell i is in region [i]
int[]<[_]>
// Array of int such that cell i is in region R:[i]
int[]<R:[_]>
Notice that in the second example, the RPL R:[i] gives us two
ways to distinguish array components: we can distinguish whole arrays
from each other by using different bindings to R; and we can
distinguish different parts of arrays from each other by using the
different values of i.
Updating different objects through different array cells:
Index-parameterized arrays can also handle the case of updates through
references to different objects, by using [_] in the RPL
argument of the element type of the array. Every object stored in a
distinct array cell gets its own region, parameterized by the index of
the cell. For example, suppose we have the following simple class
definition:
class Data<region R> {
int field in R;
}
Then we can create the following index-parameterized array:
Data<[_]>[]<[_]> arr = new Data<[_]>[N]<[_]>;
The type of arr is an array such that (1) cell i of the
array is in region [i]; and (2) cell i of the array has
type Data<[i]>. Now we can do disjoint initialization of the
array, as before:
foreach (int i in 0, N)
arr[i] = new Data<[i]>();
Notice that the type Data<[i]> on the left-hand side of the
assignment matches the type of arr[i], as it must; see
§ 4.2.4.
We can also update the objects disjointly through the references in
the array:
foreach (int i in 0, N)
// Effect is 'writes [i]'
++arr[i].field;
This is a very useful pattern in shared-memory parallel programming.
See The Deterministic Parallel Java Tutorial for more realistic examples.
Arrays of arrays: An array of arrays is an array of objects
(the arrays are objects), so it works similarly to an array of class
objects, discussed above. The difference is that with an array of
arrays, there are multiple index dimensions, so it may not be
sufficient to use the single name [_] for the index region of
an array. Instead, DPJ allows you to use an explicitly named variable
for each index dimension.
To do this, you write #var after the RPL argument of an
array type, where var is an identifier declaring a fresh
variable in scope over the type. That variable can then be used in an
array index element to indicate the index associated with each array
cell. For example, rewriting the types above using the explicit index
variable name i:
// Same as int[]<[_]>
int[]<[i]>#i
// Same as int[]<R:[_]>
int[]<R:[i]>#i
The variable name doesn't matter; we could have written j or
k or anything else. The only point of the name is to associate
the variable declaration with an array dimension, so the compiler
knows what dimension is referred to from the name. Notice that if we
use #_ for the variable declaration, then the index is
[_], as in the default case. In fact this is just how the
compiler implements the default: it declares #_ for you as the
array index variable if you don't specify one yourself.
In the array-of-array case, each #var declaration is in
scope for the array where it is declared, and all arrays to the right
of that one. For example, the type
int[]<[i]>#i[]<[i]:[j]>#j
defines an array of arrays. The outermost array has cell i in
region i, and the type of cell i is an array of int
with cell j in region [i]:[j]. If we reversed the i
and j declarations and wrote int[]<[i]>#j[]<[i]:[j]>#i,
then we would get a compile error, because i is used but not in
scope in the leftmost RPL argument.
This kind of decomposition is useful for dividing a rectangular grid
into rows and columns for disjoint parallel updates. For example, an
initialization of the array might look like this:
int[]<[i]>#i[]<[i]:[j]>#j A = new int[N]<[i]>#i[M]<[i]:[j]>#j;
foreach (int i in 0, N) {
// Effect on [i]:[?] is disjoint on different i iterations
foreach (int j in 0, M) {
// Effect on [i]:[j] is disjoint on different j iterations
A[i][j] = INITIAL_VALUE;
}
}
For the details of how the compiler would check this example, see
§ 5.4 (computing effects of statements) and
§ 5.7 (noninterfering effects).
4.2.3 New Arrays
Creation of new arrays in DPJ works just as in Java, with the extra
annotations in the array types
(§§ 4.2.1, 4.2.2).
Arrays of primitive and class types: To create an array of
n primitive types or class objects, write a DPJ array type with n
in the brackets, instead of empty brackets. For example:
// Create an array of 10 int with cells in Root
int[]<Root> A1 = new int[10]<Root>;
// Same as new int[10]<Root>
int[] A2 = new int[10];
// Create an array of 10 int with cells in R
int[]<R> A3 = new int[]<R>;
// Create an array of 10 int such that cell i is in region [i]
int[]<[_]> A4 = new int[10]<[_]>;
// Create an array of 10 int such that cell i is in region R:[i]
int[]<R:[_]> A5 = new int[10]<[_]>;
// Create an array of 10 Data objects such that (1) cell i
// is in region [i]; and (2) cell i has type Data<[i]>
Data<[_]>[]<[_]> A6 = new Data<[_]>[10]<[_]>;
Unfortunately, Java does not let you create arrays of class types that
have generic parameters without an explicit cast. For example, the
following code is illegal:
class Data<type T, region R> {
T field in R;
}
Data<Integer,[_]>[]<[_]> A;
// The following line causes a 'generic array creation' compile error
A = new Data<Integer,[_]>[10]<[_]>;
To get around this limitation, you need to use a cast:
// Rewrite of the last line above so it compiles
A = (Data<Integer,[_]>[]<[_]>) new Data[10];
This code is ugly, but it works. Also, the ugliness is localized to
the point of array creation. Once the array is created and assigned,
everything else works as it should.
An alternative to this ugly code would be to use a DPJArray
(§ 7.1) instead of a Java array. As of the
current version of DPJ, the DPJArray class cannot be used with
index-parameterized regions. It is anticipated that future versions
of DPJ will support index-parameterized DPJArrays.
Arrays of arrays: To create an array of arrays, as in
ordinary Java, at least one of the brackets must have a length
expression, and any empty brackets must appear to the right:
// Array of 10 cells in R1; each element has type int[]<R2> and
// is initialized to null
int[]<R1>[]<R2> A1 = new int[10]<R1>[]<R2>;
// Same as before, but each element of the array is initialized
// to an array of int[]<R2> with 10 elements
int[]<R1>[]<R2> A2 = new int[10]<R1>[10]<R2>;
// ILLEGAL: Empty brackets must appear to the right
int[]<R1>[]<R2> A2 = new int[]<R1>[10]<R2>;
// Array of 10 cells such that cell i is in region i; cell i has
// type int[]<[i]:[j]>#j and is initialized to null
int[]<[i]>#i[]<[i]:[j]>#j = new int[10]<[i]>#i[]<[i]:[j]>#j;
// Same as before, but each cell is initialized to an array
// with 10 elements
int[]<[i]>#i[]<[i]:[j]>#j = new int[10]<[i]>#i[10]<[i]:[j]>#j;
4.2.4 Array Type Comparisons
The following rules determine when T1 may be assigned to T2,
where T1 and T2 are both array types.
Comparing arrays of primitive types: T1 may be assigned to
T2 if the element types of T1 and T2 are both the same
primitive type, and the RPL argument of T1 is included in the RPL
argument of T2 (§ 3.6.3). For example:
-
int[]<Root> may be a assigned to int[]<Root>.
- int[] may be assigned to int[]; this is the same as
case 1.
- int[]<R> may be assigned to int[]<R:*>, because
R is included in R:*.
- int[]<r1> may not be assigned to int[]<r2>, because
r1 and r2 are incompatible region names.
Comparing arrays of class and array types: T1 may be
assigned to T2 if the element types of T1 and T2 are
array-compatible class or array types, and the RPL argument of
T1 is included in the RPL argument of T2. Two class types are
array-compatible if the element type of T1 can be assigned to the
element type of T2 using the rules in
§§ 4.1.2, 4.2.4, but
requiring equivalence (§ 3.6.1) instead of
inclusion of RPLs. For example:
-
C<R>[]<R> may be a assigned to C<R>[]<R:*>, because
C<R> is array compatible with itself, and R is included
in R:*.
- int[]<R>[]<R> may be assigned to int[]<R>[]<R:*>,
because int[]<R> is array compatible with itself, and R
is included in R.
- C<R>[]<R> may not be assigned to C<R:*>[]<R:*>,
because C<R> and C<R:*> are not array-compatible.
- int[]<R>[]<R> may not be assigned to
int[]<R:*>[]<R:*>, because int[]<R> and int[]<R:*>
are not array compatible.
The extra requirement of array-compatibility is introduced to avoid
problems like this:
class C<region R> { ... }
region r1, r2;
// Create an array of 10 C<r1>
C<r1>[] arr = new C<r1>[10];
// Not really allowed, but would be without array compatibility
C<*>[] = badArr;
// Inconsistent types! Assigning C<r2> to a cell of arr
badArr[0] = new C<r2>();
4.2.5 Array Type Casts
Casts of array types have the same rules as casts of class types
(§ 4.1.3). The cast is allowed if it would be
allowed for the DPJ erasures of the types in ordinary Java. The DPJ
erasure of an array type is computed by erasing all the DPJ region
information from the entire type. For example:
-
The DPJ erasure of int[]<R> is int[].
- The DPJ erasure of int[]<r1>[]<r2> is int[][].
- The DPJ erasure of int[]<[i]>#i[]<[i][j]>#j is
int[][].
- The DPJ erasure of C<[_]>[]<[_]> is C[].
4.3 Typing Expressions
In Java, every expression has a type. For example, if a method
returns int, then an expression invoking that method has type
int. (Some statements have types too, but those statements
always enclose expressions.) The types allow the compiler to enforce
consistency of assignments; for example, to make sure that an
int is never assigned to a variable of type String.
In DPJ, every expression has a type and an effect.
§ 5.4 discusses the effects of statements and
expressions. Here we discuss DPJ's extensions to Java's rules for
determinining the type of an expression.
4.3.1 Field Access
A field access expression in Java has the general form
- selector-exp.field-name
where selector-exp is a selector expression of class type C,
and field-name is the name of a field of class C (or a
superclass of C). If a bare field name appears, it is equivalent to
this.field-name. Here we give the rules for the general
form.
The compiler carries out the following steps to determine the type of
an expression selector-exp.field-name:
-
Determine the type selector-type of selector-exp, using the rules in
this section together with the ordinary rules for Java types.
- Look up the type field-type of field based on the
class C named in selector-type.
- Compute the capture (§ 4.3.4) of
selector-type to generate the type
captured-selector-type.
- Make the following subsitutions in field-type to generate
the answer:
-
Substitute the type and RPL arguments of
captured-selector-type for the corresponding type and RPL
parameters of class C.
- If selector-exp is a final local variable or
this, then substitute the variable for this. Otherwise,
substitute a capture parameter (§ 4.3.4)
included in owner-rpl:*, where owner-rpl is the
owner RPL of the variable (§ 4.1.4).
Here is some example code for which the capture in step 3 is a no-op,
and the capture in step 4(b) isn't needed.
§ 4.3.4 gives examples involving capture.
1 class FieldTypingExample<region R> {
2 region r;
3 FieldTypingExample<R> field1 in R;
4 void FieldTypingExample<r> method1() {
5 return (new FieldTypingExample<r>).field1;
6 }
7 FieldTypingExample<this> field2 in this;
8 void FieldTypingExample<arg>
9 method2(final FieldTypingExample<R> arg) {
10 return arg.field;
11 }
12 }
Let's see how to compute the type of the expression in the
return statement in line 5. Step 1: The selector is a new
expression of type FieldTypingExample<r>. Step 2: The field
name is field1, the class C is FieldTypingExample, and
the declared type of the field is FieldTypingExample<R> (line
3). Step 3: Nothing to do (see § 4.3.4). Step
4: Substituting r given in the selector type expression for
R in the field type, we have that the answer is
FieldTypingExample<r>.
Now let's see how to compute the type of the expression in the
return statement in line 10. Step 1: The selector is arg,
which has type FieldTypingExample<R>. Step 2: Now the field is
field2, and its type is FieldTypingExample<this> (line 7).
Step 3: Again, nothing to do. Step 4: arg is a final
local variable (line 9), so we can substitute it for this in the
type of field2. Therefore the answer is
FieldTypingExample<arg>.
4.3.2 Array Access
An array access expression has the form
array-exp[index-exp], where array-exp
is an expression of array type, and index-exp is an expression
of integer type. To compute the type of this kind of expression, the
compiler does the following:
-
Compute the element type of the array type by deleting the
leftmost pair of brackets, RPL argument if any, and index variable
declaration, if any. For example, the element type of
Data<[_]>[]<[_]> is Data<[_]>; the element type of
int[]<R1>[]<R2> is int[]<R2>; and the element type of
int[]<[i]>#i[]<[i]:[j]>#j is int[]<[i]:[j]>#j.
- Substitute index-exp for the leftmost index variable in
the element type to generate the answer. For example, if A
has type Data<[_]>[]<[_]>, then expression A[0] has
type Data<[0]>. If A has type int[]<R1>[]<R2>,
then expression A[i] has type int[]<R2>. (Notice there
is no substitution, because the index variable doesn't appear in the
element type.) If A has type
int[]<[i]>#i[]<[i]:[j]>#j, then expression A[0] has
type int[]<[0]:j>#j.
4.3.3 Method Invocation
Explicit type and RPL arguments: To compute the type of a
method invocation expression
- selector-exp.<type-args,rpl-args>method-name(args)
The compiler carries out the following steps:
-
Determine the type selector-type of selector-type
and the types arg-types of args, using the rules in
this section together with the ordinary rules for Java types.
- Look up the method in the ordinary Java way, using the method
name, selector-type, and arg-types. Use the method to
find formal value parameter types and return type.
- Compute the capture (§ 4.3.4) of
selector-type to generate the type
captured-selector-type.
- Make the following substitutions in each of the formal argument
types then check that the actual argument types arg-types are
assignable (§§ 4.1.2
and 4.2.4) to the formal argument types:
-
Substitute the type and RPL arguments of
captured-selector-type for the type and RPL arguments of its
class.
- Substitute type-args and rpl-args for the type and
RPL parameters of the method.
- For every argument in args of a type assignable to
int, substitute the actual argument expression for the
corresponding formal parameter of the method.
- If selector-exp is a final local variable or
this, then substitute the variable for this. Otherwise
substitute a capture parameter (§ 4.3.4)
included in owner-rpl:*, where owner-rpl is the
owner RPL of the variable (§ 4.1.4).
- Make the same substitutions as in step 4 in the return type to
compute the answer.
Here is some example code for which the capture in step 3 is a no-op,
and the capture in step 4(d) isn't needed.
§ 4.3.4 gives examples involving capture.
1 abstract class MethodInvocationExample<region R1, R2> {
2 abstract <region R3>
3 MethodInvocationExample<R3, [i]>callee(int i);
4 MethodInvocationExample<Root, [0]>caller() {
5 return this.<Root>callee(0);
6 }
7 }
Lines 1–2 define a method callee with one RPL parameter
R3 and one formal parameter int i. It returns type
- MethodInvocationExample<R3,[i]>.
Notice that the return type is written in terms of the method
parameter (both the RPL and value parameter); these parameters have to
be substituted away to generate a meaningful type at the call site.
At the call site (line 5), the compiler does the following. Step 1:
The selector type is
- MethodRegionInvocationExample<R1,R2>,
and the argument type is int. Step 2: The invoked method is
callee, defined in lines 1–2. Step 3: Nothing to do (see
§ 4.3.4). Step 4: Binding 0 to int is
OK. Step 5: The return type is
- <MethodInvocationExample<R3,[i]>.
The method arguments are R3=Root and i=0.
Substituting arguments for parameters gives a return type of
- MethodInvocationExample<Root,[0]>.
Inferred type and RPL arguments: If there are no explicit
type or RPL arguments, the compiler infers them as discussed in
§ 2.4.2 and proceeds as stated above, using
the inferred arguments in step 4(b). As in ordinary Java, if there is
no selector-exp, then the implied selector is this.
4.3.4 Captured Types
The capture of a generic type is important in generic Java; it
prevents bad assignments through wildcard types. DPJ uses the same
concept in connection with partially-specified RPLs, which are a kind
of wildcard, as they can stand in for several different regions. To
motivate the problem, consider this example:
1 class CaptureExample<region R> {
2 CaptureExample<R> field;
3 void method() {
4 region r1, r2;
5 // Create a new CaptureExample<R:r1>
6 CaptureExample<R:r1> ce = new CaptureExample<R:r1>();
7 // Assign it to CaptureExample<R:*>; this is allowed
8 CaptureExample<R:*> ceStar = ce;
9 // This should not be allowed!
10 ceStar.field = new CaptureExample<R:r2>();
11 }
12 }
The assignment in line 10 is a problem: the actual type of the
object stored in ceStar is CaptureExample<R:r1>. That's
because the assignment in line 8 didn't change the type of the object;
it just assigned the same object to a different reference with a
partially specified type. So the actual type of the field field
of that object is CaptureExample<R:r1>. That means it can't
hold a CaptureExample<R:r2>; that would violate the consistency
of typing at runtime. However, the type of the selector ceStar
is CaptureExample<R:*>. If we computed the type of
ceStar.field by just substituting the actual for formal
arguments in the selector type, we would get CaptureExample<R:*>
for the type of ceStar.field, and the bad assignment would be
allowed.
So we don't do that. Instead, we introduce a new parameter, called a
capture parameter, in the type of the selector. It stands in
for the unknown actual region of an object stored in a variable with
partially specified type. The introduction of the new parameter is
called capturing a type. It occurs in steps 3 and 4(b) of
typing field access (§ 4.3.1) and steps 3 and
4(d) of typing method invocation (§ 4.3.3).
In this example, the captured type of the selector is
CaptureExample<P>, where P is the capture parameter.
P is constrained to be included in R:*, because that is
all we know about the actual region from the type
CaptureExample<R:*>. This is the only way parameters can be
inclusion-constrained in DPJ (§ 3.6.3).
In the example above, now the bad assignment is disallowed, because
the type CaptureExample<R:r2> is not assignable to the type
CaptureExample<P>, where P is the capture parameter. Nor
is the type CaptureExample<R:*> assignable to
CaptureExample<P>. However, CaptureExample<P> is
assignable to CaptureExample<R:*>, because of the inclusion
constraint.
More generally, this is how the compiler computes the capture of a type
in DPJ:
-
Take the normal Java capture of the type, substituting for any
wildcard generic type arguments but keeping the same RPL
arguments, if any.
- For each RPL that is partially specified (i.e., contains *
or [?]), replace that RPL with a capture parameter constrained
to be included in the RPL.
Notice that if there are no generic wildcards and no partially
specified RPL arguments, then the capture operation does nothing to
the type.
Capturing types is mostly an internal compiler mechanism. Programmers
never have to do it, and they shouldn't even have to worry about it,
except to deal with capture errors when they occur. For example, the
above code, if compiled, would generate a type error like “expected
type CaptureExample<capture of R:*>, found type
CaptureExample<R:r2>.” That error will probably be mysterious
unless you understand how type capture works.
Here is another common way that capture errors can occur, involving
method invocations:
1 abstract class CaptureMethodExample<region R> {
2 abstract void callee(CaptureMethodExample<R> x);
3 void caller(CaptureMethodExample<*> y) {
4 // Compile error!
5 y.callee(y);
6 }
7 }
Line 4 causes an error, because it is attempting to assign y,
which has type
- CaptureMethodExample<*>
to type
- CaptureMethodExample<capture of *>
which is the type of the formal parameter x of callee,
after capturing the selector type.
Usually you can work around these errors by adding a parameter. For
example, the code above could be rewritten as follows:
1 abstract class CaptureMethodExample<region R> {
2 abstract void callee(CaptureMethodExample<R> x);
3 <region R>void caller(CaptureMethodExample<R> y) {
4 // OK
5 y.callee(y);
6 }
7 }
By adding a method region parameter R, we “capture the type
ourselves.” Now the code explicitly says that the region in the type
of y is the same as the region of the type of x: whatever
it is, it is bound to the same parameter R in both cases.
5 Effects
Effects are the way that DPJ tracks accesses to the heap to enforce
determinism. An effect is an action that reads or writes memory.
Every statement and expression in the program is assigned an effect.
If the effects of two statements do not interfere, then the statements
may be safely run in parallel. Effects don't interfere if neither one
writes to memory, or they operate on different parts of memory, or
they are both invocations of a method declared commutative
(§ 2.3.4).
This section describes how effects work in DPJ.
§ 5.1 describes basic effects, which are
individual actions involving memory (such as reading or writing a
variable). § 5.2 describes effect
summaries, which are the program representation of sets of basic
effects, and are used to summarize the effects of methods.
§ 5.3 describes local effects, which are
effects inside a method that are not visible to the caller and so may
be omitted from the method's effect summary.
§ 5.4 explains how DPJ statements and
expressions generate basic effects. § 5.5
describes effect coarsening, which is necessary for summarizing
effects on RPLs and local variables that can go out of scope.
§ 5.6 explains the subeffect relation
on effects, which is important for checking method effect summaries
against the actual method effects. § 5.7
explains the noninterference relation on effects, which is
important for checking that pairs of parallel tasks have no
conflicting operations.
5.1 Basic Effects
A basic effect is an action involving memory. DPJ's effect system
represents the following kinds of effects:
-
Read effects: A read effect indicates a read operation on
an RPL (§ 3) or local variable. Such an effect
summarizes one or more reads to one or more memory locations
associated with the region or regions named by the RPL.
- Write effects: A write effect indicates a write operation
on an RPL (§ 3) or local variable. Such an effect
summarizes one or more writes or reads to one or more memory
locations associated with the region or regions named by the RPL.
- Invocation effects: An invocation effects indicates an
invocation of some method, causing some set of basic effects (the
effects of invoking the method). The basic effects associated with
the invocation are called the underlying effects of the
invocation. For example, if method m has effect summary
writes r (§ 5.2), then invoking
m generates an invocation effect with m as its method
and writes r as its underlying effect.
Read and write effects on an RPL R are generated by directly
accessing a class field declared to be in R
(§ 2.2) or an array cell in R
(§ 4.2), or by invoking a method with the effect in
its effect summary (§ 2.3.1).
Invocation effects are always generated by invoking methods.
§ 5.4 gives more details on how DPJ statements
and expressions generate these effects.
Invocation effects are necessary because some method invocations can
commute with others (§ 2.3.4). To
keep track of these pairs of commuting methods, the compiler needs to
record the information about which method was invoked, in addition to
what effects the method invocation caused.
Notice that effects on local variables (including method parameters)
are recorded separately from effects on RPLs. Because local variables
cannot have their references taken, and never alias, the compiler
automatically keeps track of interfering effects on local variables.
The programmer doesn't have to put them in RPLs or summarize their
effects.
Finally, notice that a write effect can represent both reads and
writes. Writes are “stronger” than reads (for interference, at
least one of two operations to the same location must be a write), so
it is sound but conservative to represent reads as writes. In some
cases it may reduce the size of the effect set. For example, a read
and a write to the same location may be represented with just the
write effect. Finally, allowing write effects to represent reads does
not sacrifice any precision, since (1) a read to a location alone can
always be represented as a read effect; and (2) a read and write to a
location can always be represented as a single write to that location
without any loss of precision (the presence of the write already
causes any parallel access to the location to interfere, so the
presence or absence of the read makes no difference).
5.2 Effect Summaries
An effect summary is a bit of program text that summarizes a
set of basic effects (§ 5.1). In the current DPJ
language, effect summaries can appear in the program text only in
method definitions, where they summarize the effects of invoking the
method (§ 2.3.1). It is anticipated
that future versions of DPJ will also allow effect summaries to appear
as arguments to effect variables in class types and method
invocations, for greater flexibility in specifying and checking
effects.
An effect summary consists of one of the following:
-
pure, indicating no effect on the heap.
- reads rpl-list, indicating reads to the RPLs given
in rpl-list.
- writes rpl-list, indicating writes or reads
to the RPLs given in rpl-list.
- reads rpl-list-1 writes rpl-list-2,
indicating both reads to the RPLs in rpl-list-1 and writes to
the RPLs in rpl-list-2.
Currently there is no way to represent an invocation effect
(§ 5.1) directly in an effect summary;
invocations in the method body must be summarized by giving their
underlying effects. This may change in future versions of DPJ.
Here is a simple example of an effect summary:
1 class Point<region R> {
2 int x in R, y in R;
3 <region Rp>void add(Point<region Rp> p)
4 reads Rp writes R {
5 this.x += p.x;
6 this.y += p.y;
7 }
8 }
This class defines a simple Point object with integer
coordinates x and y. There is one class region parameter
R (§ 2.4.1), and the coordinate fields
are placed in the region of R. The add method takes a
Point object with some other (possibly different) region
Rp and adds the coordinates of that point to the coordinates of
this one. The summarized effects are reads Rp writes R, shown
in line 4, because the method reads the coordinates of p in
region Rp and writes the coordinates of this in region
R.
5.3 Local Effects
A basic effect (§ 5.1) is local if it is a
read or write effect on a local variable or local RPL
(§ 3.5), or it is an invocation effect whose
underlying effects are all local. Local effects are contained within
a method scope and never seen by the calling context, so they may be
ignored when summarizing method effects. For example:
class LocalEffects<region R> {
int x in R;
// method has no effects visible to the caller
void method() pure {
region r;
// Write effect on var is local
LocalEffects<r> var = new LocalEffects<r>();
// Write effect on region r is local
var.x = 5;
}
}
5.4 Effects of Statements and Expressions
At the heart of DPJ's determinism checking is an analysis of the
effect of every statement and expression in the program. To compute
the effects, the compiler uses the following information:
-
The form of the expression or statement. For example,
an assignment statements generates a write on the left-hand side,
and a read on the right-hand side.
- The type of a field access selector, method invocation selector,
or array, together with the RPL of the field (for field access) or
declared method efects (for method invocations). For example, if a
class C<region R> has a field x in R, then access
through a variable of type C<r> generates an effect on
r. Similarly, if C has a method void m() writes
R, then invoking m on a variable of type C<r>
generates the effect writes r.
The field region specifiers (§ 2.2)
are important because they effectively partition the class fields into
regions that can be used to describe the effects. The method effect
summaries (§ 2.3.1) are important
because they (1) document the effects of methods at API boundaries
(including, e.g., methods in classes for which the source code is not
available); and (2) allow the compiler to infer the effect of a method
invocation from the summary, rather than doing an interprocedural
analysis. (Because method calls can be recursive, this analysis would
need to iterate to a fixed point.)
In more detail, here is how the compiler computes the effects of a
DPJ statement or expression:
Field access: For field access expressions
selector-exp.field-name that directly access a
non-final field, the compiler first computes the RPL accessed by
the expression. It uses the same procedure as for typing field access
(§ 4.3.1), except that it uses the RPL specifier
of the field (§ 2.2) instead of the
type associated with the field, and there are no substitutions for
type parameters in step 4(a). The compiler records a write or read
effect (§ 5.1) to the RPL so computed, depending
on whether the expression appears on the left-hand side of an
assignment, or in a read access.
Reads of fields declared final generate no effect (writes, other
than initialization, are not allowed). Because the value of
a final field does not change after initialization, reading it
cannot cause a conflicting access.
Array access: For array access expressions e1[e2]
that access an array cell, the compiler first computes the RPL of the
cell. To do this it substitutes e2 for the leftmost index variable
in the leftmost RPL argument appearing in the type of e1. For
example, if array A has type int[]<[_]>, then the RPL
accessed by A[0] is [i]; and if array B has type
int[]<[i]>#i[]<[i]:[j]>#j, then the RPL accessed by B[0]
is [0]. The compiler records a write or read effect to the RPL
so computed, depending on whether the array access expression appears
on the left-hand side of an assignment, or in a read access.
Note that access through multiple dimensions of an array of arrays
causes a read effect for all but the last dimension. For example, for
array B defined above, the effect of expression B[0][1] is
a read of [0], plus a either a read or write on [0]:[1],
depending on the context. For example, x = B[0][1] generates a
read, while B[0][1] = x generates a write. In fact, this is
just a special case of the rules stated in the previous paragraph,
together with the rules for compound expressions (see below).
Method invocation: For method invocation expressions, the
compiler computes and accumulates the effects of evaluating the
selector and argument expressions as described in this section. Then
it uses the method's effect summary
(§ 2.3.1) to compute the effect of the
invocation itself. It uses the same procedure as for computng the
return type of a method (§ 4.3.3), except that
it uses the effect summary instead of the declared return type, and
there are no substitutions for type parameters in steps 4(a) or 4(b).
The compiler records an invocation effect (§ 5.1)
with the invoked method and the computed effect.
Local variable access: For statements and expressions that
access a non-final local variable (i.e., variable declared in a
method scope or method formal parameter), the compiler records a read
or write effect on the variable. Effects on final local
variables are ignored.
Compound statements and expressions: For any statement or
expression made up of other statements or expressions (including, for
example, an assignment that has a method invocation on its right-hand
side, or an array access to an expression which is itself an array
access), the compiler accumulates the effect of the components,
coarsening component effects as necessary
(§ 5.5) to remove elements that are no
longer in scope at the outer level.
5.5 Effect Coarsening
A statement or expression may generate an effect that is no longer in
scope where the effect must be reported. For example, consider the
following foreach loop (§ 6.2) on an
index-parameterized array (§ 4.2.2):
1 int[]<[_]> A = new int[N]<[_]>;
2 foreach (int i in 0, N) {
3 A[i] = i;
4 }
In line 3, the effect is writes [i]. But what is the effect in
the scope outside the loop? This is important, for example, if this
code appears inside a method body and its effect must be summarized.
We can solve this problem with partially-specified RPLs
(§ 3.4). For example, the effect writes [?]
covers all the effects writes [i] for all i inside the
loop, so that is what we use. This is called effect
coarsening.
Coarsening of local effects: An effect on a local RPL
(§ 3.5) or local variable is simply deleted from the
effect set when the RPL or variable goes out of scope. For example,
in the following code fragment, neither block contained in the
cobegin has any effect:
class LocalEffectCoarsening<region R> {
int x in R;
void method() pure {
cobegin {
// No effect here because var1 and r1 are out of scope
{
region r1;
LocalEffectCoarsening<r1> var1 =
new LocalEffectCoarsening<r1>();
var1.x = 10;
}
// No effect here either because var2 and r2 are out of
// scope
{
region r2;
LocalEffectCoarsening<r2> var2 =
new LocalEffectCoarsening<r2>();
var2.x = 25;
}
}
}
}
This pattern can also be used effectively with foreach, by
having each iteration create its own objects whose effects are
invisible to the other iterations:
foreach (int i in 0, N) {
// r is local to a foreach iteration
region r;
// Do some effects on r
...
}
This technique is useful for creating private objects in each
iteration that manipulate data local to that iteration. See
The Deterministic Parallel Java Tutorial for more examples.
Coarsening of nonlocal effects: Coarsening of nonlocal
effects works as follows.
Variable RPLs: An effect on an RPL starting with a
final local variable
(§§ 3.1.5, 3.2) is coarsened
when the variable goes out of scope. The effect is replaced by a new
effect on owner-rpl:*, where owner-rpl is the
owner RPL of the variable's type (§ 4.1.4).
Because variables may be nested under variables, this operation is
performed recursively on the resulting RPL until an RPL is obtained
that is valid in the outer scope.
For example:
class VariableCoarsening<region R> {
int x in R;
void method() writes R:* {
// Coarsened effect is 'writes R:*'
{
final VariableCoarsening<R> vc1 =
new VariableCoarsening<R>();
VariableCoarsening<vc1> vc2 =
new VariableCoarsening<vc1>();
// Effect is 'writes vc1'
vc2.x = 5;
}
}
Note that method parameters are in scope in the method definition, so
effects on method parameter RPLs don't need to be coarsened in the
method's effects:
abstract class ParamRPLs<region R> {
abstract void method(ParamRPLs<R> param)
// OK; 'writes R:*' is also OK, but less precise
writes param;
}
Array index RPLs: An effect on an RPL containing the array
index [e] is coarsened to [?] if the expression e
includes an integer variable that goes out of scope. An example is
the foreach code given at the beginning of this section.
5.6 Subeffects
The subeffect relation determines whether one set of effects covers
another set of effects, i.e., all effects in the second set are
represented in the first set. The compiler uses the subeffect
relation to check that a method's declared effects
(§ 2.3.1) include the actual effects of
its body and the effects of any overriding methods.
Basic effects: The following rules determine when one basic
effect (§ 5.1) is a subeffect of another:
-
If one RPL is included in another
(§ 3.6.3), then a read of the first RPL
is a subeffect of a read or write to the second. For example,
reads Root is a subeffect of reads Root:* and
writes Root:*.
- If one RPL is included in another, then a write to the first RPL
is a subeffect of a write to the second. For example, writes
Root is a subeffect of writes Root:*. Note that
writes Root is not a subeffect of reads Root:*.
- An invocation of method m with underlying effect set E1 is
a subeffect of an invocation of method m with underlying effect
set E2 if E1 is a subeffect of E2. For example, an
invocation of m with underlying effect writes R is a
subeffect of an invocation of m with underlying effect writes
R:*.
Effect sets: The following rules determine when one set of
basic effects is a subeffect of another:
-
If each basic effect in one set is a subeffect of some basic
effect in the another set, then the first set is a subeffect of the
second. For example, reads A writes B is a subeffect
writes A:*, B:*, because reads A is a subeffect of
writes A:*, and writes B is a subeffect of writes
B:*.
- An invocation effect is a subeffect of its underlying effect.
For example, an invocation of method m with the effect writes
R is a subeffect of writes R. This means that invocations
may always be summarized by stating their underlying effects.
5.7 Noninterference of effect
The noninterference relation on effects determines whether two sets of
effects are safe to be run in parallel. The compiler uses the
noninterference relation to check that there are no conflicting
effects in mutually parallel tasks (§ 6).
Basic effects: The following rules determine when one basic
effect (§ 5.1) is a subeffect of another:
-
Two basic effects are noninterfering if they are both read
effects.
- Two basic effects are noninterfering if they are each read or
write effects, and they operate on disjoint RPLs
(§ 3.6.4).
- Two invoke effects are noninterfering if they invoke the same
method, and the method is declared commutative
(§ 2.3.4).
Effect sets: The following rules determine when one effect
set is a subeffect of another:
-
If every basic effect in one set is noninterfering with every
basic effect in another set, then the two sets are noninterfering.
For example, reads A writes B is noninterfering with
writes C,D. However, reads A writes B is interfering
with writes A,C, because reads A inteferes with
writes A.
- If two effect sets are each a subeffect of another effect set,
and the including sets are noninterfering, then the included sets
are noninterfering. In particular, two invoke effects are
noninterfering if their underlying effects are (because the
underlying effects include the invoke effects, see
§ 5.6).
6 Parallel Control Flow
DPJ employs a fork-join model of parallelism. That means that a task
may launch several parallel tasks (the fork) and all must complete
(the join) before the launching task can continue. Recursive forking
is supported, i.e., tasks can launch other tasks to arbitrary depth.
There are two ways to fork tasks: cobegin, which forks several
statements as parallel tasks, and foreach, which forks groups of
consecutive loop iterations as parallel tasks.
6.1 cobegin
The syntax of the cobegin statement is as follows, where S is
a DPJ statement:
- cobegin S
If S is any statement but a block enclosed in curly braces {
… }, or if S is a block consisting of a single
statement, then the cobegin just executes the statement S.
Otherwise, the component statements of S are run as parallel tasks.
There is an implicit barrier (join) at the end of the cobegin
statement, so that all the component tasks must finish execution
before the parent task executes the statement after the cobegin.
For example, the following code executes statements S1 and
S2 in parallel:
cobegin {
S1;
S2;
}
S3;
Because S3 appears after the cobegin, both S1 and
S2 are guaranteed to finish before S3 is executed.
In order to guarantee deterministic execution, the compiler checks the
component statements of a cobegin for noninterference
(Section 5.7). If interference is discovered,
then the compiler issues a warning. For example, the following code
would cause a warning, because of the interfering writes to variable
x in the parallel tasks:
class C {
void m() {
int x;
cobegin {
x = 0;
x = 1;
}
}
}
On the other hand, the following code would compile with no warning:
class C {
void m() {
int x, y;
cobegin {
x = 0;
y = 1;
}
}
}
The cobegin statement is most often used with recursion. The
following pattern is typical:
void recursiveMethod(...) {
if (...) {
// do base case sequentially
} else {
cobegin {
recursiveMethod(...);
recursiveMethod(...);
}
}
}
Note that the “recursion cutoff” (i.e., when the base case takes
over) has to be programmed manually. For example, in a parallel
recursive sort, the condition might say to do the sort sequentially
when the input array reaches a certain minimum size. The minimum
should be chosen so that (1) there are enough parallel tasks for the
scheduler to balance the computation, but (2) the task creation
overhead is not unduly large. See The Deterministic Parallel Java Tutorial for more
examples of cobegin.
6.2 foreach
foreach operates similarly to cobegin, except that the
parallel tasks are the iterations of a loop, instead of the component
statements of a block. The granularity of parallelism (i.e., how many
loop iterations to execute in a task) is controllable by the
programmer.
6.2.1 Writing the foreach loop
The syntax of the foreach loop is as follows:
- foreach (int
index-var in start, length,
stride) body;
index-var is an identifier, start, length, and
stride are integer expressions, and body is a statement.
The stride expression is optional, and the default is 1. If it
appears, the stride expression must evaluate to an integer
greater that 0. The foreach loop executes the loop body once
for each element of an iteration space given by all integers
stride ⋅ i such that i ranges between 0 and
length−1, inclusive. The variable index-var may
not be modified in the loop body.
For example, the following code sets the cells of array A with
even indices to 0:
foreach (int i in 0, A.length, 2) {
A[i] = 0;
}
The compiler performs the following noninterference check for indexed
foreach loops:
-
Infer the effect set of body
(Section 5.4).
- Create a copy of the effect set generated in (1), but replace
every occurrence of index-var with a fresh variable that is
known to be unequal to index-var. This simulates the effects
generated by two distinct iterations of the loop, for which
index-var will have distinct values.
- Check whether the effect sets generated in (1) and (2) are
noninterfering (Section 5.7).
If interference is detected, the compiler issues a warning.
The DPJ runtime divides the foreach iterations into parallel
work according to the programmer-specified granularity
(Section 6.2.2). As in the case of
cobegin, there is an implicit barrier at the end of the
foreach; i.e., all tasks created by the foreach must
complete before any code following the foreach is executed.
6.2.2 Controlling the granularity of parallelism
In most cases, it would be inefficient to issue every loop iteration
as a parallel task. For example, consider a loop iterating over an
array with 100,000 elements on a machine with 10 cores. If 100,000
tasks were issued to do this computation, the task creation overhead
would swamp the parallelism.
Instead, the DPJ compiler allows the user to control the granularity
of task creation. The user can specify a minimum task size, in terms
of the number of iterations. The compiler recursively splits the
iteration space, until the minimum size is reached. For example, if
the cutoff were specified to be 1000 in the example above, then the
compiler would divide the loop into 100 tasks of 1000 iterations
each.
If the foreach computation is perfecly balanced (i.e., each
iteration does exactly the same amount of work), then it makes sense
to make the cutoff I/T, where I is the number of loop iterations
and T is the number of threads available to the DPJ runtime (usually
equal to the number of cores on the host machine). Using this
strategy, in the example with 100,000 iterations and 10 cores, the
cutoff should be 10,000.
If the computation is not perfectly balanced, then a better strategy
is to over-decompose the computation, specifying more tasks
than the number of available threads. This allows the scheduler to
schedule multiple tasks of varying size per thread, in a such a way
that the aggregate amount of work per thread is roughly balanced. As
in the case of the cobegin cutoff
(Section 6.1), the programmer should choose a
minimum task size that creates enough parallel work without incurring
too much task creation overhead.
The precise mechanism for specifying the number of DPJ threads and the
foreach cutoff is implementation-dependent.
The Deterministic Parallel Java Installation Manual explains how this mechanism works for the dpjc
compiler.
7 The DPJ Runtime
This section gives an overview of the classes DPJArray and
DPJPartition in the package DPJRuntime. These classes are
part of the runtime supplied with DPJ and are useful for manipulating
arrays. For full documentation of these classes, and for coverage of
the other DPJ runtime classes, see the HTML documentation included in
the DPJ release at
- DPJ/Implementation/Runtime/dpjdoc.
To use the DPJ runtime, you have to do the following:
-
Put import DPJRuntime.* at the top of your code (or just
import the class or classes you want to use), as for regular Java.
- When compiling your code, you must compile the runtime classes
located at Implementation/Runtime/dpj in the DPJ release
together with any code that uses the runtime. For example, if class
file Foo.java depends on the DPJ runtime, then you need to do
something like this:
- dpj Foo.java ${DPJ_ROOT}/Implementation/Runtime/dpj
It is not sufficient to put the runtime classes in your class
path. If you do that, you will get a lot of errors. The reason is
that DPJ bytecode doesn't yet properly support separate compilation of
DPJ annotations; the DPJ compiler needs the DPJ source for all
annotated code to process the annotations.
- When running your code, put the runtime classes located at
Implementation/Runtime/classes in your class path, as for
regular Java.
7.1 DPJArray
In parallel algorithms that operate on arrays, especially
divide-and-conquer algorithms, it is often necessary to split an array
into parts and operate separately on the separate parts. Because this
operation is so useful, some languages (e.g., Fortran 90) include
first-class support for extracting such “slices” from arrays.
Unfortunately, Java does not. Instead of adding slices directly to
arrays, which would require redefining Java array types and array
operations (and would be hard to make consistent with legacy Java),
DPJ provides array slices as an operation of a library class, called
DPJArray. There are two kinds of DPJArray classes: the
generic DPJArray class, and a set of DPJArray classes
specialized to primitive types.
Generic DPJArray: The generic DPJArray class
represents an array of objects. It has one type parameter and one RPL
parameter:
class DPJArray<type T, region R>
The type parameter is the element type of the array, and the RPL
parameter is the RPL of the array storage.
Creating a DPJArray: There are three ways to create a
DPJArray object. The first is to call a constructor that takes
just a length argument. This operation creates a new DPJArray
with the specified length, type, and RPL. For example, the following
code creates a DPJArray storing 10 Integer objects, and
the storage is in RPL r:
region r;
DPJArray<Integer,r> A = new DPJArray<Integer,r>(10);
Creating a fresh DPJ array is useful, but sometimes you have a Java
array and you want to make a DPJArray out of it. So the second
way to create a DPJArray is to call a constructor that takes an
ordinary Java array as an argument. For example, the following code
creates a Java array of 10 Integer objects, and wraps it in a
DPJArray:
region r;
Integer[]<r> a = new Integer[10]<r>;
DPJArray<Integer,r> A = new DPJArray<Integer,r>(a);
Note that the type and RPL of the array being passed in (here,
Integer and r) must match the type and RPL argument of the
DPJArray type; if not, a compile error occurs.
The third way to create a DPJArray is to make a subarray
(i.e., a slice) of an existing DPJArray. This is explained
below.
Accessing elements: DPJArray has put and
get operations similar to the ones in java.util.ArrayList.
For example, if A1 and A2 are DPJArrays, then the
following code fragment gets element 0 out of A1 and stores it
into element 0 of A2:
A2.put(0, A1.get(0));
If the DPJArray was created by wrapping a Java array, then the
put operation modifies the wrapped array. The effect of a
get operation is a read on the region of the DPJArray, and
the effect of a put operation is a write to the region of the
DPJArray.
Accesses are bounds-checked. Any attempt to access a position less
than 0 or greater than the array length minus one throws an
ArrayIndexOutOfBoundsException.
Subarrays: The real usefulness of DPJArray is its
support for subarrays, which are contiguous subsections of an
array. To create a subarray of a DPJArray, you call the
subarray instance method with a start and length argument. For
example, the following code creates a DPJArray and then extracts
the subarray of length 2 starting at position 5 (here we use the
default RPL of Root):
DPJArray<Integer> A = new DPJArray<Integer>(10);
DPJArray<Integer> B = A.subarray(5,2);
There are two nice things about subarrays. First, creating a subarray
takes minimal time and space overhead. Nothing is copied, and no new
storage allocated to hold any array elements. A small object is
created that stores the start position, length, and a reference to the
underlying array.
Second, as far as its API is concerned, a subarray is
indistinguishable from a freshly created DPJArray. For example,
the subarray created above is zero-indexed, it has length 2, and
attempts to access indices other than 0 and 1 throw an exception.
However, the subarray also provides a view into the original array.
For example, the following code stores 1 into position 0 of B,
which is the same as position 5 of A:
B.put(0,1);
You can get the start position out of a DPJArray by reading the
field start, and you can get the length by reading the field
length.
These features allow methods that operate on array subranges to be
zero-indexed, without worrying about the index parameters of the
subranges. For example, here is some simple recursive code that
increments every position of a DPJArray by 1:
<type T, region R>void increment(DPJArray<T,R> A) {
if (A.length == 0) return;
if (A.length == 1) {
A.put(0, A.get(0)+1);
}
int mid = A.length / 2;
increment(A.subarray(0, mid));
increment(A.subarray(mid, length-mid));
}
Without the subarray feature, this code would have to be written by
passing the index ranges as parameters to the increment method,
which is ugly.
Specialized DPJArrays: Because Java does not support
binding primitive types to generic parameters, the DPJ runtime also
has versions of DPJArray specialized to the primitive types
(DPJInt for int, DPJBoolean for boolean,
etc.). These operate identically to the generic DPJArray,
except that the element type is given by the class, and there is no
generic parameter.
Another way to write such arrays is to create a generic DPJArray
using the class corresponding to the primitive type. For example,
instead of a DPJArrayInt, one could use a
DPJArray<Integer>, as described above. This works, but the code
is more verbose, as well as more memory-intensive and slower, as Java
has to box and unbox all those primitive types when putting them into
and getting them out of the array.
7.2 DPJPartition
For parallel divide-and-conquer algorithms on arrays, it is often
important to create disjoint collections of subarrays. For example, a
parallel sorting algorithm might repeatedly divide an array into
disjoint halves (or quarters, etc.) and operate recursively in
parallel on the pieces. To support effect checking for this kind of
algorithm, the DPJ runtime includes a class DPJPartition for
representing disjoint collections of subarrays of the same array.
Each of the subarrays in the collection is called a segment of
the partition. As with DPJArray, there is a generic version,
and there are specialized versions.
Generic DPJPartition: The generic DPJPartition
class represents an array of subarrays of a DPJArray. It has
one type parameter and one RPL parameter:
class DPJPartition<type T, region R>
The type parameter is the element type of the DPJArray being
partitioned, and the RPL parameter is the RPL of the array storage.
Creating a DPJPartition: There are several ways to
create a DPJPartition; for a full list, see the HTML
documentation. Here are two useful ways. First, DPJPartition
has a constructor that takes a DPJArray to partition, an index
at which to partition, and a boolean value that says whether to
exclude or include the element at the index position. For example, if
A is a DPJArray<Integer> of length 10, then
new DPJArray<Integer>(A, 5, true)
partitions A into the segments [0,4] and [6,9] (excluding
position 5), while
new DPJArray<Integer>(A, 5, false)
partitions A into the segments [0,4] and [5,9] (including
position 5). This constructor is useful for parallel
divide-and-conquer algorithms with a fanout of 2.
Second, DPJPartition has a static factory method
stridedPartition that takes a DPJArray to partition and a
stride at which to partition. For example, if A is a
DPJArray<Integer> of length 10, then
new DPJPartition<Integer>(A, 2)
creates a DPJPartition<Integer> with five segments, each of
length 2. This feature is useful for parallel divide-and-conquer
algorithms with a fanout of greater than two, as well as flat
partitions (such as tiling an array).
Accessing segments: The field length stores the number
of segments in the partition. It is final, so reading it has no
effect (§ 5.4). The method get takes an
integer index idx and returns the segment corresponding to that
index (and throws an exception if the index is out of range). The
type of the segment is DPJArray<T,this:[idx]:*>.
The index-parameterized type returned by get allows the
different segments to be operated on in parallel without interference,
similarly to an index-parameterized array
(§ 4.2.2). For example, the following code uses
DPJPartition to parallelize the simple recursive increment
shown in § 7.1:
<type T, region R>void parallelIncrement(DPJArray<T,R> A)
writes R:* {
if (A.length == 0) return;
if (A.length == 1) {
// Effect is 'writes R'
A.put(0, A.get(0)+1);
}
int mid = A.length / 2;
final DPJPartition<T,R> segs = new DPJPartition<T,R>(A,mid)
cobegin {
// Effect is 'writes segs:[0]:*'
parallelIncrement(segs.get(0));
// Effect is 'writes segs:[1]:*'
parallelIncrement(segs.get(1));
}
}
For more examples of how to use DPJPartition, see The Deterministic Parallel Java Tutorial.
The variable this in the type gets substituted by the selector
expression at the call site (§ 4.3.3), so it is
usually most useful to call get through a final local
variable. Including the variable in the RPL ensures that the compiler
doesn't erroneously infer disjointness for two different partitions of
the same array. For example, in the following code,
segs1.get(1) and segs2.get(0) are not disjoint (they
overlap at [2,7]):
DPJArray<Integer> A = new DPJArray<Integer>(10);
// Create segments [0,1] and [2,9]
DPJPartition<Integer> segs1 = new DPJPartition(A, 2);
// Create segments [0,7] and [8,9]
DPJPartition<Integer> segs2 = new DPJPartition(A, 8);
Specialized DPJPartitions: As with DPJArray,
there are versions of DPJPartition specialized to the various
primitive types. They are provided for convenience and efficiency, as
Java does not support binding primitive types to generic type
parameters.
8 Exception Behavior
In DPJ, an exception thrown outside any parallel construct
(cobegin or foreach) behaves exactly as in sequential
Java. An exception thrown inside a parallel construct and caught
inside that same parallel construct also behaves as in sequential
Java. An exception thrown inside a parallel construct and not caught
inside that parallel construct has the following behavior:
-
If an exception E is thrown in branch B of a cobegin
or foreach, then branch B behaves as if it were executed in
isolation, starting with the state that existed at the start of the
cobegin or foreach. For example, consider this code,
where loopBody(int) is a method:
foreach (int i in 0, 10) {
loopBody(i);
}
If iteration I throws exception E, then replacing the entire
foreach with loopBody(I) would also cause E to be
thrown, at the same point in the execution of method loopBody as
in the parallel case.
- If multiple branches of a cobegin or foreach, each
run in isolation, would throw an exception, then one of those
exceptions is guaranteed to be thrown by the entire cobegin or
foreach. Which one is thrown is scheduler dependent (i.e.,
different ones may be thrown on different executions).
- Methods annotated commutative
(Section 2.3.4) must respect this
exception behavior. For instance, method invocations I1 and
I2 are not considered to commute with each other if execution
I1; I2 throws an exception, but I2; I1 does not.
This document was translated from LATEX by
HEVEA.