center of tech

The best-versioned artifact in the Java world today is the ClassFile structure. Two numbers that evolve with the Java SE platform (as documented in the draft Java VM Specification, Third Edition) are found in every .class file, governing its content. But what determines the version of a particular .class file, and how is the version really used? The answer turns out to be tricky because there are many interesting versionable artifacts in the Java platform.
The source language is the most obvious. A compiler doesn’t have to accept multiple versions of a source language, though javac does, via the -source flag. (-source works on a global basis; it is also conceivable to work on a local basis, accepting different versions of the source language for different compilation units.) Less obvious versioned artifacts are hidden in plain sight: character sets and compilation strategies. And .class files themselves sometimes have their versions used in surprising ways. Let’s see how javac handles all these versions, and make some claims about how an “ideal” compiler might work.
In the remainder, X and Y are versions; “language X” means “version X of the language”.
Happily, the Java platform has used the Unicode character set from day one. Unhappily, when javac for source language X is configured to accept an earlier source language Y, it uses the Unicode version specified for source language X rather than Y. For example, javac 1.4 -source 1.3 uses Unicode 3.0, since that was the Unicode specified for Java 1.4. It should use Unicode 2.1 as specified for Java 1.3.
Claim: A compiler configured to accept source language X should use the Unicode version specified for source language X.
It is difficult for javac to use multiple Unicode versions since the standard library (notably java.lang.Character) effectively controls the version of Unicode available, and only one version of the standard library is usually available. We will return to the issue of multiple standard libraries later.
Sidebar: You may be surprised to discover that some other languages don’t use Unicode by default. A factoid from 2008’s JVM Language Summit was the existence of a performance bottleneck in converting 8-bit ASCII strings (used by dynamic language libraries) to and from Unicode strings (used by canonical JVM libraries). Who knows what the 2009 JVM Language Summit will reveal?
A compilation strategy is the translation of source language constructs to idiomatic bytecode, flags, and attributes in a ClassFile. As the Java SE platform evolves by changing the source language and ClassFile features, a compilation strategy can evolve too. For example, javac 1.4 may compile an inner class one way when accepting source language 1.3 and another way when accepting source language 1.4.
Claim: A compiler may use a different compilation strategy for each source language.
The javac flag ‘-target’ selects the compilation strategy associated with a particular source language. This mainly has the effect of setting the version of the emitted ClassFile: 46.0 for Java 1.2, 47.0 for Java 1.3, 48.0 for Java 1.4, 49.0 for Java 1.5, 50.0 for Java 1.6. For example, javac 1.4 compiles an inner class the same way when configured with targets 1.3 and 1.4, but emits 47.0 and 48.0 ClassFiles respectively:
javac 1.4 -source 1.3 -target 1.3 -> 47.0
javac 1.4 -source 1.3 -target 1.4 -> 48.0
In fact, ClassFile version is orthogonal to compilation strategy. For example, javac 1.4 could conceivably compile an inner class to a 48.0 ClassFile in two ways, one when configured to accept source language 1.3 and another when configured to accept source language 1.4:
javac 1.4 -source 1.3 -target 1.4 -> 48.0
javac 1.4 -source 1.4 -target 1.4 -> 48.0
You would have to inspect the ClassFiles carefully to see the difference, since their versions don’t reveal the compilation strategy. Of course, the ClassFile version “dominates” a compilation strategy, since a strategy can only use artifacts legal in a given ClassFile version, even though the concepts are different.
The combination missing above is:
javac 1.4 -source 1.4 -target 1.3 -> 47.0
or, given that the target could refer strictly to compilation strategy and not ClassFile version:
javac 1.4 -source 1.4 -target 1.3 -> 48.0
javac does not accept a target (or compilation strategy) lower than the source language it is configured to accept. Each new version of the source language is generally accompanied by a new ClassFile version that allows the ClassFile to give meaning to new bytecode instructions, flags, and attributes. Encoding new source language constructs in older ClassFile versions is likely to be difficult. How would javac encode annotations from the Java 1.5 source language without the Runtime[In]Visible[Parameter]Annotations attributes that appeared in the 49.0 ClassFile?
Claim: A compiler configured to accept source language X should not support a compilation strategy less than that corresponding to X.
This policy can be rather restrictive: there were no changes between the 1.5 and 1.6 source languages, and only minor changes in the 49.0 and 50.0 ClassFiles that accompany those languages (really, platforms). Nevertheless, javac 1.6 does not accept -source 1.6 -target 1.5.
The famous example of the restriction is that javac 1.5 does not accept -source 1.5 -target 1.4, so source code using generics cannot be compiled for pre-1.5 JVMs even though the generics are erased. This is partly because the compilation strategy for class literals changed between Java 1.4 and 1.5, to use the upgraded ldc instruction in the 49.0 ClassFile. If javac’s compilation strategy was more configurable, it would be conceivable to produce a 48.0 ClassFile from generic source code. There is however another reason why -source 1.5 -target 1.4 is disallowed … read on.
Prior to JDK7, if javac for source language X was configured to accept an earlier source language Y, it used the ClassFile definition associated with source language X. For example, if javac 1.5 -source 1.2 reads a 46.0 ClassFile, it treats the ClassFile as a 49.0 ClassFile. This is unfortunate because user-defined attributes in the 46.0 ClassFile could share the names of attributes defined in the 49.0 ClassFile spec, and interpreting them as authentic 49.0 attributes is unlikely to succeed.
Even if javac 1.5 -source 1.2 reads a 49.0 ClassFile, there is little point in reading 49.0-defined attributes since they had no semantics in the Java 1.2 platform. This holds for non-attribute artifacts such as bridge methods too; if physically present in a 49.0 ClassFile, they should be logically invisible from a Java 1.2 point of view. In summary:
javac 1.5 -source 1.2 reading a 1.5 ClassFile -> should interpret as 1.2
javac 1.5 -source 1.5 reading a 1.2 ClassFile -> should interpret as 1.2
Claim: A compiler configured to accept source language X should interpret a ClassFile read during compilation as if the ClassFile’s version is the smaller of a) the ClassFile version associated with source language X, and b) the actual ClassFile version.
In JDK7, javac behaves as per the claim. First, it interprets a ClassFile according to the ClassFile’s actual version, regardless of the configured source language. For example, a 46.0 ClassFile is interpreted as it would have been in Java 1.2, ignoring attributes corresponding to a newer source language. Second, when the configured source language is older than a ClassFile, javac ignores ClassFile features newer than the source language it is configured to accept.
An important part of a compiler’s environment is the standard library it is configured to use. The standard library used by javac can be configured by setting the bootclasspath. In future, a module system shipped with the JDK will allow a dependency on a particular standard library to be expressed directly.
Things get tricky when compiling an older source language to a newer target ClassFile version (and hence a later JVM with a newer standard library). For example, should javac 1.6 -source 1.2 -target 1.5 compile against the Java 1.2 or 1.5 standard library? Both answers have merit, which suggests further concepts are needed to disambiguate.
Using the right libraries matters at runtime too. The introduction of a source language feature in Java 1.5 - enums - added constraints on the standard library against which ClassFiles produced from the 1.5 source language can run. The java.lang.Enum class must be present, and you can read the code of ObjectInputStream and ObjectOutputStream to see for yourself the mechanism for serializing enum constants. The simple way to guarantee that a suitable standard library is available for enum-using code at runtime is to ensure that only 49.0 ClassFiles are produced from the 1.5 source language. Such ClassFiles will not run on a 1.4 JVM since it only accepts <=48.0 ClassFiles.
In a nutshell, the compilation strategy for enums is erasure++: an enum type compiles to an ordinary ClassFile with ordinary static members for the enum constants and ordinary static methods to list and compare constants. With a few changes in that strategy (to not extend java.lang.Enum) and a serious amount of magic in the 1.5 JVM (to track reflection and serialization of objects of enum type), the ClassFiles emitted by a compiler for the 1.5 source language could run safely enough on a 1.4 JVM. But the drawbacks to such hackery are enormous, so erasure++ it was.
Thus, the reason why one new language feature implemented by erasure - generics - cannot run on earlier JVMs is because another new language feature - enums - is implemented by erasure. Such is life at the foundation of the Java platform.
Thanks to “Mr javac” Jon Gibbons for comments on this entry.
Source/Kaynak : http://blogs.sun.com/abuckley/entry/versioning_in_the_java_platform