Dissecting the Java Virtual Machine - Architecture - part 1

Architecture

   Before you can start experimenting with the Hotspot source code (or any other project in the OpenJDK ecosystem) to not forget to issue the three holy-grail commands to get the latest sources and build the JRE/JDK images (example for Linux - for Windows you need to provide additional arguments to some of the commands as specified in the article for building under Windows):

./get_source.sh - get sources from the root and the child repositories
bash configure - run configure script to configure your environment after update of sources
make images - perform an incremental build of the JRE/JDK images using the latest changes in the sources (or "make clean images" to perform a full build)

Note: You can use JDK8 build instructions for more options during build - see references.

   The following diagram describes the high level architecture of a typical Java Virtual Machine implementation (image provided by the JVM architecture article on artima.com - see references):

hotspot architecture     

   Before we see how a Java application is executed and how it makes use of the memory structures of the JVM the JVM must be started and initialized. If you don't have the source code and you just want to understand the overall architecture without bothering to look into the sources you can just skip this section (and continue from the next section in this article) - but you are not encouraged to do so.

   So it all starts with the <OpenJDK_root>/dev/jdk8_tl/jdk/src/share/bin/main.c file (the java/javaw launcher).

Note: don't get confused that this is in the 'jdk' project - you can think of the 'jdk' as a superset of the 'jre' - the JRE image is created from source code in the 'jdk' project - in case you are wondering what exactly enters the JDK and JRE images you can inspect the following build chain: <OpenJDK_root>/Makefile -> <OpenJDK_root>/make/Main.gmk -> <OpenJDK_root>/jdk/make/BuildJdk.gmk -> <OpenJDK_root>/jdk/make/Images.gmk - the Images.gmk file provides the logic for building the JDK/JRE images from the build output)

   The first thing you will notice is that some of the methods called in the various sections (such as JLI_CmdToArgs() which is used to parse command line arguments in windows and is defined in <OpenJDK_root>/jdk/src/windows/bin/cmdtoargs.c) are scattered throughout different files - the reason is because logic specific for building the JDK on a particular platform might be different and the specific details are extracted to different source files that are included at build time - in that matter compilation output varies from platform to platform and output units for one platform do not pollute the build output for another platform.

   So the main.c file calls the JLI_Launch() method defined in <OpenJDK_root>/dev/jdk8_tl/jdk/src/share/bin/java.c with a number of parameters. In the implementation of JLI_Launch() you can see that some of methods are not defined in the same file - the reason is again because they might be specific for the target platform and are determined at build time. To see a particular implementation of a method look at the particular java_md*.c file (e.g. java_md_solinux.c for Solaris/Linux) - I guess 'md' stays for 'machine dependent'. The invocation of CreateExecutionEnvironment() performs preparation of the execution environment (e.g. checks whether the JVM library path is valid) and then the LoadJavaVM() method is called to load the JVM library (e.g. jvm.dll for Windows or libjvm.so for Linux) and assign the addresses of library methods to a handle (of type InvocationFunctions) that is used later to create a new JVM instance (by invoking the JNI_CreateJavaVM method). The JVMInit() method is called that performs the actual creation of a separate native thread (other than the one used to invoke the Java program) - it calls the ContinueInNewThread() method that delegates the creation of the particular VM thread to ContinueInNewThread0() method (again - specific for the target platform).

   When the JavaMain() method is called in a separate thread it calls the JNI_CreateJavaVM() method from the JVM library using the InitializeJVM() method. The JNI_CreateJavaVM() method is defined in <OpenJDK_root>/hotspot/src/share/vm/prims/jni.cpp. The first thing the JNI_CreateJavaVM() method does is to use an atomic lock (provided by an inline assembly call for the specific platform with Atomic::xchg that uses a variant of the 'xchg' instruction on the particular platform) for locking the process that tries to create an instance of the VM - this is done in order to ensure that only one JVM instance is created per process - multiple JVM instances are not allowed per process because the JVM uses global variables. To create the VM instance the JNI_CreateJavaVM() calls the create_vm() method defined in <OpenJDK_root>/hotspot/src/share/vm/runtime/thread.cpp that is about 400 lines of code and performs a number of activities - some of the most important are the initializations of the memory structures as specified in the architecture diagram above. Here is what happens:

   - the output stream module is initialized - it provides utilities from dumping formatted output to the tty (terminal) - this includes standard output of the JVM, GC log and others (depending on the particular options provided to the JVM);

   - the java launcher properties are processed (such as '-server' or '-client' that specify the type of the JVM - client JVM provides faster start-up type and less runtime optimizations beforehand while the server JVM has slower start-up time and provides more optimization beforehand - it is more suited for server applications);

   - operating system specific settings are initialized;

   - default system properties are initialized;

   - command line arguments are parsed and aditional OS-specific initializations are performed based on the arguments;

   - TLS (thread local storage) is initialized. Each JVM thread has its own storage space that can be addressed from a so called TLS index that points to the TLS. Each thread has its own TLS index that can be used by other threads if they need to access thread-local data of another thread;

   - agent libraries (provided by the '-agentlib', '-agentpath' and '-Xrun' options) are launched. One notorius example of such a library agent is jdb debugging library that can be attached to a JVM instance and used by remote debugging clients. In short - agent libraries provide instrumentation capabilities for the applications. For more information on Java agents you can read the 'Introduction to Java Agents' article on JavaBeat - see references at the end or check out the JADE (Java Agent Development Framework) project;

   - global data structures are initialized by calling vm_init_globals() - basic type checking is provided (useful when porting to a JVM to a different platform that has specifics regarding the sizes of basic types - they must be adjusted accordingly to the Java type system), heap object sizes are initialized, event log, OS synchronization primitives, perfMemory (performance memory) and chunkPool (memory allocator);

   - the Java version of the main thread (instance of JavaThread) is created and attached to the OS thread. If you open the implementation of JavaThread you will see that it has an 'oop' field that points the a Java Thread instance in the heap (in terms of Hotspot an 'oop' is just an object pointer that points to a Java object on the heap from C++ code). At this point we can create Java threads;

   - the Java-Level synchronization subsystem is initialized by calling ObjectMonitor::Initialize();

   - the other global subsystems and structures are initialized - various counters for the JMX management subsystem (embedded JMX server with default MBeans for managing and monitoring the JVM), for the runtime, thread and classloading systems are initialized, the bytecode template maps are initialized (the interpreters uses these template mappings to match against the currently executing bytecode), the libzip library is loaded so that it can be used to load JAR (esentially ZIP) files, the bootstrap classpath entries are loaded (such as the ones from rt.jar), the code cache is initialized - it is used to store the output from JIT (Just-In-Time) compilation (in short JIT compilation is an optimization technique that compiles methods or loop blocks to native code at runtime to speed up the execution of the target method/loop - but more on that later), the Universe is initialized (basically - memory for heap, the method area and other metadata), the interpreter is initialized (along with the template table for bytecodes), the method counter is initialized (used to support JIT compilation - method invocation counts can be used to determine "hot spots" or regularly called methods);

   - various system classes are loaded (such as java.lang.String, java.lang.System, java.lang.Thread, java.lang.ThreadGroup, java.lang.reflect.Method, java.lang.ref.Finalizer, java.lang.Class, and the rest of the System classes);

   - the signal dispatcher is initialized (used to propagate OS-level events to appropriate event handlers in the JVM);

   - the JIT compilers are initialized (client/server/shark - the shark JIT compiler uses the LLVM compiler infrastructure to JIT compile Java methods without introducing system-specific code; it is used along with the 'Zero' interpreter-only port of Hotspot);

   - the JMX server agent is created and started;

   - system classes that make use of the new 'invokedynamic' instructions (such as java.lang.invoke.MethodHandle) are initialized;

   - biased locking is initialized (this is an optimization technique for synchronization that allows a thread to become "biased" towards an object
thus eliminating the overhead in releasing/reacquiring the lock each time the same thread tries to lock/unlock the object - this is useful in case there is no regular lock switching between threads).

   Throughout the whole process of starting the JVM various JVMTI (Java Virtual Machine Tooling Interface) events are triggered to notify listener tools
for events related to the state of the JVM.

   Now that the JVM is initialized along with the memory structures we can invoke the Main class of our application where the lifecycle of our application starts. This is happening when the invocation of JNI_CreateJavaVM() returns and execution continues in the JavaMain() method from <OpenJDK_root>/jdk/src/share/bin/java.c. The loading of the Main class (either provided directly or from a jar file) is performed by the LoadMainClass() method which loads the sun.launcher.LauncherHelper class and calls the static checkAndLoadMain() method that loads the Main class of the application using the system classloader. The static main() method is called from the Main class.
At the end the main application thread is detached so that it appear's to the user that the program finishes execution when the main() method finishes. However this is the point where uncaught exceptions are handled by the launcher. At the end the launcher passes control back to the JVM by calling jni_DestroyJavaVM() from <OpenJDK_root>/hotspot/src/share/vm/prims/jni.cpp. You may have noticed that the source structure of the JVM is very self-descriptive:

hotspot directory structure


Note: The share/vm/adlc directory provides an architectural language description compiler that compiles an ADL language used to describe the architecture of a processor. The compiler compiles an ADL file into code which is incorporated into the Optimizing Just In Time Compiler (OJIT) to generate efficient and correct code for the target architecture. The ADL describes three basic different types of architectural features: the instruction set (and associated operands) of the target architecture, the register set of the target architecture along with relevant information for the register allocator and the architecture's pipeline for scheduling purposes. The architecture description file along with some additional target specific oracles, written in C++, represent the principal effort in porting the OJIT to a new target architecture.
Note: Native methods (output from JIT compilation) are also called 'nmethods' in terms of the JVM.

Share