Dissecting the Java Virtual Machine

 

   The Hotspot JVM is a pretty complex piece of software and this detailed article aims to provide a good starting point and lower the barrier for developers willing to contribute to Hotspot (and related) projects or even build their own cutomized Java virtual machine from the Hotspot codebase. It is the result of digging through a number of resources over the internet, the Java Virtual Machine Specification (7th edition), a number of research document written on the JVM and a decent amount of digging through and experimenting with the codebase. Content is simplified as much as possible. In order to be able to try out the tips in this guide you should have a development environment for the Hotspot JVM - see this article for details on how to do it.

   The source code is all there! One of the great aspects of open source software. You just have to understand how to deal with it :- )) First we will provide an overview of the architecture and data structures used by the Hotspot JVM, then we will provide an overview of how such a stack-based approach works in terms of Hotspot, give an overview of the Hotpost codebase and how it maps to the concrete components and at the end we will give guidelines on how to debug the Hotspot JVM. 

   Throughout the article we will refer to <OpenJDK_repo> which is the local clone of the root OpenJDK repository (currently http://hg.openjdk.java.net/jdk8/tl) along with its child repositories.

   Let's get started :- ))

 

Virtual Machine Basics

There are basically two types of virtual machines for interpreted programming languages - stack-based (like the Hotspot JVM) and register-based (like the Dalvik JVM) that basically provides the same set of features:

   - compilation of source language into VM specific bytecode;
   - data structures to contains instructions and operands (the data the instructions process);
   - a call stack for function call operations;
   - an ‘Instruction Pointer’ (IP) pointing to the next instruction to execute;
   - a virtual ‘CPU’ – the instruction dispatcher that:
      - fetches the next instruction (addressed by the instruction pointer);
      - decodes the operands;
      - executes the instruction.

   The difference between the two approaches is in the mechanism used for storing and retrieving operands and their results.

Note: The information above is derived from a blog post by Mark Sinnathamby that provides a great comparison between stack-based and register-based virtual machines - see references at the end.

   Traditionally, most virtual machines intended for actual execution are stack-based, a trend that started with Pascal's Pmachine and continues today with Java's JVM and Microsoft's .Net environment.


Architecture

   Before you can start experimenting with the Hotspot source code (or any other project in the OpenJDK ecosystem) to not forget to issue the three holy-grail commands to get the latest sources and build the JRE/JDK images (example for Linux - for Windows you need to provide additional arguments to some of the commands as specified in the article for building under Windows):

./get_source.sh - get sources from the root and the child repositories
bash configure - run configure script to configure your environment after update of sources
make images - perform an incremental build of the JRE/JDK images using the latest changes in the sources (or "make clean images" to perform a full build)

Note: You can use JDK8 build instructions for more options during build - see references.

   The following diagram describes the high level architecture of a typical Java Virtual Machine implementation (image provided by the JVM architecture article on artima.com - see references):

hotspot architecture     

   Before we see how a Java application is executed and how it makes use of the memory structures of the JVM the JVM must be started and initialized. If you don't have the source code and you just want to understand the overall architecture without bothering to look into the sources you can just skip this section (and continue from the next section in this article) - but you are not encouraged to do so.

   So it all starts with the <OpenJDK_root>/dev/jdk8_tl/jdk/src/share/bin/main.c file (the java/javaw launcher).

Note: don't get confused that this is in the 'jdk' project - you can think of the 'jdk' as a superset of the 'jre' - the JRE image is created from source code in the 'jdk' project - in case you are wondering what exactly enters the JDK and JRE images you can inspect the following build chain: <OpenJDK_root>/Makefile -> <OpenJDK_root>/make/Main.gmk -> <OpenJDK_root>/jdk/make/BuildJdk.gmk -> <OpenJDK_root>/jdk/make/Images.gmk - the Images.gmk file provides the logic for building the JDK/JRE images from the build output)

   The first thing you will notice is that some of the methods called in the various sections (such as JLI_CmdToArgs() which is used to parse command line arguments in windows and is defined in <OpenJDK_root>/jdk/src/windows/bin/cmdtoargs.c) are scattered throughout different files - the reason is because logic specific for building the JDK on a particular platform might be different and the specific details are extracted to different source files that are included at build time - in that matter compilation output varies from platform to platform and output units for one platform do not pollute the build output for another platform.

   So the main.c file calls the JLI_Launch() method defined in <OpenJDK_root>/dev/jdk8_tl/jdk/src/share/bin/java.c with a number of parameters. In the implementation of JLI_Launch() you can see that some of methods are not defined in the same file - the reason is again because they might be specific for the target platform and are determined at build time. To see a particular implementation of a method look at the particular java_md*.c file (e.g. java_md_solinux.c for Solaris/Linux) - I guess 'md' stays for 'machine dependent'. The invocation of CreateExecutionEnvironment() performs preparation of the execution environment (e.g. checks whether the JVM library path is valid) and then the LoadJavaVM() method is called to load the JVM library (e.g. jvm.dll for Windows or libjvm.so for Linux) and assign the addresses of library methods to a handle (of type InvocationFunctions) that is used later to create a new JVM instance (by invoking the JNI_CreateJavaVM method). The JVMInit() method is called that performs the actual creation of a separate native thread (other than the one used to invoke the Java program) - it calls the ContinueInNewThread() method that delegates the creation of the particular VM thread to ContinueInNewThread0() method (again - specific for the target platform).

   When the JavaMain() method is called in a separate thread it calls the JNI_CreateJavaVM() method from the JVM library using the InitializeJVM() method. The JNI_CreateJavaVM() method is defined in <OpenJDK_root>/hotspot/src/share/vm/prims/jni.cpp. The first thing the JNI_CreateJavaVM() method does is to use an atomic lock (provided by an inline assembly call for the specific platform with Atomic::xchg that uses a variant of the 'xchg' instruction on the particular platform) for locking the process that tries to create an instance of the VM - this is done in order to ensure that only one JVM instance is created per process - multiple JVM instances are not allowed per process because the JVM uses global variables. To create the VM instance the JNI_CreateJavaVM() calls the create_vm() method defined in <OpenJDK_root>/hotspot/src/share/vm/runtime/thread.cpp that is about 400 lines of code and performs a number of activities - some of the most important are the initializations of the memory structures as specified in the architecture diagram above. Here is what happens:

   - the output stream module is initialized - it provides utilities from dumping formatted output to the tty (terminal) - this includes standard output of the JVM, GC log and others (depending on the particular options provided to the JVM);

   - the java launcher properties are processed (such as '-server' or '-client' that specify the type of the JVM - client JVM provides faster start-up type and less runtime optimizations beforehand while the server JVM has slower start-up time and provides more optimization beforehand - it is more suited for server applications);

   - operating system specific settings are initialized;

   - default system properties are initialized;

   - command line arguments are parsed and aditional OS-specific initializations are performed based on the arguments;

   - TLS (thread local storage) is initialized. Each JVM thread has its own storage space that can be addressed from a so called TLS index that points to the TLS. Each thread has its own TLS index that can be used by other threads if they need to access thread-local data of another thread;

   - agent libraries (provided by the '-agentlib', '-agentpath' and '-Xrun' options) are launched. One notorius example of such a library agent is jdb debugging library that can be attached to a JVM instance and used by remote debugging clients. In short - agent libraries provide instrumentation capabilities for the applications. For more information on Java agents you can read the 'Introduction to Java Agents' article on JavaBeat - see references at the end or check out the JADE (Java Agent Development Framework) project;

   - global data structures are initialized by calling vm_init_globals() - basic type checking is provided (useful when porting to a JVM to a different platform that has specifics regarding the sizes of basic types - they must be adjusted accordingly to the Java type system), heap object sizes are initialized, event log, OS synchronization primitives, perfMemory (performance memory) and chunkPool (memory allocator);

   - the Java version of the main thread (instance of JavaThread) is created and attached to the OS thread. If you open the implementation of JavaThread you will see that it has an 'oop' field that points the a Java Thread instance in the heap (in terms of Hotspot an 'oop' is just an object pointer that points to a Java object on the heap from C++ code). At this point we can create Java threads;

   - the Java-Level synchronization subsystem is initialized by calling ObjectMonitor::Initialize();

   - the other global subsystems and structures are initialized - various counters for the JMX management subsystem (embedded JMX server with default MBeans for managing and monitoring the JVM), for the runtime, thread and classloading systems are initialized, the bytecode template maps are initialized (the interpreters uses these template mappings to match against the currently executing bytecode), the libzip library is loaded so that it can be used to load JAR (esentially ZIP) files, the bootstrap classpath entries are loaded (such as the ones from rt.jar), the code cache is initialized - it is used to store the output from JIT (Just-In-Time) compilation (in short JIT compilation is an optimization technique that compiles methods or loop blocks to native code at runtime to speed up the execution of the target method/loop - but more on that later), the Universe is initialized (basically - memory for heap, the method area and other metadata), the interpreter is initialized (along with the template table for bytecodes), the method counter is initialized (used to support JIT compilation - method invocation counts can be used to determine "hot spots" or regularly called methods);

   - various system classes are loaded (such as java.lang.String, java.lang.System, java.lang.Thread, java.lang.ThreadGroup, java.lang.reflect.Method, java.lang.ref.Finalizer, java.lang.Class, and the rest of the System classes);

   - the signal dispatcher is initialized (used to propagate OS-level events to appropriate event handlers in the JVM);

   - the JIT compilers are initialized (client/server/shark - the shark JIT compiler uses the LLVM compiler infrastructure to JIT compile Java methods without introducing system-specific code; it is used along with the 'Zero' interpreter-only port of Hotspot);

   - the JMX server agent is created and started;

   - system classes that make use of the new 'invokedynamic' instructions (such as java.lang.invoke.MethodHandle) are initialized;

   - biased locking is initialized (this is an optimization technique for synchronization that allows a thread to become "biased" towards an object
thus eliminating the overhead in releasing/reacquiring the lock each time the same thread tries to lock/unlock the object - this is useful in case there is no regular lock switching between threads).

   Throughout the whole process of starting the JVM various JVMTI (Java Virtual Machine Tooling Interface) events are triggered to notify listener tools
for events related to the state of the JVM.

   Now that the JVM is initialized along with the memory structures we can invoke the Main class of our application where the lifecycle of our application starts. This is happening when the invocation of JNI_CreateJavaVM() returns and execution continues in the JavaMain() method from <OpenJDK_root>/jdk/src/share/bin/java.c. The loading of the Main class (either provided directly or from a jar file) is performed by the LoadMainClass() method which loads the sun.launcher.LauncherHelper class and calls the static checkAndLoadMain() method that loads the Main class of the application using the system classloader. The static main() method is called from the Main class.
At the end the main application thread is detached so that it appear's to the user that the program finishes execution when the main() method finishes. However this is the point where uncaught exceptions are handled by the launcher. At the end the launcher passes control back to the JVM by calling jni_DestroyJavaVM() from <OpenJDK_root>/hotspot/src/share/vm/prims/jni.cpp. You may have noticed that the source structure of the JVM is very self-descriptive:

hotspot directory structure


Note: The share/vm/adlc directory provides an architectural language description compiler that compiles an ADL language used to describe the architecture of a processor. The compiler compiles an ADL file into code which is incorporated into the Optimizing Just In Time Compiler (OJIT) to generate efficient and correct code for the target architecture. The ADL describes three basic different types of architectural features: the instruction set (and associated operands) of the target architecture, the register set of the target architecture along with relevant information for the register allocator and the architecture's pipeline for scheduling purposes. The architecture description file along with some additional target specific oracles, written in C++, represent the principal effort in porting the OJIT to a new target architecture.
Note: Native methods (output from JIT compilation) are also called 'nmethods' in terms of the JVM.


   Lets get back to the basic archictural diagram and see what happens in little more detail:

hotspot architecture diagram 1

   Class files are loaded from a particular resource - the file system, JAR archives, over the network etc. For that reason the class loader subsystem is being used. As most you already know there are three standard
classloaders used by the JVM:

   - the bootstrap classloader that loads the core Java libraries located in the <JAVA_HOME>/jre/lib. It is implemented in <OpenJDK_root>/jdk/src/share/native/java/lang/ClassLoader.c;

   - the extensions class loader that loads classes from the JVM extension directories (<JAVA_HOME>/jre/lib/ext or any other directory specified by the java.ext.dirs system property). It is implemented in <OpenJDK_root>/jdk/src/share/classes/sun/misc/Launcher.java
(sun.misc.Launcher$ExtClassLoader);

   - the system class loader that loads code found on java.class.path, which maps to the CLASSPATH environment variable. It is implemented in <OpenJDK_root>/jdk/src/share/classes/sun/misc/Launcher.java (sun.misc.Launcher$AppClassLoader)

   The structure of a class file is desribed by the following diagram:

classfile structure

   The fields are:

   - magic - The magic item supplies the magic number identifying the class file format; it has the value 0xCAFEBABE;

   - minor_version, major_version - the values of the minor_version and major_version items are the minor and major version numbers of this class file;

   - constant_pool_count - the value of the constant_pool_count item is equal to the number of entries in the constant_pool table plus one;

   - constant_pool[] - the constant_pool is a table of structures representing various string constants, class and interface names, field names, and other constants that are referred to within the ClassFile structure and its substructures. access_flags. The value of the access_flags item is a mask of flags used to denote access permissions to and properties of this class or interface;

   - this_class - The value of the this_class item must be a valid index into the
constant_pool table;

   - super_class - for a class, the value of the super_class item either must be zero or must be a valid index into the constant_pool table;

   - interfaces_count - the value of the interfaces_count item gives the number of direct superinterfaces of this class or interface type;

   - interfaces[] - each value in the interfaces array must be a valid index into
the constant_pool table;

   - fields_count - the value of the fields_count item gives the number of field_info structures in the fields table. The field_info structures represent all
fields, both class variables and instance variables, declared by this class or
interface type;

   - fields[] - each value in the fields table must be a field_info structure giving
a complete description of a field in this class or interface;

   - methods_count - the value of the methods_count item gives the number of method_info structures in the methods table;

   - methods[] - each value in the methods table must be a method_info structure giving a complete description of a method in this class or interface;

   - attributes_count - the value of the attributes_count item gives the number of attributes in the attributes table of this class;

   - attributes[] - each value of the attributes table must be an attribute_info structure.

So during classloading we have three separate phases:
   - loading: finding and importing the binary data for a type;
   - linking: performing verification, preparation, and (optionally) resolution;
   - verification: ensuring the correctness of the imported type; there are three subphases of of the verification phase;
      - preparation: memory for class variables is allocated and initialized to default values;
      - resolution: transforming symbolic references from the type into direct references;
      - initialization: Java code is invoked that initializes class variables to their proper starting values.

   During classloading you should differentiate between class format checking (that checks for the validity of the class file structure during the loading phase) and the bytecode verification phase - that verifies that the bytecode does not have important violations (such as uninitialized variables, method calls that do not match the type of object references, violations of rules regarding data access rules, local variable access violations or stack overflow). 


   Of course classloading and program execution make use of the JVM data structures as noted in the following diagram:

hotspot architecture diagram 2

   The Java Virtual Machine defines various run-time data areas that are used during execution of a program. Some of these data areas are created on Java Virtual Machine start-up and are destroyed only when the Java Virtual Machine exits as we already saw in the short code walk-through at the beginning of the article.
   Other data areas are per thread. Per-thread data areas are created when a thread is created and destroyed when the thread exits. The following diagram provides an overview of these data structures:

 

thread diagram

   Each Java Virtual Machine thread has its own pc (program counter) register. At any point, each Java Virtual Machine thread is executing the code of a single method, namely the current method for that thread. If that method is not native, the pc register contains the address of the Java Virtual Machine instruction currently being executed. If the method currently being executed by the thread is native, the value of the Java Virtual Machine's pc register is undefined. The Java Virtual Machine's pc register is wide enough to hold a return address or a native pointer on the specific platform. Each thread also has its own stack used to store stack frames for the currently executing method - once new method is entered new stack frame is pushed on the stack and once a method returns - a stack frame is popped. The following diagram provides an overview of a stack frame:

 

stack frame

Each frame contains:

   - local variables array;
   - return value;
   - operand stack;
   - reference to runtime constant pool for class of the current method.

   As the the JVM is a stack-based JVM the operand stack is used to provide the stack for holding bytecode instruction operands. Here is a simple example:

bytecode sample

   The class data for the loaded Java class is stored in the following structure:

 

class data

   It contains a runtime constant pool that holds constants (or when resolved later some of them become references) to various parts of the class and
the bytecode for the methods of the class (the method code).

   This class data items are stored in a non-heap memory (also called PermGen or permanent generation) along with the code cache (for storing machine code from JIT-compiled source code) and the string pool (pool of Java strings) as shown in the following diagram:

 

non heap memory

   The heap is used to allocate class instances and arrays at runtime. Arrays and objects can never be stored on the stack because a frame is not designed to change in size after it has been created. The frame only stores references that point to objects or arrays on the heap. Unlike primitive variables and references in the local variable array (in each frame) objects are always stored on the heap so they are not removed when a method ends. Instead objects are only removed by the garbage collector. To support garbage collection the heap is divided into generations:

   - young generation - often split between Eden and Survivor spaces - stores short living objects;

   - old Generation (also called Tenured Generation) - stores longer living objects.

   The reason why generational garbage collection is very efficient is because typically most Java objects are short lived (e.g. allocated and used only in a particular method) and this allows garbage collection to clean them quickly. Longer living object are more difficult to clean up and for them safepointing is required - safepointing is a mechanism in the JVM that stops executing threads until an operation occurs (such as garbage collection in this case). Such operations (also called "Stop-The-World" or STW) are typically slow. Safepointing works by polling - VM thread poisons/un-poisons polling page and threads "poll" at particular stages in order to check whether a safepoint is triggered. At a safepoint threads cannot modify the Java heap or stack. Other reasons for using safepoints are deoptimization (returning a JIT-compiled bytecode to normal bytecode in case the JVM desides at some point that the JIT-compiled code does not provide optimization at all or when a new class is introduced in the class hierarchy of the class of the JIT-compiled bytecode), Java thread suspension, JVM Tool Interface operations (e.g. heap dumps).
Each thread has its own thread allocation buffer (TLAB) that stores objects allocated by the thread and in this regards we have different strategies for garbage collection of objects from the heap spaces are:

   - serial - performed by a single thread sequentially over all application threads;
  - concurrent - performed while applications threads are executing (without safepointing);
  - parallel - performed in parallel over all application threads.

   In this manner we can have combinations for a garbage collector (such as concurrent only, parallel only or both concurrent and parallel). Inside the heap objects have the following structure:

heap memory data

   The field "klass" (a term for the internal of a Java class in JVM) refers to a pointer to the metadata of the object’s class. The field "vtable" is a virtual dispatch table with the methods of the class instances. The "mark word" is the object's header that contains the following fields:

   - identity hash code;
   - age of the object;
   - lock record address (lock records track objects locked by currently executing methods);
   - monitor address (address of the object's wait queue);
   - state (unlocked, light-weight locked, heavy-weight locked, marked for GC);
   - biased / biasable (includes other fields such as thread ID).


  The last of stage is the actual execution of the loaded Main class starting from the static main() method. This is performed by the execution engine as outlined in the following diagram:

hotspot architecture diagram 3   Ignoring exceptions, the inner loop of a Java Virtual Machine interpreter is effectively:

do {
    // atomically calculate pc and fetch opcode at pc;
    if (operands) fetch operands;
    // execute the action for the opcode;
} while (<there_is_more_to_do>);

   However we can have different execution techniques:

   - interpreting - standard bytecode execution - bytecode instructions are mapped to assembly code that is executed;

   - just-in-time (JIT) compilation - compiles bytecode of methods/loops to native code - methods/loops that are to be JIT-compiled are determined either statically or dynamically during program execution;

   - adaptive optimization (determines "hot spots" by monitoring execution) - can trigger JIT compilation dynamically during program execution.

   Additionally JIT compiled methods can be "deoptimized" as described earlier. To support this a mechanism called "On-Stack Replacement" is triggered that can be used to transfer control back and forth between bytecode and native code execution. JIT compilation is triggered asynchronously by counter overflow for a method/loop (interpreted counts method entries and loopback branches). It also produces relocation info (transferred on next method entry) apart from generated code. In case JIT-compiled code calls not-yet-JIT-compiled code control is transferred to the interpreter. The nmethods (remember? the structure that contains the machine code for JIT-compiled method/loop) produced by the JIT compiler contain also per-safepoint oopmaps (called "GC maps" if considering GC-related safepoints) that contain description of the locations (in registers or on stack) of object pointers (native machine addresses) that point to the safepoint.

   Here is how JIT compilation works in general:

   1) bytecode is turned into a graph;
   2) the graph is turned into a linear sequence of operations that manipulate an infinite loop of virtual registers (each node places its result in a virtual register);
   3) physical registers are allocated for virtual registers (the program stack might be used in case virtual registers exceed physical registers) - e.g. the C1 client JIT compiler uses the Chaitin-Briggs graph-coloring algorithm to achieve correct mapping between virtual and physical registers;
   4) code for each operation is generated using its allocated registers.

   An important point is that in many programming languages, the programmer has the illusion of allocating arbitrarily many variables. However, during compilation, the compiler must decide how to allocate these variables to a small, finite set of registers.In compiler optimization, register allocation is the process of assigning a large number of target program variables onto a small number of CPU registers. Register allocation can happen over a basic block (local register allocation), over a whole function/procedure (global register allocation), or across function boundaries traversed via call-graph (interprocedural register allocation). When done per function/procedure the calling convention may require insertion of save/restore around each call-site. The compiler can construct a graph such that every vertex represents a unique variable in the program. Interference edges connect pairs of vertices which are live at the same time, and preference edges connect pairs of vertices which are involved in move instructions. Register allocation can then be reduced to the problem of K-coloring the resulting graph, where K is the number of registers available on the target architecture. No two vertices sharing an interference edge may be assigned the same color, and vertices sharing a preference edge should be assigned the same color if possible. Some of the vertices may be precolored to begin with, representing variables which must be kept in certain registers due to calling conventions or communication between modules. As graph coloring in general is NP-complete, so is register allocation.


Debugging Hotspot

This section only scratches the surface by providing insights on how to debug the Hotspot codebase. Here are several techniques:

   - using JVM flags to dump debugging information;
   - using various existing tools (or writing your own depending on what you want to debug) - the existing tools are in the <OpenJDK_root>/hotspot/src/share/tools directory;
   - debugging a sample Java application (with jdb) that encompasses the JVM feature you want to debug - inspecting the application behaviour will allow to debug the targeted JVM feature itself;
   - using the JVMTI and other seviceability utilities - for inspecting JVM behaviour;
   - building a debug version of the JVM that will unlock additional JVM flags (see "HotSpot Internals: Explore and Debug the VM at the OS Level" in the references);
   - dumping debug information on the standard output - of course, dummiest but iron-proof method :- ))).

Using JVM flags

   To build a debug version of the JVM you may use a separate Hotspot target (the latest JDK image must be already build unless you use an official build from Oracle - it is used for bootstrapping the debug build). This is is an example using 64 bit Windows (with 64 bit Cygwin):

ALT_BOOTDIR=/cygdrive/d/projects/OpenJDK/dev/jdk8_tl/build/windows-x86_64-normal-server-release/jdk LP64=1 STRIP_POLICY=no_strip make debug

   You should make sure that the proper Visual Studio variables are also provided on the PATH if you are building under Windows. For example (change paths accordingly on your Windows system):

export PATH="/cygdrive/d/software/Microsoft Visual Studio 10.0/VC/bin/amd64":$PATH

   Also make sure that the LIB and INCLUDE varibles are set properly (look into the vcvars.exe/vcvars64.exe from the Visual Studio directories on how to set them properly). For example (change paths accordingly on your Windows system):

export LIB=D:\software\Microsoft Visual Studio 10.0\VC\LIB\amd64;D:\software\Microsoft Visual Studio 10.0\VC\ATLMFC\LIB\amd64;C:\Program Files (x86)\Microsoft SDKs\Windows \v7.0A\lib\x64;C:\Program Files\SQLXML 4.0\bin;
export INCLUDE=D:\software\Microsoft Visual Studio 10.0\VC\INCLUDE;D:\software\Microsoft Visual Studio 10.0\VC\ATLMFC\INCLUDE;C:\Program Files (x86)\Microsoft SDKs\Windows\v7.0A\include;

   Finally issue the following to copy the debug JVM (provided as library) on top of the existing Hotspot JVM library (again change paths accordingly - for Linux the JVM library is in the libjvm.so file):

cd <OpenJDK_root>
cp hotspot/build/windows/windows_amd64_compiler2/debug/jvm.dll build/windows-x86_64-normal-server-release/jdk/bin/server/jvm.dll

   To verify that the debug build is working check the version of the JVM using  the Java launcher:

cd build/windows-x86_64-normal-server-release/jdk/bin/./java.exe -version

   You should see output similar to the following:

openjdk version "1.8.0-internal"
OpenJDK Runtime Environment (build 1.8.0-internal-martin_2013_12_23_16_13-b00)
OpenJDK 64-Bit Server VM (build 25.0-b63-internal-debug, mixed mode)

   You can now see all available debug options by issuing:

./java.exe -XX:+UnlockDiagnosticVMOptions -XX:+UnlockExperimentalVMOptions -Xprintflags

   or

./java.exe -XX:+UnlockDiagnosticVMOptions -XX:+UnlockExperimentalVMOptions -XX:+PrintFlagsWithComments

   The -XX:+UnlockDiagnosticVMOptions option enables VM diagnostic options.
Note: you may need some time to understand some of the options based on your particular use case.

   Examples (some of the options work only when -XX:+UnlockDiagnosticVMOptions is specified):

./java.exe -XX:+CountBytecodes <program> - prints the number of bytecodes executed by the JVM
./java.exe -XX:+PrintBytecodeHistogram <program> - prints statistics on the number of bytecode instructions executed for each type of instruction
./java.exe -XX:+LogCompilation <program> - can emit a structured XML log of compilation-related activity during a run of the JVM
./java.exe -XX:LogFile=/<path_to_log>/ <program> - prints logging information to a log file
./java.exe -XX:+TraceClassLoading <program> - print identities of loaded classes
./java.exe -XX:+TraceClassUnloading <program> - print identities of unloaded classes
./java.exe -XX:+PrintGCDetails <program> - tracks size of the perm gen
./java.exe -XX:+UseSerialGC <program> - Serial GC (Serial-young Serial-old)
./java.exe -XX:+UseParallelGC <program> - Parallel GC (Parallel-young Serial-old)
./java.exe -XX:+UseParallelOldGC <program> - Parallel Compacting (Parallel-young Parallel-old)
./java.exe -XX:+UseConcMarkSweepGC <program> - Concurrent Mark Sweep GC (Parallel-old CMS-old)
./java.exe -XX:+PrintAssembly <program> - print assembly code for bytecoded and native methods
./java.exe -XX:+PrintOptoAssembly <program> - (C2 only)
./java.exe -XX:+PrintNMethods <program> - print nmethods as they are generated
./java.exe -XX:+PrintNativeNMethods <program> - print native method wrappers as they are generated
./java.exe -XX:+PrintSignatureHandlers <program> - print native method signature handlers
./java.exe -XX:+PrintAdapterHandlers <program> - print adapters (i2c, c2i) as they are generated
./java.exe -XX:+PrintStubCode <program> print stubs: deopt, uncommon trap, exception, safepoint, runtime support
./java.exe -XX:+PrintCompilation <program> - lets you know if any methods are compiled by printing information about compiled methods
./java.exe -XX:+PrintInlining <program> - prints information about inlining decisions
./java.exe -XX:CompileCommand=... - controls compilation policy

   You can print assembly code for each bytecode instruction generated by the template interpreter (decribed earlier in the article) by using the -XX:+PrintInterpreter option - however you will need to install a disassembler plug-in for the JVM (see the article on the PrintAssembly option from the
"Hotspot Internals" wiki - see references).

   For more JVM flags and command line arguments you can look into the following files:

  • <OpenJDK_root>/hotspot/src/share/vm/runtime/globals.hpp - global options;
  • <OpenJDK_root>/hotspot/src/share/vm/gc_implementation/g1/g1_globals.hpp - global options specific to the G1 (garbage-first) server garbage collector;
  • <OpenJDK_root>/hotspot/src/share/vm/runtime/arguments.hpp - global arguments.

Using Existing Tools

IdealGraphVisualizer tool

   The "ideal graph" visualizer is a tool developed to help examine the IR (intermediate representation) from the C2 JIT compiler (refered as "ideal graph"). The tool is located under the <OpenJDK_root>/hotspot/src/share/tools/IdealGraphVisualizer directory.

   The JVM support is controlled by the flag -XX:PrintIdealGraphLevel=#
where # is:

   0: no output, the default
   1: dumps graph after parsing, before matching, and final code.
also dumps graph for failed compiles, if available
   2: more detail, including after loop opts
   3: even more detail
   4: prints graph after parsing every bytecode (very slow)

   By default the JVM expects that it will connect to a visualizer on the local host on port 4444. This can be configured using the options -XX:PrintIdealGraphAddress= and -XX:PrintIdealGraphPort=. PrintIdealGraphAddress can actually be a hostname.

   Alternatively the output can be sent to a file using -XX:PrintIdealGraphFile=<filename>. Each compiler thread will get it's own file with unique names being generated by adding a number onto the provided file name.

LogCompilation tool

   The log compilation tool can be used to parse the output of the -XX:+LogCompilation command switch that is used to log the output from the JIT compilation (and it is not very readable). It is located under the  <OpenJDK_root>/hotspot/src/share/tool/LogCompilation directory.

   It's main purpose is to recreate output similar to -XX:+PrintCompilation -XX:+PrintInlining output from a debug JVM. It requires a 1.5 JDK to build and simply typing make should build it. It produces a jar file, logc.jar, that can be run on the hotspot.log from LogCompilation output like this:

java -jar logc.jar hotspot.log

   For more details see the article on the LogCompilation tool in the Hotspot internals wiki.

hsdis tool

   The hsdis tool is a dissasembler used by Hotspot for debugging purposes. For more details see: <OpenJDK_root>hotspot\src\share\tools\hsdis\README.txt

C1visualizer

   The C1 visualizer tool is used to visualize work of the C1 JIT client compiler. For more details read the user guide from the c1visualizer project repository (see references).

jmap

   JMAP prints shared object memory maps or heap memory details of a given process or core file or a remote debug server. You can see Oracle documentation for more details on the jmap utility.

jconsole

   You can use the JConsole JMX client to connect to the default JMX agent running in the JVM to display various statistics on the running JVM instance.
You may need to specify -Dcom.sun.management.jmxremote when starting the application.

Using serviceability utilities

   Yet another option for debugging the Hotspot JVM is to use serviceability utilities that allow observing JVM operations by other Java processes. Here is a list of the various utility implementations throughout the JVM codebase that you can inspect depending on your use case:

The Serviceability Agent(SA):
   hotspot/agent/
   hotspot/src/share/vm/runtime/vmStructs.hpp
   hotspot/src/share/vm/runtime/vmStructs.cpp
   jvmstat performance counters:
   hotspot/src/share/vm/prims/perf.cpp
   hotspot/src/share/runtime/perfMemory.cpp
   hotspot/src/share/runtime/perfData.cpp
   hotspot/src/share/runtime/statSampler.cpp
   hotspot/src/share/vm/services/*Service.cpp
   hotspot/src/os/solaris/vm/perfMemory_solaris.cpp
   hotspot/src/os/linux/vm/perfMemory_linux.cpp
   hotspot/src/os/win32/vm/perfMemory_win32.cpp

The Java Virtual Machine Tool Interface (JVMTI):
   hotspot/src/share/vm/prims/jvmtiGen.java
   hotspot/src/share/vm/prims/jvmtiGen.java
   hotspot/src/share/vm/prims/jvmti.xml

The Monitoring and Management interface:
   hotspot/src/share/vm/services/

Dynamic Attach:
   hotspot/src/share/vm/services/attachListener.*
   hotspot/src/os/linux/vm/attachListener_linux.cpp
   hotspot/src/os/solaris/vm/attachListener_solaris.cpp
   hotspot/src/os/win32/vm/attachListener_win32.cpp

DTrace:
   hotspot/src/os/solaris/dtrace/
   hotspot/build/solaris/makefiles/dtrace.make 

pstack support:
   hotspot/src/os/solaris/dtrace/

   You can read more about the above utilities also from the Hotspot documentation - see references.


Benchmarking your JVM implementation

   There are a number of standard benchmarks that can be used to test the performance of your JVM implementation:

   - for client benchmarks you can use http://www.spec.org/jvm2008/
   - for server benchmarks you can use http://www.spec.org/jbb2013/
   - for numerical computations you can use http://math.nist.gov/scimark2/

You can also use the Caliper (https://code.google.com/p/caliper/) to write your own microbenchmarks for testing the performance of small bits of java code on your JVM implementation.

Conclusion  

   I hope this article gives a decent introduction to the Hotspot codebase and can serve as a reference for understanding how the JVM works and even building your own JVM, tuning a JVM or debugging a JVM implementation. Any suggestions for improvement are more than welcome.

References

1) The Java Virtual Machine Specification (Java SE 7 Edition)
http://docs.oracle.com/javase/specs/jvms/se7/html/

2) The Architecture of the Java Virtual Machine
http://www.artima.com/insidejvm/ed2/jvm2.html

3) JVM Internals
http://blog.jamesdbloom.com/JVMInternals.html

4) Hotspot JVM tuning
http://www.slideshare.net/giladgaron/hotspot-jvm-tuning

5) Java Hotspot Virtual Machine, FOSDEM 2007
http://openjdk.java.net/groups/hotspot/docs/FOSDEM-2007-HotSpot.pdf

6) Learn about JVM internals - what does the JVM do?
http://www.youtube.com/watch?v=UwB0OSmkOtQ&list=PL1464F2747F1E66FA

7) Hotspot group docs
http://openjdk.java.net/groups/hotspot/

8) The Implementation of Lua 5.0
http://www.lua.org/doc/jucs05.pdf

9) Mani Sarkar's collection of Hotspot links
https://gist.github.com/neomatrix369/5743225

10) Synopsis of articles & videos on Performance tuning, JVM, GC in Java, Mechanical Sympathy, et al
http://www.javaadvent.com/2013/12/part-1-of-3-synopsis-of-articles-videos.html
http://www.javaadvent.com/2013/12/part-2-of-3-synopsis-of-articles-videos.html
http://www.javaadvent.com/2013/12/part-3-of-3-synopsis-of-articles-videos.html

11) Dissecting the Hotspot JVM at java2days 2013
http://nosoftskills.com/2013/12/dissecting-the-hotspot-vm-at-java2days/

12) Hacking Hotspot in Eclipse
http://neomatrix369.wordpress.com/2013/03/12/hotspot-is-in-focus-again-aka-hacking-hotspot-in-eclipse-juno-under-ubuntu-12-04/

13) JVM research (Sun/Oracle labs)
http://www.ssw.uni-linz.ac.at/Research/Projects/JVM/
https://digitalcollections.anu.edu.au/handle/1885/9053
https://www.cs.tcd.ie/publications/tech-reports/reports.07/TCD-CS-2007-49.pdf

14) Stack based vs Register based Virtual Machine Architecture, and the Dalvik VM
http://markfaction.wordpress.com/2012/07/15/stack-based-vs-register-based-virtual-machine-architecture-and-the-dalvik-vm/

15) Hotspot Overview
http://www.cs.princeton.edu/picasso/mats/HotspotOverview.pdf

16) Hotspot Internals
https://wikis.oracle.com/display/HotSpotInternals/Home

17) JDK8 build instruction (complete rewrite of instructions for JDK7)
http://hg.openjdk.java.net/jdk8/build/raw-file/tip/README-builds.html

18) Design of the Java HotSpotTM Client Compiler for Java 6
http://www.stanford.edu/class/cs343/resources/java-hotspot.pdf

19) Register allocation
http://en.wikipedia.org/wiki/Register_allocation

20) How to JIT – an introduction (awesome post that gives insights on how is JIT-compiled executed at runtime)
http://eli.thegreenplace.net/2013/11/05/how-to-jit-an-introduction/

21) A brief history of Just-in-Time
http://web.csie.cgu.edu.tw/~jhchen/course/PL2/A%20brief%20history%20of%20just-in-time.pdf

22) Runtime code generation with JVM and CLR
http://www.cs.helsinki.fi/u/vihavain/k12/compiler_project/project/Runtime_Code_Generation_with_JVM_and_CLR.pdf

23) Optimizing ML with Run-Time Code Generation
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.43.8218&rep=rep1&type=pdf

24) GCC Inline Assembly HOWTO
http://www.ibiblio.org/gferg/ldp/GCC-Inline-Assembly-HOWTO.html

25) Introduction to Java Agents
http://www.javabeat.net/introduction-to-java-agents

26) ZeroSharkFaq for IcedTea
http://icedtea.classpath.org/wiki/ZeroSharkFaq

27) CrossCompileFaq for IcedTea
http://icedtea.classpath.org/wiki/CrossCompileFaq

28) The Java HotSpot Server Compiler, Proceedings of the Java Virtual Machine Research and Technology Symposium (JVM '01)
https://www.usenix.org/legacy/events/jvm01/full_papers/paleczny/paleczny.pdf

29) HotSpot Internals: Explore and Debug the VM at the OS Level
http://openjdkpower.osuosl.org/OpenJDK/JavaOne2013_HS/javaone2013_hs.html#(1)

30) c1visualizer project
https://java.net/projects/c1visualizer/

 

 

 

Share