Dissecting the Java Virtual Machine - Architecture - part 3
- Details
- Published on Sunday, 29 December 2013 13:52
Of course classloading and program execution make use of the JVM data structures as noted in the following diagram:
The Java Virtual Machine defines various run-time data areas that are used during execution of a program. Some of these data areas are created on Java Virtual Machine start-up and are destroyed only when the Java Virtual Machine exits as we already saw in the short code walk-through at the beginning of the article.
Other data areas are per thread. Per-thread data areas are created when a thread is created and destroyed when the thread exits. The following diagram provides an overview of these data structures:
Each Java Virtual Machine thread has its own pc (program counter) register. At any point, each Java Virtual Machine thread is executing the code of a single method, namely the current method for that thread. If that method is not native, the pc register contains the address of the Java Virtual Machine instruction currently being executed. If the method currently being executed by the thread is native, the value of the Java Virtual Machine's pc register is undefined. The Java Virtual Machine's pc register is wide enough to hold a return address or a native pointer on the specific platform. Each thread also has its own stack used to store stack frames for the currently executing method - once new method is entered new stack frame is pushed on the stack and once a method returns - a stack frame is popped. The following diagram provides an overview of a stack frame:
Each frame contains:
- local variables array;
- return value;
- operand stack;
- reference to runtime constant pool for class of the current method.
As the the JVM is a stack-based JVM the operand stack is used to provide the stack for holding bytecode instruction operands. Here is a simple example:
The class data for the loaded Java class is stored in the following structure:
It contains a runtime constant pool that holds constants (or when resolved later some of them become references) to various parts of the class and
the bytecode for the methods of the class (the method code).
This class data items are stored in a non-heap memory (also called PermGen or permanent generation) along with the code cache (for storing machine code from JIT-compiled source code) and the string pool (pool of Java strings) as shown in the following diagram:
The heap is used to allocate class instances and arrays at runtime. Arrays and objects can never be stored on the stack because a frame is not designed to change in size after it has been created. The frame only stores references that point to objects or arrays on the heap. Unlike primitive variables and references in the local variable array (in each frame) objects are always stored on the heap so they are not removed when a method ends. Instead objects are only removed by the garbage collector. To support garbage collection the heap is divided into generations:
- young generation - often split between Eden and Survivor spaces - stores short living objects;
- old Generation (also called Tenured Generation) - stores longer living objects.
The reason why generational garbage collection is very efficient is because typically most Java objects are short lived (e.g. allocated and used only in a particular method) and this allows garbage collection to clean them quickly. Longer living object are more difficult to clean up and for them safepointing is required - safepointing is a mechanism in the JVM that stops executing threads until an operation occurs (such as garbage collection in this case). Such operations (also called "Stop-The-World" or STW) are typically slow. Safepointing works by polling - VM thread poisons/un-poisons polling page and threads "poll" at particular stages in order to check whether a safepoint is triggered. At a safepoint threads cannot modify the Java heap or stack. Other reasons for using safepoints are deoptimization (returning a JIT-compiled bytecode to normal bytecode in case the JVM desides at some point that the JIT-compiled code does not provide optimization at all or when a new class is introduced in the class hierarchy of the class of the JIT-compiled bytecode), Java thread suspension, JVM Tool Interface operations (e.g. heap dumps).
Each thread has its own thread allocation buffer (TLAB) that stores objects allocated by the thread and in this regards we have different strategies for garbage collection of objects from the heap spaces are:
- serial - performed by a single thread sequentially over all application threads;
- concurrent - performed while applications threads are executing (without safepointing);
- parallel - performed in parallel over all application threads.
In this manner we can have combinations for a garbage collector (such as concurrent only, parallel only or both concurrent and parallel). Inside the heap objects have the following structure:
The field "klass" (a term for the internal of a Java class in JVM) refers to a pointer to the metadata of the object’s class. The field "vtable" is a virtual dispatch table with the methods of the class instances. The "mark word" is the object's header that contains the following fields:
- identity hash code;
- age of the object;
- lock record address (lock records track objects locked by currently executing methods);
- monitor address (address of the object's wait queue);
- state (unlocked, light-weight locked, heavy-weight locked, marked for GC);
- biased / biasable (includes other fields such as thread ID).