Dissecting the Java Virtual Machine - Architecture - part 4

  The last of stage is the actual execution of the loaded Main class starting from the static main() method. This is performed by the execution engine as outlined in the following diagram:

hotspot architecture diagram 3   Ignoring exceptions, the inner loop of a Java Virtual Machine interpreter is effectively:

do {
    // atomically calculate pc and fetch opcode at pc;
    if (operands) fetch operands;
    // execute the action for the opcode;
} while (<there_is_more_to_do>);

   However we can have different execution techniques:

   - interpreting - standard bytecode execution - bytecode instructions are mapped to assembly code that is executed;

   - just-in-time (JIT) compilation - compiles bytecode of methods/loops to native code - methods/loops that are to be JIT-compiled are determined either statically or dynamically during program execution;

   - adaptive optimization (determines "hot spots" by monitoring execution) - can trigger JIT compilation dynamically during program execution.

   Additionally JIT compiled methods can be "deoptimized" as described earlier. To support this a mechanism called "On-Stack Replacement" is triggered that can be used to transfer control back and forth between bytecode and native code execution. JIT compilation is triggered asynchronously by counter overflow for a method/loop (interpreted counts method entries and loopback branches). It also produces relocation info (transferred on next method entry) apart from generated code. In case JIT-compiled code calls not-yet-JIT-compiled code control is transferred to the interpreter. The nmethods (remember? the structure that contains the machine code for JIT-compiled method/loop) produced by the JIT compiler contain also per-safepoint oopmaps (called "GC maps" if considering GC-related safepoints) that contain description of the locations (in registers or on stack) of object pointers (native machine addresses) that point to the safepoint.

   Here is how JIT compilation works in general:

   1) bytecode is turned into a graph;
   2) the graph is turned into a linear sequence of operations that manipulate an infinite loop of virtual registers (each node places its result in a virtual register);
   3) physical registers are allocated for virtual registers (the program stack might be used in case virtual registers exceed physical registers) - e.g. the C1 client JIT compiler uses the Chaitin-Briggs graph-coloring algorithm to achieve correct mapping between virtual and physical registers;
   4) code for each operation is generated using its allocated registers.

   An important point is that in many programming languages, the programmer has the illusion of allocating arbitrarily many variables. However, during compilation, the compiler must decide how to allocate these variables to a small, finite set of registers.In compiler optimization, register allocation is the process of assigning a large number of target program variables onto a small number of CPU registers. Register allocation can happen over a basic block (local register allocation), over a whole function/procedure (global register allocation), or across function boundaries traversed via call-graph (interprocedural register allocation). When done per function/procedure the calling convention may require insertion of save/restore around each call-site. The compiler can construct a graph such that every vertex represents a unique variable in the program. Interference edges connect pairs of vertices which are live at the same time, and preference edges connect pairs of vertices which are involved in move instructions. Register allocation can then be reduced to the problem of K-coloring the resulting graph, where K is the number of registers available on the target architecture. No two vertices sharing an interference edge may be assigned the same color, and vertices sharing a preference edge should be assigned the same color if possible. Some of the vertices may be precolored to begin with, representing variables which must be kept in certain registers due to calling conventions or communication between modules. As graph coloring in general is NP-complete, so is register allocation.

Share