Dissecting the Java Virtual Machine

 

   The Hotspot JVM is a pretty complex piece of software and this detailed article aims to provide a good starting point and lower the barrier for developers willing to contribute to Hotspot (and related) projects or even build their own cutomized Java virtual machine from the Hotspot codebase. It is the result of digging through a number of resources over the internet, the Java Virtual Machine Specification (7th edition), a number of research document written on the JVM and a decent amount of digging through and experimenting with the codebase. Content is simplified as much as possible. In order to be able to try out the tips in this guide you should have a development environment for the Hotspot JVM - see this article for details on how to do it.

   The source code is all there! One of the great aspects of open source software. You just have to understand how to deal with it :- )) First we will provide an overview of the architecture and data structures used by the Hotspot JVM, then we will provide an overview of how such a stack-based approach works in terms of Hotspot, give an overview of the Hotpost codebase and how it maps to the concrete components and at the end we will give guidelines on how to debug the Hotspot JVM. 

   Throughout the article we will refer to <OpenJDK_repo> which is the local clone of the root OpenJDK repository (currently http://hg.openjdk.java.net/jdk8/tl) along with its child repositories.

   Let's get started :- ))

 

Virtual Machine Basics

There are basically two types of virtual machines for interpreted programming languages - stack-based (like the Hotspot JVM) and register-based (like the Dalvik JVM) that basically provides the same set of features:

   - compilation of source language into VM specific bytecode;
   - data structures to contains instructions and operands (the data the instructions process);
   - a call stack for function call operations;
   - an ‘Instruction Pointer’ (IP) pointing to the next instruction to execute;
   - a virtual ‘CPU’ – the instruction dispatcher that:
      - fetches the next instruction (addressed by the instruction pointer);
      - decodes the operands;
      - executes the instruction.

   The difference between the two approaches is in the mechanism used for storing and retrieving operands and their results.

Note: The information above is derived from a blog post by Mark Sinnathamby that provides a great comparison between stack-based and register-based virtual machines - see references at the end.

   Traditionally, most virtual machines intended for actual execution are stack-based, a trend that started with Pascal's Pmachine and continues today with Java's JVM and Microsoft's .Net environment.

Share