the Jau Virtual Machine: July 2005

Friday, July 29, 2005

The second commit?

If you'd kept an eye on the repository, you'd know I've been working on the Frame class, and I'm pretty much done with it for now. =)

As I said, there were different possibilities regarding the implementation of this class - let me present you with two.

Frames must store a stack for the function. This frame's size is well known, so this is not a problem. There are two kinds of things the frame must hold: primitive values (ints, floats, longs and doubles - since booleans, bytes, shorts and chars are stored as ints) and references (subclasses of Object). From these: ints, floats and references take 32 bits for 1 word; and, longs and doubles take 64 bits for 2 words.

First approach: use an Object[] and box primitive values. For longs and doubles store the boxed Long or Double in the first word and null in the second word.

Second approach: use 3 arrays (an int[], a double[] and an Object[]) one to hold 32 bit primitives, another for 64 bit ones and one for references; the double[] can have half the size.

The first approach is conceptually simpler, and it also makes all operations simpler. The second approach was tested to be at least two times faster than the first (even without taking into account the extra garbage the first produces).

The rational for the second approach is that: no primitives are boxed; ints are more common than floats (and doubles more common than longs); intBitsToFloat (and longBitsToDouble, etc) are fast; space wast is minimized (vs. having 5 arrays).

I took the simpler, though slower, approach for now. I have, however, tried to encapsulate inside the Frame class everything necessary to make the move later. Here goes the class's current interface (you can also check an implementation):

Frame.java

package net.sf.jauvm.vm;

import java.io.Serializable;

public final class Frame implements Cloneable, Serializable {
    public Frame getParent();
    public int getRet();
    // public InterpretableMethod getCode();

    public Integer popInt();
    public void pushInt(Integer val);
    public Integer getInt(int var);
    public void putInt(int var, Integer val);
    public Long popLong();
    public void pushLong(Long val);
    public Long getLong(int var);
    public void putLong(int var, Long val);
    public Float popFloat();
    public void pushFloat(Float val);
    public Float getFloat(int var);
    public void putFloat(int var, Float val);
    public Double popDouble();
    public void pushDouble(Double val);
    public Double getDouble(int var);
    public void putDouble(int var, Double val);
    public Object popObject();
    public void pushObject(Object val);
    public Object getObject(int var);
    public void putObject(int var, Object val);

    public Frame popParameters(Class<?>[] parameterTypes, int ret, /*InterpretableMethod code,*/ int maxStack, int maxLocals);
    public Object[] popParameters(Class<?>[] parameterTypes);

    public void pop();
    public void pop2();
    public void dup();
    public void dupBnth1();
    public void dupBnth2();
    public void dup2();
    public void dup2Bnth1();
    public void dup2Bnth2();
    public void swap();

    public boolean isImmutable();
    public void makeParentsImmutable();
    public Frame clone() throws CloneNotSupportedException;
}

The first set of methods are getters for: the return address; this frame's parent frame; and, the corresponding method's code (unimplemented) - these properties are constant.

The second set are basic stack operations. Popping off and pushing into references and primitives, setting and getting reference or primitive local variables. These functions take and return boxed primitives due to the fact that Frames are implemented with an Object[]. However, due to Java 5.0's auto-(un)boxing, this interface can be changed to use primitive values, almost without modification to its clients (besides a recompile).

The third set are methods that help in handling invocations. The first one creates a new Frame for an interpretable invocation, popping the parameters into it. The second one returns an Object[] suitable for invocation through reflection.

The fourth set are the general purpose stack operations of the JVM, that benefit with knowledge of the internal structure of Frame objects.

And, the last set manages the immutability of Frame objects -frames captured by continuations must remain immutable. Accordingly: when you do a new Continuation() you call makeParentsImmutable() on the current frame, effectively marking all parent frames as immutable; when you need to modify a Frame you check its immutability with isImmutable() and, if necessary, you clone that Frame.

That's all for now. I'll work on that InterpretableMethod class now.

Tuesday, July 26, 2005

The first commit...

OK, so I did my first commit today. You can keep track of the project's code by either browsing it through the web-based viewer (which is usually a couple hours off), or by using the anonymous access:

cvs -d:pserver:anonymous@cvs.sourceforge.net:/cvsroot/jauvm login
cvs -z3 -d:pserver:anonymous@cvs.sourceforge.net:/cvsroot/jauvm co -P jauvm

You may be wondering what took me so long. It's true, I've been available since Friday when I last posted, and yet no news.

I've been struggling to solve my first problem: the promised Monitor class. That class involved the creation of methods to implement unbalanced monitorenter and monitorexit operations. Although those can't be expressed directly in Java, they could easily be implemented directly in JVM bytecode.

To avoid having an extra compile-time dependency, I'd decided to use ASM, defining the class in XML. It all worked pretty well (including a couple of ANT tasks to automate it), until I actually tried running the code, just to find out that the JVM checks to guarantee at runtime that no method does an unbalanced locking operation.

The check was added in 1.4, and is - in my opinion - pretty stupid (sorry clever guys at Sun, but read on). Unbalanced locks are useful, so much so that they are present in rival .NET, and consequently a form of those appeared in 1.5 in the new java.util.concurrent package.

This new API though, however “dangerous” for purists, does not suit my needs, because, au contraire of what happens in .NET, this it's not “compatible” with the common synchronized block (in the sense that it's not possible to unbalancedly synchronize on an arbitrary object, you can only lock locks). The API's existence however, proves it can be done - most likely through JNI (I know, that sucks, but that's the JVM we've got).

So the bad news is, in general, synchronization won't work for the time being. Your options will be: using java.util.concurrent.locks.Locks where you can; or, waiting for the JNI implementation of Monitor.enter and Monitor.exit, if you can tolerate JNI.

Note that this affects all synchronization, meaning all the usages of the synchronized keyword (blocks and methods) in interpreted code (and not just the direct usage of the Monitor class): these will all throw UnsupportedOperationExceptions until further notice. This is so, because the monitorenter and monitorexit opcodes would have themselves to be implemented through Monitor. Sorry there really seams to be nothing I can do about this.

I won't just give up over this - I knew synchronization would be a problem, and it's really not an unsolvable one (it's just nasty when you can't use plain Java to do safe, simple stuff like this).

I'll move on, for now, and will look into the design of Frame objects - which are responsible for keeping the interpreter's state. There are a few different possibilities: some faster, others simpler - as usual. I'll go with the simpler version for now, probably, but I'm trying to abstract enough of the inner workings of frames inside the Frame class, to make it easy to latter (as in during or after the summer =) replace it with the faster version.

That's all that's been going on, remember to keep an eye on the repository if you're interested.

Friday, July 22, 2005

User Guide - Part Four

In my previous installment I showed you some examples of using continuations. From them you can see continuations are a sort of non-local jumps (a bit like exceptions). Isn't that dangerous? Yes, it is - but fun! Let's have an example:

@interpretable Continuation aMethod() {
    return new Continuation();
}

@interpretable void anotherMethod() {
    Continuation cont = aMethod();

    synchronized (this) {
        cont.returnTo(cont);
    }
}

Note the synchronized block. Now that's nasty, right? If you jump somewhere else, just like that, this will remain locked, right? Well, no... Remember continuations are “a bit like exceptions” and the JauVM will handle the unlocking for you, just like the JVM does with exceptions. We have a new problem though:

@interpretable Continuation aMethod() {
    return new Continuation();
}

@interpretable void anotherMethod() {
    Continuation cont = null;

    synchronized (this) {
        cont = aMethod();
    }

    cont.returnTo(cont);
}

What's happening here? Well, you acquire the lock, then you save the Continuation and release the lock. Afterwards you invoke the Continuation, and you're suddenly inside the synchronized block without the lock.

Now wait a minute. Can't the JauVM reacquire the lock for me? Unfortunately no, it's not that simple. And we have the same problem with try ... finally too - finallys are ran, but we have no “initially”.

To amend this issue an new Throwable was devised - the continuing class. It can be used to tag a catch clause of a try block as the code to be run when that try is reentered. Confusing? Examples will help.

In the presence of continuations code, a synchronized should look like this:

synchronized (this) {
    try {
        // ...
    } catch (continuing c) {
        Monitor.enter(this);
    }
}

And a try ... finally (safeguarding a JDBC connection, in this case) should be:

Connection conn = pool.getConnection();
try {
    // ...
} catch (continuing c) {
    conn = pool.getConnection();
} finally {
    conn.close();
}

As you can see, the catch continuing clause is about reacquiring the resources you've freed when you first left the try block.

But what's this continuing class like? Here it is:

public final class continuing extends Error {
    private continuing() {
    }
}

The class is final and the constructor private because there should be no objects of this kind - again this is just a tag! It extends Error for it mustn't be a checked exception, and shouldn't normally be caught either - though it doesn't make a difference if you catch Throwable, only continuing works.

Also, you're probably curious about that Monitor class. Well, it will have to be written in assembly, and I haven't made up my mind about an assembler yet (Jamaica, Jasmin...) It'll have at least enter and exit static methods, but you'll see.

And that's it, for now! User guide over. Not complete, but I hope I've passed along the general picture. By the way, I'm open to suggestions for names for these all classes and methods (especially the continuing exception, which I particularly dislike).

Thanks for “listening”!

Monday, July 18, 2005

User Guide - Part Three

I was out for this weekend and will be busy the next few days, but today I'll talk about continuations. I really don't think I'm up to the task of explaining continuations, however this is a good introduction on the topic.

So, what will continuations be like? In the JauVM a Continuation is Java object, that represents a snapshot of the execution state of an Interpreter object, including the values of local variables. Here is its implementation:

Continuation.java

package net.sf.jauvm;

import java.io.Serializable;

public class Continuation implements Serializable {
    public Continuation() {
        throw new UnsupportedOperationException();
    }

    public void returnTo() {
        throw new UnsupportedOperationException();
    }

    public void returnTo(Object o) {
        throw new UnsupportedOperationException();
    }

    public void throwTo(Throwable t) {
        throw new UnsupportedOperationException();
    }

    protected void continueFrom() {
        throw new UnsupportedOperationException();
    }
}

The first thing to note are the methods' implementations. All those operations are unsupported because you're not supposed to use them in your regular Java code. A Continuation is a special object, recognized by JauVM's Interpreter, and whose implementation is closely tied to it. A continuation can not be reified or invoked in code ran by the regular JVM.

To capture a continuation, you have to create a new Continuation object. For example:

@interpretable void aMethod() {
    Continuation cont = new Continuation();
}

In the above code, the variable cont represents the execution state of the current caller of aMethod.

Since cont is a regular Java object you can return it, store it in a variable, assign it to a field of another object - whatever you like. You can also call any of it's instance methods, namely the overloaded returnTo method. When you make such a call, the current execution state of the program is discarded and the snapshot of the program represented by the Continuation object is resumed in its place. You may also pass an argument to that method call, which becomes the return value of the method in which the Continuation was captured. For instance:

@interpretable Object aMethod() {
    return new Continuation();
}

@interpretable void anotherMethod() {
    Object obj = aMethod();

    if (obj instanceof Continuation) {
        System.out.println("obj is a continuation");
        Continuation cont = (Continuation) obj;
        cont.returnTo(100);
    }
    
    System.out.println("obj is " + obj.toString());
}

Invoking anotherMethod yields the following output:

obj is a Continuation
obj is 100

When returnTo is invoked, the program jumps back to the call to aMethod, but this time aMethod returns the Integer 100 passed into returnTo.

What about throwTo? A continuation can be thought as representing the return from a method invocation. In Java, in addition to a normal return, a method may return due to an exception. The throwTo method throws its argument in the context of the Continuation after it's restored. For example:

@interpretable Object aMethod() {
    return new Continuation();
}

@interpretable void anotherMethod() {
    Object obj = null;

    try {
        obj = aMethod();
    } catch (RuntimeException e) {
        System.out.println(e);
    }

    if (obj instanceof Continuation) {
        System.out.println("obj is a Continuation");
        Continuation cont = (Continuation) obj;
        cont.throwTo(new RuntimeException("thrown from aMethod"));
    }
}

Invoking anotherMethod yields the following output:

obj is a Continuation
RuntimeException: thrown from aMethod

This isn't all there is to know about continuations, but this post is already huge, the rest will have to wait. Keep tuned!

Thursday, July 14, 2005

User Guide - Part Two

Yesterday I said I was going to show what tail-calls look like, not how they will look like. That's because nothing will be changed, except perhaps you, the programmer - who may start using them a lot more.

But what are tail-calls anyway? Tail-calls are function calls - or method calls, in Javanese - done in tail position. What the hell is tail position? Well, something is in tail position, if it's done just before you do a return.

Confusing? Lets see some examples of what this means in Java:

// first example
methodCall();
return;

// second example
return methodCall();

As you can see, tail-calls come in two shapes in Java: they're either a method call followed by a void return, or a return of a method call.

Is it that simple? Yep, pretty much. Well, no. In fact, tail-calls always look like this, but these aren't always tail-calls.

Imagine, in our first example above, methodCall returns an int. This int has to be discarded by the JVM, before control is returned to our parent method, so the method call isn't quite the last thing that's being done before the return.

Now take our second example, and consider that methodCall returns a float but the method we are returning from returns a double. The JVM must coerce the float returned by methodCall to a double before returning from this method, so again that function call isn't the last thing done before the return. Depending on your compiler (which is javac, most likely) some coercions do work (byte, short and char to int, and coercions amongst objects without casts).

All this details mean that, if you do come to depend on tail-calls, you must beware of these issues. Other, simpler, examples where “tail-calls don't work” are:

// first example
synchronized (obj) {
    return methodCall();
}

// second example
try {
    return methodCall();
} catch (Exception e) {
    // ...
} finally {
    // ...
}

Clearly, on both examples, there are things to be done after the method call - to release the lock, to handle exceptions and to run finallies - so these aren't really tail-calls.

But hey, what's so special about tail-calls? Well, remember all that stuff about them being the very last thing done in a method? If there's nothing more to be done in that method, you can simply jump to the other method and ignore the caller and its frame. This both reduces stack space, allows the earlier collection of some local variables, and promotes a recursion oriented programming style.

That's all there is to know about tail-calls. Tomorrow, continuations will follow.

Wednesday, July 13, 2005

User Guide - Part One

As I promised yesterday, today, I'll start a series of entries with a very rough first user guide for the JauVM.

Lets start with the interpreter itself.

In the proposal text there's a brief description on what Interpreter objects should look like. Lets elaborate a bit on that:

Interpreter.java

package net.sf.jauvm;

public class Interpreter implements Runnable {
    private final Runnable r;

    public Interpreter(final Runnable r) {
        this.r = r;
    }

    public Interpreter(final Continuation c) {
        this.r = new Runnable() {
            public @interpretable void run() {
                c.continueFrom();
            }
        };
    }

    public void run() {
        // interpret r.run()
    }
}

This is pretty much what I had already described, though there are some new things.

As you can see, I've “added” a second constructor that takes a Continuation which is then continued when the interpreter is run. This is handy when you wish to call a continuation you've just deserialized on a new Interpreter object.

It also serves the purpose of showing you how to write an interpretable method: just tag it with the @interpretable annotation. Here is the code for this annotation:

interpretable.java

package net.sf.jauvm;

import java.lang.annotation.*;

@Retention(RetentionPolicy.RUNTIME)
@Target(ElementType.METHOD)
public @interface interpretable {}

The fact that an Interpreter implements Runnable facilitates the creation of an interpreter on a new thread, look:

new Thread(new Interpreter(new Runnable() {
    public @interpretable void run() {
        // ...
    }
})).start();

Tomorrow, I'll shed some light what tail-calls look like.

No code yet?

Well, the exam season at my University ends on July 23, and my last exam is on July 21. So, the Summer of Code will, unfortunately, have to wait a bit longer...

That doesn't mean I won't do a thing, of course - just that it won't be full-time dedication.

Bah... no more excuses, back to code! OK, then - say no more!

In fact, I wanted to past some code here the other day; the geek in me wanted it highlighted, though. So, today, I worked out this little script (while studying for an exam) to do just that:

code2html

#!/bin/bash

enscript --color -E -Whtml -p - ${1-'-'} |

sed '
    1,/<PRE>/d
    /<\/PRE>/,$d

    s/<B><FONT COLOR="\([^"]*\)">/<b style="color:\1">/g
    s/<I><FONT COLOR="\([^"]*\)">/<i style="color:\1">/g
    s/<FONT COLOR="\([^"]*\)"><B>/<b style="color:\1">/g
    s/<FONT COLOR="\([^"]*\)"><I>/<i style="color:\1">/g
    s/<FONT COLOR="\([^"]*\)">/<span style="color:\1">/g

    s/<\/FONT><\/B>/<\/b>/g
    s/<\/FONT><\/I>/<\/i>/g
    s/<\/FONT>/<\/span>/g
    ' |

sed '
    1s/.*/<pre><code>&/
    $s/.*/&<\/code><\/pre>/
    '

Nothing to do with the JauVM, right?

Well, with this great little tool at hand, tomorrow, I'll start a series of posts showing you - the potential user of this library - what will your code look like.

Friday, July 08, 2005

Yay Sourceforge!

I finally got the JauVM accepted at Sourceforge. A link to the project page is - you guessed it - in the links section.

Why am I this happy about it? Well, it was quite an adventure... read on.

The project was rejected upon first review, but it's done now:

SourceForge.net Project Approved

Your project registration for SourceForge.net has been approved.

Project Information:

Project Descriptive Name:  the Jau Virtual Machine
Project Unix Name:         jauvm
CVS Server:                cvs.sourceforge.net
Shell Server:              shell.sourceforge.net
Web Server:                jauvm.sourceforge.net

[...]

 -- the SourceForge.net crew

It turns out the rejection was due to my description being too vague. I'd linked to the proposal text, but it seems the reviewers just don't follow links.

In the end, it all went pretty smoothly: I supplied additional info - more or less copy-pasted the first two paragraphs of the proposal text - and before I knew it, the registration was accepted, in 2 business days instead of the expected 5.

Yay Sourceforge!

Saturday, July 02, 2005

But... what's with the funny name?

Where does it come from?
How is it pronounced?

If it's ever to get used, this is bound to become one of those projects with these questions on its FAQ...

So, I'll just satisfy your curiosity right away!

‘Jau’ is Portuguese for ‘Javanese’ - Of or relating to Java or its people, language, or culture. (Portuguese fellows in amazement, check your dictionaries)

It's pronouced ‘Jou’ as in Java and out.

Pretty cool, ain't it?

I'm actually really proud of it - that is, of knowing the one who thought it up for me... many thanks!

Friday, July 01, 2005

So... what does it do?

Another Virtual Machine!? What for?

Well, read the proposal yourself:

The goal of my project (it doesn't have a name yet, unfortunately) is the implementation of a JVM bytecode interpreter in Java. This interpreter, hereby called the “Secondary JVM”, will be executed by the JVM itself, hereby called the “Primary JVM”.

The “Secondary JVM” will not need to implement all the features of a standard JVM, since it can rely on the “Primary JVM” for things like class loading, bytecode verification, garbage collection and heap management. The “Secondary JVM” will however manage its own stack. This will allow the “Secondary JVM” to provide two popular features the Java Platform currently lacks: general tail call elimination and serializable continuations.

Tail call elimination (or optimization) is important since it opens up the possibility of a functional/recursion-oriented programming style. Continuations are even more fundamental, since it can be proved that all control structures can be implemented with continuations.

Having these two features implemented “at the VM level” (the “Secondary JVM” in this case) greatly simplifies the implementation of compilers for “functional” languages like Scheme or Ruby targeting the JVM, as well as the creation of web-application frameworks like Cocoon or RIFE.

The following web page, at IBM developerWorks, supports the idea of using continuations for the development of complex web-applications (Use continuations to develop complex Web applications); and, there's a long-standing RFE at Sun's Bug Database asking for the implementation of tail call elimination at the JVM level (Bug ID: 4726340 RFE: Tail Call Optimization).

The “Secondary JVM” will be just a regular Java object, probably implementing the Runnable interface. It will itself take a Runnable as a parameter on construction. Upon calling its run method, the “Secondary JVM” will interpret the run method of the Runnable it was constructed with. This interpretation will probably rely on some form of JIT compilation, but the “Secondary JVM” will keep its own stack. It will be possible to create multiple “Secondary JVMs”, and run them on separate threads.

Tail call elimination will be done by the “Secondary JVM”, and it will be possible to capture the execution state through the creation of Continuation objects - which will be serializable.

It will be important, though challenging, to respect the Java class-loading semantics.

Only methods tagged as “interpretable” will be run by the “Secondary JVM”, other methods will be delegated to the “Primary JVM” for execution. This tagging will probably be done through the use of annotations, and can be seen as somewhat similar to marking a method as native. There will be some restrictions on what methods can be tagged as “interpretable” - static constructors can't, and neither can native methods, obviously - it is not yet clear, however, the extent of these restrictions, though most methods will certainly qualify.

Programs using this library can be seen as a combination of methods specified to run on the “Primary” or “Secondary JVMs”. The execution of a Java program will always start on the “Primary JVM”, which may then create a “Secondary JVM” to run certain methods, benefiting from the provided features. The “Secondary JVM” can then delegate certain methods that do not require those features back to “Primary JVM” for performance.

The idea for this library came from the need to support these features (tail calls and continuations) in yet another Scheme-to-JVM compiler I might be creating next year for school. Thinking this might be equally useful to someone creating a Ruby-to-JVM compiler, for instance, I decided to code it as a separate library, and make it freely available. Google's “Summer of Code” may just be the incentive I need to have it finished during this summer. Irrespectively of the approval of this application, I will go forward with the project, which I intend to release under either the BSD or the Apache licenses.

Prototyping work on this project has already begun, and more intensive work will follow as soon as the examinations season at my University ends, in mid-July. At that time, I will probably apply for hosting at Sourceforge, or a similar site. By September, there must already exist a stable interface for all features, as well as a usable implementation. Whatever the state of the project in September, work on it will not stop by then. I intend to maintain the project, further improving it both performance- and feature-wise.

So, that's it! Keep tuned!

Welcome!

Not long ago, I got an e-mail that left me in shock:

Congratulations!

Dear Nuno Cruces,

We are delighted to tell you that your project (attached) has been
selected to receive a grant in Google's Summer of Code program.

[...]

Best regards,
The Summer of Code team

I've learnt a lot about the IRS and tax regulations since. Hopefully I'll learn a lot more - and not just about tax!

But all that - the tax and all the paperwork, I mean - hasn't taken the fun of it. Honestly, I couldn't care less about the money. Sure, it can buy me a brand new Intel Powerbook, but that's not why I'm doing it.

I'm doing it because it's cool.
I'm doing it because it was something I had just thought up before Google's program was announced, and I decided to give it a try.
I'm doing it because this way it's even more of a challenge, and that's the best way to get me working.

I didn't actually expect to be accepted.
I didn't really think I had worded the proposal appropriately, I thought it was too vague.
I didn't think anybody would interested enough in my project.

It seems I was wrong.

You know that feeling you get when you have this crazy ideia, and you're able to share your enthusiasm with someone?

I know now - it's great.

So, this is what this blog is about: me coding all summer, waiving my vacations, and being mightily enthusiastic about it!

Welcome!

the Jau Virtual Machine