Tuesday, December 09, 2014

A deeper look at default methods conflicts and proxies

So what are default interface methods?

(All my examples will use a very syntax very near to Java.)

In essence they are a way to add methods to an interface, without requiring the implementing class to define the implementation. Example:

 interface A{
default int foo() {return 1}
}
class B implements A{}
assert new B().foo() == 1
The default keyword is here used to start the method declaration and as a flag for the resulting method. B will not have its own implementation of foo, still I will be able to call foo() though an instance of B.

All fine?

Well... what happens if there are conflicts? Example:
 interface A {
default int foo(){return 1}
}
interface B {
default int foo(){return 2}
}
class C implements A,B{}
This results in "error: class C inherits unrelated defaults for foo() from types A and B" when you compile it in Java. The problem is easily solved, by writing a foo method in C and then call the implementation you want with A.super.foo() or B.super.foo()

And that's the point where most tutorials end.

I would like to go further. The "promise" was interface evolution, under which I understand that you can add methods to interfaces without having to worry too much about the implementing class not working anymore. So let me first describe the situation we are really coming from:
 interface A{
void foo()
}
class B implements A{
void foo(){System.out.println("B.foo");}
}
Let us assume A is coming from a library, B is your code, that implements the library interface and B is then supposed to be used in the library and will call foo, which here then results in B.foo being printed.
Now imagine the library gets to a new version and A is changed to this:
interface A{
void foo()
void bar()
}
And because your B is compiled against an older version of the library, you don't have an implementation of bar(). As long as bar() is not called, there won't be a problem. But of course the method was added for a purpose, so the library will call it, resulting a injava.lang.AbstractMethodError. The "evolution" part in Java 8 default methods now is, that you can make this method a default method
 interface A{
void foo()
default void bar() {System.out.println("A.bar");}
}
Now the library code can call bar on B and have a fallback to bar in A, thus avoiding the AbstractMethodError

But I mentioned conflicts. For this we make the example slightly bigger
 interface A{}
interface B{}
class C implements A,B{}
Again we say A comes from a library, let us call it for simplicity A as well. Let us also say B comes from library B, and C is your code that happens to implement the interfaces from both libraries, maybe to produce some kind of adapter. That's the starting situation. Now both libraries take a liking in Java8 and add default methods to their interfaces:
 interface A{
default int foo(){1}
}
interface B{
default int foo(){2}
}
Remember, C would now not compile anymore, but since you compiled against older versions, C stays precompiled, so there is no chance for javac to complain here. But what happens to the code in library A calling A.foo? Well... that would be: "java.lang.IncompatibleClassChangeError: Conflicting default methods: A.foo B.foo" And as far as I know, there is no way around this. That's where the evolution fails.

Another problem case are Java's reflective proxy. There you create a class implementing a series of interfaces by providing an invocation handler, which itself does not implement those interfaces. All method calls will then end up in an invoke method, with a Method instance, describing the method that is supposed to be called. The problem with this environment is, that you cannot call the default method at all. To be able to reflectively call the default method, you need an instance. But since your proxy is the instance and since the proxy will delegate those calls to the invoke method, but you have no such instance. Meaning, Proxy is essentially broken with reflection...

In theory there is a workaround with MethodHandles. This blog here describes how: https://rmannibucau.wordpress.com/2014/03/27/java-8-default-interface-methods-and-jdk-dynamic-proxies But the text there is really incomplete without the code in the second comment. To call a default method, you need to do an invokespecial. The standard invokevirtual Java does to call normal instance methods will result in the errors I mentioned already. Invokespecial allows to bypass inheritance to some extend and target a specific implementation in a specific class. It is for example used for super based method calls. But the usage of this instruction is restricted. The call must be done from the same class... which is not accessible to us. The second comment in the blog of rmannibucau is now using reflection to access a hidden constructor, which allows for private access. That means all handles produced from that lookup object will be treated as if they are from the class and not from outside. This allows calling private methods (hence private access), but also access to invokespecial (unreflectSpecial ensures that).

But what if you have a security manager that does not allow this private access? Simply allowing this access would be a problem, since with that logic you can call anything, that is supposed to be private. The logic for MethodHandles will do a security check only once, and if that check is already passed when the lookup object is created, then a SecurityManager really has no other choice, does it? The only way your proxy can work then is by redirecting the call to the proxied instance. If that it is not the intention to do this for all calls, then you lost.

So what do we do then? Produce our own proxy class at runtime? Well... ignoring that generating classes like this easily leads to class loader related problems, the very same security manager can easily prevent that as well. After all, you have to define your class somewhere.

I guess in total, there is no sure way for the Proxy version to work properly. And the conflict case is obviously also not covered.

I would say, that if default methods had been intended as a safe way to evolve interfaces, then they are a failure. Everything is fine, as long as you stay with a single interface only. If there are multiple interfaces, then things can still go fine, if the interfaces are all from the same library. But if you have to bridge interfaces of different libraries, then you can expect trouble. For me this means I cannot use default methods whenever I write a library, without giving that change the same considerations as if it would be a normal interface method. That means for me it is to be seen as a breaking change. That's a -1 on maintenance.

So in general I actually cannot advice their usage to evolve existing interfaces unless you really, really have thought this through.

Tuesday, November 25, 2014

A Joint Compiler for Groovy and Java using the Processing API?

We had this year a Google Summer of Code project (actually 2) with the goal to write a stubless joint compiler for Groovy using the javac API or at least finding out if it can be done. The idea was to have a two way communication between the compilers, adapting the AST to what each compiler needs. This would allow to compile both languages at once in a single pass, without creating a lot of potentially unused files with potential errors in them as well.

Well, since that did not work out particularly well, I started with a more simple approach leveraging the Java processing API. It surely is no secret that a @SupportedAnnotationTypes("*") will cause the annotation processor described with this to be applied to all classes javac is going to compile. Interesting is how javac behaves if a class cannot be resolved. In that case the symbol gets an error marking, but you can still get information about it.

So I thought, it could be a good idea to just use what javac offers in the processing API to produce a bunch of ClassNode instances our Groovy compiler will understand to then compile first the Groovy code and later use the produced final class files to compile the Java code in a second javac run. The big advantages are: no stubs and java can see the effects of ast transformations in Groovy.

Simple tests showed the approach working well. I just used dummies for the missing classes and let the Groovy compiler fill them later. This worked well till the point where I made a bigger test using the Groovy build itself... and spending hours working myself through the sparse documentation of that API. When I tried out the final version I did find the big flaw in this approach...

Let us assume we have a "import x.y.*; package foo; class FromJava extends FromGroovy {}" where FromJava is Java code and FromGroovy is groovy code. While I can know the full name for FromJava as foo.FromJava and while javac is so kind to tell me that name, we have a big problem with FromGroovy. FromGroovy could have the full name x.y.FromGroovy or foo.FromGroovy. Since javac cannot resolve the missing FromGroovy class, all I will get is a vanilla FromGroovy. In the Groovy compiler on the other hand I only have the full name. And since the vanilla name is not clear enough to find the correct class with the full name, I would need package and imports to maybe create a lookup of some kind myself. But the processing API does not provide information about imports. And that's where this approach gets a burial.

So either to use javac internal apis to get the needed information or no joint compilation.

But using the javac internal api is something I wanted to avoid for this approach. For one it is an internal API and as such not really adapted for outside use and for a number two... the API is complex and difficult to work through. Pairing up with somone knowing javac very well I could probably write a proper joint compiler within a few days. But that's no option here.


Tuesday, January 28, 2014

What class duplication is and how it happens

From time to time we get a question on the lists that turns out to be related to class duplication. In short class duplication is the problem of having two classes of the same name, that are not equal. And that gives all sorts of problems.

In a command line Java application, you usually don't have that sort of problem, because there is only on significant class loader. You have there several loaders too - like the bootstrap loader and the application loader, but what you care about are usually classes given to the JVM by the class path. And these classes are then loaded with the application loader. In my time with Groovy I really had to fight some ugly class loader problems that go beyond mere duplication. They are sometimes so difficult to debug, that I count them to the worst kind of bugs you can have actually. But here we concentrate on class duplication.

Some Basics

First of all you have to imagine that all class loaders together form a tree. The application and bootstrap loaders will be on top forming the root, any other created loader will be a node or leave in that tree. Every class loader has a parent class loader, which the class loader is supposed to delegate loadClass calls to. If the parent doesn't know the class and is not able to create it, then a ClassNotFoundException is thrown and caught by the child node, that requested the class. This child node then has the opportunity to create the class itself, or throw the exception again. In the worst case this goes down to the node doing the original request and then may ultimately throw a ClassNotFoundException for the user.  The class loader creating the class is called defining loader. If you have a Class object from that, you can get the class loader of that class and you will get the defining class loader. For example in Groovy, if your are executing a script in source form from the command line, then this.getClass().getClassLoader() will return an instance of InnerLoader or GroovyClassLoader. I have to mention, that if you don't set the parent, the parent might be null if you request it with classLoader.getParent(), but that does not mean there is no parent. Instead the parent is then the bootstrap class loader. It depends on the implementation though if null is used.

Class loader constraints

Class loader constraints ensure the well behaving of libraries and Java applications.They are described in for example http://docs.oracle.com/javase/specs/jvms/se7/html/jvms-5.html#jvms-5.3.4 But I will try to form this in a bit less complicated and mathematic language. Basically, in the JVM a class as an Class object is not the same as the class you have in source code. Instead you have an object basically defined by a pair of name and loader. A different name with the same loader means a different Class object. A different defining loader but same name, means also a different Class object (class duplication!). The constraints basically translate to this for loadClass calls:
  • A class loader that returned the Class object c for a given name String n, has to always return the same c (referential identity) for a name equal to n (equals)
  • A class loader has to ask the parent to load a class first
The first point goes beyond the defining loader. Of course there should be always
c.getClassLoader().loadClass(c.getName()) == c
but also
 Class c1 = loader.loadClass("Foo")
Class c2 = loader.loadClass("Foo")
assert c1==c2
at any time, even if c1.getClassLoader()!=loader.

Class duplication example

Trouble comes in an environment with complex class loader setup. But before we get into that let me try to illustrate the problem a bit:
 // loader1 will be able to load the class Foo from a jar
def loader1 = new URLClassLoader(...)
// loader2 will be able to load a class of the same name from the same jar
def loader2 = new URLClassLoader(...)

def class1 = loader1.loadClass("Foo")
// loader1 is the defining loader for Foo in class1
assert class1.classLoader == loader1

def class2 = loader2.loadClass("Foo")
// loader2 is the defining loader for Foo in class2
assert class2.classLoader == loader2

// class 1 and class 2 are not the same !
assert class1!=class2

In this example we have the loaders loader1 and loader2, which each can load the class named Foo from a jar.This is not a violation of the constraints I mentioned above. And this example alone does not yet illustrate the full scope of the problem.

When Foo is not Foo

Imagine you have written Java code like this:
 public class Bar {
public void foo(Foo f){}
}
The important part here is that loading the class Bar, will require loading of class Foo as well, since Bar depends on Foo. The loader used to load Foo will be the same that defines Bar. that means the defining loader for Foo will be either a parent of the loader for Bar, or the loader for Bar itself. Let us now come back to our class duplication example from before, but slightly modified:
 // loader1  and loader2 will be able to load the classes
// Bar and Foo from a jar
def loader1 = new URLClassLoader(...)
def loader2 = new URLClassLoader(...)

def class1 = loader1.loadClass("Bar")
// loader1 is the defining loader for Bar in class1 and for Foo
assert class1.classLoader == loader1

def class2 = loader2.loadClass("Foo")
// loader2 is the defining loader for Foo in class2
assert class2.classLoader == loader2

// create a Bar instance
def bar = class1.newInstance()
// create a Foo instance
def foo = class2.newInstance()
// call Bar#foo(Foo)
bar.foo(foo) // Exception!!
The last line here fails, because the Foo we give as argument in the method call is no Foo for the Bar in bar. the Foo known to Bar is one with the defining loader loader1, and the Foo we give in has the defining loader2. This is not limited to method calls, setting fields or even casts have the same behavior. In case of a cast Groovy will then maybe report something like this: GroovyCastException: Cannot cast object 'Foo@1234abcd' with  class 'Foo' to class 'Foo'

This is no problem in Groovy (or Java), this is a problem of your classloader setup.

Diagnose and Solve

Of course a simple test like
foo.getClass.getName().equals("Foo") && Foo.class!=foo.getClass()
can already give some hint for a class duplication problem, since the condition is only true if foo is an instanceof Foo, but not the Foo we used here. One program that can shed some light on the structure is this:
 def printLoader(Class c) {
def loader = c.classLoader

while (loader!=null) {
println "class loader is an instance of ${loader.getClass()} called $loader"
loader = loader.parent
}
println "<bootstrap loader>"
}
If applied here to foo.getClass() and Foo.class you can compare the outputs and should see that at least the first line differs. The fix is more easy said than done. Only a loader common to both should define Foo. Either that has to be done introducing a new class loader, or a class loader that takes URLs has to handle the jar containing Foo (and all dependencies).