Tuesday, January 28, 2014

What class duplication is and how it happens

From time to time we get a question on the lists that turns out to be related to class duplication. In short class duplication is the problem of having two classes of the same name, that are not equal. And that gives all sorts of problems.

In a command line Java application, you usually don't have that sort of problem, because there is only on significant class loader. You have there several loaders too - like the bootstrap loader and the application loader, but what you care about are usually classes given to the JVM by the class path. And these classes are then loaded with the application loader. In my time with Groovy I really had to fight some ugly class loader problems that go beyond mere duplication. They are sometimes so difficult to debug, that I count them to the worst kind of bugs you can have actually. But here we concentrate on class duplication.

Some Basics

First of all you have to imagine that all class loaders together form a tree. The application and bootstrap loaders will be on top forming the root, any other created loader will be a node or leave in that tree. Every class loader has a parent class loader, which the class loader is supposed to delegate loadClass calls to. If the parent doesn't know the class and is not able to create it, then a ClassNotFoundException is thrown and caught by the child node, that requested the class. This child node then has the opportunity to create the class itself, or throw the exception again. In the worst case this goes down to the node doing the original request and then may ultimately throw a ClassNotFoundException for the user.  The class loader creating the class is called defining loader. If you have a Class object from that, you can get the class loader of that class and you will get the defining class loader. For example in Groovy, if your are executing a script in source form from the command line, then this.getClass().getClassLoader() will return an instance of InnerLoader or GroovyClassLoader. I have to mention, that if you don't set the parent, the parent might be null if you request it with classLoader.getParent(), but that does not mean there is no parent. Instead the parent is then the bootstrap class loader. It depends on the implementation though if null is used.

Class loader constraints

Class loader constraints ensure the well behaving of libraries and Java applications.They are described in for example http://docs.oracle.com/javase/specs/jvms/se7/html/jvms-5.html#jvms-5.3.4 But I will try to form this in a bit less complicated and mathematic language. Basically, in the JVM a class as an Class object is not the same as the class you have in source code. Instead you have an object basically defined by a pair of name and loader. A different name with the same loader means a different Class object. A different defining loader but same name, means also a different Class object (class duplication!). The constraints basically translate to this for loadClass calls:
  • A class loader that returned the Class object c for a given name String n, has to always return the same c (referential identity) for a name equal to n (equals)
  • A class loader has to ask the parent to load a class first
The first point goes beyond the defining loader. Of course there should be always
c.getClassLoader().loadClass(c.getName()) == c
but also
 Class c1 = loader.loadClass("Foo")
Class c2 = loader.loadClass("Foo")
assert c1==c2
at any time, even if c1.getClassLoader()!=loader.

Class duplication example

Trouble comes in an environment with complex class loader setup. But before we get into that let me try to illustrate the problem a bit:
 // loader1 will be able to load the class Foo from a jar
def loader1 = new URLClassLoader(...)
// loader2 will be able to load a class of the same name from the same jar
def loader2 = new URLClassLoader(...)

def class1 = loader1.loadClass("Foo")
// loader1 is the defining loader for Foo in class1
assert class1.classLoader == loader1

def class2 = loader2.loadClass("Foo")
// loader2 is the defining loader for Foo in class2
assert class2.classLoader == loader2

// class 1 and class 2 are not the same !
assert class1!=class2

In this example we have the loaders loader1 and loader2, which each can load the class named Foo from a jar.This is not a violation of the constraints I mentioned above. And this example alone does not yet illustrate the full scope of the problem.

When Foo is not Foo

Imagine you have written Java code like this:
 public class Bar {
public void foo(Foo f){}
}
The important part here is that loading the class Bar, will require loading of class Foo as well, since Bar depends on Foo. The loader used to load Foo will be the same that defines Bar. that means the defining loader for Foo will be either a parent of the loader for Bar, or the loader for Bar itself. Let us now come back to our class duplication example from before, but slightly modified:
 // loader1  and loader2 will be able to load the classes
// Bar and Foo from a jar
def loader1 = new URLClassLoader(...)
def loader2 = new URLClassLoader(...)

def class1 = loader1.loadClass("Bar")
// loader1 is the defining loader for Bar in class1 and for Foo
assert class1.classLoader == loader1

def class2 = loader2.loadClass("Foo")
// loader2 is the defining loader for Foo in class2
assert class2.classLoader == loader2

// create a Bar instance
def bar = class1.newInstance()
// create a Foo instance
def foo = class2.newInstance()
// call Bar#foo(Foo)
bar.foo(foo) // Exception!!
The last line here fails, because the Foo we give as argument in the method call is no Foo for the Bar in bar. the Foo known to Bar is one with the defining loader loader1, and the Foo we give in has the defining loader2. This is not limited to method calls, setting fields or even casts have the same behavior. In case of a cast Groovy will then maybe report something like this: GroovyCastException: Cannot cast object 'Foo@1234abcd' with  class 'Foo' to class 'Foo'

This is no problem in Groovy (or Java), this is a problem of your classloader setup.

Diagnose and Solve

Of course a simple test like
foo.getClass.getName().equals("Foo") && Foo.class!=foo.getClass()
can already give some hint for a class duplication problem, since the condition is only true if foo is an instanceof Foo, but not the Foo we used here. One program that can shed some light on the structure is this:
 def printLoader(Class c) {
def loader = c.classLoader

while (loader!=null) {
println "class loader is an instance of ${loader.getClass()} called $loader"
loader = loader.parent
}
println "<bootstrap loader>"
}
If applied here to foo.getClass() and Foo.class you can compare the outputs and should see that at least the first line differs. The fix is more easy said than done. Only a loader common to both should define Foo. Either that has to be done introducing a new class loader, or a class loader that takes URLs has to handle the jar containing Foo (and all dependencies).