Blackdrag's View

Monday, April 13, 2015

About being a paid OSS Developer for Groovy

Groovy is my Child in code you could say. For over 10 years I helped the project. I was one of the first guys to be paid to get bugs fixed and work done in Groovy and did so for about 9 years.

In the beginning it was a nice experience. I mean you do your job, you help the community and work on your hobby at the same time! Ideal, or not? Well, not ideal. It largely depends on if this continues as being perceived as hobby project and really don't care about the success or if you do care. Well, getting into a company with services around Groovy and Grails, namely G2One and having to fight off trolls all the time, made me start caring... and soon frustrated. Sure, the workload was high, but I could cope with that. What frustrated me was more, that we couldn't do as much progress on Groovy as we could have if we had more man-power. And G2One management adding feature after feature without testing or documentation, or at least considering the community did not help here at all. But at least there was the perspective of having more people on the project in the future. G2One then being bought by SpringSource stirred up that hope even more. The actual handling of the Groovy and Grails team in SpringSource on the other hand was different. A few liked us, a few hated us, most ignored us. Well, that would have been fine, but there was also no real plan of continuation. Instead things continued as before - with less new features, but more difficult to maintain code... And hope died again. Actually not fully, there was not enough time for that, since we then got bought by VMware. If we had been somewhere deep down in the management chain before, now we have been even deeper down. And what is worse, now not only would we have to get SpingSource convinced, but also VMware.

That's where I started not hoping anymore.. till we got wind of an open position that we had a chance of getting someone on. We had to act fast, pressure management as best as we could and finally got 2013 Cedric on board. The Groovy team of paid developers got actually bigger for once. You have no idea how happy I was about that. Not only did we get another guy working on core, also a really capable one! It still took years to make Cedric feeling at home in most of the codebase. But I also noticed increased usage of Groovy, bugs getting more difficult to fix and that 2 developers doing coding work fulltime, even with as great contributors as we have is not enough.

Still I stayed a bit more happy, I was thinking that maybe in a few years we can get another person working on these things... especially with us being now Pivotal and them wanting to be so open-source-oriented and with Spring-Boot using Groovy too. Well, turns out Pivotal cares only about OSS if they can use it. From the POV of a business man I can understand this very well. But that involves seeing people only as cost factors and not as human beings. Makes me really wonder about their other projects and what happens if their attention shifts. And I so much now remember my discussion with a nice guy on the W-JAX in Munich, Germany in November 2014. He said there is one very interesting thing about American companies, if something doesn't contribute to their business anymore, they cut it. That may or may not apply to "American companies", it seems to apply to Pivotal. Ok, Pivotal tried to help and be nice with the transition, but I think that is more a lesson learned by the Vert.X disaster at VMware-times - and some more sane people trying not to make the company looking too bad. As an OSS-member Pivotal already failed for me. And if Pivotal management does not change their thinking, then they will stay a failure. I expect several more OSS related problems with Pivotal in the future.

Anyway.. the result is that all of the paid Groovy guys... Cedric (developer), Guillaume (manager and evangelist) and me (developer) had to leave Pivotal. This means, even if I would be getting someone paying me to spend 80-100% on Groovy, I still would be the only one doing more or less fulltime. Not sure how much Gradle will let Cedric work on Groovy, but I expect not much more than 20% - on good days. This also means I would have to see Groovy as hobby project again. A project like Groovy really needs at least the equivalent of 3 full time developers to even cope with the bugs and not having any joy to work on some new ideas.

At this point I am still here being serious about Groovy, but by far not having the resources. I am shifting gears and will make Groovy my hobby again. I probably should never have changed the POV and getting serious about Groovy. But would I then have joined G2One? Most likely not. I would have had to leave the project long before, working for some company most probably not caring about Groovy at all. Maybe it sounds arrogant, but I am pretty sure, that if I had not been paid developer for Groovy those first two years of working as paid OSS-developer, even Groovy 1.0 would have not happened. If I had gotten out later I am pretty sure Groovy would have never became this stable. But times change. The project is pretty usable as it is. There are big things that should be changed in the future, but this will require working time, a big code rewrite, lot's of new documentation, convincing of people and many many other things. If that is not supposed to happen in a time frame of many years, then I do not see how this can be done. But the old codebase has some serious problems in a few places. Architecture problems, often created by me, sometimes there was no other choice back then. But things get increasingly difficult to manage and it would be a time for a rewrite. I rewrote already a lot of code in Groovy in the past. Usually that improved the situation in the long term a lot. But it is an investment for the future. Would I really care about this if I am not serious about the success of Groovy? Maybe not really. If you just help to get some fast success, then surely not. And for doing the boring and difficult to fix stuff, most people have to be serious.

Turns out, that unless there is a perspective for more developers in fulltime, I cannot become serious about Groovy anymore. It would frustrate me too much. Given the chance again to work fulltime on Groovy, would I take it? Actually I am not sure. If the company doing that cares, there should be a perspective of more fulltime workers. So probably I would take the job. But such a company has not appeared. And being paid through donations? Well, I am not convinced we would get together enough money to pay more than one person long term. Even one person is difficult, unless you spend a big portion of your time to go around begging companies for donations. I really don't want to beg for my income all the time. If I had my own business and I could do Groovy work by donations instead of working with a customer so that I don't really depend on them, then it would probably be a perspective, assuming I can keep customers like that. But that would not be fulltime either. That would be maybe 50% with luck and customers I don't have. So no option either. And working fulltime, but not being serious about things too much? I think I am not the type for that. Not being serious means to me I spend my energies somewhere else or only if I am interested. With family and work I have almost zero free time. Maybe an hour here or there. But nothing really you can do much with. It would mean I would not have an outlet... no, cannot do. I would explode or go dumb.

So for the time being it looks like there is no chance for me getting fulltime on Groovy. I dare the community to fill the gaps and show they can do serious stuff in their free time. I will help with guidance through the codebase, but in the end it all depends on the community and their motivation. And maybe, if I see several contributors appearing, that do this potentially long term and that can handle me and the community. Then I might be able to get serious again... if my help then is still welcome and a company found to pay for that.

Wednesday, March 04, 2015

Thoughts about the new meta class system MOP 2

Some may remember, MOP2 is the idea of letting Groovy have a new meta class system, since the old system has some serious internal problems. The problems are largely related to the basic concepts like seeing the meta class as something that invokes methods. But also problems with locking for caching and global structures and many many small mistakes in the MOP that become established. The result is a really complicated MOP, that makes the life of everyone challenging, that wants to use those things.

Some ideas have been born out of how to redesign things from scratch. And over the last two years I was trying things here and there, toying with ideas, preparing for a big jump. But since the day I become aware that Pivotal is pulling the plug for funding Groovy I had to realize, that a long term adoption project like I had planed before with MOP 2 might be difficult to do in the future. Why? Even if me working full-time on Groovy in the future will be no problem, it will be a new employer and there you cannot simply come with a project like that in the beginning.

On the other hand I don't want to continue doing nothing for MOP2 in the next year or two.

So I started to think about a plan to slowly migrate the current MOP into one I find better suited for a modern Groovy. As a result I made a list of some features I did plan for the MOP2 and how it fits with the current system, as well as how much I see a possible migration path here. And with migration path I mean to not break old code, and have a different behavior only if intended. Sounds difficult? Well, yes. It can be done only by not doing some things I did intend to do.

New package namespace for groovy.lang

Why a new package? Because the idea was to be able to run the old and the new runtime in parallel, maybe let them communicate. You would have something like a MOP1 compatibility jar, that if on the classpath, allows your old Groovy classes to run normally. While having both runtimes in parallel was thought as optional I think now it has to become the regular mechanism and standard. Having to have a slow migration path means for me here to mostly exchange the standard meta class

Remove MOP specific methods from GroovyObject

The MOP specific methods I am talking about here are invokeMethod, getProperty and setProperty as well as getMetaClass. Those methods have been used in the early times of Groovy as a way to speed up things and avoid reflective method calls where possible, while at the same time providing extension points for the users. I am not so much talking about how bad it is that invokeMethod is usually called as a last resort kind of method, and get/setProperty usually upfront, or that the property methods make it really difficult to keep information about the origin of a call, or the amounts of code that goes into our runtime which tries to recognize if it can safely ignore those methods and directly go to the meta class... No, in terms of a migration path I have to clearly say, that to keep binary compatibility, those methods do have to stay. And they do have to stay with a more or less similar logic as today. But a new GroovyObject class could be made, that goes in a different package... Having to create GroovyObject wrappers all the time to satisfy the needs of older APIs using that won't do though. It is a terrible strain on memory and breaks referential identity. So I guess you would have to have an annotation to switch between old and new GroovyObject.

Make the meta class similar to a Persistent Collection

MetaClassImpl was in the past already going the route of being immutable, but ExpandoMetaClass has shown there are different needs. Well, to be exact, MetaClassImpl has two states, one that allows initialization (and mutation with it) and the initialized state, in which no mutation happens anymore. ExpandoMetaClass is based on MetaClassImpl and more or less switches between states to allow for mutation. But this turned out to be a horrible mess for concurrency. Persistent Collections on the other hand are by nature immutable, which allows nice lock-free paths. SO if you want a mutation, you have to actually use a new instance. The trick is to leverage the internal structures of the old instance to save on memory and initialization time. For example if you have two list, the second one is created by using the first one and appending an element, you can easily use simply the old list, if you link the new element to the old elements. It would be a list in which the new element is before the old elements, like a stack. Of course this is just for illustration and there are solutions to make it in the other order.

Now the current API is more based on having a single reference and the instance behind that is mutated. Doing the persistent collection approach means there would have to be a new instance. But this could be solved by having a kind of dummy meta class, that will keep track of the real meta class. The real meta class would not be mutated directly anymore.

Let open blocks not use a class anymore

In current Groovy if you have code like

list.each{println it}

you will get an anonymous inner class for the open block which is also an instance of Closure. There you can set strategies and whatever. Well, in the new MOP I intend to not produce a class anymore. Instead a method will be used. It would have the same parameters as the Closure doCall method would have now, but plus a helper instance for things like the resolve strategy and such. And in cases where we do not reference any outside context (like in my example) we could omit this as well. The old Closure class would then be only a wrapper for the real representation of reference to the open block. In Java7+ code this could be a a couple of MethodHandles, in older code this would have to be MetaMethods. But of course those MetaMethods would have to be used with care, to avoid to many references to the meta class system. Otherwise you would serialize half the runtime in current Groovy if you wanted to serialize a Closure. I think a lot of the logic that can currently be found in ClosureMetaClass to have a simplified dispatch for Closures can be reused here. I guess this all can be done using the current API, but some frameworks may be based on having those inner classes and recognize them. Also the change means there will be only one meta class for every open block out there. Frameworks building on top of that would either need to change or we would need to provide an option for the old way... The main reason I would want this kind of change is to have a smaller bytecode footprint, less permgen space and of course lower memory consumption. In cases that use no outside references we could even think about reusing instances of Closure to have even less footprint.

New Meta Class DSL

The DSL provided by ExpandoMetaClass has some nasty things here and there. Fixing them would almost certainly break code, especially in Grails. Thus a new optional front end to the meta classes would allow to have a cleaner DSL. The details here really have to be worked out using actual code I guess. But I see no bigger problem.

No Meta Class subclassing anymore

Subclassing proofed to be a problem for performance improvements. Of course there still needs to be a way to add some kind of custom behavior, but that would then be done using composition and with caching in mind, rather than having the meta class as the original instance of a method call invocation, as it is now... and need to be worked around all the time. The composition approach also allows for a cleaner separation of an API. Currently we have for example the MetaClass interface, but it is almost never used for a custom meta class, instead people subclass either DelegatingMetaClass or directly MetaClassImpl. And DelegatigMetaClass is really already going in the direction of a composition model.

Realms

The idea of the Realm concept is to have different spaces for meta classes, that can exist independent from each other and not influence each other. For example in a Groovy implementation of something like DefaultGroovyMethods, you may want to call the Java version of toString, instead of the Groovy one (if existing). Then you need a way to set a Realm and switch them. But since this concept is unknown to the current meta class system, using realms in a class, would automatically mean using the new meta classes. Since old and new system exist at the same time, this will cause confusion I guess. But basically there is only one Real for the old meta classes, so if another realm is requested I guess the best would be to just give a runtime error then.

The plan would be to start implementing the new meta class system and change the old meta class system to use the new one if appropriate. That means for example the old meta classes would become mere skeletons for the real system meta class for example. We would need some annotations or other logic here and there to switch between new and old logic, with the old logic being the default of course

But overall I think this can be done and in a way that won't require people to update all their code right away. The can then migrate with their own pace.

Sunday, February 08, 2015

Getting rid of compareTo for ==

NOTE:This is article is thought as prelude to a discussion on the mailing list about a possible removal of the general compareTo path for the equality operator.

As many may know Groovy has quite the complicated logic for the == operator. Which is to call equals unless the left side implements Comparable, in which case we use compareTo... well simplified...

To illustrate the logic:

 class A implements Comparable {

  boolean equals(Object o) {false}

  int compareTo(Object o) {0}

}

def xa = new A()

def ya = new A()

assert xa==xa && ya==ya     // referential identity override

assert !xa.equals(ya)       // direct call to equals

assert xa.compareTo(ya)==0  // direct call to compareTo

assert xa==ya               // ignores equals, since it implements Comparable



class B implements Comparable {

  boolean equals(Object o) {false}

  int compareTo(Object o) {0}

}

def xb = new B()

assert !xa.equals(xb) && !xb.equals(xa)

assert xa.compareTo(xb)==0 && xb.compareTo(xa)==0

assert xa!=xb              // ignores equals as well as Comparable



assert 1==1l               // compare primtive long and int

assert !(1.0G.equals(1.00G))

assert 1.0G==1.00G         // compare BigDecimals with differing scale

assert 1.0G==1l            // compare primitive long with BigDecimal

assert 1G==1.0G            // compare BigInteger and BigDecimal

assert 1!=new Object()     // compare primitive with incompatible instances

In Java you know that this operator allows you for example to compare ints and longs and does this in the given case by transforming the int into a long to then compare the numeric values. Similar things happen for the other primitives. Since Java5 the operator does even allow you to compare a primitive int and an Integer by using autoboxing. Where the equality operator in Java fails is if you compare for example a Long and an Integer. Fail in the sense that it does not the same as for the primitive counter parts.

Now in Groovy the equality operator traditionally had to handle comparing the boxed versions as if they are not boxed. This is because in versions of Groovy before 1.8 every primitive declared variable actually used the boxed version. Only in 1.8 I introduced actual primitives, but the ability of the equality operator to compare for example Integer and Long stayed. It had to stay, because we don't only compare those, we have also those 1-char Strings, that are supposed to be equal to Strings, GString logic and of course BigInteger and BigDecimal logics.

BigDecimal now does something that is not really advised when implementing the interface Comparable, it returns false using equals for a case that is seen as equal for compareTo. For example "1.0" and "1.00" is such a case. They are not equal, because the scale is not, even if the value is projectable without precision loss to the other to do an actual compare.

Since people do also things like `1==new Object()` and since this is not supposed to throw a ClassCastExpcetion, even though the compareTo method will do that here, we had also to add a special logic doing the compareTo call only, iff the right side type is a subtype of the left side type.

This causes all kinds of confusion to people. And my suggestion is to remove the compareTo path.

Instead I suggest adding a path special to BigDecimal to handle the equals problem. This should remove a lot of confusion in the future.

Now this will of course have more impact than some people may think. Obviously classes implementing Comparable may now behave different. But especially custom Number implementations may do that now. So it is a loss of feature to some extend. But if the usage of those features is causing more problems than abilities it allows, then we have to rethink this. And I think this is the case here. My intended change would also change the behavior of the program above. The referential equality override would stay, but "assert xa==ya" would then fail, since equals returns always false. Also if equals did return always true "assert xa!=xb" would fail, since before it did not call equals and now does.

Monday, January 12, 2015

Indy and CompileStatic as tag team to defeat array access times

Micro Benchmarks are Evil

They are evil, aren't they? You test a very specialized case that may have no relation to your everyday application at all. But they can show some weaknesses here and there. If they are relevant or not is a different question, that is most often answered with a "they are not". Still there are sometimes cases on which a language can improve upon. And one such case in Groovy is array access.

Array access in Groovy

For those not being aware of it, but Groovy does array access not like Java. Groovy allows the usage of a negative array index, in which case we go from the first element to the last. So a -1 denotes always the last element. Using -array.length on array will again result in a ArrayIndexOutOfBoundsException.

Benchmarking a little

To measure the extend of the problem I am using this little benchmark named fannkuch. It is based on the alioth shootouts Groovy version for fannkuch. Since I know Groovy will not perform very well on this, even with primitive optimization I don't expects too much checking this with none-static code. For those not knowing what primitive optimizations is... it is letting the compiler generate an alternative bytecode execution path based on primitives and the assumption that there are no meta class changes affecting primitives. To ensure this assumption is legal I am using guards.

**fannkuch microbenchmark times in ms (JDK8_u25):**
primopts Groovy	12889.2358718	+2718.9594152/-787.6735898
static Groovy	3325.5838752	+270.7189528/-266.2819292
Java	561	+85/-65

Which means even Groovy with primitive optimizations is slower by factor 23. Switching to @CompileStatic makes things look better, but there is still a factor 5.

Analyzing the results

Analyzing the generated bytecode will show us, that the @CompileStatic version is not doing anything strange compared to Java, only the array access parts are done different by using BytecodeInterface8 methods to access the arrays. primopts on the other hand show that besides the BytecodeInterface8 method usage, there is also dynamic access to arrays. This of course then means bad times, since beating primitives on the JVM is difficult with code, that cannot handle primitives all that well... like for example reflection.

So my next was to try if invokedynamic can improve the situation. It may at first look strange to use invokedynamic in static compiled code for something as static as this. We know all the types at compile time so a method call should be faster than any fancy thing invokedynamic could do, right? Wrong. Or I should say it depends. What we can do here is to give a very short path for the optimistic case of the array index being positive. In the original code this is done with a try-catch. But in terms of MethodHandles used by invokedynamic we can use a guard that checks the index for a positive value instead. MethodHandles do also provide an exception catching guard of some kind, but this has issues in terms of performance and how far the code can be optimized. In total the guard version has the big advantage of doing something the JVM would do anyway and thus potentially just remove the second check, making the first check very very cheap. The fallback of course is still as complex as before and there is no real speed improvement to be expected. Another part that should deserve consideration is that in invokedynamic a static call site is no where to be compared to a mutable callsite. Thanks to Java8 lambdas a lot of performance optimization effort has been going in making static callsites fast. And we have one here.

New results

This then resulted in PR #587 and updated times in our table:

primopts Groovy	12889.2358718	+2718.9594152/-787.6735898
static Groovy	3325.5838752	+270.7189528/-266.2819292
static Groovy with indy	878.0258219	+328.9714071/-134.1179639
Java	561	+85/-65

This indicates a mere slow done of 57% now. I think this is a great improvement... And while it would have been nice to be actually on par with Java here, I assume this can only be done by using Java's array access logic in the end. A slowdown like this is something I found already occurring if you check for a boolean in an if for example. So I doubt there is much more room of improvement.

As for primitive optimizations, after GROOVY-7249 and GROOVY-7251 we can also look forward to improvements in indy and normal primitive optimizations.

I will make a new blogpost of the results, once those are implemented

Tuesday, December 09, 2014

A deeper look at default methods conflicts and proxies

So what are default interface methods?

(All my examples will use a very syntax very near to Java.)

In essence they are a way to add methods to an interface, without requiring the implementing class to define the implementation. Example:

 interface A{

    default int foo() {return 1}

}

class B implements A{}

assert new B().foo() == 1

The default keyword is here used to start the method declaration and as a flag for the resulting method. B will not have its own implementation of foo, still I will be able to call foo() though an instance of B.

All fine?

Well... what happens if there are conflicts? Example:

 interface A {

    default int foo(){return 1}

}

interface B {

    default int foo(){return 2}

}

class C implements A,B{}

This results in "error: class C inherits unrelated defaults for foo() from types A and B" when you compile it in Java. The problem is easily solved, by writing a foo method in C and then call the implementation you want with A.super.foo() or B.super.foo()

And that's the point where most tutorials end.

I would like to go further. The "promise" was interface evolution, under which I understand that you can add methods to interfaces without having to worry too much about the implementing class not working anymore. So let me first describe the situation we are really coming from:

 interface A{

    void foo()

}

class B implements A{

    void foo(){System.out.println("B.foo");}

}

Let us assume A is coming from a library, B is your code, that implements the library interface and B is then supposed to be used in the library and will call foo, which here then results in B.foo being printed.
Now imagine the library gets to a new version and A is changed to this:

interface A{

    void foo()

    void bar()

}

And because your B is compiled against an older version of the library, you don't have an implementation of bar(). As long as bar() is not called, there won't be a problem. But of course the method was added for a purpose, so the library will call it, resulting a injava.lang.AbstractMethodError. The "evolution" part in Java 8 default methods now is, that you can make this method a default method

 interface A{

    void foo()

    default void bar() {System.out.println("A.bar");}

}

Now the library code can call bar on B and have a fallback to bar in A, thus avoiding the AbstractMethodError

But I mentioned conflicts. For this we make the example slightly bigger

 interface A{}

interface B{}

class C implements A,B{}

Again we say A comes from a library, let us call it for simplicity A as well. Let us also say B comes from library B, and C is your code that happens to implement the interfaces from both libraries, maybe to produce some kind of adapter. That's the starting situation. Now both libraries take a liking in Java8 and add default methods to their interfaces:

 interface A{

    default int foo(){1}

}

interface B{

    default int foo(){2}

}

Remember, C would now not compile anymore, but since you compiled against older versions, C stays precompiled, so there is no chance for javac to complain here. But what happens to the code in library A calling A.foo? Well... that would be: "java.lang.IncompatibleClassChangeError: Conflicting default methods: A.foo B.foo" And as far as I know, there is no way around this. That's where the evolution fails.

Another problem case are Java's reflective proxy. There you create a class implementing a series of interfaces by providing an invocation handler, which itself does not implement those interfaces. All method calls will then end up in an invoke method, with a Method instance, describing the method that is supposed to be called. The problem with this environment is, that you cannot call the default method at all. To be able to reflectively call the default method, you need an instance. But since your proxy is the instance and since the proxy will delegate those calls to the invoke method, but you have no such instance. Meaning, Proxy is essentially broken with reflection...

In theory there is a workaround with MethodHandles. This blog here describes how: https://rmannibucau.wordpress.com/2014/03/27/java-8-default-interface-methods-and-jdk-dynamic-proxies But the text there is really incomplete without the code in the second comment. To call a default method, you need to do an invokespecial. The standard invokevirtual Java does to call normal instance methods will result in the errors I mentioned already. Invokespecial allows to bypass inheritance to some extend and target a specific implementation in a specific class. It is for example used for super based method calls. But the usage of this instruction is restricted. The call must be done from the same class... which is not accessible to us. The second comment in the blog of rmannibucau is now using reflection to access a hidden constructor, which allows for private access. That means all handles produced from that lookup object will be treated as if they are from the class and not from outside. This allows calling private methods (hence private access), but also access to invokespecial (unreflectSpecial ensures that).

But what if you have a security manager that does not allow this private access? Simply allowing this access would be a problem, since with that logic you can call anything, that is supposed to be private. The logic for MethodHandles will do a security check only once, and if that check is already passed when the lookup object is created, then a SecurityManager really has no other choice, does it? The only way your proxy can work then is by redirecting the call to the proxied instance. If that it is not the intention to do this for all calls, then you lost.

So what do we do then? Produce our own proxy class at runtime? Well... ignoring that generating classes like this easily leads to class loader related problems, the very same security manager can easily prevent that as well. After all, you have to define your class somewhere.

I guess in total, there is no sure way for the Proxy version to work properly. And the conflict case is obviously also not covered.

I would say, that if default methods had been intended as a safe way to evolve interfaces, then they are a failure. Everything is fine, as long as you stay with a single interface only. If there are multiple interfaces, then things can still go fine, if the interfaces are all from the same library. But if you have to bridge interfaces of different libraries, then you can expect trouble. For me this means I cannot use default methods whenever I write a library, without giving that change the same considerations as if it would be a normal interface method. That means for me it is to be seen as a breaking change. That's a -1 on maintenance.

So in general I actually cannot advice their usage to evolve existing interfaces unless you really, really have thought this through.

Tuesday, November 25, 2014

A Joint Compiler for Groovy and Java using the Processing API?

We had this year a Google Summer of Code project (actually 2) with the goal to write a stubless joint compiler for Groovy using the javac API or at least finding out if it can be done. The idea was to have a two way communication between the compilers, adapting the AST to what each compiler needs. This would allow to compile both languages at once in a single pass, without creating a lot of potentially unused files with potential errors in them as well.

Well, since that did not work out particularly well, I started with a more simple approach leveraging the Java processing API. It surely is no secret that a @SupportedAnnotationTypes("*") will cause the annotation processor described with this to be applied to all classes javac is going to compile. Interesting is how javac behaves if a class cannot be resolved. In that case the symbol gets an error marking, but you can still get information about it.

So I thought, it could be a good idea to just use what javac offers in the processing API to produce a bunch of ClassNode instances our Groovy compiler will understand to then compile first the Groovy code and later use the produced final class files to compile the Java code in a second javac run. The big advantages are: no stubs and java can see the effects of ast transformations in Groovy.

Simple tests showed the approach working well. I just used dummies for the missing classes and let the Groovy compiler fill them later. This worked well till the point where I made a bigger test using the Groovy build itself... and spending hours working myself through the sparse documentation of that API. When I tried out the final version I did find the big flaw in this approach...

Let us assume we have a "import x.y.*; package foo; class FromJava extends FromGroovy {}" where FromJava is Java code and FromGroovy is groovy code. While I can know the full name for FromJava as foo.FromJava and while javac is so kind to tell me that name, we have a big problem with FromGroovy. FromGroovy could have the full name x.y.FromGroovy or foo.FromGroovy. Since javac cannot resolve the missing FromGroovy class, all I will get is a vanilla FromGroovy. In the Groovy compiler on the other hand I only have the full name. And since the vanilla name is not clear enough to find the correct class with the full name, I would need package and imports to maybe create a lookup of some kind myself. But the processing API does not provide information about imports. And that's where this approach gets a burial.

So either to use javac internal apis to get the needed information or no joint compilation.

But using the javac internal api is something I wanted to avoid for this approach. For one it is an internal API and as such not really adapted for outside use and for a number two... the API is complex and difficult to work through. Pairing up with somone knowing javac very well I could probably write a proper joint compiler within a few days. But that's no option here.

Tuesday, January 28, 2014

What class duplication is and how it happens

From time to time we get a question on the lists that turns out to be related to class duplication. In short class duplication is the problem of having two classes of the same name, that are not equal. And that gives all sorts of problems.

In a command line Java application, you usually don't have that sort of problem, because there is only on significant class loader. You have there several loaders too - like the bootstrap loader and the application loader, but what you care about are usually classes given to the JVM by the class path. And these classes are then loaded with the application loader. In my time with Groovy I really had to fight some ugly class loader problems that go beyond mere duplication. They are sometimes so difficult to debug, that I count them to the worst kind of bugs you can have actually. But here we concentrate on class duplication.

Some Basics

First of all you have to imagine that all class loaders together form a tree. The application and bootstrap loaders will be on top forming the root, any other created loader will be a node or leave in that tree. Every class loader has a parent class loader, which the class loader is supposed to delegate loadClass calls to. If the parent doesn't know the class and is not able to create it, then a ClassNotFoundException is thrown and caught by the child node, that requested the class. This child node then has the opportunity to create the class itself, or throw the exception again. In the worst case this goes down to the node doing the original request and then may ultimately throw a ClassNotFoundException for the user. The class loader creating the class is called defining loader. If you have a Class object from that, you can get the class loader of that class and you will get the defining class loader. For example in Groovy, if your are executing a script in source form from the command line, then this.getClass().getClassLoader() will return an instance of InnerLoader or GroovyClassLoader. I have to mention, that if you don't set the parent, the parent might be null if you request it with classLoader.getParent(), but that does not mean there is no parent. Instead the parent is then the bootstrap class loader. It depends on the implementation though if null is used.

Class loader constraints

Class loader constraints ensure the well behaving of libraries and Java applications.They are described in for example http://docs.oracle.com/javase/specs/jvms/se7/html/jvms-5.html#jvms-5.3.4 But I will try to form this in a bit less complicated and mathematic language. Basically, in the JVM a class as an Class object is not the same as the class you have in source code. Instead you have an object basically defined by a pair of name and loader. A different name with the same loader means a different Class object. A different defining loader but same name, means also a different Class object (class duplication!). The constraints basically translate to this for loadClass calls:

A class loader that returned the Class object c for a given name String n, has to always return the same c (referential identity) for a name equal to n (equals)
A class loader has to ask the parent to load a class first

The first point goes beyond the defining loader. Of course there should be always

c.getClassLoader().loadClass(c.getName()) == c

but also

 Class c1 = loader.loadClass("Foo")

Class c2 = loader.loadClass("Foo")

assert c1==c2

at any time, even if c1.getClassLoader()!=loader.

Class duplication example

Trouble comes in an environment with complex class loader setup. But before we get into that let me try to illustrate the problem a bit:

 // loader1 will be able to load the class Foo from a jar

def loader1 = new URLClassLoader(...)

// loader2 will be able to load a class of the same name from the same jar 

def loader2 = new URLClassLoader(...)



def class1 = loader1.loadClass("Foo")

// loader1 is the defining loader for Foo in class1

assert class1.classLoader == loader1



def class2 = loader2.loadClass("Foo")

// loader2 is the defining loader for Foo in class2

assert class2.classLoader == loader2 



// class 1 and class 2 are not the same !

assert class1!=class2

In this example we have the loaders loader1 and loader2, which each can load the class named Foo from a jar.This is not a violation of the constraints I mentioned above. And this example alone does not yet illustrate the full scope of the problem.

When Foo is not Foo

Imagine you have written Java code like this:

 public class Bar {

  public void foo(Foo f){}

}

The important part here is that loading the class Bar, will require loading of class Foo as well, since Bar depends on Foo. The loader used to load Foo will be the same that defines Bar. that means the defining loader for Foo will be either a parent of the loader for Bar, or the loader for Bar itself. Let us now come back to our class duplication example from before, but slightly modified:

 // loader1  and loader2 will be able to load the classes

// Bar and Foo from a jar

def loader1 = new URLClassLoader(...)

def loader2 = new URLClassLoader(...)



def class1 = loader1.loadClass("Bar")

// loader1 is the defining loader for Bar in class1 and for Foo

assert class1.classLoader == loader1



def class2 = loader2.loadClass("Foo")

// loader2 is the defining loader for Foo in class2

assert class2.classLoader == loader2



// create a Bar instance 

def bar = class1.newInstance()

// create a Foo instance

def foo = class2.newInstance()

// call Bar#foo(Foo)

bar.foo(foo) // Exception!!

The last line here fails, because the Foo we give as argument in the method call is no Foo for the Bar in bar. the Foo known to Bar is one with the defining loader loader1, and the Foo we give in has the defining loader2. This is not limited to method calls, setting fields or even casts have the same behavior. In case of a cast Groovy will then maybe report something like this: GroovyCastException: Cannot cast object 'Foo@1234abcd' with class 'Foo' to class 'Foo'

This is no problem in Groovy (or Java), this is a problem of your classloader setup.

Diagnose and Solve

Of course a simple test like

foo.getClass.getName().equals("Foo") && Foo.class!=foo.getClass()

can already give some hint for a class duplication problem, since the condition is only true if foo is an instanceof Foo, but not the Foo we used here. One program that can shed some light on the structure is this:

 def printLoader(Class c) {

  def loader = c.classLoader



  while (loader!=null) {

    println "class loader is an instance of ${loader.getClass()} called $loader"

    loader = loader.parent

  }

  println "<bootstrap loader>"

}

If applied here to foo.getClass() and Foo.class you can compare the outputs and should see that at least the first line differs. The fix is more easy said than done. Only a loader common to both should define Foo. Either that has to be done introducing a new class loader, or a class loader that takes URLs has to handle the jar containing Foo (and all dependencies).