Tuesday, January 28, 2014

What class duplication is and how it happens

From time to time we get a question on the lists that turns out to be related to class duplication. In short class duplication is the problem of having two classes of the same name, that are not equal. And that gives all sorts of problems.

In a command line Java application, you usually don't have that sort of problem, because there is only on significant class loader. You have there several loaders too - like the bootstrap loader and the application loader, but what you care about are usually classes given to the JVM by the class path. And these classes are then loaded with the application loader. In my time with Groovy I really had to fight some ugly class loader problems that go beyond mere duplication. They are sometimes so difficult to debug, that I count them to the worst kind of bugs you can have actually. But here we concentrate on class duplication.

Some Basics

First of all you have to imagine that all class loaders together form a tree. The application and bootstrap loaders will be on top forming the root, any other created loader will be a node or leave in that tree. Every class loader has a parent class loader, which the class loader is supposed to delegate loadClass calls to. If the parent doesn't know the class and is not able to create it, then a ClassNotFoundException is thrown and caught by the child node, that requested the class. This child node then has the opportunity to create the class itself, or throw the exception again. In the worst case this goes down to the node doing the original request and then may ultimately throw a ClassNotFoundException for the user.  The class loader creating the class is called defining loader. If you have a Class object from that, you can get the class loader of that class and you will get the defining class loader. For example in Groovy, if your are executing a script in source form from the command line, then this.getClass().getClassLoader() will return an instance of InnerLoader or GroovyClassLoader. I have to mention, that if you don't set the parent, the parent might be null if you request it with classLoader.getParent(), but that does not mean there is no parent. Instead the parent is then the bootstrap class loader. It depends on the implementation though if null is used.

Class loader constraints

Class loader constraints ensure the well behaving of libraries and Java applications.They are described in for example http://docs.oracle.com/javase/specs/jvms/se7/html/jvms-5.html#jvms-5.3.4 But I will try to form this in a bit less complicated and mathematic language. Basically, in the JVM a class as an Class object is not the same as the class you have in source code. Instead you have an object basically defined by a pair of name and loader. A different name with the same loader means a different Class object. A different defining loader but same name, means also a different Class object (class duplication!). The constraints basically translate to this for loadClass calls:
  • A class loader that returned the Class object c for a given name String n, has to always return the same c (referential identity) for a name equal to n (equals)
  • A class loader has to ask the parent to load a class first
The first point goes beyond the defining loader. Of course there should be always
c.getClassLoader().loadClass(c.getName()) == c
but also
 Class c1 = loader.loadClass("Foo")
Class c2 = loader.loadClass("Foo")
assert c1==c2
at any time, even if c1.getClassLoader()!=loader.

Class duplication example

Trouble comes in an environment with complex class loader setup. But before we get into that let me try to illustrate the problem a bit:
 // loader1 will be able to load the class Foo from a jar
def loader1 = new URLClassLoader(...)
// loader2 will be able to load a class of the same name from the same jar
def loader2 = new URLClassLoader(...)

def class1 = loader1.loadClass("Foo")
// loader1 is the defining loader for Foo in class1
assert class1.classLoader == loader1

def class2 = loader2.loadClass("Foo")
// loader2 is the defining loader for Foo in class2
assert class2.classLoader == loader2

// class 1 and class 2 are not the same !
assert class1!=class2

In this example we have the loaders loader1 and loader2, which each can load the class named Foo from a jar.This is not a violation of the constraints I mentioned above. And this example alone does not yet illustrate the full scope of the problem.

When Foo is not Foo

Imagine you have written Java code like this:
 public class Bar {
public void foo(Foo f){}
}
The important part here is that loading the class Bar, will require loading of class Foo as well, since Bar depends on Foo. The loader used to load Foo will be the same that defines Bar. that means the defining loader for Foo will be either a parent of the loader for Bar, or the loader for Bar itself. Let us now come back to our class duplication example from before, but slightly modified:
 // loader1  and loader2 will be able to load the classes
// Bar and Foo from a jar
def loader1 = new URLClassLoader(...)
def loader2 = new URLClassLoader(...)

def class1 = loader1.loadClass("Bar")
// loader1 is the defining loader for Bar in class1 and for Foo
assert class1.classLoader == loader1

def class2 = loader2.loadClass("Foo")
// loader2 is the defining loader for Foo in class2
assert class2.classLoader == loader2

// create a Bar instance
def bar = class1.newInstance()
// create a Foo instance
def foo = class2.newInstance()
// call Bar#foo(Foo)
bar.foo(foo) // Exception!!
The last line here fails, because the Foo we give as argument in the method call is no Foo for the Bar in bar. the Foo known to Bar is one with the defining loader loader1, and the Foo we give in has the defining loader2. This is not limited to method calls, setting fields or even casts have the same behavior. In case of a cast Groovy will then maybe report something like this: GroovyCastException: Cannot cast object 'Foo@1234abcd' with  class 'Foo' to class 'Foo'

This is no problem in Groovy (or Java), this is a problem of your classloader setup.

Diagnose and Solve

Of course a simple test like
foo.getClass.getName().equals("Foo") && Foo.class!=foo.getClass()
can already give some hint for a class duplication problem, since the condition is only true if foo is an instanceof Foo, but not the Foo we used here. One program that can shed some light on the structure is this:
 def printLoader(Class c) {
def loader = c.classLoader

while (loader!=null) {
println "class loader is an instance of ${loader.getClass()} called $loader"
loader = loader.parent
}
println "<bootstrap loader>"
}
If applied here to foo.getClass() and Foo.class you can compare the outputs and should see that at least the first line differs. The fix is more easy said than done. Only a loader common to both should define Foo. Either that has to be done introducing a new class loader, or a class loader that takes URLs has to handle the jar containing Foo (and all dependencies).

Sunday, October 14, 2012

Open Blocks and MOP 2

In my last post I was describing how owner, delegate and this are needed in open blocks and how builders and resolve strategies change the behaviour of an open block. I also stated that this situation is not very satisfying for the next MOP.

The question is then, if there is a deeper principle we can use. The reason why many see a groovy.lang.Closure not as a closure in the more functional sense is partially, because it is resolving names dynamically. Dynamically in the sense, the a closure, once created at runtime, has all names resolved. We on the other hand resolve the names on demand. I will call the imaginary part doing this "dynamic resolver". This dynamic resolver is currently either owner or delegate or a mix between them. But the important part is, that we have something that resolves the name for us.

Now I had the following idea for the resolving of the implicit this part, consisting of two parts... Part one is that I want to let each sub block share the same resolver unless a new resolver is set. Newly created Closures then would get that resolver. The standard resolver would resolve everything against the surrounding class. And if I say "standard" and "new", this means you can set a new resolver, which all sub blocks will then share. The question is if we then can still cover all the bases with this approach and if it is actually better than the old way.

Not-nested Builder

Let us assume we have a builder, that captures each method call, thus if the Closure delegate is set with delegate-only, we will never call the surrounding class. This is actually very much the resolver idea. But why was this not enough in the past? Because we may want to go to the class as well, either after or before the delegate is asked. In my idea the resolver would have to do that. To be able to do this though, we need somehow a resolver for the class too, which I will declare to be always available through the API and thus can be called from the custom resolver as well. This way we have the default OWNER_ONLY and can create DELEGATE_FIRST, DELEGATE_ONLY and OWNER_FIRST quite easily.

Open Sub Blocks

For the usage of an open block in a Closure the idea says, we just reference the resolver from the owner. This realizes by default OWNER_ONLY/OWNER_FIRST. Just like today, we would kind of go to the owner and resolve the call there in a further step, which may end up in a delegate or another owner to continue from there. The difference with the approach here is, that we don't actually go to the owner. Instead we give the resolver to our new Closure and the Closures's behaviour will then be defined by this. If this sub Closure had no delegate set, then OWNER_ONLY and OWNER_FIRST in the old way are equal. Since if there is no delegate set, we don't go that path at all. If the delegate is set we have nested builder, which I will handle later. So for now it is important to note, that regardless what the parent Closure has set as resolver, we effectively realize a OWNER_ONLY/OWNER_FIRST.

Nested Builder

So going back to the sub Closure with a delegate set, we first to note that a builder of some kind will set that delegate, meaning we can set a new resolver here as well. Now with OWNER_ONLY we would not realize any builder since the builder would never be called. With OWNER_FIRST we ask first the parent and then maybe the delegate, depending on if there was a response to the request before or not. If we set a resolver, that first asks the old resolver and then the builder, we get this strategy. With DELEGATE_ONLY we have a builder, that does not ask the parent at all. Here using a resolver, that resolves everything against the builder only will realize this. DELEGATE_FIRST means we first ask the builder and then the parent. Here it is clear, that if we use a resolver that first asks the old resolver and then the builder, we get that as well.

To Self

TO_SELF is the only strategy I did not mention yet. I see no use for this strategy in terms of builders. But that one can be made too, by a resolver, that resolves against the Closure class, ignoring owner and delegate of course.

Differences

The most obvious difference between this way and the old way is, that instead of potentially going up the tree if Closures to the surrounding class and in worst case going it back down again, we have a chain of resolvers, the amount depending on the amount of nested builders. The other most obvious difference is that instead of depending on a predefined strategy and a combination of owner and delegate we get rid of the owner completely and have a resolver instead of the delegate.

Further differences come of course with the details of the resolver. For the MOP2 the basis should be something that answers to the request of a method call with a method, not with the result of the method call. And it should answer if the method call is allowed to be cached or not. With the current way of piping everything through the MOP methods on Closure those two goals are impossible. A general meta class as for a normal class is not enough here, since we have to handle the "implicit this" different. Anything I came up with so far, looked quite complicated. Complex as diagram and difficult to explain too. With this concept I can say that everything goes to the resolver and be done. I think that is better understood.

For the resolver itself the question is open if it should be simply an object and we go by the mop methods (returning methods now) comparable to today, or if it should be an explicit resolver thingy, maybe even as general MOP2 element.

I guess you can call this a draft so far ;)

Monday, October 08, 2012

Owner, Delegate and (implicit) this in an Open Block

Many know of course groovy.lang.Closure, they know about delegates and such, but maybe not so many know why these things exist.

What is a Open Block?

That is basically what is at runtime represented as groovy.lang.Closure or short Closure. Please note that I don't use closure, since an open block can be a kind of closure, but is less limited than a closure. Open Blocks are no lambda expressions either, since they can contain 0-n statements. For more syntactical information please go to http://groovy.codehaus.org/Closures.

The captured call

http://groovy.codehaus.org/Builders shows some examples about what builder are, but essentially they are hierarchical structures, with a capturing ability. You don't really need to nest one open block into another to get that of course. But those Builder are exactly the reason why we have owner, delegate and "this".

To explain this in more detail, let us start with a normal Java block
{ foo () }
You notice, this is a simple call to a method named foo, that is supposed to be defined somewhere outside of the code part we look at. For example foo might be defined in the same class, that contains this code block. It is clear, that foo() is equal to this.foo() here. In the case above I call that to have an "implicit this". There are languages that don't have that of course. Smalltalk and JavaScript come to my mind. In Java "this" and the "implicit this" always refer to the enclosing class.

"Implicit this" in Groovy

Groovy now does this a bit different for open blocks. In Groovy the implicit this is like a reference to the capturing mechanism built into the open block, realized by the Groovy MOP and for a builder. Therefore the call will in Groovy be resolved to the owner, the delegate or to "this". Actually, Groovy is the only programming language I know in which "this" and "implicit this" are essentially different. I know of differing type variants for them in other languages, many don't even have an "implicit this"... but if they have it normally means they are aligned.

Coming from wanting to support the Groovy builder structure, it is clear we want some kind of capturing, thus it is clear, that we cannot simply do the call on the class outside. On the other hand, assume you have an XML-Builder, that is supposed to turn all calls into xml tags. How would you distinguish a programmer wanting to produce <foo/> (whatever sense that may have) from wanting to call a method from the class, to for example increase a counter, prepare a state or something similar. Led by this thought we decided to make the "implicit this" different from the explicit one.

People knowing the pre 1.0 history of Groovy a bit, can tell, that in the early days a "this" in such a block referred to the groovy.lang.Closure instance. This was at another point changed into having to have the builder instance passed around and making "this" and "implicit this" equal. Both versions seemed not to be what we wanted. The first version conflicts with the Java style, making that a quite phony structure. The second version is even more phony and it made builders absolutely not a nice experience. The compromise solution was then to have the explicit "this" bypass the MOP of the groovy.lang.Closure part and let it call into the surrounding class directly, while the "implicit this" goes through the groovy.lang.Closure MOP part. Ignoring owner and differing resolve strategies the MOP at this point is simply, look if a delegate is set, and if it is, try calling the method on the delegate. If the call succeeds, we are done, if not, fall back to "this".

Open Blocks interacting with Builders

Having a delegate we can realize many builder already quite easily. But the devil comes with the details. Assume we use our xml builder like this:
xml.outerElement {
  10.times { innerElement () }
}
This is supposed to produce an element named outerElement, containing 10 times an <innerElement/> part. You may ask why this is difficult. There is one thing about builder you have to know, and that is for each element of the hierarchy, that means for each Closure, you have to set a delegate. You can do this only, if you are capturing the method call. But since we capture only calls with "implicit this", the times call will not be captured. It is a qualified call through the usage of the number 10. Still we want to refer to the builder delegate "xml" outside from with the block given in the times call. Since we cannot set the delegate for that one from the builder, we need a different MOP here. And this is the point where "owner" comes into play. The "owner" is the "structure" containing/owning our Closure. That is either a class, or another Closure. So in the example above the Closure in the outerElement call is the owner of the Closure in the times call. We then change our MOP to not simply fall back to "this", instead we fall back to "owner". Then our innerElement() call will at first be resolved to the delegate that times set. But since times did not set one, it will go to the owner, which is the Closure given to the outerElement call. The outerElement call has a delegate set through our xml builder. With that we then create our <innerElement/>, as we want it.

This means all four parts, owner, delegate, this and implicit this, are required elements for the Groovy MOP.

Resolving Strategies

In the part above I stated the delegate will be resolved against first. That is actually not always right. It actually depends on what the builder sets as strategy. And the default in Groovy is to resolve against the owner first. If you think back that at first "this" referred to the Closure itself and not the surrounding class, it should be clear, that this kind of strategy is a left over from back then. Because without it, you would not be able to call any method from the class if you have an "capture them all" builder, like a xml builder usually is. So the reason that this is the default is historic. There have been heated debates about what the default should be and there are multiple ways, all with pros and cons, but this default was the result back then. Later we added a set of strategies you can use... DELEGATE_FIRST to first look at the delegate and then at the owner, DELEGATE_ONLY, to stop after the delegate, OWNER_FIRST the default, OWNER_ONLY to stop resolving the call after looking at the owner and there is TO_SELF as well, which would resolve the call against the Closure itself only. Again I have to add a detail to the MOP here. Even though it is called OWNER_FIRST and DELEGATE_FIRST, the first thing we will do is to try to resolve the call against the Closure itself. That makes the default MOP a 3-step procedure: try to resolve method against Closure instance, owner, delegate. It is similar for DELEGATE_FIRST or the "only"-variants.

At this point you may have also noticed a practical difference between DELEGATE_FIRST and OWNER_FIRST. If you use an all-capturing builder inside of an all-capturing-builder structure and you use owner first, then the method call will be trapped by the outer builder. If you use the delegate first the inner builder will do that instead. It depends on your use cases what is better in your situation.


Static Type Checking

With Groovy 2.0, Groovy now also offers an optional static type checker, that allows static type checking for a subset of Groovy programs. Things like a builder are highly dynamic structures and difficult to check statically. Languages like Kotlin and Scala have a problem here. Sure you can make builders in them, but when it comes to nested builders you have to be more verbose and pass the builder around all the time and something similar to the early stages in Groovy. And I don't want even to mention that for a html builder for example you have to define somewhere methods for every element. Considering xml and its probable infinite amount of elements, you get into trouble here, even if you ignore owner, delegate this and implicit this as well es resolving strategies.

In Groovy++ the "solution" was to make a kind of mixed mode, that simply doesn't fail compilation if a method is not found in the normal static context. In fact that was always one of the conflict points between the Groovy team and Alex. We, that especially includes me, found that ignoring a missing method defies the purpose of a static compilation and that you loose most benefits from this. The only remaining one actually is that everything else is near Java speed. But static type safety is completely lost.

The idea Alex did not come up with was a helper annotation called @DelegatesTo from Peter Niederwieser. The idea is to mark the Closure parameter in the builder method with that annotation to allow it to tell the compiler, what kind of delegate this method will use, maybe including the resolve strategy. We already have a framework for "static type checker plugins", allowing to hook into the type checker and influence how method calls are resolved. we hope with a combination of both we can even solve cases like the xml builder. Of course this is current development and target for Groovy 2.1. We have to see with what Cedric will come up with in the end, but what we discussed so far sounded promising, and finally solves a longstanding problem

Groovy 3

The next major version will be in 2013 Groovy 3.0 and include a new MOP, that is still supposed to get its full shape. Implementation wise the way owner, delegate and (implicit this) are used together with the resolving strategy actually impose quite a problem to an efficient implementation. We don't want to force users using @DelegateTo. That annotation is only a helper for static compilation. The problem stems from the fact, that we have a quite long method resolving process here. we may have to go through a longer chain of Closure objects and have to test each time if a method is present or not. And if the delegate is changed later on, caching becomes almost impossible. This is a problem for the current implementation, for the invokedynamic port and for anything in Groovy 3 as well, as long as the capabilities should stay the same. Probably I will be using a series of SwitchPoint to solve this, I have yet to see, if that really gives me the desired speed. It is not only speed that matters to me here for Groovy 3. One goal in Groovy 3 is to make the MOP more easy and this kind of mechanism is not easy. I hope with the help of the community I will be able to solve this problem as well.

Sunday, January 08, 2012

The invokedynamic API

I thought I should write a bit about the invokedynamic API, since the API is very powerful and flexible, but sometimes needs a bit getting used to it. I won't write about everything or even in detail, just some hints for thinking on a few elements - the ones I used for Groovy mostly and are most probably the ones you will use as well.

Let us say we are in the situation, that we have the method we want to call as handle already and we have the target type of our call site. I call that method handle SM (for selected method) and the target type TT - for obvious reasons.

Now normally in Groovy you would do the following: you transform the arguments into a form you can use for calling the method. If you for example have an int, but want to call an long taking method, then you have to transform the int into a long. You normally do this by using a general type transforming method, that takes the argument and a goal type, inspects the argument and then goes through quite big code parts for all the transformations that are allowed. We do this not during method selection itself, but for each call of course. What we do is kind of like wrapping the method object into something new, that we can call in our call site and that for each call will test if it has to apply the big standard transformation - and well, apply it too.

If you read that big transformation code, then you will have a direct transformation of the arguments to whatever the SM needs. In invokedynamic we work a slightly bit different. Instead of having only one transformation we have many small ones, that we combine into a specialized one. We do this by taking our SM, apply transformers to it and use the result for our method calls. Since the main work is now no longer correcting the big transformer code, but the combination of the many small transformers the workflow feels kind of reversed to me. You now want to apply transformers to make out of your SM, one that accepts the target type TT.

Some notes for better understanding:

  • One thing to remember when working with MethodHandles for example is that it has no receiver. In a method call foo.bar(x), we say normally that foo is the receiver (well the class of foo), bar the method name and x the argument. For a MethodHandle it itself is the receiver of course. So if we work on its arguments, then the first argument is the receiver of our method call. A MethodHandle for above would maybe have this MethodType: (Foo,X)Object - taking the foo receiver and an x argument, returning an Object.
  • The classes you work with mostly... You have MethodType to describe a method's parameters, including return type and receiver. There is MethodHandle of course, the core of it all, representing your SM. And there is a collection of helper methods in the class MethodHandles. Note the additional s. Well, and SwitchPoint, but I haven't used it really yet... that is still to come soon.
  • Type matching... your SM has to be changed by the usage of the transforms to fit the requested target type of your call site. For this you use for example asType

But without further delay...

MethodHandles#dropArguments is one quite useful method, that at the same time shows very much the reversed thinking that needs to be applied. This is a transform, that will drop arguments, it does not do it right away. In the old thinking we have for example arguments a,b,c and if we would drop there, we would for example get a,b. This transform *will* do exact this, but what we look at gives a different impression. We have a handle that takes A,B (A being type of a and B being type of b) and dropping then means we will take that and produce a new handle that can take A,B,C. This new handle will then ignore the argument c of type C and in doing so, it does exactly what I described above, but if we debug the code producing the combinations of transformers we see a method handle that gets the dropArguments transformation applied and now takes one more argument instead of one less. So again.... assuming you want to get rid of an argument you start with the handle, that is without it and have to transform it into a handle that takes it... for me that is like reversed thinking and really sometimes produces problems for me. You may want to apply this one if your SM takes less arguments than provided.

MethodHandle#bindTo is also one I use often. Assuming we have a handle taking A,B,C and A will be the same for each call, we can bind it to produce a new handle taking B,C... but only for the first argument. So for example if you have a method call that will be done through the meta class system, then this results mostly in calling a method MetaClass#invokedMethod. But the receiver there is not the receiver from my callsite, so I bind the first argument, the meta class. A changed meta class would require a new method selection so it is kind of constant for this selection. You may want to apply this one if your SM has one more argument than the ones provided and it is the first one

But what to do if we have more than one extra argument our SM would like to take? In this case you use MethodHandles#insertArguments. For example your SM would like to take A,B,C, your TT is A,B and you want to bind a c. Then you can use newHandle = insertArguments(oldHandle,2,c).If it where A,B,C,D, and you want to bind for C and D, then it is insertArguments(oldHandle,2,c,d). If you have A,B,C,D and you want to bind B and D, then it is for example (there are two ways): newHandle = insertArguments(oldHandle,1,b); newHandle = insertArguments(newHandle,2,d). You may have noticed the chaining of transforms in this one. The first makes A,B,C,D into A,C,D, moving D from position 3 to position 2. The second makes A,C then. You may want to apply this one if your SM has more arguments than the ones provided.


MethodHandle#asCollector is also a nice one. There are several invokeMethod methods in Groovy that take an Object[] as argument to contain all the arguments and then delegate calls to builders or do other things. The callsite you start with though may not provide the Object[], but the arguments one by one. So your TT may look like this: A,B,C,D and your SM may accept this: A,Object[]. Meaning we somehow have to wrap B,C,D into an Object[], if thought from the perspective of the arguments. And that is why the method is called asCollector. What you see on your method handle is that with handle = handle.asCollector(Object[].class, 3), your A,Object[] handle is now a A,Object,Object,Object handle. You will need asType to get to the final form. Should you want to collect the arguments at a different place than the last one, you may have to permute the arguments. The opposite way is asSpreader, but I didn't use that one yet. I may do so when I extend the implementation to include the spread operator. You may want to apply this one if your SM has arguments you want to collect into an array

Sometimes we have to make an transformation, that changes the runtime class of a reference type. This is of course not possible, because the JVM is strong typed - but we can create a new object with the desired result. For example Groovy has GString, which is a kind of String. A method call done with an GString argument is supposed to be able to call a method that takes a String. Now GString and String are not in a subclass relation, meaning that we will have to do a real type transformation here and MethodHandles#filterArguments will help us with that. A filter takes one argument and returns the transformed value. For our case it should take a GString and return a String. Let us say our SM takes A,String,String,B; TT be A,GString,GString,B and our filter MethodHandle takes GSTRING. Then we can simply do newHandle = MethodHandles.filterArguments(oldHandle, 1, GSTRING, GSTRING) to produce a handle for A,GString,GString,B. Again I had here some problems with the reversed thinking: The filter's return type must match the type in the SM and the argument type the type of our TT. If you think about it, it is quite clear and obvious, but when I work on a program I always have to stop here and rethink. Anyway... You may want to apply this one if your SM argument differs form the provided argument in a way that boxing and casting alone don't do the transformation requried.

And the last one I want to mention: MethodHandles#guardWithTest. In many situations you have to handle the invalidation of your call site to some extend. One way is for example (if you use a MutableCallsite) to have a guard causing exchanging the target for your callsite by a new method selection. Let us assume we have a call foo(a,x) and the methods foo(Object,Object) and foo(Object,String). Now x may be a String and we want to call foo(Object,String) then, or we want to call foo(Object,Object) if it is not String. Let us assume there will be three calls: The first done with an Integer, the second with String and the last one with an Integer again. We will have to exchange the method each time. A guard takes a number of arguments (0-N) and returns a boolean, which will be used to determine if we call our normal target or a fallback handle. For this case here I have used a handle called SAME_CLASS. It takes a class and an Object argument and tests if argument.getClass() is equal to the provided class. Since the first argument, the class, is fixed I bind it upfront, leaving me a guard handle which takes an Object argument only. guardWithTest does not support any position, but we want to check the second argument to the method, not the first... and there is the receiver too. If you did not jump to this section, then you may find we are in a situation in which we have less arguments than given... only it is not the SM, but our guard now. But that doesn't matter. Still we drop the first two to get a handle Object,Object,Object, which will ignore the 0 position argument, the receiver, and the argument at position 1, but test the one at position 2. Then it is simply newHandle = MethodHandles.guardWithTest(guard, oldHandle, fallback) and be done with it. If you want to have multiple guards, you continue to apply this kind of transform. I haven't found a way to combine guards themselves to have only one guardWithTest call. But maybe that isn't needed too. You may apply this one if your call site needs to be invalidated based on the given arguments.

This is just a short overview and by far nowhere near complete or a replacement for reading the javadoc. Just a little guide that may be of help.

Thursday, November 03, 2011

The Java Way, simple Type Inference and Flow Sensitice Typing

In "Groovy static type checker: status update" C├ędric gave one of his favorite type checking examples. Even though the example made some things clear that I was not so clear about in my last blog post, I still think we need to look at this example a bit more in detail.

Perfectly fine Groovy code like this:

class A {
void foo() {}
}
def name = new A()
name.foo()
name = 1
name = '123'
name.toInteger()

is a problem for a static type checker and I want to explain once more in a bit more detail why. There are currently 3 ways to approach this code...

The Way of Java
In this version we try to be as much Java as possible, but obviously we have to give the def a meaning. In standard Groovy this is basically equal to Object. As I wrote in my last blog post, the view on types is in Groovy a tiny bit different compared to Java. Anyway, if def is just an alias for Object and we compile this code with Java's type rules, then the code will not compile:

class A {
void foo() {}
}
def name = new A() // name is of type Object
name.foo() // error: foo() does not exist on Object
name = 1 // assigning Integer to Object typed name is allowed
name = '123' // assigning String to Object typed name is allowed
name.toInteger() // error: toInteger() does not exist on Object

While the assignments would all work just fine, the method calls will not pass, since those methods are not available on Object.

Simple Type Inference
This seems to be the way groovypp goes. If I am wrong about it, feel free to correct me. Again we have to give def a meaning and this time we use right hand side of the assignment to do so. For the remaining code we stay more or less with the Java rules. The result is then this:

class A {
void foo() {}
}
def name = new A() // name is of type A
name.foo() // no problem, foo() exists on A
name = 1 // error: assigning Integer to A typed name
name = '123' // error: assigning String to Object typed name
name.toInteger() // error: toInteger() does not exist on A

Instead of the two problem from before we have no 3, 2 of them at a different position in our code. If we want that piece of code to compile then those two approaches won't do it.

Flow Sensitive Typing
In this third version we still have to give def a meaning, but this time the meaning is not fixed:

class A {
void foo() {}
}
def name = new A() // name is of flow type A
name.foo() // no problem, foo() exists on A
name = 1 // name becomes of flow type Integer
name = '123' // name becomes of flow type String
name.toInteger() // no problem, toInteger() does exist on String

With this we reach our goal - who would have guessed that ;)

The difficult thing for a Java programmer here probably is, that name is not of a fixed type. In fact, looking at many papers in the area of formal semantics, most type systems out there use the flow type only for type checks, not to actually give a variable a type. On the other hand, if you look at the Java way, you could say, that simple type inference is just an enhancement. Instead of letting the user write the type, the compiler will set the type. There are actually many old languages that support that kind of logic. This is really nothing new. Still, if we see that as an enhancement, then saying we let the compiler set the type of a variable automatically at more than one place can be considered as just the next step.

But I am getting side tracked... I only wanted to show why exactly this example is causing a problem or why not. That is all. [UPDATE: I had to reformat the article a bit, because I had problems with my java script based syntax highlighter and with the line length of some code examples]

Wednesday, October 26, 2011

Flow Sensitive Typing?

While we (Guillaume, Cedric and myself) had a meeting in Paris, we talked about the typing system of Grumpy a bit.

Coming from a dynamic language and going static often feels quite limiting. For me the main point of a static type system is to ensure the code I just wrote is not totally stupid. Actually many would say the purpose is to ensure the correctness of the code, but maybe I am a more dynamic guy, because I think this is impossible to achieve for a poor compiler. So a static compiler usually checks

  • method calls, to ensure the method I want to call really exists
  • fields/properties, to ensure they exist
  • check assignments, to ensure right-hand-side and left-hand-side have compatible types
  • check type usage, for complying with the type system (including generics, class declarations and so on)
  • and others...
So usually if a compiler detects a violation it will cause a compilation error and if it cannot check things the code will probably include runtime checks.

Optional Typing

Now Groovy has, what we call optional typing. In Groovy the compiler won't check fields, properties or methods on their existence, since the compiler cannot be totally sure we really mean some entity, that exists at compile time. Groovy allows you to create/remove/add/replace methods at runtime and attach them to classes via their MetaClass. A program that would not compile statically, may run just fine with those additions. What the Groovy compiler does though is to check type usage to some extend. So you can for example not extend an interface. The Groovy compiler has to do this, because the JVM is also typed to some extend, and doesn't allow directly for arbitrary type systems. sure there are ways around, but that always means to reduce the high integration with Java, and we don't want that.

Another aspect is for assignments. If you assign a value to a typed variable in Groovy, then the compiler won't ensure this works at runtime, but it will add a cast, that may fail. Effectively this means for a typed variable in Groovy: We guarantee you, that if the assignment works, the value assigned to the variable, will be at least of the declared type.

This implies for example for a String typed variable, that you assign a non String to, that its toString() method is called. We call that way of casting a Groovy cast, and the rules for it are actually quite complex.

Still there are enough cases we could actually check, if we would know the type of the right-hand-side. In general we don't know that type, because for example a method call is done there and then we cannot ensure the type. By the time we actually reach that point in the code, the method might have been replaced.

Strong Typing
If you follow the discussions about typing, then you will most probably see very fast, that dynamic and static typing might be kind of defined, but beyond that, there are often conflicting definitions for other terms. For example some say that Groovy is not strong typed, while Java is. In my definition strong typing means that an value cannot change its type at runtime, without creating first a new value. In Java we have this situation for example for boxing. You can assign an int to an Object, but not without the value being boxed, thus a new value being created. Now in Groovy this is just the same. An int cannot become an Integer or a String, just like that. We depend on the type system, enforced on us by the JVM, and the JVM is strong typed... well that may change in the future, but for now it is strong typed. In Groovy you can add methods for example and with it changing the interface a value provides, but there is no way for a value of a certain class to become the same value with a totally different class, without a new value being created and that one used instead.

Flow Sensitive Typing
Flow Sensitive Typing is not unknown in the world. It is normally used to for example find the type of a complex expression, to then check that with the actual allowed type in an assignment. Now in Groovy we want to go a bit a different way. Basically we want to have not a fixed static type, instead each assignment can specify a new one. If you defined a variable using "def", then in normal Groovy all assignments to it are allowed. Basically we see "def" as Object in Groovy. But if you want static method checks, you still want something like "def str = "my string"; println str.toUpperCase()" to work. This case can so far be solved also by type inference. But in Groovy you can do also this: "def v = 1; v = v.toString(); println v.toUpperCase()". Even though we start out with an int, we assign later a String to v. If we work only with a simple inferencing system, this will not compile. But since it is of course our goal to make a wide range of Groovy programs available even in a static checked version, we would of course like to allow this. And a simple flow analysis can indeed give us the information, that the flow type of v change to String and thus the toUpperCase() method exists. In other words, this would compile. Taking into consideration, that "def" in Groovy doesn't mean much more than Object, we don't want this being limited to "def" only. We want also to allow this: "Object v = 1; v = v.toString(); println v.toUpperCase()" Java would not allow for this. Sure, you can assign the 1, you can even call toString() and assign it to v, but because v is declared as Object, the compiler would start barking at you for the toUpperCase() call. Our thinking is, that there is not really a need to limit the type system like this. As in Groovy, we would again give the guarantee, that v is at least an Object. But the specific type is in Groovy depending on the runtime type, on Grumpy on the flow type. Something Grumpy would for example still not allow is "int i = new Object()"

But till now this flow sensitive type system is not approved of by the community.


Feeling Grumpy?

Grumpy, might have been mentioned a few times here and there already, but what is it? 

It is a little project for Groovy we are currently working on and the project title is for now Grumpy. Grumpy is no final name, we use it really just until we found a final and more serious name.

The goal of Grumpy is to have a static type checker for Groovy driven by an annotation. This will be optional and not cause any changes to normal Groovy. We see this as part of the Java migration story, in which people may not want the dynamic type system of Groovy everywhere. Grumpy is also no static compiler, it will be normal Groovy under the hood. I will not exclude a future static compiler based on Grumpy, but that is not the primary purpose of Grumpy. also we don't want to compete with Scala here. For my taste their type system is too complex. We may go beyond Java's system though, if we think it makes sense.

Basically there is a static type checker with Groovy++ (and a static compiler), but integrating that into Groovy just for type checking will mean either to integrate huge parallel structures, that no one but the Groovy++ people do understand. Or it means to invest a lot of time to transfer everything over, probably more than it takes to write the type checker new. Thus we decided to make a new type checker and try to involve the community as much as possible on every step.


How will it work?
So far you will be able to annotate methods or classes and an AST transform will then perform some type checking on that code. We don't want any new syntax constructs for Grumpy. This means only a subset of Groovy will be accepted by Grumpy.

Can I mix dynamic typed code and Grumpy?
Yes you can. If you use the per method level annotations you can just use a different method for the dynamic or grumpy code. So far we have no annotation that will turn dynamic typing on again for the case you used the class level annotation, but that may follow in the future.

When will it be available?
Grumpy will be part of Groovy 1.9, the next beta will already include a pretty advanced Grumpy.

What will Grumpy do about the GDK?
For those, that don't know... the GDK is a set of methods we use to enhance standard Java classes. For example we have an "each" method on Collections, to iterate over the list using a Groovy Closure. Grumpy will support all GDK methods, but to enable type checking in those Closure blocks, there will have to be much more work.

Much more work?
In for example "int i=1; [1,2].each {i += it}" you want to ensure nothing is added to i, that is not compatible with int. Since we cannot possible let the compiler know about how each of those GDK methods is working by code, we will have to find a way to do that more automatically. The Java type system with generics is not really able to express our needs here. For example if you iterate a Map using each, you have one variant, that takes a Map.Entry, another one, that takes key and value. If you just declare everything as Object, you won't gain all that much in terms of type checking. most probably we will have a second annotation for this kind of thing, in which we will store a String with extra type information. The goal here is to let the compiler add this annotation on its own for Groovy code, but for predefined Java code we will of course have to depend on the user doing that, or not having the type information.

Anyway, I am sure Grumpy will be an interesting project. Feel free to suggest names