Wednesday, October 26, 2011

Flow Sensitive Typing?

While we (Guillaume, Cedric and myself) had a meeting in Paris, we talked about the typing system of Grumpy a bit.

Coming from a dynamic language and going static often feels quite limiting. For me the main point of a static type system is to ensure the code I just wrote is not totally stupid. Actually many would say the purpose is to ensure the correctness of the code, but maybe I am a more dynamic guy, because I think this is impossible to achieve for a poor compiler. So a static compiler usually checks

  • method calls, to ensure the method I want to call really exists
  • fields/properties, to ensure they exist
  • check assignments, to ensure right-hand-side and left-hand-side have compatible types
  • check type usage, for complying with the type system (including generics, class declarations and so on)
  • and others...
So usually if a compiler detects a violation it will cause a compilation error and if it cannot check things the code will probably include runtime checks.

Optional Typing

Now Groovy has, what we call optional typing. In Groovy the compiler won't check fields, properties or methods on their existence, since the compiler cannot be totally sure we really mean some entity, that exists at compile time. Groovy allows you to create/remove/add/replace methods at runtime and attach them to classes via their MetaClass. A program that would not compile statically, may run just fine with those additions. What the Groovy compiler does though is to check type usage to some extend. So you can for example not extend an interface. The Groovy compiler has to do this, because the JVM is also typed to some extend, and doesn't allow directly for arbitrary type systems. sure there are ways around, but that always means to reduce the high integration with Java, and we don't want that.

Another aspect is for assignments. If you assign a value to a typed variable in Groovy, then the compiler won't ensure this works at runtime, but it will add a cast, that may fail. Effectively this means for a typed variable in Groovy: We guarantee you, that if the assignment works, the value assigned to the variable, will be at least of the declared type.

This implies for example for a String typed variable, that you assign a non String to, that its toString() method is called. We call that way of casting a Groovy cast, and the rules for it are actually quite complex.

Still there are enough cases we could actually check, if we would know the type of the right-hand-side. In general we don't know that type, because for example a method call is done there and then we cannot ensure the type. By the time we actually reach that point in the code, the method might have been replaced.

Strong Typing
If you follow the discussions about typing, then you will most probably see very fast, that dynamic and static typing might be kind of defined, but beyond that, there are often conflicting definitions for other terms. For example some say that Groovy is not strong typed, while Java is. In my definition strong typing means that an value cannot change its type at runtime, without creating first a new value. In Java we have this situation for example for boxing. You can assign an int to an Object, but not without the value being boxed, thus a new value being created. Now in Groovy this is just the same. An int cannot become an Integer or a String, just like that. We depend on the type system, enforced on us by the JVM, and the JVM is strong typed... well that may change in the future, but for now it is strong typed. In Groovy you can add methods for example and with it changing the interface a value provides, but there is no way for a value of a certain class to become the same value with a totally different class, without a new value being created and that one used instead.

Flow Sensitive Typing
Flow Sensitive Typing is not unknown in the world. It is normally used to for example find the type of a complex expression, to then check that with the actual allowed type in an assignment. Now in Groovy we want to go a bit a different way. Basically we want to have not a fixed static type, instead each assignment can specify a new one. If you defined a variable using "def", then in normal Groovy all assignments to it are allowed. Basically we see "def" as Object in Groovy. But if you want static method checks, you still want something like "def str = "my string"; println str.toUpperCase()" to work. This case can so far be solved also by type inference. But in Groovy you can do also this: "def v = 1; v = v.toString(); println v.toUpperCase()". Even though we start out with an int, we assign later a String to v. If we work only with a simple inferencing system, this will not compile. But since it is of course our goal to make a wide range of Groovy programs available even in a static checked version, we would of course like to allow this. And a simple flow analysis can indeed give us the information, that the flow type of v change to String and thus the toUpperCase() method exists. In other words, this would compile. Taking into consideration, that "def" in Groovy doesn't mean much more than Object, we don't want this being limited to "def" only. We want also to allow this: "Object v = 1; v = v.toString(); println v.toUpperCase()" Java would not allow for this. Sure, you can assign the 1, you can even call toString() and assign it to v, but because v is declared as Object, the compiler would start barking at you for the toUpperCase() call. Our thinking is, that there is not really a need to limit the type system like this. As in Groovy, we would again give the guarantee, that v is at least an Object. But the specific type is in Groovy depending on the runtime type, on Grumpy on the flow type. Something Grumpy would for example still not allow is "int i = new Object()"

But till now this flow sensitive type system is not approved of by the community.


Feeling Grumpy?

Grumpy, might have been mentioned a few times here and there already, but what is it? 

It is a little project for Groovy we are currently working on and the project title is for now Grumpy. Grumpy is no final name, we use it really just until we found a final and more serious name.

The goal of Grumpy is to have a static type checker for Groovy driven by an annotation. This will be optional and not cause any changes to normal Groovy. We see this as part of the Java migration story, in which people may not want the dynamic type system of Groovy everywhere. Grumpy is also no static compiler, it will be normal Groovy under the hood. I will not exclude a future static compiler based on Grumpy, but that is not the primary purpose of Grumpy. also we don't want to compete with Scala here. For my taste their type system is too complex. We may go beyond Java's system though, if we think it makes sense.

Basically there is a static type checker with Groovy++ (and a static compiler), but integrating that into Groovy just for type checking will mean either to integrate huge parallel structures, that no one but the Groovy++ people do understand. Or it means to invest a lot of time to transfer everything over, probably more than it takes to write the type checker new. Thus we decided to make a new type checker and try to involve the community as much as possible on every step.


How will it work?
So far you will be able to annotate methods or classes and an AST transform will then perform some type checking on that code. We don't want any new syntax constructs for Grumpy. This means only a subset of Groovy will be accepted by Grumpy.

Can I mix dynamic typed code and Grumpy?
Yes you can. If you use the per method level annotations you can just use a different method for the dynamic or grumpy code. So far we have no annotation that will turn dynamic typing on again for the case you used the class level annotation, but that may follow in the future.

When will it be available?
Grumpy will be part of Groovy 1.9, the next beta will already include a pretty advanced Grumpy.

What will Grumpy do about the GDK?
For those, that don't know... the GDK is a set of methods we use to enhance standard Java classes. For example we have an "each" method on Collections, to iterate over the list using a Groovy Closure. Grumpy will support all GDK methods, but to enable type checking in those Closure blocks, there will have to be much more work.

Much more work?
In for example "int i=1; [1,2].each {i += it}" you want to ensure nothing is added to i, that is not compatible with int. Since we cannot possible let the compiler know about how each of those GDK methods is working by code, we will have to find a way to do that more automatically. The Java type system with generics is not really able to express our needs here. For example if you iterate a Map using each, you have one variant, that takes a Map.Entry, another one, that takes key and value. If you just declare everything as Object, you won't gain all that much in terms of type checking. most probably we will have a second annotation for this kind of thing, in which we will store a String with extra type information. The goal here is to let the compiler add this annotation on its own for Groovy code, but for predefined Java code we will of course have to depend on the user doing that, or not having the type information.

Anyway, I am sure Grumpy will be an interesting project. Feel free to suggest names