Seeing as I've done a slightly better version of the Python code for a previous article, I wouldn't want to seem to be biased, so here's the same approach used for the corresponding Java code.
Whether you think this is better or not, or if you'd prefer it different in some other way, is something you'd be able to tell me if we were pairing. I have to admit that at this point, having made these changes, I now also think that it would have been better to do TDD after all, as, although it wouldn't have been faster in this case, the code might well have been better. Doing this TDD, or pairing, is left as an exercise for the interested reader.
package com.oocode;
import java.io.*;
import java.util.*;
public class Graph2Csv {
private Map<String, Node> nodes = new HashMap<String, Node>();
public static void main (String[] args) throws IOException {
new Graph2Csv().run();
}
private void run() throws IOException {
BufferedReader input = new BufferedReader(new InputStreamReader(System.in));
String line = input.readLine();
while(line != null){
if(line.contains(" -> ")){
String[] parts = line.split(" -> ");
String source = parts[0];
String destination = parts[1].substring(0,parts[1].length()-1);
node(destination).incoming++;
node(source).outgoing++;
}
line = input.readLine();
}
System.out.println("incoming,node name,outgoing");
for(Node node : nodes.values()){
System.out.println(node.incoming+","+node.name+","+node.outgoing);
}
}
private Node node(String nodeName) {
if(!nodes.containsKey(nodeName)){
nodes.put(nodeName, new Node(nodeName));
}
return nodes.get(nodeName);
}
private class Node{
public Node(String nodeName) {
this.name = nodeName;
}
public int incoming,outgoing;
public String name;
}
}
Having been for a swim and not thought about the code in a previous article, looking at it again it struck me that it could be simpler, so here's the simpler version:
import sys
nodes = {}
lines = sys.stdin.readlines()[2:-2]
class Node:
def __init__(self,name):
self.name=name
self.incoming=0
self.outgoing=0
for line in lines:
source, destination = line.split(" -> ")
destination = destination[:-2]
nodes.setdefault(destination,Node(destination)).incoming += 1
nodes.setdefault(source,Node(source)).outgoing += 1
print "incoming,node name,outgoing"
for node in nodes.values():
print str(node.incoming)+","+node.name+","+str(node.outgoing)
I'd have posted it as a comment, but I couldn't work out how to get it to format correctly without putting in more effort than it was worth.
In the previous article, I wrote the "graph to csv" converting code in Python, because I thought it would be a suitable tool for the job. A few years ago, I ran a workshop with Keith Braithwaite called "Why Java programmers should learn Python", one of the premisses being that you should choose the correct tool for the job; Java isn't the best language for every problem. Too many Java programmers, particularly many graduates these days whose degree courses are dominated by Java, don't have a wide enough experience of other styles of language. Learning other languages is both good for your brain and often useful.
There are some problems for which using Python can get you a solution sooner than Java, and the "type safety" of Java and other statically typed languages is often less important than you might think. Today, I thought I'd see what the Python code from the previous article looked like if re-written in Java to see if it would shed any light on these assertions.
Here is a Java version of the code - I've tried to make it reasonable - if you can see any improvements for either version, then please feel free to post them as comments, or post on your own blog and put a link in a comment.
package com.oocode;
import java.io.*;
import java.util.*;
public class Graph2Csv {
public static void main (String[] args) throws IOException{
Map<String, Integer> incoming = new HashMap<String, Integer>();
Map<String, Integer> outgoing = new HashMap<String, Integer>();
Set<String> nodes = new HashSet<String>();
BufferedReader input = new BufferedReader(new InputStreamReader(System.in));
String line = input.readLine();
while(line != null){
if(line.contains(" -> ")){
String[] parts = line.split(" -> ");
String source = parts[0];
String destination = parts[1].substring(0,parts[1].length()-1);
increment(incoming, destination);
increment(outgoing, source);
nodes.add(source);
nodes.add(destination);
}
line = input.readLine();
}
System.out.println("incoming,node name,outgoing");
for(String node : nodes){
System.out.println(num(incoming,node)+","+node+","+num(outgoing,node));
}
}
private static int num(Map<String, Integer> incoming, String node) {
return incoming.containsKey(node) ? incoming.get(node) : 0;
}
private static void increment(Map<String, Integer> map, String node) {
if(!map.containsKey(node)){
map.put(node, 0);
}
map.put(node, map.get(node) + 1);
}
}
The Java version is slightly longer and I don't think the static typing adds either to the clarity or the correctness of the code. On the other hand there are some great Java IDEs that do most of the typing for you, so overall I probably made the same number of keystrokes in both cases. In this code, I think the new features of Java 5 made it better than if I'd used an older Java.
The type safety, or otherwise, doesn't affect my confidence in either version of the code nearly as much as the fact that neither version has automated tests. I've tested both versions manually with one sample input, and they both look like they work, but if either has a bug, it's not going to be spotted by static type checking, but by testing.
So why didn't I use TDD? Several reasons:
I'm writing it by myself, for myself, rather than as part of a team.
It's a small, standalone, throw away piece of code, not part of a larger project or code that other people are relying on.
I don't care much whether it works for anything other than the one input I wrote it for.
It didn't seem like the sort of code where I'd be quicker doing it TDD.
If you think your favourite language would be just the thing for this problem, then please post a comment containing the code, or a link to the code on your blog.
I've finally got around to looking at some of the new features in Java 5.0. When Java 5.0 first came out, it seemed like the new features were mostly Java (ironically) catching up with C#. (BTW - my background includes several years of Java, then some C#, then some more Java, then some more C# and currently Java again).
What I've seen so far looks good, but there are some things that I'd have preferred to be slightly different. This article is about Annotations, and I might write future articles on other of the new features in Java 5.0.
Annotations allow you to add metadata, examples include object/relational mapping and specifying test methods. I won't go into details here - just a simple example (it's left to the interested reader to find out more).
To define an annotation type called Meta:
import java.lang.annotation.Retention;
import java.lang.annotation.RetentionPolicy;
@Retention(RetentionPolicy.RUNTIME)
public @interface Meta {
String data() default "fast";
}
To use that annotation:
...
@Meta(data = "slow")
public void foo(){
...
}
...
To print out the meta data that the annotation defines for the example above:
public static void main(String[] args) {
for (Method method : MyClass.class.getMethods()) {
Annotation[] annotations = method.getAnnotations();
int num = annotations.length;
if(num==1){
System.out.println(((Meta)annotations[0]).data());
}
}
}
When defining an annotation type, you can specify when the annotation should be available; in a nutshell, before compilation (RetentionPolicy.SOURCE), after compilation (RetentionPolicy.CLASS) or at runtime (RetentionPolicy.RUNTIME). The default is RetentionPolicy.CLASS - not what I would have expected. My guess is that the main use of annotations will be for metadata that you want available at runtime, and certainly the examples that I've seen so far are nearly all runtime.
The fact that the syntax looks so similar for an annotation type compared to an interface - only an "@" different - I think is more confusing than helpful. An annotation type cannot extend or implement anything. The "attributes" (things that look like method declarations) cannot have parameters. Also, as you can see from the example above, annotation types implicitly, rather than explicity, implement "Annotation".
Pair-programming with another one of my wonderful team colleagues last week, we came across a tricky little bug caused by a (correctly behaving) feature of Java Calendar.
We tried to replicate the bug (to do with matching something within a date range) by running the application as described in the bug report. The application worked just fine.
We eventually tracked down the problem - setting the system time to the afternoon we replicated the bug. If we'd started on this bug after lunch, instead of first thing, we'd have found it straight away. I'm sure there's a moral there somewhere!
The HOUR of a Calendar is based on a 12 hour clock - we found some code that was setting the HOUR (to 0) when it should have been setting the HOUR_OF_DAY. The effect of setting the HOUR to 0 is that before 12pm, the hour part of the date is set to 0, and after 12pm the hour is set to 12 - this lead to the bug we had to fix.
OK - so it's not very interesting - but hopefully at least if you managed to stay awake during the whole of this article you'll remember it next time you come across something using Calendar.HOUR and the alarm bells will ring.
We were lucky that someone spotted this bug - it could have easily been missed. How do you find something like this where the bug only appears in the afternoon?
Also - it feels to me like the feature of being able to set the HOUR rather than the HOUR_OF_DAY is a bit superfluous and almost invites bugs like the one we fixed. How much time is saved by the existence of this feature compared to the time wasted by it? What does the time wasted by this feature tell us about how to write APIs? The experience has reminded me of a talk at a company conference run by one of my previous employers; "Features - just say no!".
A colleague of mine told me how he'd measured the time to calculate the square root of 10 million numbers as part of the evaluation of a product. One of the suprising things is just how fast this is on a modern average specification PC. Have a guess how long this would take in Java doing a simple loop implementation. Answer at the end - it might suprise you.
It seemed to me that this "problem" would be just the sort of thing you could do in the language J very simply; here's some code that works (whether this is how a J master would do it is quite another question):
%: ?10000000#10000000
Yes - that's all!
to explain:
10000000#10000000
gives you a list of 10000000 numbers, each of value 10000000
?10000000#10000000
gives you a list of 10000000 random numbers, each bewteen 0-10000000 (it applies the ? function to every element in the list, ? being a function that gives you a random number from 0 - it's parameter)
%: is the square root function, so %: ?10000000#10000000 gives you a list which is the square root of each number in the list.
So - to answer the speed question, on a desktop machine, the Java version took about half a second. (The speed of the J version will have to wait until I find out how to time things in J, probably really easy but I just haven't bothered to look yet; I was more interested in the neatness of the code rather than the performance).
Just a short addition to part 2 - I was writing some code the other day and came across another closure related problem with Python. I wanted to create some buttons, with event callbacks, dynamically; something like this:
class Foo:
def __init__(self):
self.callbacks = [lambda event: self.callback(name) for name in ['bob','bill','jim']]
def callback(self, name):
print name
def run(self):
for callback in self.callbacks:
callback("some event")
Now, what do you suppose happens when you run "Foo().run()"?
You get:
jim
jim
jim
This is because there is only one "name" variable (so Nat Pryce tells me). A simple solution is to do:
class Foo:
def __init__(self):
self.callbacks = [self.makeCallback(name) for name in ['bob','bill','jim']]
def makeCallback(self, name):
return lambda event: self.callback(name)
def callback(self, name):
print name
def run(self):
for callback in self.callbacks:
callback("some event")
I'm sure there are many other solutions too. For the curious, in Ruby the equivalent code would be:
class Foo
def initialize()
@callbacks = ['bob','bill','jim'].map {|name| Proc.new {|event| callback(name)}}
end
def callback(name)
print name
end
def run()
@callbacks.each { |callback| callback.call("some event") }
end
end
Foo.new.run()
which works just fine. However, I still chose Python over Ruby for pragmatic reasons (a bit ironic I know) - the libraries, tools and support are all superior.
Closures in Python (part 2)
This assumes that you've read Martin Fowler's article on closures. Part 1 shows a translation of Martin Fowler's Ruby code into Python, both a direct translation and a more idomatic translation using Python's "list comprehensions" (which are arguably neater for doing lots of the sorts of things that you use closures for in Smalltalk or Ruby). From this, you might think that Python can handle closures like Ruby or Smalltalk can, but this isn't quite the case.
In the non-list comprehension examples, the "lambda" keyword for creating a closure in Python can only be used with an expression, not any arbitrary code. This happens to be OK for the examples in Martin's article, but consider something just a tiny bit more complicated:
Let's say you wanted to do:
map(lambda each: if each.isManager: each.salary = 2000, employees)
You can't. "if each.isManager: each.salary = 2000" isn't an expression.
Instead, you'd have to define a function (which doesn't take much syntax):
def anonymousFunction(employee):
if employee.isManager: employee.salary = 2000
then you can do:
map(anonymousFunction, employees)
(As an aside, "map" returns the collection of the result of executing the function on every element of a collection. We just want to execute the function and don't care about the return result; there's no equivalent to that in Python other than just ignoring the return result which is what we'll do. It's not a problem.)
There are other problems too. Consider the following code:
def totalCostOfManagers(emps):
total = 0
def anonymousFunction(employee):
if employee.isManager: total = total + employee.salary
map(anonymousFunction, emps)
return total
This looks like it should give you the total of the manager's salaries. (Ignore the fact that there are other ways to do this, it's just an example). Try to execute this and you get:
UnboundLocalError: local variable 'total' referenced before assignment
This is because the "total" inside "anonymousFunction" is different to the "total" inside "totalCostOfManagers". When you do an assignment to a variable, it is created if it didn't already exist (in that scope). (If I find a suitable reference I'll edit this and put it here).
One way around this would be not to try to assign to "total" itself, but rather have "total" refer to a list and assign to an element in that:
def totalCostOfManagers(emps):
total = [0]
def anonymousFunction(employee):
if employee.isManager: total[0] = total[0] + employee.salary
map(anonymousFunction, emps)
return total[0]
This is the sort of thing you might also do with an anonymous inner class in Java, where you also can't do assignments to variables in an outer scope.
A slightly subtle thing that I haven't mentioned at all so far is the difference between the "closures" you've see in Python and those in Smalltalk or Ruby. In Smalltalk a closure is an object defining a "value" method. That is, to execute the code of a Smalltalk closure, you'd send it the message "value", with parameters as appropriate. The equivalent in Python would be something like: (given that "emps" has a method "do" that accepts an object with a "value" method)
def totalCostOfManagers(emps):
total = [0]
class AnonymousClass:
def value(self, employee):
if employee.isManager: total[0] = total[0] + employee.salary
emps.do(AnonymousClass())
return total[0]
You can do the equivalent in Java using an anonymous inner class. (Note in Python that "self" (or "this") is explicit, and also has to be included as the first parameter in method definitions.)
(If you want to try this out, you could use the following:
class Employee:
pass
ivan = Employee()
ivan.name="ivan"
ivan.isManager = False
ivan.salary = 2
tim = Employee()
tim.name="tim"
tim.isManager = True
tim.salary = 5
class Employees:
def __init__(self):
self.emps = [tim, ivan]
def do(self, block):
for e in self.emps:
block.value(e)
and execute:
print totalCostOfManagers(Employees())
to see it work.)
You might also consider:
def totalCostOfManagers(emps):
class AnonymousClass:
def __init__(self):
self.total = 0
def value(self, employee):
if employee.isManager: self.total = self.total + employee.salary
block = AnonymousClass()
emps.do(block)
return block.total
Note that "__init__" defines the constructor for AnonymousClass, which is called by doing "AnonymousClass()" (there's no "new" keyword needed).
In Smalltalk, closures look like objects with a "value" method. In Python it is more idomatic to use a function instead, as you've seen earlier. To invoke a Python function, you put "()" after it. So, back to basics; if you have a function "foo":
def foo():
return "hi mum"
then "foo" is a reference to the function, and "foo()" executes the function, i.e. evaluating: "print foo" results in something like "<function foo at 0x008F7970>" and evaluating "print foo()" results in "hi mum".
So, rather than defining "do" to accept an object with a "value" method, more idomatic would be to use the built in function "map" and pass it a function (as shown earlier). In Python, you can make any object look like a function by defining a "__call__" method. So back to the example, another way to implement it would be:
def totalCostOfManagers(emps):
class AnonymousClass:
def __init__(self):
self.total = 0
def __call__(self, employee):
if employee.isManager: self.total = self.total + employee.salary
block = AnonymousClass()
map(block, emps)
return block.total
(execute "print totalCostOfManagers([tim, ivan])", with "tim" and "ivan" defined as before, to see it work).
Note the regularity in Python of having functions/methods callable (e.g. "foo()"), classes callable (e.g. the constructor "AnonymousClass()" and instances callable (e.g. the instance of "AnonymousClass").
To clarify, try:
class Foo:
def __call__(self):
return "hi mum"
def bar(self):
return "yo dude"
someFoo = Foo()
print someFoo.bar()
print someFoo()
OK - let's admit the truth - it's looking like a closure style isn't working so great here. Let's just revert to a simple loop:
def totalCostOfManagers(emps):
total = 0
for employee in emps:
if employee.isManager: total = total + employee.salary
return total
So - what was all that effort for? Why are closures such a Good Thing? Where a closure becomes really neat is if you have to do something more complicated. For example, consider if you have to do something like this:
def totalCostOfManagers(emps):
total = 0
try:
emps.startSomething()
employee = emps.next()
while(employee != None):
if employee.isManager: total = total + employee.salary
employee = emps.next()
finally:
emps.endSomething()
return total
where "emps" is some object defining "startSomething", "endSomething" and "next" methods that have to be called like in this method, and the "endSomething" has to be called whether you finish looping through all the employees or not.
(You can use:
class Employees:
def __init__(self):
self.emps = [tim, ivan]
def startSomething(self):
pass
def endSomething(self):
pass
def next(self):
if len(self.emps) == 0:
return None
return self.emps.pop()
print totalCostOfManagers(Employees())
to try this out. Of course, this is not how a real implementation would look, it's just to illustrate the calling method "totalCostOfManagers")
This pattern is typical of some types of code, e.g. database related code. If you had to do lots of things similar to this, but slightly different, you can easily end up with lots of duplicated code.
For example:
def totalCostOfNonManagers(emps):
total = 0
try:
emps.startSomething()
employee = emps.next()
while(employee != None):
if not employee.isManager: total = total + employee.salary
employee = emps.next()
finally:
emps.endSomething()
return total
In Java, I've seen lots of code like this that is mostly duplicated with just a bit different. At first sight, to many Java developers it might look like too much effort to remove the duplication, but actually it's not too hard. This is a case where closures really help (or anonymous inner classes in Java). You can put the bulk of this code where it belongs, in the Employees class:
class Employees:
#whatever ...
def do(self, fun):
try:
self.startSomething()
employee = self.next()
while(employee != None):
fun(employee)
employee = self.next()
finally:
self.endSomething()
Then the definition of "totalCostOfManagers" doesn't need to worry about most of that stuff and looks just like the code from earlier (but "emps" is an instance of "Employees" rather than a simple list):
def totalCostOfManagers(emps):
total = [0]
def anonymous(employee):
if employee.isManager: total[0] = total[0] + employee.salary
emps.do(anonymous)
return total[0]
However, as we've seen earlier, this would be neater if we could write it as just a simple loop.
Python has a trick up it's sleeve, called a "generator". If we define "do" as:
class Employees:
#whatever ...
def do(self):
self.startSomething()
employee = self.next()
while(employee != None):
yield employee
employee = self.next()
self.endSomething()
The "yield" keyword creates a "generator", which, as far as a "for" loop is concerned, is a method that looks like it returns a list, and as far as the "generator" method is concerned, looks like a method that "returns" one element at a time. Unfortunately, "'yield' not allowed in a 'try' block with a 'finally' clause", so I've deleted that until someone can tell me a way around that limitation (arghhh!). Anyway, back to the "generator" version of "do"; the calling code now becomes:
def totalCostOfManagers(emps):
total = 0
for employee in emps.do():
if employee.isManager: total = total + employee.salary
return total
which is nice and simple again.
Another tweak is also available. If we change the name of "do" to "__iter__":
class Employees:
#whatever ...
def __iter__(self):
self.startSomething()
employee = self.next()
while(employee != None):
yield employee
employee = self.next()
self.endSomething()
Then, our calling code becomes:
def totalCostOfManagers(emps):
total = 0
for employee in emps:
if employee.isManager: total = total + employee.salary
return total
and we've gone back to something that looks really simple!
Our other method now becomes:
def totalCostOfNonManagers(emps):
total = 0
for employee in emps:
if not employee.isManager: total = total + employee.salary
return total
which removes much of the duplication. Removing the last of the duplicaion is left as an exercise for the reader.
I miss the really neat syntax of closures in Smalltalk, and the fact that everything just works like it should in Smalltalk with syntax that is truth and beauty. However, despite thinking that Smalltalk is a better language, I use Python in my work for various reasons.
I have never been able to choose the main implementation language for a commercial project. (Yes, now I think of it, really, never.) It's always been dictated either by what's already been done, or for projects that I've joined from the start, it's been chosen by someone else (someone else less technical than me, that is!). These days, that means Java or C#. However, even on these projects, there's always the need for little scripts for automating parts of the development process or doing other useful things. Python is better than Smalltalk for this sort of thing, as it's easy to share Python programs with other people, and to move them from machine to machine. You just copy the relevant text file, edit it in your favourite text editor if required, and "Python whatever.py" and it works. Smalltalk doesn't have that convenience.
Some people use shell scripts for that sort of thing, but Python is a very much better language, and also works cross platform. Compared to Perl or Ruby, Python has very little syntax, which makes it easy to read even if you haven't been doing much Python recently. I'm a "bear of very little brain" (or something like that), so I want to be able to read code without having to remember what the syntax means. With Python, that's very easy. I can also read other people's Python code, which helps!
Python now has a very large following, and it's easy to create Python wrappers for "C" programs, so the quality and quantity of libraries is fantastic. The main Python implementation itself is also stable and reasonably performant.
From a language point of view, Python is less scary to the majority of Java developers than Smalltalk (although that's a deficiency on their behalf rather than Smalltalk's, sometimes you have to make compromises to get something accepted). Also, people have heard of Python and aren't frightened by it. The common reaction is "oh, that's like Perl but more readable, isn't it?", whereas the reaction to Smalltalk tends to be fear. Python also has some nice language features, such as list comprehensions and generators, which make it easy to write code that's easy to read.
If you want to learn a language with good support for closures, and one of the best languages ever invented, try Smalltalk. If you want to use a language that's easy to learn and useful in your day-to-day work, try Python. If you want to improve as a developer, learn both.
| Ruby | Python (Direct translation, using "lambda") |
def managers(emps)
return emps.select {|e| e.isManager}
end
|
def managers(emps):
return filter(lambda e: e.isManager, emps)
|
def highPaid(emps)
threshold = 150
return emps.select {|e| e.salary > threshold}
end
|
def highPaid(emps):
threshold = 150
return filter(lambda e: e.salary > threshold, emps)
|
def paidMore(amount)
return Proc.new {|e| e.salary > amount}
end
|
def paidMore(amount):
return lambda e: e.salary > amount
|
highPaid = paidMore(150) john = Employee.new john.salary = 200 print highPaid.call(john) |
highPaid = paidMore(150) john = Employee() john.salary = 200 print highPaid(john) |
| Ruby | Python (Idiomatic translation, using "list comprehensions") |
def managers(emps)
return emps.select {|e| e.isManager}
end
|
def managers(emps):
return [e for e in emps if e.isManager]
|
def highPaid(emps)
threshold = 150
return emps.select {|e| e.salary > threshold}
end
|
def highPaid(emps):
threshold = 150
return [e for e in emps if e.salary > threshold]
|
It's good for you to learn a new programming language every now and again and the more you'll learn from it the better, so it's probably best to pick something quite different to any that you already know. Therefore, on the recommendation of Romilly Cocking, I've chosen J, invented by Kenneth E. Iverson - the inventor of APL. Romilly assured me that I'd be able to become proficient in a matter of only 2 or 3 years ;-)
J has really succinct syntax for mathematical things.
Some examples:
a list of the numbers 6, 4 and 9 is 6 4 9
each item in that list squared is 6 4 9 ^ 2
2 to the power of each item in that list is 2 ^ 6 4 9
a list from 0 to 9 is i.10
therefore, a list from 1 to 10 is 1 + i.10
J doesn't just handle one dimensional lists of numbers, but rather multidimensional matrices.
For example:
6 4 9 ^/ 2 3
produces a matrix of 6, 4 and 9 squared and 6, 4 and 9 cubed, i.e.:
36 216
16 64
81 729
that matrix with 1 added to every item is simply:
1 + 6 4 9 ^/ 2 3
produces:
37 217
17 65
82 730
Another cute example is a random lottery ticket generator (thanks to Adewale Oshineye for this):
sort 1 + 6?49
I've been learning by following the tutorials that it comes with - they are very good but sometimes goes off into mathematics that is way beyond anything I understand.
J also comes with graphing and some demo applications including solitaire! There's so much more but I won't be covering it here - find out for yourself.
Last night I co-ran (with Tim Mackinnon and Steve Freeman) an "Introduction to Smalltalk" workshop for some fellow ThoughtWorks colleagues.
It reminded me (as a now sadly ex-Smalltalker) that Smalltalk is Truth and Beauty in a programming language.
There's very little in the language but what there is is great. It's really powerful, productive, terse and readable. Smalltalk is a big improvement over it's successors, such as Ruby and Python. Not only that, but rather than predict the future, Alan Kay invented it (along with the rest of the Smalltalk team at Xerox Parc in the 70's). Windows, networked workstations etc. that are now ubiquitous.
I've been trying out a few implementations: VisualWorks (commercial, multiplatform and very full featured environment and libraries) Dolphin (commercial, windows only and very easy to get on with) and Squeak (free open source, multiplatform and a vibrant community - Alan Kay is the lead of the Squeak team).