There are some programming habits that I see very frequently that lead to code that's more difficult to understand and maintain than necessary. I'll just cover one in this article and then maybe others in future articles.
Consider the run method of Graph2Csv from the previous article. I often see code where variables are declared outside of the scope where they are used. For example:
private void run() throws IOException {
BufferedReader input = new BufferedReader(new InputStreamReader(System.in));
String line = input.readLine();
String[] parts;
String source;
String destination;
while(line != null){
if(line.contains(" -> ")){
parts = line.split(" -> ");
source = parts[0];
destination = parts[1].substring(0,parts[1].length()-1);
...
The variables parts, source, destination could have been declared inside the loop (as shown previously).
I think this habit comes from experience with some other languages, or earlier education, in which it was either necessary, or deemed good practice, to declare variables at the beginning of functions/procedures/methods.
Delaring these variables outside the loop takes slightly more lines without being any clearer. You don't know, without reading through the whole method, whether the variables are accessed later in the method (if declared within the loop then, of course, they can't possibly be being used outside of the loop).
It's not as obvious whether the code within the loop can be extracted as a method. Looking at this code, I'm wondering whether:
if(line.contains(" -> ")){
parts = line.split(" -> ");
source = parts[0];
destination = parts[1].substring(0,parts[1].length()-1);
node(destination).incoming++;
node(source).outgoing++;
}
can be extracted as a method - maybe called "parseLine", but having the declarations outside of this scope makes me unsure. Fortunately, Eclipse is good enough to do the refactoring correctly, even if it doesn't do a fantastic job of it. You end up with the declarations of parts, source, destination being left behind but unused.
The declaration of variables in an outer scope isn't such a big deal in this example because the code is small and simple enough, but when combined with long methods, particularly if the variable is used later in the method for some other purpose (e.g. if there was another loop later in the method), it can be really difficult to see how to split a long method up into understandable parts.
It would be great if Eclipse (or other IDEs) spotted that these variable declarations could be put into an inner scope and offer to refactor the code to move the declarations for you (if you know of a tool or IDE that does this, please post a comment).
Seeing as I've done a slightly better version of the Python code for a previous article, I wouldn't want to seem to be biased, so here's the same approach used for the corresponding Java code.
Whether you think this is better or not, or if you'd prefer it different in some other way, is something you'd be able to tell me if we were pairing. I have to admit that at this point, having made these changes, I now also think that it would have been better to do TDD after all, as, although it wouldn't have been faster in this case, the code might well have been better. Doing this TDD, or pairing, is left as an exercise for the interested reader.
package com.oocode;
import java.io.*;
import java.util.*;
public class Graph2Csv {
private Map<String, Node> nodes = new HashMap<String, Node>();
public static void main (String[] args) throws IOException {
new Graph2Csv().run();
}
private void run() throws IOException {
BufferedReader input = new BufferedReader(new InputStreamReader(System.in));
String line = input.readLine();
while(line != null){
if(line.contains(" -> ")){
String[] parts = line.split(" -> ");
String source = parts[0];
String destination = parts[1].substring(0,parts[1].length()-1);
node(destination).incoming++;
node(source).outgoing++;
}
line = input.readLine();
}
System.out.println("incoming,node name,outgoing");
for(Node node : nodes.values()){
System.out.println(node.incoming+","+node.name+","+node.outgoing);
}
}
private Node node(String nodeName) {
if(!nodes.containsKey(nodeName)){
nodes.put(nodeName, new Node(nodeName));
}
return nodes.get(nodeName);
}
private class Node{
public Node(String nodeName) {
this.name = nodeName;
}
public int incoming,outgoing;
public String name;
}
}
Having been for a swim and not thought about the code in a previous article, looking at it again it struck me that it could be simpler, so here's the simpler version:
import sys
nodes = {}
lines = sys.stdin.readlines()[2:-2]
class Node:
def __init__(self,name):
self.name=name
self.incoming=0
self.outgoing=0
for line in lines:
source, destination = line.split(" -> ")
destination = destination[:-2]
nodes.setdefault(destination,Node(destination)).incoming += 1
nodes.setdefault(source,Node(source)).outgoing += 1
print "incoming,node name,outgoing"
for node in nodes.values():
print str(node.incoming)+","+node.name+","+str(node.outgoing)
I'd have posted it as a comment, but I couldn't work out how to get it to format correctly without putting in more effort than it was worth.
In the previous article, I wrote the "graph to csv" converting code in Python, because I thought it would be a suitable tool for the job. A few years ago, I ran a workshop with Keith Braithwaite called "Why Java programmers should learn Python", one of the premisses being that you should choose the correct tool for the job; Java isn't the best language for every problem. Too many Java programmers, particularly many graduates these days whose degree courses are dominated by Java, don't have a wide enough experience of other styles of language. Learning other languages is both good for your brain and often useful.
There are some problems for which using Python can get you a solution sooner than Java, and the "type safety" of Java and other statically typed languages is often less important than you might think. Today, I thought I'd see what the Python code from the previous article looked like if re-written in Java to see if it would shed any light on these assertions.
Here is a Java version of the code - I've tried to make it reasonable - if you can see any improvements for either version, then please feel free to post them as comments, or post on your own blog and put a link in a comment.
package com.oocode;
import java.io.*;
import java.util.*;
public class Graph2Csv {
public static void main (String[] args) throws IOException{
Map<String, Integer> incoming = new HashMap<String, Integer>();
Map<String, Integer> outgoing = new HashMap<String, Integer>();
Set<String> nodes = new HashSet<String>();
BufferedReader input = new BufferedReader(new InputStreamReader(System.in));
String line = input.readLine();
while(line != null){
if(line.contains(" -> ")){
String[] parts = line.split(" -> ");
String source = parts[0];
String destination = parts[1].substring(0,parts[1].length()-1);
increment(incoming, destination);
increment(outgoing, source);
nodes.add(source);
nodes.add(destination);
}
line = input.readLine();
}
System.out.println("incoming,node name,outgoing");
for(String node : nodes){
System.out.println(num(incoming,node)+","+node+","+num(outgoing,node));
}
}
private static int num(Map<String, Integer> incoming, String node) {
return incoming.containsKey(node) ? incoming.get(node) : 0;
}
private static void increment(Map<String, Integer> map, String node) {
if(!map.containsKey(node)){
map.put(node, 0);
}
map.put(node, map.get(node) + 1);
}
}
The Java version is slightly longer and I don't think the static typing adds either to the clarity or the correctness of the code. On the other hand there are some great Java IDEs that do most of the typing for you, so overall I probably made the same number of keystrokes in both cases. In this code, I think the new features of Java 5 made it better than if I'd used an older Java.
The type safety, or otherwise, doesn't affect my confidence in either version of the code nearly as much as the fact that neither version has automated tests. I've tested both versions manually with one sample input, and they both look like they work, but if either has a bug, it's not going to be spotted by static type checking, but by testing.
So why didn't I use TDD? Several reasons:
I'm writing it by myself, for myself, rather than as part of a team.
It's a small, standalone, throw away piece of code, not part of a larger project or code that other people are relying on.
I don't care much whether it works for anything other than the one input I wrote it for.
It didn't seem like the sort of code where I'd be quicker doing it TDD.
If you think your favourite language would be just the thing for this problem, then please post a comment containing the code, or a link to the code on your blog.
Graphs are great, but sometimes you want to do analysis on some data. Excel and similar tools can be very powerful - many of my "customers" (in the XP sense of the term) in recent years have been Excel super-power-mega users.
I've often seen developers put too much effort into trying to format output for their customer. It can be quicker for the developers and more useful for the customer to provide a csv file and let the customer decide how they want to see the data (of course, suggesting this to the customer and doing this if they agree, not just unilaterally deciding that's what your going to do). This can also apply to data used only by developers.
Here's a little python program which takes the input to Graphviz used in my previous article, and produces a csv file containing the number of "incoming" and "outgoing" dependencies as shown by the graph in that article.
The code was saved as "graph2csv.py" and executed by running "python graph2csv.py < graph.txt > graph.csv", and here's the raw output
and here it is in Excel: ![]()
import sys
incoming = {}
outgoing = {}
nodes = set()
lines = sys.stdin.readlines()[2:-2]
for line in lines:
source, destination = line.split(" -> ")
destination = destination[:-2]
incoming.setdefault(destination,[]).append(source)
outgoing.setdefault(source,[]).append(destination)
nodes.add(source)
nodes.add(destination)
def num(dictionary, node):
if not dictionary.has_key(node):
return "0"
return str(len(dictionary[node]))
print "incoming,node name,outgoing"
for node in nodes:
print num(incoming,node)+","+node+","+num(outgoing,node)
Graphviz is a great tool for the postmodern programmer. It's got a simple to integrate interface of text files and command line execution. Bjorn Freeman-Benson is visiting at the moment. He suggested we demonstrate graphviz using the dependencies between eclipse plugins. So here's the code that reads the eclipse manifest files and the resulting graph. This graph shows the dependencies between a significant subset of the standard eclipse 3.1 plugins. Bjorn and I have paired on this to bring you "postable" code that doesn't embarrass us too much.
import sys, time
from os import listdir, system
from os.path import join, isdir, exists
import os.path
from zipfile import is_zipfile, ZipFile
class DependencyGrapher:
def run(self, pluginDirectory):
files = [join(pluginDirectory, file) for file in listdir(pluginDirectory)]
graphText = self.process_files(files)
f=open("graph.txt","w")
graphText = 'digraph G {\n "' +\
"generated on " + time.asctime() + '";\n' +\
graphText + "\n}"
f.write(graphText)
f.close()
system("dot -Tpng -ograph.png graph.txt")
def process_files(self, files):
result = ""
for file in files:
if isdir(file):
result += self.process_plugin_directory(file)
if is_zipfile(file):
result += self.process_plugin_jar(file)
return result
def process_plugin_directory(self, directory):
propertiesFileName = join(directory,"META-INF","MANIFEST.MF")
if exists(propertiesFileName):
return self.process_properties(self.graphviz_friendly_name_from_plugin_filename(directory), open(propertiesFileName).read())
return ""
def process_plugin_jar(self, jarName):
try:
jar = ZipFile(jarName)
propertiesFile = jar.read("META-INF/MANIFEST.MF")
return self.process_properties(self.graphviz_friendly_name_from_plugin_filename(jarName), propertiesFile)
finally:
jar.close()
def process_properties(self, pluginName, properties):
result = ""
start = properties.find("Require-Bundle:")
if start != -1:
lines = [line.strip() for line in properties[start+len("Require-Bundle:"):].split('\n')]
for line in [line for line in lines if len(line) > 0]:
lastLine = not line.endswith(',')
if not lastLine:
line = line[:-1]
parts = line.split(';')
line = parts[0]
result += self.graphviz_friendly_name(pluginName) + " -> " + self.graphviz_friendly_name(line) + ";\n"
if lastLine:
break
return result
def graphviz_friendly_name(self, line):
return line.replace('.','_')
def graphviz_friendly_name_from_plugin_filename(self, fileName):
return self.graphviz_friendly_name(os.path.split(fileName)[-1].split('_')[0])
if __name__ == "__main__":
if len(sys.argv) != 2:
print "need plugin directory location"
print 'e.g. python dependencies.py "C:\Program Files\eclipse\plugins"'
sys.exit(1)
pluginDirectory = sys.argv[1]
DependencyGrapher().run(pluginDirectory)