October 13, 2004

Closures in Python (part 2)

Closures in Python (part 2)

This assumes that you've read Martin Fowler's article on closures. Part 1 shows a translation of Martin Fowler's Ruby code into Python, both a direct translation and a more idomatic translation using Python's "list comprehensions" (which are arguably neater for doing lots of the sorts of things that you use closures for in Smalltalk or Ruby). From this, you might think that Python can handle closures like Ruby or Smalltalk can, but this isn't quite the case.

Limitation of lambda

In the non-list comprehension examples, the "lambda" keyword for creating a closure in Python can only be used with an expression, not any arbitrary code. This happens to be OK for the examples in Martin's article, but consider something just a tiny bit more complicated:

Let's say you wanted to do:


map(lambda each: if each.isManager: each.salary = 2000, employees)

You can't. "if each.isManager: each.salary = 2000" isn't an expression.
Instead, you'd have to define a function (which doesn't take much syntax):


def anonymousFunction(employee):
	if employee.isManager: employee.salary = 2000

then you can do:

	
map(anonymousFunction, employees)

(As an aside, "map" returns the collection of the result of executing the function on every element of a collection. We just want to execute the function and don't care about the return result; there's no equivalent to that in Python other than just ignoring the return result which is what we'll do. It's not a problem.)

Assignment considered awkward

There are other problems too. Consider the following code:


def totalCostOfManagers(emps):
	total = 0
	def anonymousFunction(employee):
		if employee.isManager: total = total + employee.salary
	map(anonymousFunction, emps)
	return total

This looks like it should give you the total of the manager's salaries. (Ignore the fact that there are other ways to do this, it's just an example). Try to execute this and you get:

	
UnboundLocalError: local variable 'total' referenced before assignment

This is because the "total" inside "anonymousFunction" is different to the "total" inside "totalCostOfManagers". When you do an assignment to a variable, it is created if it didn't already exist (in that scope). (If I find a suitable reference I'll edit this and put it here).

Work-around for assignment

One way around this would be not to try to assign to "total" itself, but rather have "total" refer to a list and assign to an element in that:


def totalCostOfManagers(emps):
	total = [0]
	def anonymousFunction(employee):
		if employee.isManager: total[0] = total[0] + employee.salary
	map(anonymousFunction, emps)
	return total[0]

This is the sort of thing you might also do with an anonymous inner class in Java, where you also can't do assignments to variables in an outer scope.

Making functions look like objects

A slightly subtle thing that I haven't mentioned at all so far is the difference between the "closures" you've see in Python and those in Smalltalk or Ruby. In Smalltalk a closure is an object defining a "value" method. That is, to execute the code of a Smalltalk closure, you'd send it the message "value", with parameters as appropriate. The equivalent in Python would be something like: (given that "emps" has a method "do" that accepts an object with a "value" method)


def totalCostOfManagers(emps):
	total = [0]
	class AnonymousClass:
		def value(self, employee):
			if employee.isManager: total[0] = total[0] + employee.salary
	emps.do(AnonymousClass())
	return total[0]

You can do the equivalent in Java using an anonymous inner class. (Note in Python that "self" (or "this") is explicit, and also has to be included as the first parameter in method definitions.)

(If you want to try this out, you could use the following:


class Employee:
	pass
	
ivan = Employee()
ivan.name="ivan"
ivan.isManager = False
ivan.salary = 2

tim = Employee()
tim.name="tim"
tim.isManager = True
tim.salary = 5

class Employees:
    def __init__(self):
        self.emps = [tim, ivan]
    def do(self, block):
        for e in self.emps:
            block.value(e)

and execute:


print totalCostOfManagers(Employees())

to see it work.)

You might also consider:


def totalCostOfManagers(emps):
	class AnonymousClass:
		def __init__(self):
			self.total = 0
		def value(self, employee):
			if employee.isManager: self.total = self.total + employee.salary
	block = AnonymousClass()
	emps.do(block)
	return block.total

Note that "__init__" defines the constructor for AnonymousClass, which is called by doing "AnonymousClass()" (there's no "new" keyword needed).

Making objects look like functions

In Smalltalk, closures look like objects with a "value" method. In Python it is more idomatic to use a function instead, as you've seen earlier. To invoke a Python function, you put "()" after it. So, back to basics; if you have a function "foo":


def foo():
	return "hi mum"

then "foo" is a reference to the function, and "foo()" executes the function, i.e. evaluating: "print foo" results in something like "<function foo at 0x008F7970>" and evaluating "print foo()" results in "hi mum".

So, rather than defining "do" to accept an object with a "value" method, more idomatic would be to use the built in function "map" and pass it a function (as shown earlier). In Python, you can make any object look like a function by defining a "__call__" method. So back to the example, another way to implement it would be:


def totalCostOfManagers(emps):
	class AnonymousClass:
		def __init__(self):
			self.total = 0
		def __call__(self, employee):
			if employee.isManager: self.total = self.total + employee.salary
	block = AnonymousClass()
	map(block, emps)
	return block.total

(execute "print totalCostOfManagers([tim, ivan])", with "tim" and "ivan" defined as before, to see it work).

Note the regularity in Python of having functions/methods callable (e.g. "foo()"), classes callable (e.g. the constructor "AnonymousClass()" and instances callable (e.g. the instance of "AnonymousClass").

To clarify, try:


class Foo:
	def __call__(self):
		return "hi mum"
	def bar(self):
		return "yo dude"
someFoo = Foo()
print someFoo.bar()
print someFoo()

Admitting defeat for now, and moving onto something more difficult

OK - let's admit the truth - it's looking like a closure style isn't working so great here. Let's just revert to a simple loop:


def totalCostOfManagers(emps):
	total = 0
	for employee in emps:
		if employee.isManager: total = total + employee.salary
	return total

So - what was all that effort for? Why are closures such a Good Thing? Where a closure becomes really neat is if you have to do something more complicated. For example, consider if you have to do something like this:


def totalCostOfManagers(emps):
	total = 0
	try:
		emps.startSomething()
		employee = emps.next()
		while(employee != None):
			if employee.isManager: total = total + employee.salary
			employee = emps.next()
	finally:
		emps.endSomething()
	return total

where "emps" is some object defining "startSomething", "endSomething" and "next" methods that have to be called like in this method, and the "endSomething" has to be called whether you finish looping through all the employees or not.

(You can use:


class Employees:
	def __init__(self):
		self.emps = [tim, ivan]
	def startSomething(self):
		pass
	def endSomething(self):
		pass
	def next(self):
		if len(self.emps) == 0:
			return None
		return self.emps.pop()
		
print totalCostOfManagers(Employees())

to try this out. Of course, this is not how a real implementation would look, it's just to illustrate the calling method "totalCostOfManagers")

Now for some duplication, and how to remove it.

This pattern is typical of some types of code, e.g. database related code. If you had to do lots of things similar to this, but slightly different, you can easily end up with lots of duplicated code.

For example:


def totalCostOfNonManagers(emps):
	total = 0
	try:
		emps.startSomething()
		employee = emps.next()
		while(employee != None):
			if not employee.isManager: total = total + employee.salary
			employee = emps.next()
	finally:
		emps.endSomething()
	return total

In Java, I've seen lots of code like this that is mostly duplicated with just a bit different. At first sight, to many Java developers it might look like too much effort to remove the duplication, but actually it's not too hard. This is a case where closures really help (or anonymous inner classes in Java). You can put the bulk of this code where it belongs, in the Employees class:


class Employees:

	#whatever ...
	
	def do(self, fun):
            try:
                self.startSomething()
                employee = self.next()
                while(employee != None):
                        fun(employee)
                        employee = self.next()
            finally:
                self.endSomething()       

Then the definition of "totalCostOfManagers" doesn't need to worry about most of that stuff and looks just like the code from earlier (but "emps" is an instance of "Employees" rather than a simple list):


def totalCostOfManagers(emps):
	total = [0]
	def anonymous(employee):
		if employee.isManager: total[0] = total[0] + employee.salary
	emps.do(anonymous)
	return total[0]

However, as we've seen earlier, this would be neater if we could write it as just a simple loop.

Generators

Python has a trick up it's sleeve, called a "generator". If we define "do" as:


class Employees:

	#whatever ...

	def do(self):
            self.startSomething()
            employee = self.next()
            while(employee != None):
                    yield employee
                    employee = self.next()
            self.endSomething() 

The "yield" keyword creates a "generator", which, as far as a "for" loop is concerned, is a method that looks like it returns a list, and as far as the "generator" method is concerned, looks like a method that "returns" one element at a time. Unfortunately, "'yield' not allowed in a 'try' block with a 'finally' clause", so I've deleted that until someone can tell me a way around that limitation (arghhh!). Anyway, back to the "generator" version of "do"; the calling code now becomes:


def totalCostOfManagers(emps):
	total = 0
	for employee in emps.do():
		if employee.isManager: total = total + employee.salary
	return total

which is nice and simple again.

Generators as Iterators

Another tweak is also available. If we change the name of "do" to "__iter__":


class Employees:

	#whatever ...

	def __iter__(self):
            self.startSomething()
            employee = self.next()
            while(employee != None):
                    yield employee
                    employee = self.next()
            self.endSomething()   

Then, our calling code becomes:


def totalCostOfManagers(emps):
	total = 0
	for employee in emps:
		if employee.isManager: total = total + employee.salary
	return total

and we've gone back to something that looks really simple!

Our other method now becomes:


def totalCostOfNonManagers(emps):
	total = 0
	for employee in emps:
		if not employee.isManager: total = total + employee.salary
	return total

which removes much of the duplication. Removing the last of the duplicaion is left as an exercise for the reader.

Conclusion

I miss the really neat syntax of closures in Smalltalk, and the fact that everything just works like it should in Smalltalk with syntax that is truth and beauty. However, despite thinking that Smalltalk is a better language, I use Python in my work for various reasons.

I have never been able to choose the main implementation language for a commercial project. (Yes, now I think of it, really, never.) It's always been dictated either by what's already been done, or for projects that I've joined from the start, it's been chosen by someone else (someone else less technical than me, that is!). These days, that means Java or C#. However, even on these projects, there's always the need for little scripts for automating parts of the development process or doing other useful things. Python is better than Smalltalk for this sort of thing, as it's easy to share Python programs with other people, and to move them from machine to machine. You just copy the relevant text file, edit it in your favourite text editor if required, and "Python whatever.py" and it works. Smalltalk doesn't have that convenience.

Some people use shell scripts for that sort of thing, but Python is a very much better language, and also works cross platform. Compared to Perl or Ruby, Python has very little syntax, which makes it easy to read even if you haven't been doing much Python recently. I'm a "bear of very little brain" (or something like that), so I want to be able to read code without having to remember what the syntax means. With Python, that's very easy. I can also read other people's Python code, which helps!

Python now has a very large following, and it's easy to create Python wrappers for "C" programs, so the quality and quantity of libraries is fantastic. The main Python implementation itself is also stable and reasonably performant.

From a language point of view, Python is less scary to the majority of Java developers than Smalltalk (although that's a deficiency on their behalf rather than Smalltalk's, sometimes you have to make compromises to get something accepted). Also, people have heard of Python and aren't frightened by it. The common reaction is "oh, that's like Perl but more readable, isn't it?", whereas the reaction to Smalltalk tends to be fear. Python also has some nice language features, such as list comprehensions and generators, which make it easy to write code that's easy to read.

If you want to learn a language with good support for closures, and one of the best languages ever invented, try Smalltalk. If you want to use a language that's easy to learn and useful in your day-to-day work, try Python. If you want to improve as a developer, learn both.

Posted by ivan at October 13, 2004 8:57 PM
Copyright (c) 2004-2008 Ivan Moore
Comments

Assuming emps is iterable, why not:

total = sum([employee.salary for employee in emps if employee.IsManager])

in Python 2.3?

Posted by: Yawar Amin at October 15, 2004 9:07 AM

Many thanks for your comments.
Dave - (1) interesting way to do this - but I don't think very readable and my main point was just that lambdas are restricted to a single expression.
(2) I like this - it's more readable than what I did in the examples.
(3) That was deliberate - it was meant to be some API that isn't very nice and to show how this can be hidden by a generator

Both Dave and Yawar - the "sum" thing is much neater - in trying to make a simple example I've made it simple enough to be made even simpler than I meant (if that makes any sense to you!)

Posted by: ivan at October 15, 2004 10:38 AM

Thanks Ivan - i enjoyed the python insights. Makes me want to learn some more. Hopefully i will get to try it out for real one day.

Posted by: Mike Hill at October 18, 2004 10:53 AM

http://boo.codehaus.org/Closures

http://boo.codehaus.org/Martin+Fowler%27s+closure+examples+in+boo

Posted by: DougHolton at January 3, 2005 7:50 PM