Monday, December 2, 2013

Dealing with unexpected sharing in languages

Most programming languages use data sharing to improve performance, but this can bite you in the a** when you didn't expect sharing.  Here's a simple example (python) and simple solution that I ran into recently.

I had this code:

respdict={}
for val in ['foo','bar','baz']:
  respdict[val] = lambda arg: (arg == val+"")
myfunc = respdict['foo']
print myfunc('foo')
print myfunc('bar')

False
False

hunh????  I expected True and False.

What's happening is that 'val' is being shared across all the lambda's (anonymous functions) that are being "created" in the for-loop.  While this sounds crazy, it's a actually fairly common in languages which use sharing to improve performance.

There's complex per-language solutions, but here's a much simpler one: just "force" the language to make a copy of the inner value by performing some no-op operation on the data.  Numbers aren't shared so it's a non-issue, and other data types you need to call an explicit copy function.  In languages with smart optimizing compilers, watch out for the optimizer which can out-smart you and know that the operation is a no-op, and remove it-- if this happens to you, just be extra devious e.g. hit the string with a regexp.

respdict={}
def mkfunc(val):
  return lambda arg: (arg == val)
for val in ['foo','bar','baz']:
  respdict[val+""] = mkfunc(val+"")

myfunc = respdict['foo']
print myfunc('foo')
print myfunc('bar')

True
False