Earlier today I was having trouble trying to pickle a namedtuple instance. As a sanity check, I tried running some code that was posted in another answer. Here it is, simplified a little more:
from collections import namedtuple import pickle P = namedtuple("P", "one two three four") def pickle_test(): abe = P("abraham", "lincoln", "vampire", "hunter") f = open('abe.pickle', 'w') pickle.dump(abe, f) f.close() pickle_test()
I then changed two lines of this to use my named tuple:
from collections import namedtuple import pickle P = namedtuple("my_typename", "A B C") def pickle_test(): abe = P("ONE", "TWO", "THREE") f = open('abe.pickle', 'w') pickle.dump(abe, f) f.close() pickle_test()
However this gave me the error
File "/path/to/anaconda/lib/python2.7/pickle.py", line 748, in save_global (obj, module, name)) pickle.PicklingError: Can't pickle <class '__main__.my_typename'>: it's not found as __main__.my_typename
i.e. the Pickle module is looking for
my_typename. I changed the line
P = namedtuple("my_typename", "A B C") to
P = namedtuple("P", "A B C") and it worked.
I looked at the source of
namedtuple.py and at the end we have something that looks relevant, but I don’t fully understand what is happening:
# For pickling to work, the __module__ variable needs to be set to the frame # where the named tuple is created. Bypass this step in enviroments where # sys._getframe is not defined (Jython for example) or sys._getframe is not # defined for arguments greater than 0 (IronPython). try: result.__module__ = _sys._getframe(1).f_globals.get('__name__', '__main__') except (AttributeError, ValueError): pass return result
So my question is what exactly is going on? Why does the
typename argument need to match the name of the factory for this to work?
In the section titled What can be pickled and unpickled? of the Python documentation it indicates that only “classes that are defined at the top level of a module” can be pickled. However
namedtuple() is a factory function which is effectively defining a class (
my_typename(tuple) in your second example), however it’s not assigning the manufactured type to a variable named
my_typename at the top level of the module.
This is because
pickle saves only the “fully qualified” name of such things, not their code, and they must be
importable from the module they’re in using this name in order to be able to unpickled later (hence the requirement that the module must contain the named object at the top level).
This can be illustrated by seeing one workaround for the problem—which would be to change one line of the code so that the type named
my_typename is defined at the top level:
P = my_typename = namedtuple("my_typename", "A B C")
Alternatively, you could just give the
namedtuple the name
"P" instead of
P = namedtuple("P", "A B C")
As for what that
namedtuple.py source code you were looking at does: It’s trying to determine the name of module the caller (the creator of the
namedtuple) is in because the author knows that
pickle might try to use it to
import the definition to do unpickling and that folks commonly assign the result to variable with the same name that they passed to the factory function (but you didn’t in the second example).