I often include this, or something close to it, in Python scripts and IPython notebooks.
import cPickle def unpickle(filename): with open(filename) as f: obj = cPickle.load(f) return obj
This seems like a common enough use case that the standard library should provide a function that does the same thing. Is there such a function? If there isn’t, how come?
Most of the serialization libraries in the stdlib and on PyPI have a similar API. I’m pretty sure it was
marshal that set the standard,* and
PyYAML, etc. have just followed in its footsteps.
So, the question is, why was
marshal designed that way?
Well, you obviously need
dumps; you couldn’t build those on top of a filename-based function, and to build them on top of a file-object-based function you’d need
StringIO, which didn’t come until later.
You don’t necessarily need
dump, because those could be built on top of
dumps—but doing so could have major performance implications: you can’t save anything to the file until you’ve built the whole thing in memory, and vice-versa, which could be a problem for huge objects.
You definitely don’t need a
dumpf function based on filenames, because those can be built trivially on top of
dump, with no performance implications, and no tricky considerations that a user is likely to get wrong.
On the one hand, it would be convenient to have them anyway—and there are some libraries, like
ElementTree, that do have analogous functions. It may only save a few seconds and a few lines per project, but multiply that by thousands of projects…
On the other hand, it would make Python larger. Not so much the extra 1K to download and install it if you added these two functions to every module (although that did mean a lot more back in the 1.x days than nowadays…), but more to document, more to learn, more to remember. And of course more code to maintain—every time you need to fix a bug in
marshal.dumpf you have to remember to go check
json.dumpf to make sure they don’t need the change, and sometimes you won’t remember.
Balancing those two considerations is really a judgment call. One someone made decades ago and probably nobody has really discussed since. If you think there’s a good case for changing it today, you can always post a feature request on the issue tracker or start a thread on
* Not in the original 1991 version of
marshal.c; that just had
dump. Guido added
dumps in 1993 as part of a change whose main description was “Add separate main program for the Mac: macmain.c”. Presumably because something inside the Python interpreter needed to dump and load to strings.**
marshal is used as the underpinnings for things like importing
.pyc files. This also means (at least in CPython) it’s not just implemented in C, but statically built into the core of the interpreter itself. While I think it actually could be turned into a regular module since the 3.4
import changes, but it definitely couldn’t have back in the early days. So, that’s extra motivation to keep it small and simple.