Flask: Using a global variable to load data files into memory

I have a large XML file which is opened, loaded into memory and then closed by a Python class. A simplified example would look like this:

class Dictionary():
   def __init__(self, filename):
      f = open(filename)
      self.contents = f.readlines()

   def getDefinitionForWord(self, word):
      # returns a word, using etree parser

And in my Flask application:

from dictionary import Dictionary
dictionary = Dictionary('dictionary.xml')
print 'dictionary object created'

def home():
   word = dictionary.getDefinitionForWord('help')

I understand that in an ideal world, I would use a database instead of XML and make a new connection to this database on every request.

I understood from the docs that the Application Context in Flask means that each request will cause dictionary = new Dictionary('dictionary.xml') to be recreated, therefore opening a file on disk and re-reading the whole thing into memory. However, when I look at the debug output, I see the dictionary object created line printed exactly once, despite connecting from multiple sources (different sessions?).

My first question is:

As it seems to be the case that my application only loads the XML file once… Then I can assume that it resides in memory globally, and can be safely read by a large amount of simultaneous requests, limited only by RAM on my server- right? If the XML is 50MB, then it would take approx. 50MB in memory and be served up to simultaneous requests at high speed… I’m guessing it’s not that easy.

And my second question is:

If this is not the case, what sort of limits am I going to hit on my ability to handle large amounts of traffic? How many requests can I handle if I have a 50MB XML being repeatedly opened, read from disk, and closed? I presume one at a time.

I realise this is vague and dependent on hardware but I’m new to Flask, python, and programming for the web, and just looking for guidance.


Best answer

It is safe to keep it that way as long as the global object is not modified. That is a WSGI feature as explained in the Werkzeug docs1 (library which Flask is built on top of).

That data is going to be kept in memory of each worker process of WSGI app server. That does not mean once, but the number of processes (workers) is small and constant (does not depend on number of sessions or traffic).

So, it is possible to keep it that way.

That said, I would use a proper database on your place. If you have 16 workers, your data will take at least 800 MB of RAM (the number of workers is usually twice the number of processors). If the XML grows and you finally decide to use a database service, you will need to rewrite your code.

If the reason to keep it memory is that PostgreSQL and MySQL are too slow, you could use a SQLite kept in an in-memory filesystem like RAMFS of TMPFS. It gives you the speed, the SQL interface and you will probably save RAM usage. Migration to PostgreSQL or MySQL would be much easier too (in terms of code).