The typical MySQLdb library query can use a lot of memory and perform poorly in Python, when a large result set is generated. For example:
cursor.execute("SELECT id, name FROM `table`") for i in xrange(cursor.rowcount): id, name = cursor.fetchone() print id, name
There is an optional cursor that will fetch just one row at a time, really speeding up the script and cutting the memory footprint of the script a lot.
import MySQLdb import MySQLdb.cursors conn = MySQLdb.connect(user="user", passwd="password", db="dbname", cursorclass = MySQLdb.cursors.SSCursor) cur = conn.cursor() cur.execute("SELECT id, name FROM users") row = cur.fetchone() while row is not None: doSomething() row = cur.fetchone() cur.close() conn.close()
But I can’t find anything about using
SSCursor with with nested queries. If this is the definition of
def doSomething() cur2 = conn.cursor() cur2.execute('select id,x,y from table2') rows = cur2.fetchall() for row in rows: doSomethingElse(row) cur2.close()
then the script throws the following error:
_mysql_exceptions.ProgrammingError: (2014, "Commands out of sync; you can't run this command now")
It sounds as if
SSCursor is not compatible with nested queries. Is that true? If so that’s too bad because the main loop seems to run too slowly with the standard cursor.
This problem in discussed a bit in the MySQLdb User’s Guide, under the heading of the
threadsafety attribute (emphasis mine):
The MySQL protocol can not handle multiple threads using the same
connection at once. Some earlier versions of MySQLdb utilized locking
to achieve a threadsafety of 2. While this is not terribly hard to
accomplish using the standard Cursor class (which uses
mysql_store_result()), it is complicated by SSCursor (which uses
mysql_use_result(); with the latter you must ensure all the rows have
been read before another query can be executed.
The documentation for the MySLQ C API function
mysql_use_result() gives more information about your error message:
mysql_use_result(), you must execute
NULLvalue is returned, otherwise, the unfetched rows are
returned as part of the result set for your next query. The C API
gives the error
Commands out of sync; you can't run this command now
if you forget to do this!
In other words, you must completely fetch the result set from any unbuffered cursor (i.e., one that uses
mysql_use_result() instead of
mysql_store_result() – with MySQLdb, that means
SSDictCursor) before you can execute another statement over the same connection.
In your situation, the most direct solution would be to open a second connection to use while iterating over the result set of the unbuffered query. (It wouldn’t work to simply get a buffered cursor from the same connection; you’d still have to advance past the unbuffered result set before using the buffered cursor.)
If your workflow is something like “loop through a big result set, executing N little queries for each row,” consider looking into MySQL’s stored procedures as an alternative to nesting cursors from different connections. You can still use MySQLdb to call the procedure and get the results, though you’ll definitely want to read the documentation of MySQLdb’s
callproc() method since it doesn’t conform to Python’s database API specs when retrieving procedure outputs.
A second alternative is to stick to buffered cursors, but split up your query into batches. That’s what I ended up doing for a project last year where I needed to loop through a set of millions of rows, parse some of the data with an in-house module, and perform some
UPDATE queries after processing each row. The general idea looks something like this:
QUERY = r"SELECT id, name FROM `table` WHERE id BETWEEN %s and %s;" BATCH_SIZE = 5000 i = 0 while True: cursor.execute(QUERY, (i + 1, i + BATCH_SIZE)) result = cursor.fetchall() # If there's no possibility of a gap as large as BATCH_SIZE in your table ids, # you can test to break out of the loop like this (otherwise, adjust accordingly): if not result: break for row in result: doSomething() i += BATCH_SIZE
One other thing I would note about your example code is that you can iterate directly over a cursor in MySQLdb instead of calling
fetchone() explicitly over
xrange(cursor.rowcount). This is especially important when using an unbuffered cursor, because the
rowcount attribute is undefined and will give a very unexpected result (see: Python MysqlDB using cursor.rowcount with SSDictCursor returning wrong count).