Why is my Python app stalled with ‘system’ / kernel CPU time

First off I wasn’t sure if I should post this as a Ubuntu question or here.
But I’m guessing it’s more of an Python question than a OS one.

My Python application is running on top of Ubuntu on a 64 core AMD server.
It pulls images from 5 GigE cameras over the network by calling out to a .so through ctypes and then processes them.
I am seeing frequent pauses in my application causing frames from the cameras to be dropped by the external camera library.

To debug this I’ve used the popular psutil Python package with which I log out CPU stats every 0.2 seconds in a separate thread.
I sleep for 0.2 seconds in that thread and when that sleep takes substantially longer I also see camera frames being dropped.
I have seen pauses up to 17 seconds long!
Most of my processing is either in OpenCV or Numpy (both of which release the GIL) or in one part of the app a multiprocessing.Pool
with 59 processes (this it to get around the Python GIL).

My debug logging shows very high ‘system’ (i.e. kernel) CPU time on many of my process’ threads when the pauses happen.

For example. I see CPU times as follows (usually every 0.2 seconds) and then suddenly a big jump
(‘Process’ numbers are in CPU utilization, i.e. 1 CPU fully used would be 1, Linux top showing 123% would be 1.2):

Process user | Process system | OS system % | OS idle %
19.9         | 10.5           | 6           | 74 
5.6          | 2.3            | 4           | 87
6.8          | 1.7            | 11          | 75
4.6          | 5.5            | 43          | 52
0.5          | 26.4           | 4           | 90

I don’t know why the high OS system usage is reported one line before matching high process system usage.
The two match up since 26.4 of 64 cores = 41%. At that point my application experienced an approximately 3.5 second pause
(as determined by my CPU info logging thread using OpenCV’s cv2.getTickCount() and also the jump in time stamps in the Python logging output)
causing multiple camera frames to be dropped.

When this happens I have also logged the CPU info for each thread of my process.
For the example above 25 threads were running at a ‘system’ CPU utilization of 0.9 and a few more at 0.6, which matches the total for the process of 26.4 above.
At that point there were about 183 threads running.

This pause usually seems to happen close after the multiprocessing pool is used (it’s used for short bursts) but by no means happens every time the pool is used.
Also, if I halve the amount of processing that needs to happen outside the pool then no camera skipping happens.

Question: how can I determine why OS ‘system’ / kernel time suddenly goes through the roof? Why would that happen in a Python app?

And more importantly: any ideas why this is happening and how to avoid it?

Notes:

  • This runs as root (it has to for the camera library unfortunately) from upstart
  • When the cameras are turned off the app restarts (using respawn in upstart) and this happens multiple times a day so it’s not due to being long running, I have also seen this happen very soon after the process starts
  • It is the same code being run over and over, it’s not due to running a different branch of my code
  • Currently has a nice of -2, I have tried removing the nice with no affect
  • Ubuntu 12.04.5 LTS
  • Python 2.7
  • Machine has 128GB of memory which I am no where near using

Best answer

OK. I have the answer to my own question. Yes, it’s taken me over 3 months to get this far.

It appears to be GIL thrashing in Python that is the reason for the massive ‘system’ CPU spikes and associated pauses. Here is a good explanation of where the thrashing comes from. That presentation also pointed me in the right direction.

Python 3.2 introduced a new GIL implementation to avoid this thrashing. The result can be shown with a simple threaded example (taken from the presentation above):

from threading import Thread
import psutil

def countdown():
    n = 100000000
    while n > 0:
        n -= 1

t1 = Thread(target=countdown)
t2 = Thread(target=countdown)
t1.start(); t2.start()
t1.join(); t2.join()

print(psutil.Process().cpu_times())    

On my Macbook Pro with Python 2.7.9 this uses 14.7s of ‘user’ CPU and 13.2s of ‘system’ CPU.

Python 3.4 uses 15.0s of ‘user’ (slightly more) but only 0.2s of ‘system’.

So, the GIL is still in place, it still only runs as fast as when the code is single threaded, but it avoids all the GIL contention of Python 2 that manifests as kernel (‘system’) CPU time. This contention, I believe, is what was causing the issues of the original question.

Update

An additional cause to the CPU problem was found to be with OpenCV/TBB. Fully documented in this SO question.