Understanding os.fork and Queue.Queue

I wanted to implement a simple python program using parallel execution. It’s I/O bound, so I figured threads would be appropriate (as opposed to processes). After reading the documentation for Queue and fork, I thought something like the following might work.

q = Queue.Queue()

if os.fork():            # child
    while True:
        print q.get()
else:                    # parent
    [q.put(x) for x in range(10)]

However, the get() call never returns. I thought it would return once the other thread executes a put() call. Using the threading module, things behave more like I expected:

q = Queue.Queue()

def consume(q):
    while True:
        print q.get()

worker = threading.Thread (target=consume, args=(q,))
worker.start()

[q.put(x) for x in range(10)]

I just don’t understand why the fork approach doesn’t do the same thing. What am I missing?

Best answer

The POSIX fork system call creates a new process, rather than a new thread inside the same adress space:

The fork() function shall create a new process. The new process (child
process) shall be an exact copy of the calling process (parent
process) except as detailed below: […]

So the Queue is duplicated in your first example, rather than shared between the parent and child.

You can use multiprocessing.Queue instead or just use threads like in your second example 🙂

By the way, using list comprehensions just for side effects isn’t good practice for several reasons. You should use a for loop instead:

for x in range(10): q.put(x)