If you've ever had to process thousands of files, make hundreds of HTTP requests, or speed up a heavy algorithm, you know that traditional synchronous Python can be frustratingly slow. The solution? Concurrency and parallelism. In this complete guide, you'll master multithreading and multiprocessing in Python, understanding when and how to use each technique to squeeze every drop of performance from your code.
We'll explore everything from the fundamental concepts — threads, processes, and the infamous GIL — to practical implementations with ThreadPoolExecutor, ProcessPoolExecutor, locks, and queues. By the end, you'll have a complete arsenal to write real parallel Python code.
Understanding Threads and Processes
Before we write code, we need to understand what threads and processes are — and why the difference between them is crucial in Python.
A process is an instance of a running program. Each process has its own memory space, its own resources, and is isolated from others. Think of opening Chrome three times: each window is a separate process. Processes are expensive to create but fully independent.
A thread is the smallest unit of processing within a process. All threads within the same process share the same memory space. Think of tabs inside a single Chrome window: they all share resources but can block each other if not managed properly.
In Python, this distinction matters even more because of a peculiarity of the standard implementation: the GIL.
The GIL (Global Interpreter Lock)
The Global Interpreter Lock is a mechanism in CPython (the standard Python implementation) that allows only one thread to execute Python bytecode at a time. This means that for CPU-bound tasks (heavy processing tasks), multiple threads don't bring performance gains — in fact, they can even degrade performance due to thread-switching overhead.
So why use threads in Python at all? The answer lies in I/O-bound tasks: input/output operations like HTTP requests, file reads, or database queries. During these operations, the thread sits idle waiting for a response, and the GIL is released — allowing other threads to run. This is where multithreading shines.
To understand the GIL more deeply, I recommend reading What Is the Python Global Interpreter Lock (GIL)? on Real Python, which explains the topic clearly and in depth.
Multithreading with threading.Thread
The threading module provides the basic API for creating and managing threads in Python. Let's start with a simple example:
import threading
import time
def task(name, seconds):
print(f"Thread {name}: starting")
time.sleep(seconds)
print(f"Thread {name}: completed after {seconds}s")
threads = []
for i in range(5):
t = threading.Thread(target=task, args=(f"T{i}", i + 1))
threads.append(t)
t.start()
for t in threads:
t.join()
print("All threads completed!")
We call start() to launch each thread and join() to wait for all of them to finish before proceeding. Without join(), the main program could terminate before the child threads complete their work.
The official threading module documentation covers all the details about locks, semaphores, events, and other synchronization mechanisms.
Thread Subclasses
A more organized approach is to create subclasses of Thread:
import threading
class MyDownload(threading.Thread):
def __init__(self, url):
super().__init__()
self.url = url
self.result = None
def run(self):
import time
time.sleep(2)
self.result = f"Data from {self.url}"
downloads = [MyDownload(f"https://api.example.com/item/{i}") for i in range(10)]
for d in downloads:
d.start()
for d in downloads:
d.join()
print(d.result)
This approach is especially useful when you need to encapsulate state and behavior in reusable objects.
ThreadPoolExecutor: The Modern Way
Managing threads manually works, but it's tedious. The concurrent.futures module, introduced in Python 3.2 and significantly improved over time, offers a much more elegant API through the ThreadPoolExecutor:
from concurrent.futures import ThreadPoolExecutor
import time
def download_file(url):
print(f"Downloading: {url}")
time.sleep(2)
return f"Contents of {url}"
urls = [f"https://site.com/file_{i}.zip" for i in range(20)]
with ThreadPoolExecutor(max_workers=5) as executor:
results = list(executor.map(download_file, urls))
print(f"Downloaded {len(results)} files successfully!")
The ThreadPoolExecutor automatically manages a pool of reusable threads. The max_workers parameter sets how many threads are kept in the pool. With the context manager (with), threads are automatically cleaned up when exiting the block.
You can also use submit() to get Future objects, which let you track the progress of each task individually:
from concurrent.futures import ThreadPoolExecutor, as_completed
with ThreadPoolExecutor(max_workers=5) as executor:
futures = {executor.submit(download_file, url): url for url in urls}
for future in as_completed(futures):
url = futures[future]
try:
result = future.result()
print(f"{url} completed: {result[:30]}...")
except Exception as e:
print(f"{url} failed: {e}")
Check the official concurrent.futures documentation to explore all advanced features, including callbacks and timeouts.
Synchronization: Locks and Queues
When multiple threads share data, race conditions emerge. Two threads might try to modify the same variable at the same time, corrupting the result. The classic solution is using locks:
import threading
counter = 0
lock = threading.Lock()
def increment():
global counter
for _ in range(100000):
with lock:
counter += 1
threads = [threading.Thread(target=increment) for _ in range(10)]
for t in threads:
t.start()
for t in threads:
t.join()
print(f"Final counter: {counter}") # Always 1,000,000
Without the lock, the result would be unpredictable — likely less than 1,000,000 — due to race conditions. The with lock context manager ensures the lock is always released, even when exceptions occur.
For communication between threads, the queue (queue.Queue) is the ideal tool. It is already thread-safe, eliminating the need for manual locks:
from queue import Queue
import threading
import time
def producer(queue):
for i in range(10):
item = f"Item {i}"
queue.put(item)
print(f"Produced: {item}")
time.sleep(0.5)
queue.put(None) # End signal
def consumer(queue):
while True:
item = queue.get()
if item is None:
break
print(f"Consumed: {item}")
time.sleep(1)
print("Consumer finished")
queue = Queue()
t_producer = threading.Thread(target=producer, args=(queue,))
t_consumer = threading.Thread(target=consumer, args=(queue,))
t_producer.start()
t_consumer.start()
t_producer.join()
t_consumer.join()
The producer-consumer pattern with queues is extremely versatile and appears in real-world applications like log processing, web scraping, and ETL pipelines. The official queue module documentation details the available queue types: FIFO, LIFO, and priority queues.
To dive deeper into concurrency patterns, also check out our complete guide on context managers in Python, which shows how to use the with pattern to safely manage resources.
Multiprocessing with ProcessPoolExecutor
For tasks that demand serious CPU power — like image processing, scientific calculations, or machine learning — multithreading doesn't help due to the GIL. The solution is multiprocessing, which creates separate processes, each with its own Python interpreter and its own GIL.
The ProcessPoolExecutor from the concurrent.futures module offers the same elegant API as ThreadPoolExecutor, but using processes instead of threads:
from concurrent.futures import ProcessPoolExecutor
import math
def count_primes(limit):
primes = []
for num in range(2, limit):
if all(num % i != 0 for i in range(2, int(math.sqrt(num)) + 1)):
primes.append(num)
return len(primes)
ranges = [10000, 20000, 30000, 40000, 50000, 60000]
with ProcessPoolExecutor(max_workers=4) as executor:
results = list(executor.map(count_primes, ranges))
for r, total in zip(ranges, results):
print(f"Primes up to {r}: {total}")
The performance difference in CPU-bound tasks is dramatic. While threads would be limited by the GIL and run virtually sequentially, processes distribute the work across CPU cores, achieving near-linear performance gains.
It's important to note that ProcessPoolExecutor has some limitations: function arguments and return values must be picklable (serializable), and the overhead of creating processes is higher than threads. For very small tasks, the process creation cost can outweigh the benefit.
The official multiprocessing module documentation offers more advanced features like inter-process queues, shared memory, and pipe connections for scenarios requiring finer control.
Another resource that can complement your knowledge is the guide on args and kwargs in Python, essential for creating flexible functions that work well with executors.
Multithreading vs Multiprocessing: When to Use Each
Choosing between threads and processes is one of the most important decisions in Python concurrent programming. Here's a practical summary:
Use Multithreading when:
- The task is I/O-bound (HTTP requests, file reads, database queries)
- You need to share state between tasks frequently
- The number of concurrent tasks is very large (thousands)
- Overhead needs to be minimal
Use Multiprocessing when:
- The task is CPU-bound (image processing, mathematical computations)
- You need to leverage multiple CPU cores
- Data transfer between tasks is small
- Process isolation is desirable (security, stability)
The article Difference Between Multithreading vs Multiprocessing in Python on GeeksforGeeks offers a detailed comparison table that can help you decide.
Practical Example: Concurrent Web Scraping
Let's put it all together in a realistic example: a web scraper that downloads multiple pages simultaneously:
from concurrent.futures import ThreadPoolExecutor, as_completed
import requests
import time
def scrape_url(url):
try:
resp = requests.get(url, timeout=10)
return url, resp.status_code, len(resp.text)
except Exception as e:
return url, None, str(e)
urls = [
"https://python.org",
"https://docs.python.org/3/",
"https://pypi.org",
"https://realpython.com",
"https://github.com/python",
"https://stackoverflow.com/questions/tagged/python",
"https://www.geeksforgeeks.org/python-programming-language/",
"https://pandas.pydata.org",
]
print("Starting synchronous scraping...")
start = time.time()
for url in urls:
result = scrape_url(url)
print(f"{result[0]}: {result[1]}")
print(f"Sync time: {time.time() - start:.2f}s")
print("\nStarting concurrent scraping...")
start = time.time()
with ThreadPoolExecutor(max_workers=4) as executor:
futures = {executor.submit(scrape_url, url): url for url in urls}
for future in as_completed(futures):
url, status, data = future.result()
print(f"{url}: {status}")
print(f"Concurrent time: {time.time() - start:.2f}s")
The time difference between the synchronous and concurrent versions is impressive. In real-world tests, scraping with ThreadPoolExecutor is typically 3 to 8 times faster, depending on the number of URLs and network latency.
The Real Python concurrency guide expands on these examples with benchmarks and detailed comparisons between threading, multiprocessing, and asyncio.
Best Practices and Common Pitfalls
Concurrency is powerful, but it also brings challenges. Here are the most common pitfalls and how to avoid them:
1. Excessive state sharing: The less state shared between threads/processes, the better. Prefer passing data as arguments and returning results, rather than modifying global variables.
2. Forgetting join(): Always wait for your threads/processes to finish with join() or by using context managers.
3. Deadlocks: When two threads wait for each other to release a lock. Use with lock and avoid acquiring multiple locks at the same time.
4. Too many workers: Creating too many threads or processes degrades performance. For I/O-bound tasks, twice the number of CPUs usually works well; for CPU-bound tasks, use the number of cores on your machine.
5. Ignoring exceptions in threads: Exceptions inside threads don't propagate automatically to the main thread. Always use future.result() or try/except inside the target function.
For a more in-depth study of multithreading in Python, the tutorial Multithreading in Python on GeeksforGeeks covers everything from basics to advanced topics like semaphores and synchronization barriers.
Performance Considerations
Here's a practical reference table for choosing the right tool:
| Scenario | Recommended Tool | Expected Gain |
|---|---|---|
| 100 HTTP requests | ThreadPoolExecutor (10-20 workers) | 5x to 10x faster |
| Process 50 large images | ProcessPoolExecutor (4-8 workers) | 3x to 8x faster |
| Thousands of light I/O tasks | asyncio (async/await) | 10x to 100x faster |
| Intense math algorithm | ProcessPoolExecutor (num_cores) | Nx faster (N = cores) |
| ETL pipeline with stages | Queue + Thread/Process | 2x to 5x faster |
Conclusion
Mastering multithreading and multiprocessing is essential for any Python developer who wants to write performant and scalable applications. In this guide, you learned:
- The fundamental difference between threads (shared memory) and processes (isolated space)
- How the GIL impacts each approach
- How to use ThreadPoolExecutor for I/O-bound tasks
- How to use ProcessPoolExecutor for CPU-bound tasks
- Synchronization techniques with locks and queues
- Practical patterns like producer-consumer
- Real-world examples of concurrent web scraping
The secret to successful concurrent code lies in choosing the right tool for each type of task. Start with ThreadPoolExecutor for I/O operations and graduate to ProcessPoolExecutor when you need to crunch heavy data. And remember: always measure performance before and after — premature optimization is the root of all evil!
Enjoyed the guide? Explore other tutorials on the Universo Python blog to keep leveling up your skills in concurrent programming and other advanced Python topics.