As the name suggests, you can use ThreadPool to manage thread pools in Python.
Disclaimer
I'll simplify various concepts here, as it's only an introduction.
Python Threads in short
A Python process can be considered as an instance of the Python program (~ main thread).
It usually executes specific instructions in one thread, but you can create more threads to execute some tasks concurrently.
The built-in ThreadPool class can ease the configuration while providing some good standards.
Of course, you may want to manage it manually, like starting and closing threads exactly when you need it, but it gets significantly harder when the number of tasks increases, and the class already optimized that operation.
Multiprocessing in short
With ThreadPool, you basically get "reusable threads" to execute tasks. The class abstracts the complexity:
- you don't have to select a thread for your task
- you don't have to start the thread manually
- you don't have to wait for the task to complete
- it supports both local and remote concurrency
There is so much more to say about multiprocessing, but, as a beginner, such built-in tool can be beneficial:
from multiprocessing.pool import ThreadPool
if __name__ == '__main__':
results = ThreadPool(5).imap_unordered(myfunc, some_list)
for result in results:
print(result)
Here, we define a pool of 5 tasks, and we apply myfunc
to some_list
. If you have a list of files or URLs to process, you may leverage the benefits of the pool to speed up the execution.
N.B.: pool.imap_unordered
is a variant of pool.imap
. It might be slightly faster in some cases.
The big caveat
Obviously, if you misuse it, ThreadPool can have unexpected effects, but it's designed to ease the implementation and prevent common mistakes.
In my experience, it should not be used for writing large files unless you process them in chunks in your handler (myfunc
), which Python allows you to do quite easily.
Better implementations
Please refer to the documentation for better implementations of ThreadPool.
For example, you'll see that Python devs recommend using concurrent.futures.ThreadPoolExecutor
instead of ThreadPool
, because it's compatible with more libraries:
from concurrent.futures import ThreadPoolExecutor
if __name__ == '__main__':
with ThreadPoolExecutor(max_workers = 5) as executor:
executor.map(mynfunc, some_list)
Bottom line
Thread pools allows managing thread conveniently and efficiently.
The internal mapper is pretty handy to apply a function on each element in a list.