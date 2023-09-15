The Celery project, which is often used Python library to run “background tasks” for synchronous web frameworks, describes itself as:

Celery is a simple, flexible, and reliable distributed system to process vast amounts of messages , while providing operations with the tools required to maintain such a system.

The documentation goes into great detail about how to configure Celery with its plethora of options, but it does not focus much on the high level architecture or how messages pass between the components. Celery is extremely flexible (almost every component can be easily replaced!) but this can make it hard to understand. I attempt to break it down to the best of my understanding below.

Celery has a few main components :

Your application code, including any Task objects you’ve defined. (Usually called the “client” in Celery’s documentation.) A broker or message transport. One or more Celery workers. A (results) backend.

A simplified view of Celery components.

In order to use Celery you need to:

Instantiate a Celery application (which includes configuration, such as which broker and backend to use and how to connect to them) and define one or more Task definitions. Run a broker. Run one or more Celery workers. (Maybe) run a backend.

If you’re unfamiliar with Celery, below is an example. It declares a simple add task using the @task decorator and will request the task to be executed in the background twice ( add.delay(...) ). The results are then fetched ( asyncresult_1.get() ) and printed. Place this in a file named my_app.py :

from celery import Celery app = Celery ( "my_app" , backend = "rpc://" , broker = "amqp://guest@localhost//" , ) @app . task () def add ( a : int , b : int ) -> int : return a + b if __name__ == "__main__" : # Request that the tasks run and capture their async results. asyncresult_1 = add . delay ( 1 , 2 ) asyncresult_2 = add . delay ( 3 , 4 ) result_1 = asyncresult_1 . get () result_2 = asyncresult_2 . get () # Should result in 3, 7. print ( f "Results: { result_1 } , { result_2 } " )

Usually you don’t care where (which worker) the task runs on it, or how it gets there but sometimes you need! We can break down the components more to reveal more detail:

The Celery components broken into sub-components.

Broker The message broker is a piece of off-the-shelf software which takes task requests and queues them until a worker is ready to process them. Common options include RabbitMQ, or Redis, although your cloud provider might have a custom one. The broker may have some sub-components, including an exchange and one or more queues. (Note that Celery tends to use AMQP terminology and sometimes emulates features which do not exist on other brokers.) Configuring your broker is beyond the scope of this article (and depends heavily on workload). The Celery routing documentation has more information on how and why you might route tasks to different queues.

Workers Celery workers fetch queued tasks from the broker and then run the code defined in your task , they can optionally return a value via the results backend. Celery workers have a “consumer” which fetches tasks from the broker: by default it requests many tasks at once, equivalent to “prefetch multiplier x concurrency“. (If your prefetch multiplier is 5 and your concurrency is 4, it attempts to fetch up to 20 queued tasks from the broker.) Once fetched it places them into an in-memory buffer. The task pool then runs each task via its Strategy — for a normal Celery Task the task pool essentially executes tasks from the consumer’s buffer. The worker also handles scheduling tasks to run in future (by queueing them in-memory), but we will not go deeper into that here. Using the “prefork” pool, the consumer and task pool are separate processes, while the “gevent”/”eventlet” pool uses coroutines, and the “threads” pool uses threads. There’s also a “solo” pool which can be useful for testing (everything is run in the same process: a single task runs at a time and blocks the consumer from fetching more tasks.)