Understanding TF-IDF with Python example
2022-02-28
A complete guide to writing custom Datasets and DataLoader in PyTorch
2022-03-09
Show all

Setup Celery with Redis for Django Tutorial

9 mins read

When you work on data-intensive applications, long-running tasks can seriously slow down your users. Modern users expect pages to load instantaneously, but data-heavy tasks may take many seconds or even minutes to complete. How can we make sure users have a fast experience while still completing complicated tasks?

Enter Workers + Message Queues

If we want users to experience fast load times in our application, we’ll need to offload some of the work from our web server.

Workers

One way we do this is with asynchronicity. While the webserver loads the next page, a second server is doing the computations that we need in the background.

We call these background, task-based servers “workers.” We call these background, task-based servers “workers.” While you typically only have one or a handful of web servers responding to user requests, you can have many worker servers that process tasks in the background. These workers can then make changes in the database, update the UI via webhooks or callbacks, add items to the cache, process files, send emails, queue future tasks, and more! All while our main web server remains free to respond to user requests.

Message Queues

We tell these workers what to do via a message queue. Put simply, a queue is a first-in, first-out data structure. When we store messages in a queue the first one we place in the queue will be the first to be processed. All tasks will be started in the order we add them. When a worker becomes available, it takes the first task from the front of the queue and begins processing. If we have many workers, each one takes a task in order. The queue ensures that each worker only gets one task at a time and that each task is only being processed by one worker.

What Is Celery? Worker Management for Python Tasks

Celery allows Python applications to quickly implement task queues for many workers.

It takes care of the hard part of receiving tasks and assigning them appropriately to workers.

You use Celery to accomplish a few main goals:

  1. Define independent tasks that your workers can do as a Python function
  2. Listen to a message broker (we’ll use Redis in this example) to get new task requests
  3. Assign those requests to workers to complete the task
  4. Monitor the progress and status of tasks and workers

Overview of Celery + Django

In this example, we’ll use Celery inside a Django application to background long-running tasks.

Since we want Celery to have access to our database, models, and logic, we’ll define the worker tasks inside of our Django application.

However, these tasks will not run on our main Django webserver. Instead, Celery will manage separate servers that can run the tasks simultaneously in the background.

Since we need that queue to be accessible to both the Django webserver (to add new tasks) and the worker servers (to pick up queued tasks), we’ll use an extra server that works as a message broker.

That message broker server will use Redis — an in-memory data store — to maintain the queue of tasks.

Django adds tasks to Redis; Redis feeds tasks to Celery

To recap: Django creates a task (Python function) and tells Celery to add it to the queue. Celery puts that task into Redis (freeing Django to continue working on other things). On a separate server, Celery runs workers that can pick up tasks. Those workers listen to Redis. When the new task arrives, one worker picks it up and processes it, logging the result back to Celery.

Celery Tutorial in a Django Application Using Redis

Hopefully, by now, you can see why Celery is so useful. It helps us quickly create and manage a system for asynchronous, horizontally-scaled infrastructure. We can continue to add workers as the number of tasks increases, and each worker will remove tasks from the queue in order — allowing us to process many tasks simultaneously. So, how does it actually work in practice? Here are the steps:

Requirements

  • Install Redis & start it up locally at port 6379
  • pip install django
  • pip install celery
  • pip install redis

1. Set Up Django

Let’s create a new Django project to test out Celery:

django-admin startproject celery_tutorial

Set up the models:

cd celery_tutorial/
python manage.py migrate

Make sure it works:

python manage.py runserver

Visit http://localhost:8000/

2. Add Celery config to Django

From the folder where manage.py is:

cd celery_tutorial/

You should now be in the folder where settings.py is.

We need to set up Celery with some config options. Create a new file called celery.py :

from __future__ import absolute_import, unicode_literals

import os

from celery import Celery

# set the default Django settings module for the 'celery' program.
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'celery_tutorial.settings')

app = Celery('celery_tutorial')

# Using a string here means the worker doesn't have to serialize
# the configuration object to child processes.
# - namespace='CELERY' means all celery-related configuration keys
#   should have a `CELERY_` prefix.
app.config_from_object('django.conf:settings', namespace='CELERY')

# Load task modules from all registered Django app configs.
app.autodiscover_tasks()

This file creates a Celery app using the Django settings from our project. The last line tells Celery to try to automatically discover a file called tasks.py in all of our Django apps.

Save that file.

We also want Celery to start automatically whenever Django starts. So, update __init__.py in the same folder as settings.py and celery.py :

from __future__ import absolute_import, unicode_literals

# This will make sure the app is always imported when
# Django starts so that shared_task will use this app.
from .celery import app as celery_app

__all__ = ('celery_app',)

Finally, we need to tell Celery how to find Redis. So, open settings.py and add this line:

CELERY_BROKER_URL = 'redis://localhost:6379'

3. Create Your First Task

If you have an existing Django project, you can now create a file called tasks.py inside any app. Celery will automatically detect that file and look for worker tasks you define there.

For simplicity, though, we’re going to create our first task in celery_tutorial/celery.py , so re-open that file and add this to the bottom:

@app.task(bind=True)
def debug_task(self):
    print('Request: {0!r}'.format(self.request))

This simple task just prints all the metadata about the request when the task is received.

It’s not a super useful task, but it will show us that Celery is working properly and receiving requests.

4. Let’s Queue Our First Task!

Okay, just to recap. We…

  • Created a Celery app instance that manages all tasks in our application
  • Started Redis and gave Celery the address to Redis as our message broker
  • Created our first task so the worker knows what to do when it receives the task request

Now, the only thing left to do is queue up a task and start the worker to process it.

Queuing the task is easy using Django’s shell :

python manage.py shell

Let’s import the task and queue it up:

>>> from celery_tutorial.celery import debug_task
>>> debug_task.delay()
<AsyncResult: fe261700-2160-4d6d-9d77-ea064a8a3727>

We use .delay() to tell Celery to add the task to the queue. We got back a successful AsyncResult — that task is now waiting in Redis for a worker to pick it up!

5. Start a Worker to Process the Task

There’s a task waiting in the Redis queue. Let’s start up a worker to go get and process the task. Remember the task was just to print the request information, so this worker won’t take long. All we have to do is run Celery from the command line with the path to our config file. Make sure you’re in the base directory (the one with manage.py) and run:

celery -A celery_tutorial.celery worker --loglevel=info

You should see Celery start up, receive the task, print the answer, and update the task status to “SUCCESS”:

-------------- celery@Bennetts-MacBook-Pro.local v4.4.2 (cliffs)
--- ***** ----- 
-- ******* ---- macOS-10.15.3-x86_64-i386-64bit 2020-04-18 20:41:52
- *** --- * --- 
- ** ---------- [config]
- ** ---------- .> app:         celery_tutorial:0x1107b5a90
- ** ---------- .> transport:   redis://localhost:6379//
- ** ---------- .> results:     disabled://
- *** --- * --- .> concurrency: 8 (prefork)
-- ******* ---- .> task events: OFF (enable -E to monitor tasks in this worker)
--- ***** ----- 
 -------------- [queues]
                .> celery           exchange=celery(direct) key=celery
                

[tasks]
  . celery_tutorial.celery.debug_task

[INFO/MainProcess] Connected to redis://localhost:6379//
[INFO/MainProcess] mingle: searching for neighbors
[2INFO/MainProcess] mingle: all alone

[INFO/MainProcess] Received task: celery_tutorial.celery.debug_task[fe261700-2160-4d6d-9d77-ea064a8a3727]  [WARNING/ForkPoolWorker-8] Request: <Context: {'lang': 'py', 'task': 'celery_tutorial.celery.debug_task', 'id': 'fe261700-2160-4d6d-9d77-ea064a8a3727', 'shadow': None, 'eta': None, 'expires': None, 'group': None, 'retries': 0, 'timelimit': [None, None], 'root_id': 'fe261700-2160-4d6d-9d77-ea064a8a3727', 'parent_id': None, 'argsrepr': '()', 'kwargsrepr': '{}', 'origin': 'gen3931@Bennetts-MacBook-Pro.local', 'reply_to': 'f8232d33-d7ee-3912-814c-6d531e3e9259', 'correlation_id': 'fe261700-2160-4d6d-9d77-ea064a8a3727', 'hostname': 'celery@Bennetts-MacBook-Pro.local', 'delivery_info': {'exchange': '', 'routing_key': 'celery', 'priority': 0, 'redelivered': None}, 'args': [], 'kwargs': {}, 'is_eager': False, 'callbacks': None, 'errbacks': None, 'chain': None, 'chord': None, 'called_directly': False, '_protected': 1}>[INFO/ForkPoolWorker-8] Task celery_tutorial.celery.debug_task[fe261700-2160-4d6d-9d77-ea064a8a3727] succeeded in 0.0015866540000000207s: None

Woohoo! We’re now using Celery — just that easy.

How You Might Deploy It

Well, it’s working locally, but how would it work in production?

Basically, no matter what cloud infrastructure you’re using, you’ll need at least 3 servers:

  1. Django webserver
  2. Redis message broker
  3. Celery worker server

The cool thing about Celery is its scalability. So you can add many Celery servers, and they’ll discover one another and coordinate, using Redis as the communication channel. This allows for a very high throughput of tasks. As you add more tasks to the queue (e.g. from more users), you can add more worker servers to scale with demand. Individual worker tasks can also trigger new tasks or send signals about their status to other parts of the application. This means that decoupled, microservice-based applications can use Celery to coordinate and trigger tasks across services.

Just Scratching the Surface

Celery is an incredibly powerful tool. Most major companies that use Python on the backend are also using Celery for asynchronous tasks that run in the background.

As a Python developer, I don’t hear enough people talking about Celery and its importance. I’m a huge fan of its simplicity and scalability.

Source:

https://medium.com/swlh/python-developers-celery-is-a-must-learn-technology-heres-how-to-get-started-578f5d63fab3

Amir Masoud Sefidian
Amir Masoud Sefidian
Machine Learning Engineer

Comments are closed.