A cookie is just a key-value pair that is stored in the user’s browser. A cookie is sent to your browser as part of the HTTP response that contains the web page you requested.
When your browser receives a cookie, it stores it, and sends it back to the server with every subsequent request it makes on the same website.
Typical information stored in cookies:
- Session IDs (see below)
- Tracking IDs (Google Analytics, etc.)
- User preferences (preferred language or currency, etc.)
For larger, or sensitive data, you typically store values in the session. The cookie is only there to identify the proper session.
A cookie can be configured to only live until the browser window is closed, or have a configurable lifetime (1 week, 1 month, 1 year, whatever). If you visit the website again during this period, your browser will send the cookie with every request.
A session is a set of data that is stored on the server, usually as key-value pairs. A session is assigned a pseudo-random, secret ID that is usually stored in the user’s browser using a cookie, for example
SESSID=abcdef123456789. The session ID typically matches the name of a file containing the session data on the server.
Sessions are usually short-lived, and automatically deleted if unused for some time (20 minutes or so).
Typical information stored in a session:
- ID of the user currently logged in
- Shopping cart
- … anything you can think of, that can be safely deleted when the session expires
Let’s say I visit a website for the first time. The website detects that I didn’t send a session cookie, so it creates a session for me. It creates a session file on the server, such as
Then it sends a cookie header with the HTTP response that contains the web page:
HTTP/1.1 200 OK
My browser stores this cookie. If I visit another page on the same server, my browser will send this cookie with the request:
GET /cart HTTP/1.1
When receiving the second request, the server can check if there’s a session file with this ID, and use it to retrieve the session data.
Your web programming language will offer support for sessions, and should handle most of this complexity for you. You can usually directly use the session array/object, which will be already populated with the session data specific to the user visiting your website, and will be automatically saved if you update the session data; this should be totally transparent to you.
When logging in a user to your website, always store the user ID in the session. Never trust a user ID stored in a cookie to load user data.
It’s very easy to forge a cookie. If you were to load user information based on a user ID stored in a cookie, it would be easy to change the user ID in this cookie to gain access to any user’s account on your website.
On the other hand, if you store the user ID in the session, which is assigned a pseudo-random session ID, it will be hard for an attacker to guess the session ID that is currently assigned to the user
HTTP is a stateless protocol – the server is not required to retain information or status about each user for the duration of multiple requests.
For smart web applications, however, this isn’t good enough. You want to login into an application and have it remember you across requests. A good example is maintaining a “shopping cart” at some merchandise website, which you gradually fill as you browse through the products that interest you.
To solve this problem, HTTP cookies were invented by Netscape back in the 1990s. Cookies are formally defined in RFC2965, but to spare you all that jabber, cookies can be described very simply.
A cookie is just an arbitrary string sent by the server to the client as part of the HTTP response. The client will then return this cookie back to the server in subsequent requests. The information stored in the cookie is opaque to the client – it’s only for the server’s own use. This scheme allows the client to identify itself back to the server with some state the server has assigned it. Here’s a more detailed flow of events:
- The client connects to the server for the first time, and sends a normal HTTP request (say, a simple GET for the main page).
- The server wants to track the client’s state and in its HTTP response (which contains the page contents) attaches a Set-Cookie header. This header’s information is a set of key, value pairs, where both keys and values are strings that make sense for the server, but for the client are a black box.
- In subsequent requests the client makes to the server, it adds a Cookie header in the HTTP requests it sends, with the cookie information the server specified in previous responses.
Implementation-wise, the client stores the latest cookie received from various servers (which are easily identifiable by their URLs). Even if the next time the client accesses the server is a few days after the previous request, it will still send this information (assuming the cookie hasn’t expired), and the server will be able to identify it. This is why I can point my browser to Amazon today, not having visited it for some weeks, and the website will greet me with “Hello, Eli”.
The above is a necessarily simplified explanation of cookies – I have no intention of repeating the contents of the RFC here. There are a lot of details I’ve left out like expiration time, filtering of cookies by paths, various size and amount limits the user agents (web browsers, etc.) are forced to abide, and so on. However, it’s a sufficient amount of details for the needs of this article, so let’s see some code.
This is the second in a series of three articles.
Sessions are Django’s high-level tool for keeping a persistent state for users on the server. Sessions allow to store arbitrary data per visitor, and have this data available the next time the visitor visits the site. As we’ll learn in this article, sessions are still based on cookies, but cookie management is abstracted away, handling a lot of issues on the way – as sessions provide a more convenient, robust and safe way to store the data.
Example – using sessions
First, it’s useful to see an easy example of using Django sessions. Here’s a simple view that uses sessions to count the amount of times a user has triggered it .
if 'count' in request.session:
request.session['count'] += 1
return HttpResponse('new count=%s' % request.session['count'])
request.session['count'] = 1
return HttpResponse('No count in session. Setting to 1')
If we compare this to the cookies usage example from part I, a couple of differences are apparent:
- Sessions are more uniform – a single session attribute of the request is used for both querying and modifying the session.
- While in this example we’re only using an integer as the value, the session attribute acts as a dictionary, allowing string keys and almost arbitrary Python objects as values .
However, looking at the actual HTTP traffic for this view, we notice yet another, very important difference. Here’s the cookie the view returns to the user for this particular instance of the application on my machine:
expires=Thu, 07-Jul-2011 04:16:28 GMT;
There’s no count=1 (or other numeric value) here – the cookie just sets some unique sessionid. We’ll see what this means shortly, but I’ll just note that this is a very important feature of session management. Think about the security implications, for instance. Suppose the user gets a prize for the 10th time he triggers the view. With a simple cookie passing the count into the user’s browser this would be something very easy to spoof. With a session, however, the user has no idea what the correct sessionid – in fact, no such sessionid exists yet, so the user has no real way spoofing his 10th visit
Deciphering the session ID
By default, Django’s session module stores sessions in the app’s main DB, in table django_session with this schema:
CREATE TABLE "django_session" (
"session_key" varchar(40) NOT NULL PRIMARY KEY,
"session_data" text NOT NULL,
"expire_date" datetime NOT NULL
session_key is the ID placed in the cookie, and session_data contains the actual session data in encoded format. Here’s how to decipher the session ID we’ve seen above:
from django.contrib.sessions.models import Session
sess = Session.objects.get(pk='a92d67e44a9b92d7dafca67e507985c0')
As you can see, Django stores the request.session dictionary in the DB, in an encoded manner. Django can recover it from the DB by using the session ID the user’s browser returns in a cookie. All of this is done transparently by Django’s session module – the application’s view just has a simple access to the request.session dictionary. Let’s dive into the guts of Django to understand how it manages to make this work.
The first layer of magic I’d like to unwrap has to deal with the session attribute of django.http.HttpRequest. How does the session information even get there, and how can the view change the session by simply modifying the attribute?
The answer is Django’s middleware. To borrow a quote from the Django Book:
[…] Django’s middleware framework, which is a set of hooks into Django’s request/response processing. It’s a light, low-level “plug-in” system capable of globally altering both Django’s input and output.
We can think of middleware in the following way. The normal flow of data around the view we’re coding in Django looks like this:
The view accepts a HTTP request object, does some application-specific work based on its contents and eventually returns a HTTP response object. Middleware makes this process a bit more complicated:
This is done by allowing the programmer to write “hook classes” with special methods that the middleware framework knows about . These hooks can be registered in the MIDDLEWARE_CLASSES setting in settings.py. Note that the django.contrib.sessions.middleware.SessionMiddleware class is there by default. Looking at its source code, it has two middleware hooks – process_request and process_response .
process_request pulls the session key (ID) from a cookie. We can see that sessionid is actually a configurable name – SESSION_COOKIE_NAME, set by default in Django’s global settings to sessionid. The request.session attribute is then populated to contain a “session store” object. More on this object a bit later.
process_response saves the session store object (thus making the changes the view did persistent) and attaches a cookie to the response sent to the client. To save on traffic, it does that only if the view actually modified the session, or if the SESSION_SAVE_EVERY_REQUEST setting is set.
This explains how the sessions are translated to cookies. But clearly, a lot of the logic is still hidden, implemented in the store object of sessions. Let’s see how that works.
Sessions can use one of several storage “engines” (backends). This is configurable via the SESSION_ENGINE setting, which points to django.contrib.sessions.backends.db by default – the application’s main database (as mentioned above in “Deciphering the session ID”). If you look at the sessions/backends directory in Django’s source you’ll see other available engines, but unless your needs are very special, you’re probably OK with the default one.
Each storage engine exports a StorageSession class which derives from StorageBase. This common base implements most of the functionality of session stores, relying on methods from its specializations to abstract away the actual method of storing the data – whether in DB, file, in-memory cache or some other way. The DB-backed store uses Django’s standard ORM, defined in module session.models.
To understand how all these classes play together, let’s follow through what happens when the user tries to access request.session in a view, assuming the default DB store:
- Session middleware’s process_request sets request.session to be an instance of db.SessionStore with session_key passed into the constructor.
- The constructor of SessionStore defers to the constructor of SessionBase, which stores the session key for later use.
- Note that the session isn’t loaded right away from the DB. This is lazy loading – the actual data is loaded when it’s actually being accessed.
- process_request is done at this point, so the HTTP request is passed into the view. Suppose we now read its count key, as in the example above.
- SessionBase implements a dict-like interface , and in particular __getitem__, which takes the key from a _session attribute, which in itself is a property, deferring reads to the _get_session method.
- _get_session does the actual lazy loading, using the load method.
- load is one of the methods related to the actual storage, so SessionBase doesn’t implement it. Instead db.SessionStore implements it and uses the session DB model to load the value from the DB based on the key, decoding it first.
This is about it, except one small detail. How does encoding and decoding work? Let’s look at encode:
def encode(self, session_dict):
"Returns the given session dictionary pickled and encoded as a string."
pickled = pickle.dumps(session_dict, pickle.HIGHEST_PROTOCOL)
hash = self._hash(pickled)
return base64.encodestring(hash + ":" + pickled)
The session dictionary is pickled. Then, a hash is computed and prepended to the pickle string. Finally the whole string is encoded in base 64, which is stored in the session_data field of the DB table.
In this article, we’ve seen how to use Django sessions, what happens on the low-level of HTTP requests and responses when sessions are being used, and how sessions are actually implemented by Django. While I didn’t cover every little detail, I hope there’s enough information to understand the big picture. If there’s any important information you think I may have missed, please let me know.