1100 Hard words 18-30
2017-10-19
Python Exceptions Best Practices
2019-03-25
Show all

Relation between sessions and cookies

Cookie

A cookie is just a key-value pair that is stored in the user’s browser. A cookie is sent to your browser as part of the HTTP response that contains the web page you requested.

When your browser receives a cookie, it stores it, and sends it back to the server with every subsequent request it makes on the same website.

Because cookies are part of the HTTP request and response headers, they are somewhat limited in size.

Typical information stored in cookies:

  • Session IDs (see below)
  • Tracking IDs (Google Analytics, etc.)
  • User preferences (preferred language or currency, etc.)

For larger, or sensitive data, you typically store values in the session. The cookie is only there to identify the proper session.

A cookie can be configured to only live until the browser window is closed, or have a configurable lifetime (1 week, 1 month, 1 year, whatever). If you visit the website again during this period, your browser will send the cookie with every request.

Session

A session is a set of data that is stored on the server, usually as key-value pairs. A session is assigned a pseudo-random, secret ID that is usually stored in the user’s browser using a cookie, for example SESSID=abcdef123456789. The session ID typically matches the name of a file containing the session data on the server.

Sessions are usually short-lived, and automatically deleted if unused for some time (20 minutes or so).

Typical information stored in a session:

  • ID of the user currently logged in
  • Shopping cart
  • … anything you can think of, that can be safely deleted when the session expires

Example

Let’s say I visit a website for the first time. The website detects that I didn’t send a session cookie, so it creates a session for me. It creates a session file on the server, such as /tmp/sess_abcdef123456789.

Then it sends a cookie header with the HTTP response that contains the web page:

HTTP/1.1 200 OK
Set-Cookie: SESSID=abcdef123456789

My browser stores this cookie. If I visit another page on the same server, my browser will send this cookie with the request:

GET /cart HTTP/1.1
Cookie: SESSID=abcdef123456789

When receiving the second request, the server can check if there’s a session file with this ID, and use it to retrieve the session data.

Your web programming language will offer support for sessions, and should handle most of this complexity for you. You can usually directly use the session array/object, which will be already populated with the session data specific to the user visiting your website, and will be automatically saved if you update the session data; this should be totally transparent to you.

Security

When logging in a user to your website, always store the user ID in the session. Never trust a user ID stored in a cookie to load user data.

It’s very easy to forge a cookie. If you were to load user information based on a user ID stored in a cookie, it would be easy to change the user ID in this cookie to gain access to any user’s account on your website.

On the other hand, if you store the user ID in the session, which is assigned a pseudo-random session ID, it will be hard for an attacker to guess the session ID that is currently assigned to the user

 

HTTP is a stateless protocol – the server is not required to retain information or status about each user for the duration of multiple requests.

For smart web applications, however, this isn’t good enough. You want to login into an application and have it remember you across requests. A good example is maintaining a “shopping cart” at some merchandise website, which you gradually fill as you browse through the products that interest you.

To solve this problem, HTTP cookies were invented by Netscape back in the 1990s. Cookies are formally defined in RFC2965, but to spare you all that jabber, cookies can be described very simply.

A cookie is just an arbitrary string sent by the server to the client as part of the HTTP response. The client will then return this cookie back to the server in subsequent requests. The information stored in the cookie is opaque to the client – it’s only for the server’s own use. This scheme allows the client to identify itself back to the server with some state the server has assigned it. Here’s a more detailed flow of events:

  1. The client connects to the server for the first time, and sends a normal HTTP request (say, a simple GET for the main page).
  2. The server wants to track the client’s state and in its HTTP response (which contains the page contents) attaches a Set-Cookie header. This header’s information is a set of key, value pairs, where both keys and values are strings that make sense for the server, but for the client are a black box.
  3. In subsequent requests the client makes to the server, it adds a Cookie header in the HTTP requests it sends, with the cookie information the server specified in previous responses.

Implementation-wise, the client stores the latest cookie received from various servers (which are easily identifiable by their URLs). Even if the next time the client accesses the server is a few days after the previous request, it will still send this information (assuming the cookie hasn’t expired), and the server will be able to identify it. This is why I can point my browser to Amazon today, not having visited it for some weeks, and the website will greet me with “Hello, Eli”.

The above is a necessarily simplified explanation of cookies – I have no intention of repeating the contents of the RFC here. There are a lot of details I’ve left out like expiration time, filtering of cookies by paths, various size and amount limits the user agents (web browsers, etc.) are forced to abide, and so on. However, it’s a sufficient amount of details for the needs of this article, so let’s see some code.

Setting cookies in Python, without Django

The following demonstrates how to set cookies in from a Python server-side application without using Django. For simplicity, I’ll just use the web server built-in into the Python standard library:

from BaseHTTPServer import HTTPServer
from SimpleHTTPServer import SimpleHTTPRequestHandler
import Cookie

class MyRequestHandler(SimpleHTTPRequestHandler):
    def do_GET(self):
        content = "<html><body>Path is: %s</body></html>" % self.path
        self.send_response(200)
        self.send_header('Content-type', 'text/html')
        self.send_header('Content-length', str(len(content)))

        cookie = Cookie.SimpleCookie()
        cookie['id'] = 'some_value_42'

        self.wfile.write(cookie.output())
        self.wfile.write('\r\n')

        self.end_headers()
        self.wfile.write(content)

server = HTTPServer(('', 59900), MyRequestHandler)
server.serve_forever()

This is a very simple application that just shows the path that the client requested. The more interesting thing happens below the covers – the application also sets a cookie. If we examine the HTTP response sent by this application to a client that connected to it, we’ll see this among the headers:

Set-Cookie: id=some_value_42

In a similar manner, the Cookie module allows the server to parse cookies returned by the client in Cookie headers, using the load method.

Setting and reading cookies with Django

Django makes setting and reading cookies almost trivial. Here’s a simple view that checks whether the client set the idcookie in its request, and if it hadn’t, sends the cookie to the client (so that the client will have it for the next request):

def test_cookie(request):
    if 'id' in request.COOKIES:
        cookie_id = request.COOKIES['id']
        return HttpResponse('Got cookie with id=%s' % cookie_id)
    else:
        resp = HttpResponse('No id cookie! Sending cookie to client')
        resp.set_cookie('id', 'some_value_99')
        return resp

As you can see, cookies are taken from the COOKIES dict-like attribute of Django’s HttpRequest, and set by calling the set_cookie method of HttpResponse. Couldn’t be any simpler. What we’re really here for is to understand how these things work under the hood of Django, so let’s dive in.

How cookies are implemented in Django

The recommended way to deploy Django applications is with WSGI, so I’ll focus on the WSGI backend implemented in Django. This is a good place to mention that at the time of this writing, I’m looking into the source code of Django 1.3, which is installed in site-packages/django in the usual installation structure of Python.

Looking at Django’s WSGIRequest class (which inherits from http.Request) we can see that COOKIES is a property that hides a dict attribute named self._cookies behind a getter/setter pair. The dict is initialized in _get_cookies:

def _get_cookies(self):
    if not hasattr(self, '_cookies'):
        self._cookies = http.parse_cookie(self.environ.get('HTTP_COOKIE', ''))
    return self._cookies

This appears to be a lazy initialization that should aid performance – if the view doesn’t want to look into the cookies of a request, there’s no need to parse them. Cookies are taken from the HTTP_COOKIE entry of the request’s environment object, per the WSGI specification. What about http.parse_cookie? This is a utility method in Django’s HTTP module:

def parse_cookie(cookie):
    if cookie == '':
        return {}
    if not isinstance(cookie, Cookie.BaseCookie):
        try:
            c = SimpleCookie()
            c.load(cookie, ignore_parse_errors=True)
        except Cookie.CookieError:
            # Invalid cookie
            return {}
    else:
        c = cookie
    cookiedict = {}
    for key in c.keys():
        cookiedict[key] = c.get(key).value
    return cookiedict

As you can see, it uses the Cookie module from the standard library to parse the cookie with the load method, similarly to what I mentioned above for the non-Django code.

Setting cookies on a response is done with the set_cookie method of HttpResponse. This method simply writes down the new cookie in its self.cookies attribute. WSGIHandler then adds the cookies to its response headers when sending the response.

Wrapping up

As you can see, cookies are relatively easy to handle in Python, and in particular with Django. That said, when writing a Django application it’s rare to be needing cookies directly, because cookies are a fairly low-level building block. Django’s higher level session framework is much easier to use and is the recommended way to implement persistent state in applications. The next part of the article will examine how to use Django sessions and how they work under the hood.

This is the second in a series of three articles.

Sessions are Django’s high-level tool for keeping a persistent state for users on the server. Sessions allow to store arbitrary data per visitor, and have this data available the next time the visitor visits the site. As we’ll learn in this article, sessions are still based on cookies, but cookie management is abstracted away, handling a lot of issues on the way – as sessions provide a more convenient, robust and safe way to store the data.

Example – using sessions

First, it’s useful to see an easy example of using Django sessions. Here’s a simple view that uses sessions to count the amount of times a user has triggered it [1].

def test_count_session(request):
    if 'count' in request.session:
        request.session['count'] += 1
        return HttpResponse('new count=%s' % request.session['count'])
    else:
        request.session['count'] = 1
        return HttpResponse('No count in session. Setting to 1')

If we compare this to the cookies usage example from part I, a couple of differences are apparent:

  • Sessions are more uniform – a single session attribute of the request is used for both querying and modifying the session.
  • While in this example we’re only using an integer as the value, the session attribute acts as a dictionary, allowing string keys and almost arbitrary Python objects as values [2].

However, looking at the actual HTTP traffic for this view, we notice yet another, very important difference. Here’s the cookie the view returns to the user for this particular instance of the application on my machine:

Set-Cookie:sessionid=a92d67e44a9b92d7dafca67e507985c0;
           expires=Thu, 07-Jul-2011 04:16:28 GMT;
           Max-Age=1209600;
           Path=/

There’s no count=1 (or other numeric value) here – the cookie just sets some unique sessionid. We’ll see what this means shortly, but I’ll just note that this is a very important feature of session management. Think about the security implications, for instance. Suppose the user gets a prize for the 10th time he triggers the view. With a simple cookie passing the count into the user’s browser this would be something very easy to spoof. With a session, however, the user has no idea what the correct sessionid – in fact, no such sessionid exists yet, so the user has no real way spoofing his 10th visit [3]

Deciphering the session ID

By default, Django’s session module stores sessions in the app’s main DB, in table django_session with this schema:

CREATE TABLE "django_session" (
    "session_key" varchar(40) NOT NULL PRIMARY KEY,
    "session_data" text NOT NULL,
    "expire_date" datetime NOT NULL
);

session_key is the ID placed in the cookie, and session_data contains the actual session data in encoded format. Here’s how to decipher the session ID we’ve seen above:

from django.contrib.sessions.models import Session
#...
sess = Session.objects.get(pk='a92d67e44a9b92d7dafca67e507985c0')
print(sess.session_data)
print(sess.get_decoded())

This prints:

ZmEyNDVhNTBhMTk2ZmRjNzVlYzQ4NTFjZDk2Y2UwODc3YmVjNWVjZjqAAn1xAVUFY291bnRxAksG
cy4=

{'count': 6}

As you can see, Django stores the request.session dictionary in the DB, in an encoded manner. Django can recover it from the DB by using the session ID the user’s browser returns in a cookie. All of this is done transparently by Django’s session module – the application’s view just has a simple access to the request.session dictionary. Let’s dive into the guts of Django to understand how it manages to make this work.

Session middleware

The first layer of magic I’d like to unwrap has to deal with the session attribute of django.http.HttpRequest. How does the session information even get there, and how can the view change the session by simply modifying the attribute?

The answer is Django’s middleware. To borrow a quote from the Django Book:

[…] Django’s middleware framework, which is a set of hooks into Django’s request/response processing. It’s a light, low-level “plug-in” system capable of globally altering both Django’s input and output.

We can think of middleware in the following way. The normal flow of data around the view we’re coding in Django looks like this:

The view accepts a HTTP request object, does some application-specific work based on its contents and eventually returns a HTTP response object. Middleware makes this process a bit more complicated:

This is done by allowing the programmer to write “hook classes” with special methods that the middleware framework knows about [4]. These hooks can be registered in the MIDDLEWARE_CLASSES setting in settings.py. Note that the django.contrib.sessions.middleware.SessionMiddleware class is there by default. Looking at its source code, it has two middleware hooks – process_request and process_response [5].

process_request pulls the session key (ID) from a cookie. We can see that sessionid is actually a configurable name – SESSION_COOKIE_NAME, set by default in Django’s global settings to sessionid. The request.session attribute is then populated to contain a “session store” object. More on this object a bit later.

process_response saves the session store object (thus making the changes the view did persistent) and attaches a cookie to the response sent to the client. To save on traffic, it does that only if the view actually modified the session, or if the SESSION_SAVE_EVERY_REQUEST setting is set.

This explains how the sessions are translated to cookies. But clearly, a lot of the logic is still hidden, implemented in the store object of sessions. Let’s see how that works.

Session store

Sessions can use one of several storage “engines” (backends). This is configurable via the SESSION_ENGINE setting, which points to django.contrib.sessions.backends.db by default – the application’s main database (as mentioned above in “Deciphering the session ID”). If you look at the sessions/backends directory in Django’s source you’ll see other available engines, but unless your needs are very special, you’re probably OK with the default one.

Each storage engine exports a StorageSession class which derives from StorageBase. This common base implements most of the functionality of session stores, relying on methods from its specializations to abstract away the actual method of storing the data – whether in DB, file, in-memory cache or some other way. The DB-backed store uses Django’s standard ORM, defined in module session.models.

To understand how all these classes play together, let’s follow through what happens when the user tries to access request.session in a view, assuming the default DB store:

  • Session middleware’s process_request sets request.session to be an instance of db.SessionStore with session_key passed into the constructor.
  • The constructor of SessionStore defers to the constructor of SessionBase, which stores the session key for later use.
    • Note that the session isn’t loaded right away from the DB. This is lazy loading – the actual data is loaded when it’s actually being accessed.
  • process_request is done at this point, so the HTTP request is passed into the view. Suppose we now read its count key, as in the example above.
  • SessionBase implements a dict-like interface [6], and in particular __getitem__, which takes the key from a _session attribute, which in itself is a property, deferring reads to the _get_session method.
  • _get_session does the actual lazy loading, using the load method.
  • load is one of the methods related to the actual storage, so SessionBase doesn’t implement it. Instead db.SessionStore implements it and uses the session DB model to load the value from the DB based on the key, decoding it first.

This is about it, except one small detail. How does encoding and decoding work? Let’s look at encode:

def encode(self, session_dict):
    "Returns the given session dictionary pickled and encoded as a string."
    pickled = pickle.dumps(session_dict, pickle.HIGHEST_PROTOCOL)
    hash = self._hash(pickled)
    return base64.encodestring(hash + ":" + pickled)

The session dictionary is pickled. Then, a hash is computed and prepended to the pickle string. Finally the whole string is encoded in base 64, which is stored in the session_data field of the DB table.

Conclusion

In this article, we’ve seen how to use Django sessions, what happens on the low-level of HTTP requests and responses when sessions are being used, and how sessions are actually implemented by Django. While I didn’t cover every little detail, I hope there’s enough information to understand the big picture. If there’s any important information you think I may have missed, please let me know.

[1] For the sake of this article, I’m ignoring cookie/session expiration issues. Assume they never expire.
[2] By “almost arbitrary” I refer to Python objects that are pickle-able.
[3] I say no “real” way because this scheme is, of course, not entirely secure – so don’t bet real money on it. Depending on the exact configuration and usage of sessions by the application, by having access to the traffic from other users, the attacker can possibly spoof a session ID.
[4] These middleware classes are a prime example of Python’s duck typing. No need to adhere to any specified interface; no need to derive from some common base or explicitly set the hook methods. Just implement the methods you need in a class, and register that class with the framework. Python’s reflection and duck typing capabilities are then used to automatically discover and use these hooks.
[5] Take a moment to review the Django middleware docs to understand how to use these hooks.
[6] More formally, a Python mapping type.

 

https://stackoverflow.com/questions/32563236/relation-between-sessions-and-cookies

Amir Masoud Sefidian
Amir Masoud Sefidian
Data Scientist, Machine Learning Engineer, Researcher, Software Developer

Comments are closed.