First, to be honest, this post is the result of personal frustration. I learned python writing gaeutilties, and it’s still my language of choice even if appengine isn’t my choice for application hosting. So I often search for gaeutilities and find people asking for support on different forums and answer them. It seems I am not the only one who performs this search though. Most of the time I get to a question on it, the author of gae-sessions has already posted a reply stating the person should use his library instead. In most cases the issue that the person has had would be the exact same with gae-sessions as well. Basically, the poor developer looking to solve a problem is having their time wasted. Here’s some examples.
http://groups.google.com/group/google-appengine-python/browse_thread/thread/bd890af0db6549c1
http://stackoverflow.com/questions/3288565/gaeutilities-session-problem/4368336#4368336 So, here’s the benchmarks.
https://github.com/dound/gae-sessions/wiki/comparison-with-alternative-libraries And here’s the script.
https://gist.github.com/401981 Now, to touch on a few items. Complexity yes, there’s almost 5 times as much code for gaeutilities session as there is for gae-sessions. But wait, you haven’t gotten to real complexity yet. Gaeutilities ships with an entire settings file. https://github.com/joerussbowman/gaeutilities/blob/master/appengine_utilities/settings_default.py There’s 12 parameters for session alone in there. This is because Gaeutilities session is much more configurable and secure than the other competitors. More on why this is so below. Security The other sessions libraries are not as secure as gaeutilties for http requests. I’ll say it again. The other sessions libraries are not as secure as gaeutilties for http requests. Long before Firesheep came out, I realized that session hijacking is amazingly simple in most session libraries. Since https for 3rd party domains isn’t an option, I wanted to write something that could at least be more secure, though I will admit even it lacks the security offered by an encrypted connection. Gaeutilities rotates the session token every x seconds (defaults to 5) and keeps the most recent 3 tokens available. Ok, that’s a mouth full. Here, go over to this link http://gaeutilities.appspot.com/session and scroll down a little until you see the really long ugly string of characters above the results table. That’s the session id that’s stored as a cookie in your browser. Now, start hitting refresh. You’ll see that is changes fairly often. Most libraries never change this token so that’s how you use Firesheep to attack them. Gaeutilties also includes User Agent and IP validation by default as well. There is a performance hit for this. There needs to be a write to the datastore every time that token changes in order to keep it up to date. This is the point most people have for not using gaeutilties. They don’t feel they need the security and want more performance. Actually, this is where that whole point about gaeutilities being configurable comes in. In the settings file (which you need to copy from settings_default.py to settings.py in order to use) change “SESSION_TOKEN_TTL” to something like 9999999999. That token won’t rotate now. An idea for the future might be make 0 or None an option for that setting, I’ll keep that in mind. Performance OK, gaeutilties uses an entity for each piece of session data. So yea if you do an integer writing testing with code like
Lazy Loading I don’t think someone read the gaeutilities source code when they made the proclamation it doesn’t do lazy loading. I think the code speaks for itself.
The future Now, sessions2 is starting for gaeutilities. And it’s going to be an improvement on the current sessions library, not a rewrite. The primary change is that session will move to a decorator pattern which will help with the following
http://stackoverflow.com/questions/3288565/gaeutilities-session-problem/4368336#4368336 So, here’s the benchmarks.
https://github.com/dound/gae-sessions/wiki/comparison-with-alternative-libraries And here’s the script.
https://gist.github.com/401981 Now, to touch on a few items. Complexity yes, there’s almost 5 times as much code for gaeutilities session as there is for gae-sessions. But wait, you haven’t gotten to real complexity yet. Gaeutilities ships with an entire settings file. https://github.com/joerussbowman/gaeutilities/blob/master/appengine_utilities/settings_default.py There’s 12 parameters for session alone in there. This is because Gaeutilities session is much more configurable and secure than the other competitors. More on why this is so below. Security The other sessions libraries are not as secure as gaeutilties for http requests. I’ll say it again. The other sessions libraries are not as secure as gaeutilties for http requests. Long before Firesheep came out, I realized that session hijacking is amazingly simple in most session libraries. Since https for 3rd party domains isn’t an option, I wanted to write something that could at least be more secure, though I will admit even it lacks the security offered by an encrypted connection. Gaeutilities rotates the session token every x seconds (defaults to 5) and keeps the most recent 3 tokens available. Ok, that’s a mouth full. Here, go over to this link http://gaeutilities.appspot.com/session and scroll down a little until you see the really long ugly string of characters above the results table. That’s the session id that’s stored as a cookie in your browser. Now, start hitting refresh. You’ll see that is changes fairly often. Most libraries never change this token so that’s how you use Firesheep to attack them. Gaeutilties also includes User Agent and IP validation by default as well. There is a performance hit for this. There needs to be a write to the datastore every time that token changes in order to keep it up to date. This is the point most people have for not using gaeutilties. They don’t feel they need the security and want more performance. Actually, this is where that whole point about gaeutilities being configurable comes in. In the settings file (which you need to copy from settings_default.py to settings.py in order to use) change “SESSION_TOKEN_TTL” to something like 9999999999. That token won’t rotate now. An idea for the future might be make 0 or None an option for that setting, I’ll keep that in mind. Performance OK, gaeutilties uses an entity for each piece of session data. So yea if you do an integer writing testing with code like
class WriteInts(req_handler_cls):
def get(self, n):
session = get_session(self)
start_session(session)
for i in xrange(int(n)):
session['i%d' % i] = i
save_session(session)
self.response.headers['Content-Type'] = 'text/plain'
self.response.out.write('this page wrote %s ints to the session' % n)
Well, that’s going to create a lot of ints for something like WriteInts(req, 10000)
The libraries that use a single entity to store all data (thus getting their 1MB limit) of course are going to have a much easier time handling this because they’re only working one entity. However, I did say that there is one entity for each gaeutilities session data item. So, what if the code was written like this for the gaeutilities test?
class WriteInts(req_handler_cls):
def get(self, n):
session = get_session(self)
start_session(session)
session[ints] = {}
for i in xrange(int(n)):
session["ints"]['i%d' % i] = i
save_session(session)
self.response.headers['Content-Type'] = 'text/plain'
self.response.out.write('this page wrote %s ints to the session' % n)
I’d imagine the test would be a lot quicker. However, truth be told at this point other libraries may still do it faster. Gaeutilities will write every time you add an int which will impact performance. It’d write on each loop. So maybe something like this would be better.
class WriteInts(req_handler_cls):
def get(self, n):
session = get_session(self)
start_session(session)
ints = {}
for i in xrange(int(n)):
ints['i%d' % i] = i
session["ints"] = ints
save_session(session)
self.response.headers['Content-Type'] = 'text/plain'
self.response.out.write('this page wrote %s ints to the session' % n)
This optimization could continue at this point, dropping the dict and making it a list since it’s based on integers anyway, but I think the point has been made.
Lazy Loading I don’t think someone read the gaeutilities source code when they made the proclamation it doesn’t do lazy loading. I think the code speaks for itself.
def __getitem__(self, keyname):
"""
Get item from session data.
keyname: The keyname of the mapping.
"""
# flash messages don't go in the datastore
if self.integrate_flash and (keyname == u"flash"):
return self.flash.msg
if keyname in self.cache:
return self.cache[keyname]
if keyname in self.cookie_vals:
return self.cookie_vals[keyname]
if hasattr(self, u"session"):
data = self._get(keyname)
if data:
try:
if data.model != None:
self.cache[keyname] = data.model
return self.cache[keyname]
else:
self.cache[keyname] = pickle.loads(data.content)
return self.cache[keyname]
except:
self.delete_item(keyname)
else:
raise KeyError(unicode(keyname))
raise KeyError(unicode(keyname))
No session item is retrieved until you try to access it. After you’ve accessed it is cached for the duration of the request so it doesn’t need to be retrieved from memcache or the datastore again. Not sure how that isn’t lazy loading?
Scalability Well, it supports more than 1MB of data, handles store model entities by automatically creating references. Sure, it’s up to the developer to best determine how to manage their data, but I seriously doubt people are writing 10,000 ints to a session object over and over. Gaeutilties doesn’t impose limits on the developer by locking them into a single entity for storage. Gaeutilities could store a whole lot of gae-sessions sessions.
Not to mention the durability gaeutilities provides for failed datastore writes. Face it, they happen, a lot more than any of us ever want to see. I got frustrated with it early on and wrote session to manage this by storing the writes in memcache as well, and keeping track of the 2 to make sure they were in sync.
The future Now, sessions2 is starting for gaeutilities. And it’s going to be an improvement on the current sessions library, not a rewrite. The primary change is that session will move to a decorator pattern which will help with the following
- No more accidentally instantiating Session() twice in a request, deleting the session object by accident.
- Deferred writes to speed up performance. I’m currently evaluating two methods for doing this. The first is to just cache all writes to the end and then do a db.put() for all new/changed session values. The other is I’m evaluating the implementation of asynchronous datastore writes using RPC calls that will be validated after output.