FEB 8 12

Three projects to boost your Python performance

As mentioned before we at ESN use Python for most things we do as well as the backbone language for our real-time web development framework Planet.

Let’s face it, performance matters. If you can do more with less hardware your web site will cost less to operate. This translates into decreased operating costs and if it’s in your interest also increased profits.

Our hunt for increased performance sent us down the path of developing a couple of Python extensions which we’ve made available on github, licensed under the permissive BSD license. We suffix the project names with “ultra” to denote their family heritage as being closely bound to performance or ultimate if you so will.

Though compatible with any CPython flavour we especially prefer enjoying the increased performance using the well engineered gevent project and socket monkey patching.

So what have we done?

UltraJSON:

Fully JSON standard complaint encoder and decoder with unsurpassed performance written entirely in C. Visit github or pypi.

UltraMySQL:

MySQL client written entirley in C/C++. Visit github or pypi.

UltraMemcache:

Memcache Client written entirley in C/C++. Visit github or pypi.

What kind of performance boost can I expect?

Short answer, it depends. We experienced a 15-35% increase in web requests per second when phasing out simplejson, gevent-mysql and gevent-memcache. Worth mentioning is that all of the modules we phased out have bits and pieces of native compiled code. You may get more, you may get less but considering that compiled native code usually is 2-5 times faster than Python code you will always decrease your effective CPU usage.

by Jonas Tärnström View Comments

MAY 31 11

A faster MySQL driver for Python and gevent

During the later development and optimization phases of our real-time web framework PLANET we at ESN began looking more at the possibility of doing spot on C/C++ optimizations of our Python code. Not being the typical Python hacker of ESN I was asked to put together a proof of concept MySQL driver for Python written in pure C/C++.

Lucky for me I didn’t have to start from scratch, the gevent port of the Concurrence project already had done most of the research of the protocol and their source code acted as the perfect blue print. My work would become more of porting Python and Pyrex to C/C++ while adding my own touch for low level optimizations.

Integration code and the base driver

The base driver and the language specific integration code exchange a set of function pointers which allows the base driver to call into the integration code. The integration code is responsible for creating sockets, handling blocking I/O, result sets, character set conversions and type conversions. The base drivers deals with the MySQL protocol and may I say in a very performant matter :)

Not only for gevent and Python!

While the emphasis has been on creating a Python driver for gevent I’m also shipping integration code for “normal” blocking Python called simply CPython. Beyond that there’s nothing that will stop anyone from developing bindings to other languages like Ruby for instance. I’m looking very much forward to seeing that happening!

Fibers, threads or greenlets

Depending on which approach the I/O specific integration code takes it’s possible to adapt the base driver to any type offiber, thread or greenlet enviorment provided that they either switch back to the operating system upon blocking for I/O or yield to another user space thread (like fibers and greenlets) and while doing so preserves the stack.

Performance benchmarks

I’ll publish more detailed statistics later on but my initial benchmarks showed a 3-8 times performance gain over using the gevent-mysql driver. Until I find the time for putting up better benchmarks you’re just gonna have to take my word for it or feel free to contribute your own results!
Check out the project on github here!

by Jonas Tärnström View Comments

MAR 2 11

Ultra fast JSON encoder and decoder for Python

We do a lot of JSON encoding and decoding here at ESN. Python 2.6 ships with an accurate but rather slow implementation which we’ve switched for simplejson. There’s a lot of stuff going on with JavaScript and JSON today and I thought maybe this was a place where my good old C optimization skills could be of good use. To be honest I also wanted to prove that I still had the skills

UltraJSON

Not being able to stay out of this mess I spent a weekend researching the quickest and (perhaps) also the dirtiest way of  encoding and decoding JSON. I call the result UltraJSON and it’s by my preliminary and perhaps somewhat limited benchmarks the fastest JSON encoder and decoder I’ve found so far (and if it’s not I’m gonna make it faster!).

Python bindings

Neither the decoder nor the encoder part of UltraJSON is specific to any language. It can be integrated with most anything and since I wanted my colleges to use it I implemented Python bindings for it as the module ‘ujson’.

UPDATE: UltraJSON is now available on PyPI package ujson. Installation should be simple through easy_install or pip!

Current benchmarks:

64-bit benchmarks Linux
Python 2.6.6 (r266:84292, Sep 15 2010, 16:22:56)
OS Version: Ubuntu 10.10
System Type: x64-based PC
Processor: Intel(R) Core(TM) i5-2300 CPU @ 2.80GHz
Total Physical Memory: 4096 MB

Array with 256 utf-8 strings:
ujson encode      : 2874.54652 calls/sec
simplejson encode : 1539.47999 calls/sec
cjson encode      : 132.33571 calls/sec

ujson decode      : 2072.09417 calls/sec
cjson decode      : 991.20903 calls/sec
simplejson decode : 310.75309 calls/sec

Medium complex object:
ujson encode      : 19001.01929 calls/sec
simplejson encode : 3512.29205 calls/sec
cjson encode      : 3063.69959 calls/sec

ujson decode      : 12791.80993 calls/sec
cjson decode      : 8288.32916 calls/sec
simplejson decode : 6640.22169 calls/sec

Array with 256 strings:
ujson encode      : 40161.78453 calls/sec
simplejson encode : 19301.40779 calls/sec
cjson encode      : 12337.13166 calls/sec

ujson decode      : 36944.81317 calls/sec
cjson decode      : 30187.40167 calls/sec
simplejson decode : 25105.56562 calls/sec

Array with 256 doubles:
ujson encode      : 6054.71950 calls/sec
simplejson encode : 2912.44353 calls/sec
cjson encode      : 3539.51228 calls/sec

ujson decode      : 27794.29735 calls/sec
cjson decode      : 14892.38775 calls/sec
simplejson decode : 14879.00070 calls/sec

Array with 256 True values:
ujson encode      : 168086.95325 calls/sec
simplejson encode : 49348.93309 calls/sec
cjson encode      : 67392.90623 calls/sec

ujson decode      : 139359.25968 calls/sec
cjson decode      : 82552.26652 calls/sec
simplejson decode : 114998.51396 calls/sec

Array with 256 dict{string, int} pairs:
ujson encode      : 24125.68837 calls/sec
simplejson encode : 5751.74871 calls/sec
cjson encode      : 4735.65147 calls/sec

ujson decode      : 17176.70493 calls/sec
cjson decode      : 13420.93963 calls/sec
simplejson decode : 9854.27352 calls/sec

Dict with 256 arrays with 256 dict{string, int} pairs:
ujson encode      : 86.52449 calls/sec
simplejson encode : 17.46117 calls/sec
cjson encode      : 18.31323 calls/sec

ujson decode      : 49.54660 calls/sec
cjson decode      : 38.34094 calls/sec
simplejson decode : 28.18035 calls/sec

More on GitHub!

I’d love to see more people using and contributing to this project, so please check out my GitHub repository!

Bindings to more languages would be awesome!

by Jonas Tärnström View Comments

OCT 15 10

Reaching out of the browser sandbox using jQuery Title Alert

Most websites that we develop at ESN are social websites with real-time features. Features that are in real-time on these websites might be contact lists with chat features, counter-strike game lobbies, social activity feeds, and so on. Having these features in real-time really enhances the over all functionality of the websites, especially together with some ways of making users aware of the updates, like animations and toasters.

These websites – as well as many others on the net like Facebook, Twitter, Gmail, Last.fm (the list goes on forever) – are more and more taking the role that was previously hold by normal applications like Thunderbird, Emacs, Skype, etc. When using such web application, it’s normal user behavior to multitask between different web pages and programs, just like one would do using a normal application, like Skype. However, OS level programs have ways to alert the user when certain events occur (i.e. someone sends you a message), that web application is some what lacking, because they live in the browser sandbox. This is a problem, because the usefulness of having a user-to-user chat in real-time, decreases a great deal, if one has to manually go and check for new messages while having another website, or another program, focused. One solution to this is to show notification messages in the browser’s title bar.

That’s why I wrote jQuery Title Alert about a year ago. It’s a small jQuery plugin for displaying a flashing notification message in the browser’s title bar. It supports setting different intervals and timings, that defines the appearance of the notification, as well as options for stopping the title notification when the browser get focus.

Hopefully someone will find the plugin very useful as a complement to a website that uses a real-time service like BeaconPush.

Try it!

Here is an example on how to use it:
(See the documentation for all available options)

$.titleAlert("New chat message!", {
    stopOnFocus:true,
    duration:60000,
    interval:500
});

Get it!

Go ahead and grab it in my original blog post, or at the project GitHub page. And feel free to fork it as much as you want!

by Jonatan Heyman View Comments

SEP 15 10

Endless scroller jQuery plugin

In a recent web project, I needed to build an automatic pagination for end-users, which automatically fetches more content when the user scrolls the page. Examples of this are the Facebook log and the shoe browsing on http://www.mirapodo.de/herrenschuhe/ (not affiliated with us). When the user gets to the end of the page, it fetches more content. This way, you don’t have to load more than neccesary on the first page load, and the user benefits, since she doesn’t have to click numbers in a pagination widget. The result is the jquery.esn.autobrowse.js jQuery plugin which does that for any page.

It loads JSON data from the server, and renders it client-side. One of the problems with a typical auto-pagination is that when the user clicks on a link in the content, and then clicks “back” in the browser, he or she ends up at the top of the page and have to scroll all the way down again (even Facebook suffers from this problem at the time of writing this 2010). In this plugin, you can choose to use the browser local storage to cache the content fetched, and it keeps track on how far you have scrolled on the page. This means that if the user clicks a link and then goes back, she will see what she left.

The browser cache is accomplished using the jStorage plugin (slightly modified for this autobrowse plugin). It presents a cross-browser interface for saving arbitrary data in the browser storage. The storage methods are different in different browsers however, so for instance IE doesn’t have as much storage space as Chrome or Firefox. The cache does fallback on an ajax request if the storage failed. The modified version of jStorage notifies the autobrowse plugin when the storage failed (i.e. when the data didn’t fit into the space available).

Try the demo and download. Report bugs or feature requests to me on GitHub or simply fork it!
by Micael Sjölund View Comments

JUN 18 10

Python memory management and TCMalloc

At ESN, we have recently experienced problems with Python not giving back memory to the OS in Linux. It reuses allocated memory internally, but never releases free memory back to the OS. This causes problems with monitoring, as it becomes difficult to see trends or temporary memory usage spikes. At first we thought we had a Python memory leak on our hands. Others seem to have similar problems, for example there is a Stack Overflow entry about it. We investigated this problem and solved it using TCMalloc, a malloc replacement part of Google Performance Tools, and some appropriate tuning. This post details the results of our investigation and our solution.

Pythons default memory management is based on two basic techniques:

  • Using malloc, on Linux-systems usually the GLIBC version, to allocate memory from the OS.
  • A custom memory allocator for small objects, to reduce the number of malloc calls.

To understand why Python does not give back memory to the OS, we have to dig into how GLIBC’s malloc works and in particular how Linux memory management works.

In Linux, there are two ways to allocate memory:

  • Through the brk()/sbrk() syscalls. These are used to increase or decrease a continuous amount of memory allocated to the process. It is always provided as a continuous chunk, so you can only free memory at the end of the allocated memory, you cannot have “holes”.
  • Through the mmap() syscall. With mmap, you can allocate an arbitrary size of memory and map it wherever you like in the virtual address space of the process. You can also release memory allocated by mmap using munmap(), meaning you can have “holes” in your allocated memory. In many respects, allocating memory through mmap() is similar and as flexible as using malloc. There is, however, a performance penalty for using mmap. The reason is that the OS, to be POSIX compliant, has to zero the memory before giving it to the process. Because of this, mmap is traditionally only used for larger allocations that are not so frequent.

The picture below shows the virtual address space of a process. The first segment, marked as brk, is the memory allocated using brk()/sbrk() calls. The end of the brk segment is called the breakpoint of the process (which is the reason for the syscall names). Using sbrk()/brk(), it is possible to move this breakpoint. With mmap() you can place arbitrary chunks of memory into the address space.

Describes how Unix uses brk() and mmap

GLIBC’s malloc uses both brk and mmap. It uses brk for small allocations (on 64-bit the default is lower than 64MB, but this threshold is dynamically adjusted and can be tuned, explained in a message from libc mailing list) and mmap for larger allocations. Allocations inside the memory allocated by brk is managed by malloc internally, potentially leading to fragmentation.

The problem arises when many allocations occur followed by almost all memory being freed. If the memory that is still allocated is high up in the brk segment, malloc will not be able to release the memory to the OS. Typical scenarios is when you have a long memory-consuming computation, and store the result. The result is then likely to be in the upper part of the brk segment.

While malloc can be tuned to use mmap at lower thresholds, it does not have the ability to manage smaller allocations inside a block allocated using mmap. Python’s own allocator for small objects can help, but it does not use it for all objects.

Our solution was to use TCMalloc, a malloc replacement part of  Google Performance Tools. TCMalloc can be tuned to only use mmap, and uses a delay before releasing memory to the OS, reducing the number of OS calls for applications using malloc frequently.

We compiled a version of Python with TCMalloc that only uses mmap. When testing the new Python in one of our largest projects, we found that not only did Python give back memory to the OS correctly, it also had a reduced memory usage and no apparent CPU penalty for using mmap instead of brk.

Planet, our development platform for the social real-time web,  now ships with Python using TCMalloc as standard.

by Marcus Nilsson View Comments

JUN 8 10

Debugging NPAPI plugins in Visual Studio

A follow up to my last post about a sample boilerplate plugin for NPAPI I’d like to give some advice on how to debug NPAPI plugins in Mozilla Firefox in Visual Studio 2008 (or similar).

Disable Mozilla Crash reporting

In order to debug your plugin you must first make sure that Mozilla Firefox doesn’t try to handle crashes and exceptions on it’s own. The easiest way of doing this is to set a System environment variable (Control panel->System->Advanced->Environment Variables). Name the variable MOZ_CRASHREPORTER_DISABLE and set it’s value to “1″ (without the quotes of course).  Restart most things after this, or at least Visual Studio and any running Firefox instances.

Making sure Mozilla loads the right plugin

  1. Start Visual Studio and open your project. Make note of the full path to the plugin DLL your project builds.
  2. Start regedit and change the “path” value for your plugin’s registry entry to the full path of your plugin DLL. Take a look at the registry patch found in the boilerplate project for further reference.
  3. Create a HTML test file which instances your plugin using the <embed> or <object> tag. You can use mine as a reference.
  4. Open Project properties (ALT+F7) for your plugin project.
  5. Go to the Debugging.
  6. As Command, browse to the location of firefox.exe
  7. As Command Arguments, enter the URL to your test file written as an URL (like file:///c:/mydir/myfile.html)
  8. Save/Apply the changes.
  9. Set a breakpoint early in the program, preferably in some of the boilerplate like the NP_Initialize function. This is just to make sure everything works out.
  10. Start debugging (press F5). If the breakpoint we set out trips it means everything works out.

Troubleshooting the debugging

The most common problem is that Firefox for some reason loads another plugin DLL or not the one matching your PDB files. Make sure everything in step 2 checks out.

Happy debugging!

by Jonas Tärnström View Comments

JUN 8 10

Think twice before “helping” your visitors

The website of SEB, a major bank in Sweden, has a lot of annoying design choices.

Nevermind the frames, as I’m sure they have lots of enterprisy technical reasons to use them in the year 2010.  Nevermind selectboxes with “go”-buttons, instead of plain links.

The top most annoying thing is that when logging in, they have decided to “help” you press TAB.

Login screen of SEB

After typing in my 10 digit ID-number, SEB has decided to help me set the focus to the next inputbox.

So, every single time I try to login, I forget that they are “helping” me and I press TAB after completing the first inputbox.

This is the way all other sites and desktop apps work. But, at SEB, this of course makes the login button get focus instead (since focus has shifted to the second input by the time I press TAB).

The intent is good, but the result is horrible. Instead of trying to come up with clever ideas like this, make sure your website follows current UI standards. That’s the best help your visitors can get.

by Markus Thurlin View Comments

JUN 7 10

Useful tools for JBoss Netty

At ESN, we’re avid users of the Java networking library, JBoss Netty. It sports a really well-designed API for working with non-blocking I/O and great performance! Written by Trustin Lee of the Apache MINA fame.

Our real-time push server, Orbit, is written in Java and based on aforementioned library. Orbit is used both in our real-time Planet framework and in Beacon, our cloud hosted push service. For supporting the latest in HTML 5 development it naturally supports both Comet and Web Sockets as transport alternatives.

During the months of developing Orbit, a couple well-known standards and protocols was implemented. HTTP, Web Sockets and Thrift being the most used. However, after finishing and stabilizing these it felt a bit immoral to keep this closed source. Especially since we use a lot of open-source software ourselves.

Giving back

Most of what could be re-used was extracted out of our real-time push server and put up on GitHub. Licensed under the liberal MIT license and will hopefully come to good use.

Right now it features:

  • HTTP File Server
  • HTTP Cache
  • HTTP Router
  • Bandwidth meter
  • Asynchronous Web Socket client
  • Thrift processor

More details can be found in the netty-tools repository on GitHub.

Despite us trying to minimize the use of Java, we still use it for certain projects. The performance and tools available make it very hard to resist sometimes. However, we try limit the use of it to infrastructural projects with well defined responsibilities, such as back-end servers or IDE development.

Our goal of sharing and open-sourcing as much ESN technology as possible has started. Keep your eyes open for more!

by Carl Byström View Comments

JUN 3 10

Boilerplate for a NPAPI plugin

The world of NPAPI is strange and odd world kind of like a puzzle with the right answers spread all over the Internet. Getting started with a portable project can be quiet cumbersome and the browsers do very little and sometimes nothing to aid you with debugging and deployment.

Therefore I’ve put together a sample plugin with the boilerplate and everything necessary to build an (almost) empty  NPAPI plugin for Windows and OS X. My hope is that this can get you going faster with less problems and frustrations. The plugin has been tested and should work with most browsers compatible with NPAPI. These include Google Chrome, Firefox, Opera and Safari. I’ve created ready to use project files for Xcode (3.2) and Visual Studio (2008). It’s probably possible (with little or no modification) to build this plugin on various Unix flavors such as Linux but I simply haven’t tried it. I’m not claiming that I know everything on this subject but I do know that this boilerplate is in it’s essence the same code as we have shipped to several satisfied customers of ESN.

Feel free to fork the project on GitHub!

read more…

by Jonas Tärnström View Comments