Python memory management and TCMalloc
At ESN, we have recently experienced problems with Python not giving back memory to the OS in Linux. It reuses allocated memory internally, but never releases free memory back to the OS. This causes problems with monitoring, as it becomes difficult to see trends or temporary memory usage spikes. At first we thought we had a Python memory leak on our hands. Others seem to have similar problems, for example there is a Stack Overflow entry about it. We investigated this problem and solved it using TCMalloc, a malloc replacement part of Google Performance Tools, and some appropriate tuning. This post details the results of our investigation and our solution.
Pythons default memory management is based on two basic techniques:
- Using malloc, on Linux-systems usually the GLIBC version, to allocate memory from the OS.
- A custom memory allocator for small objects, to reduce the number of malloc calls.
To understand why Python does not give back memory to the OS, we have to dig into how GLIBC’s malloc works and in particular how Linux memory management works.
In Linux, there are two ways to allocate memory:
- Through the brk()/sbrk() syscalls. These are used to increase or decrease a continuous amount of memory allocated to the process. It is always provided as a continuous chunk, so you can only free memory at the end of the allocated memory, you cannot have “holes”.
- Through the mmap() syscall. With mmap, you can allocate an arbitrary size of memory and map it wherever you like in the virtual address space of the process. You can also release memory allocated by mmap using munmap(), meaning you can have “holes” in your allocated memory. In many respects, allocating memory through mmap() is similar and as flexible as using malloc. There is, however, a performance penalty for using mmap. The reason is that the OS, to be POSIX compliant, has to zero the memory before giving it to the process. Because of this, mmap is traditionally only used for larger allocations that are not so frequent.
The picture below shows the virtual address space of a process. The first segment, marked as brk, is the memory allocated using brk()/sbrk() calls. The end of the brk segment is called the breakpoint of the process (which is the reason for the syscall names). Using sbrk()/brk(), it is possible to move this breakpoint. With mmap() you can place arbitrary chunks of memory into the address space.
GLIBC’s malloc uses both brk and mmap. It uses brk for small allocations (on 64-bit the default is lower than 64MB, but this threshold is dynamically adjusted and can be tuned, explained in a message from libc mailing list) and mmap for larger allocations. Allocations inside the memory allocated by brk is managed by malloc internally, potentially leading to fragmentation.
The problem arises when many allocations occur followed by almost all memory being freed. If the memory that is still allocated is high up in the brk segment, malloc will not be able to release the memory to the OS. Typical scenarios is when you have a long memory-consuming computation, and store the result. The result is then likely to be in the upper part of the brk segment.
While malloc can be tuned to use mmap at lower thresholds, it does not have the ability to manage smaller allocations inside a block allocated using mmap. Python’s own allocator for small objects can help, but it does not use it for all objects.
Our solution was to use TCMalloc, a malloc replacement part of Google Performance Tools. TCMalloc can be tuned to only use mmap, and uses a delay before releasing memory to the OS, reducing the number of OS calls for applications using malloc frequently.
We compiled a version of Python with TCMalloc that only uses mmap. When testing the new Python in one of our largest projects, we found that not only did Python give back memory to the OS correctly, it also had a reduced memory usage and no apparent CPU penalty for using mmap instead of brk.
Planet, our development platform for the social real-time web, now ships with Python using TCMalloc as standard.
Debugging NPAPI plugins in Visual Studio
A follow up to my last post about a sample boilerplate plugin for NPAPI I’d like to give some advice on how to debug NPAPI plugins in Mozilla Firefox in Visual Studio 2008 (or similar).
Disable Mozilla Crash reporting
In order to debug your plugin you must first make sure that Mozilla Firefox doesn’t try to handle crashes and exceptions on it’s own. The easiest way of doing this is to set a System environment variable (Control panel->System->Advanced->Environment Variables). Name the variable MOZ_CRASHREPORTER_DISABLE and set it’s value to “1″ (without the quotes of course). Restart most things after this, or at least Visual Studio and any running Firefox instances.
Making sure Mozilla loads the right plugin
- Start Visual Studio and open your project. Make note of the full path to the plugin DLL your project builds.
- Start regedit and change the “path” value for your plugin’s registry entry to the full path of your plugin DLL. Take a look at the registry patch found in the boilerplate project for further reference.
- Create a HTML test file which instances your plugin using the <embed> or <object> tag. You can use mine as a reference.
- Open Project properties (ALT+F7) for your plugin project.
- Go to the Debugging.
- As Command, browse to the location of firefox.exe
- As Command Arguments, enter the URL to your test file written as an URL (like file:///c:/mydir/myfile.html)
- Save/Apply the changes.
- Set a breakpoint early in the program, preferably in some of the boilerplate like the NP_Initialize function. This is just to make sure everything works out.
- Start debugging (press F5). If the breakpoint we set out trips it means everything works out.
Troubleshooting the debugging
The most common problem is that Firefox for some reason loads another plugin DLL or not the one matching your PDB files. Make sure everything in step 2 checks out.
Happy debugging!
Think twice before “helping” your visitors
The website of SEB, a major bank in Sweden, has a lot of annoying design choices.
Nevermind the frames, as I’m sure they have lots of enterprisy technical reasons to use them in the year 2010. Nevermind selectboxes with “go”-buttons, instead of plain links.
The top most annoying thing is that when logging in, they have decided to “help” you press TAB.
After typing in my 10 digit ID-number, SEB has decided to help me set the focus to the next inputbox.
So, every single time I try to login, I forget that they are “helping” me and I press TAB after completing the first inputbox.
This is the way all other sites and desktop apps work. But, at SEB, this of course makes the login button get focus instead (since focus has shifted to the second input by the time I press TAB).
The intent is good, but the result is horrible. Instead of trying to come up with clever ideas like this, make sure your website follows current UI standards. That’s the best help your visitors can get.
Useful tools for JBoss Netty
At ESN, we’re avid users of the Java networking library, JBoss Netty. It sports a really well-designed API for working with non-blocking I/O and great performance! Written by Trustin Lee of the Apache MINA fame.
Our real-time push server, Orbit, is written in Java and based on aforementioned library. Orbit is used both in our real-time Planet framework and in Beacon, our cloud hosted push service. For supporting the latest in HTML 5 development it naturally supports both Comet and Web Sockets as transport alternatives.
During the months of developing Orbit, a couple well-known standards and protocols was implemented. HTTP, Web Sockets and Thrift being the most used. However, after finishing and stabilizing these it felt a bit immoral to keep this closed source. Especially since we use a lot of open-source software ourselves.
Giving back
Most of what could be re-used was extracted out of our real-time push server and put up on GitHub. Licensed under the liberal MIT license and will hopefully come to good use.
Right now it features:
- HTTP File Server
- HTTP Cache
- HTTP Router
- Bandwidth meter
- Asynchronous Web Socket client
- Thrift processor
More details can be found in the netty-tools repository on GitHub.
Despite us trying to minimize the use of Java, we still use it for certain projects. The performance and tools available make it very hard to resist sometimes. However, we try limit the use of it to infrastructural projects with well defined responsibilities, such as back-end servers or IDE development.
Our goal of sharing and open-sourcing as much ESN technology as possible has started. Keep your eyes open for more!
Boilerplate for a NPAPI plugin
The world of NPAPI is strange and odd world kind of like a puzzle with the right answers spread all over the Internet. Getting started with a portable project can be quiet cumbersome and the browsers do very little and sometimes nothing to aid you with debugging and deployment.
Therefore I’ve put together a sample plugin with the boilerplate and everything necessary to build an (almost) empty NPAPI plugin for Windows and OS X. My hope is that this can get you going faster with less problems and frustrations. The plugin has been tested and should work with most browsers compatible with NPAPI. These include Google Chrome, Firefox, Opera and Safari. I’ve created ready to use project files for Xcode (3.2) and Visual Studio (2008). It’s probably possible (with little or no modification) to build this plugin on various Unix flavors such as Linux but I simply haven’t tried it. I’m not claiming that I know everything on this subject but I do know that this boilerplate is in it’s essence the same code as we have shipped to several satisfied customers of ESN.
Feel free to fork the project on GitHub!

