JUN 18 10

Python memory management and TCMalloc

At ESN, we have recently experienced problems with Python not giving back memory to the OS in Linux. It reuses allocated memory internally, but never releases free memory back to the OS. This causes problems with monitoring, as it becomes difficult to see trends or temporary memory usage spikes. At first we thought we had a Python memory leak on our hands. Others seem to have similar problems, for example there is a Stack Overflow entry about it. We investigated this problem and solved it using TCMalloc, a malloc replacement part of Google Performance Tools, and some appropriate tuning. This post details the results of our investigation and our solution.

Pythons default memory management is based on two basic techniques:

  • Using malloc, on Linux-systems usually the GLIBC version, to allocate memory from the OS.
  • A custom memory allocator for small objects, to reduce the number of malloc calls.

To understand why Python does not give back memory to the OS, we have to dig into how GLIBC’s malloc works and in particular how Linux memory management works.

In Linux, there are two ways to allocate memory:

  • Through the brk()/sbrk() syscalls. These are used to increase or decrease a continuous amount of memory allocated to the process. It is always provided as a continuous chunk, so you can only free memory at the end of the allocated memory, you cannot have “holes”.
  • Through the mmap() syscall. With mmap, you can allocate an arbitrary size of memory and map it wherever you like in the virtual address space of the process. You can also release memory allocated by mmap using munmap(), meaning you can have “holes” in your allocated memory. In many respects, allocating memory through mmap() is similar and as flexible as using malloc. There is, however, a performance penalty for using mmap. The reason is that the OS, to be POSIX compliant, has to zero the memory before giving it to the process. Because of this, mmap is traditionally only used for larger allocations that are not so frequent.

The picture below shows the virtual address space of a process. The first segment, marked as brk, is the memory allocated using brk()/sbrk() calls. The end of the brk segment is called the breakpoint of the process (which is the reason for the syscall names). Using sbrk()/brk(), it is possible to move this breakpoint. With mmap() you can place arbitrary chunks of memory into the address space.

Describes how Unix uses brk() and mmap

GLIBC’s malloc uses both brk and mmap. It uses brk for small allocations (on 64-bit the default is lower than 64MB, but this threshold is dynamically adjusted and can be tuned, explained in a message from libc mailing list) and mmap for larger allocations. Allocations inside the memory allocated by brk is managed by malloc internally, potentially leading to fragmentation.

The problem arises when many allocations occur followed by almost all memory being freed. If the memory that is still allocated is high up in the brk segment, malloc will not be able to release the memory to the OS. Typical scenarios is when you have a long memory-consuming computation, and store the result. The result is then likely to be in the upper part of the brk segment.

While malloc can be tuned to use mmap at lower thresholds, it does not have the ability to manage smaller allocations inside a block allocated using mmap. Python’s own allocator for small objects can help, but it does not use it for all objects.

Our solution was to use TCMalloc, a malloc replacement part of  Google Performance Tools. TCMalloc can be tuned to only use mmap, and uses a delay before releasing memory to the OS, reducing the number of OS calls for applications using malloc frequently.

We compiled a version of Python with TCMalloc that only uses mmap. When testing the new Python in one of our largest projects, we found that not only did Python give back memory to the OS correctly, it also had a reduced memory usage and no apparent CPU penalty for using mmap instead of brk.

Planet, our development platform for the social real-time web,  now ships with Python using TCMalloc as standard.

by Marcus Nilsson
  • Pytrade

    Thanks for the very helpful post.

    “We compiled a version of Python with TCMalloc that only uses mmap”

    Do you know how to make TCMalloc only use mmap?

  • http://www.esn.me Jonas Tärnström

    Hi,
    Just set the enviorment variable TCMALLOC_SKIP_SBRK to true.

    More on this on:
    http://google-perftools.googlecode.com/svn/trunk/doc/tcmalloc.html

  • Masebase

    Ran into this issue a couple of years ago with some C++ programs. Never understood it totally just changed malloc. Your post helps put all the pieces together. Thanks for sharing.

    My poorly written discussion of the issue.
    http://dev.pezad.com/2008/09/memory-allocation.html

    I used jemalloc to start with which seems to hand back memory the best, but is less stable then google’s.

  • eastlandgrl

    interesting, thanks

  • Delijati

    Wow nice this should be a part of the vanilla python …

  • excetara2

    Can you provide instructions how to compile Python with TCMalloc as the replacement?? I am having the same issue in linux (Debian based).

  • Antoine

    This should be fixed in 3.3: http://hg.python.org/cpython/rev/f8a697bc3ca8nn(when you enconter issues like this, it’s always nice to report them to the bug tracker: http://bugs.python.org )n

blog comments powered by Disqus