Fix the Memory Fragmentation of Embedded Systems with TCMalloc

While developing a video processing device, we met an eccentric phenomenon- the amount of memory used by our program kept on growing. After several weeks of running, the program triggered the OOM(Out of Memory) condition, and was killed by the Linux Operating System. At first I doubted that there might be a memory leak. So I checked every calls about malloc() and free(), but found no problem. I then used detect tools such as valgrind, but got the same result. So I decided to dig into the GNU C library’s (glibc’s) malloc library, to find out any improperly use of the it.

In the source code I read, “Also, in practice, programs tend to have runs of either small or large requests, but less often mixtures, so consolidation is not invoked all that often in most programs. And the programs that it is called frequently in otherwise tend to fragment. ” It also mentioned that for long-lived programs, special attention should be paid to the trim threshold and the mmap control parameters.

The memory using style of our program didn’t conform to those rules. It’s mixed with small and large requests, and created many objects reserved for a long time. After some searching, I found that many people have met the similar problem, and they used other libraries to deal with it.

There are some third party memory manage libraries, such as: tcmalloc, jemalloc, hoard, lockless, etc. Two of them are very famous because Google used tcmalloc and Facebook used jemalloc. I chose tcmalloc for our project, because they said it’s “The fastest malloc we’ve seen; works particularly well with threads and STL”. However, we still met the memory growing problem. After some deep study, I found we can use “MallocExtension::instance()->ReleaseFreeMemory()” frequently to keep the memory stable. Till this time of the day, our devices have been running continuously for over two years.

Third parties’ implementation emphasize fragmentation avoidance, and also have higher performance. The following program gives a good example. It will take 1177 seconds and 308 seconds by using glibc 2.15 and tcmalloc 1.8.2 separately.

#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <string.h>
#include <time.h>

#define MAX_OBJECT_NUMBER       (1024)
#define MAX_MEMORY_SIZE         (1024*100)

struct BufferUnit{
   int   size;
   char* data;

struct BufferUnit   buffer_units[MAX_OBJECT_NUMBER];

void MallocBuffer(int buffer_size) {

for(int i=0; i<MAX_OBJECT_NUMBER; ++i)  {
    if (NULL != buffer_units[i].data)   continue;

    buffer_units[i].data = (char*)malloc(buffer_size);
    if (NULL == buffer_units[i].data)  continue;

    memset(buffer_units[i].data, 0x01, buffer_size);
    buffer_units[i].size = buffer_size;

void FreeHalfBuffer(bool left_half_flag) {
    int half_index = MAX_OBJECT_NUMBER / 2;
    int min_index = 0;
    int max_index = MAX_OBJECT_NUMBER-1;
    if  (left_half_flag)
        max_index =  half_index;
        min_index = half_index;

    for(int i=min_index; i<=max_index; ++i) {
        if (NULL == buffer_units[i].data) continue;

        buffer_units[i].data =  NULL;
        buffer_units[i].size = 0;

int main() {
    memset(&buffer_units, 0x00, sizeof(buffer_units));
    int decrease_buffer_size = MAX_MEMORY_SIZE;
    bool left_half_flag   =   false;
    time_t  start_time = time(0);
    while(1)  {
        left_half_flag = !left_half_flag;
        if (0 == decrease_buffer_size) break;
    time_t end_time = time(0);
    long elapsed_time = difftime(end_time, start_time);

    printf("Used %ld seconds. \n", elapsed_time);
    return 1;


Fragmentation of memory:

How Firefox fix memory fragmentation problem:
Memory fragmentation

Malloc Internals:


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s