How big is too big?

Recently, I have been trying to see how large a memory chunk I can allocate using malloc. Rather, I was trying to allocate more than twenty large chunks of the order 0.1 GB or so, and my desktop  (Mac 4 x 2.5 GHz PowerPC G5, with 4 GB DDR2 SDRAM) was giving up with error messages of the  following type:

muse_evolver_fftw.out(2491) malloc: *** vm_allocate(size=268435456) failed (error code=3)
muse_evolver_fftw.out(2491) malloc: *** error: can’t allocate region
muse_evolver_fftw.out(2491) malloc: *** set a breakpoint in szone_error to debug
0
Bus error

So, I wanted to figure out the limit (as well as a solution — if there is a way to overcome the limitations, if any).

After a bit of Googling, I found that Chris Adams, in his page, gives a C code which you can compile and run on your machine to figure this limit; I compiled and ran the code on my desktop and got the following output:

Determining maximum single allocation: block size = 65536, memset=0
maxmalloc(4123) malloc: *** vm_allocate(size=2363490304) failed (error code=3)
maxmalloc(4123) malloc: *** error: can’t allocate region
maxmalloc(4123) malloc: *** set a breakpoint in szone_error to debug
Maximum single allocation: 2.20Gb (36064 65536 blocks)
Determining maximum allocation size: small block size = 1024, memset=0
maxmalloc(4123) malloc: *** vm_allocate(size=8421376) failed (error code=3)
maxmalloc(4123) malloc: *** error: can’t allocate region
maxmalloc(4123) malloc: *** set a breakpoint in szone_error to debug
Maximum total allocation: 1.00Gb (1777660 1024 blocks)

Thus, though the physical memory available to me is large, malloc gives up within 25 to 50% of that memory usage.

The fact that malloc has an upper limit in the memory it can allocate has a say in how large a simulation I can run, since, I run 3D simulations using FFTW. One of the easy ways in which you can make FFTW run faster is by using threads — it just requires few extra lines in your code and can speed up the calculations quite a bit; however, such shared memory parallelizations seem to have a problem when one has to run simulations on large system sizes. In such cases, it might be essential that I distribute both memory and processes — in other words, use MPI and such parallelizations. For now, I have overcome the problem partly, by malloc-ing and free-ing the memory so that at no time I have such a large number of arrays in memory. This seems to mitigate the problem a bit, albeit, at the cost of such frequent allocations and freeing.

Disclaimer:  I do not know much about malloc, its limits, shared memory and distributed memory processes. All of the above, I gathered via Googling. So, if any of you find any mistakes in what I have written above, and/or have solutions for my problem, please feel free to put a note below in the comments. I will hoist them to the post.

Advertisements

Tags: , , , , ,

6 Responses to “How big is too big?”

  1. Siddhartha V. Tambat Says:

    Hi Guru,

    Do you run Linux on your desktop? If so, the virtual memory layout for a Linux process is such that the maximum heap space is 1 GB (out of the 3 GB of virtual memory available per process). In practice, this is reduced to about 0.8 GB if you use dynamically linked binaries because shared libraries reside at the higher end of the heap area. So, malloc, which uses sbrk to allocate memory in the heap area, fails if heap usage exceeds about 0.8 GB. However, one can use mmap to allocate memory if space is still available on the system. This version of malloc uses mmap to allocate space if sbrk fails (ftp://gee.cs.oswego.edu/pub/misc/malloc.c). Try using it with your code and let me know if it works.

    I had faced a similar problem in running my simulations many years back and the above version of malloc worked for me. In fact, I found the above info in my research log of Apr’01. Luckily for you, the link still works!

    Regards,
    -Siddhartha.

  2. Guru Says:

    Dear Tambat,

    Thanks for stopping by and the pointer; my desktop is a Mac, and though, in principle it is like Linux, its behaviour most of the times, isn’t. This is yet another instance. Finally, I did do what you suggest–I took my code to a Linux cluster to run. However, I did not use mmap on the linux machine. I will try that and let you know how it goes.

    Guru

  3. Joshua Says:

    I’ve been testing out some of the limits of FFTW in C lately, and quickly ran into the same problems. When mallocing and freeing multiple arrays for the FFT, did you use the additive properties of producing an FFT? It’s where a time domain array is broken up into N equal parts (alternating, where two arrays produces one with data of array elements n=0,2,4.. and another is n=1,3,5… ), the FFT of each part is place in an array and repeated N times. Then N-1 arrays of the same length are computed from e^((2N-1)*pi*n_element/array_length). Each of these 2N arrays are added element wise for the final FFT.

    I found this calculation here. I know my explanation is a bit of a mess, and I don’t know the technique’s name either.
    http://www.dsprelated.com/showarticle/63.php

  4. Guru Says:

    Dear Joshua,

    Thanks a lot for stopping by and the very neat trick. I was not aware of this additive trick.

    I will try your suggestion and write about it here, soon.

    Guru

  5. Biswajit Says:

    You might be able to get a larger amount allocated if you change the stack size. In many versions of Linux the stack size is smaller than allowable by the hardware.

    Try ulimit -a and then change the stack size by ulimit -s unlimited. You can also set other parameters in a similar way if they are artificially limited.

    Caveat: This is just a guess. I don’t have a clear idea how memory is allocated in today’s huge memory machines.

  6. Guru Says:

    Dear Biswajit,

    You are right; on most of the machines, ulimit command helps. However, on Mac machines I found that there is a hard limit (most probably imposed by the kernel), which, I am not able to override. So, I could get my code working on the cluster which is based on Linux, though not on my Mac desktop.

    Guru

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s


%d bloggers like this: