Posts Tagged ‘FFTW’

How big is too big?

January 31, 2008

Recently, I have been trying to see how large a memory chunk I can allocate using malloc. Rather, I was trying to allocate more than twenty large chunks of the order 0.1 GB or so, and my desktop  (Mac 4 x 2.5 GHz PowerPC G5, with 4 GB DDR2 SDRAM) was giving up with error messages of the  following type:

muse_evolver_fftw.out(2491) malloc: *** vm_allocate(size=268435456) failed (error code=3)
muse_evolver_fftw.out(2491) malloc: *** error: can’t allocate region
muse_evolver_fftw.out(2491) malloc: *** set a breakpoint in szone_error to debug
Bus error

So, I wanted to figure out the limit (as well as a solution — if there is a way to overcome the limitations, if any).

After a bit of Googling, I found that Chris Adams, in his page, gives a C code which you can compile and run on your machine to figure this limit; I compiled and ran the code on my desktop and got the following output:

Determining maximum single allocation: block size = 65536, memset=0
maxmalloc(4123) malloc: *** vm_allocate(size=2363490304) failed (error code=3)
maxmalloc(4123) malloc: *** error: can’t allocate region
maxmalloc(4123) malloc: *** set a breakpoint in szone_error to debug
Maximum single allocation: 2.20Gb (36064 65536 blocks)
Determining maximum allocation size: small block size = 1024, memset=0
maxmalloc(4123) malloc: *** vm_allocate(size=8421376) failed (error code=3)
maxmalloc(4123) malloc: *** error: can’t allocate region
maxmalloc(4123) malloc: *** set a breakpoint in szone_error to debug
Maximum total allocation: 1.00Gb (1777660 1024 blocks)

Thus, though the physical memory available to me is large, malloc gives up within 25 to 50% of that memory usage.

The fact that malloc has an upper limit in the memory it can allocate has a say in how large a simulation I can run, since, I run 3D simulations using FFTW. One of the easy ways in which you can make FFTW run faster is by using threads — it just requires few extra lines in your code and can speed up the calculations quite a bit; however, such shared memory parallelizations seem to have a problem when one has to run simulations on large system sizes. In such cases, it might be essential that I distribute both memory and processes — in other words, use MPI and such parallelizations. For now, I have overcome the problem partly, by malloc-ing and free-ing the memory so that at no time I have such a large number of arrays in memory. This seems to mitigate the problem a bit, albeit, at the cost of such frequent allocations and freeing.

Disclaimer:  I do not know much about malloc, its limits, shared memory and distributed memory processes. All of the above, I gathered via Googling. So, if any of you find any mistakes in what I have written above, and/or have solutions for my problem, please feel free to put a note below in the comments. I will hoist them to the post.

Fourier transforms

October 19, 2007

The Princeton Companion to Mathematics article on Fourier transforms, written by Terence Tao, is available at Tao’s site; his way of defining the transforms is the same as that used in FFTW, that great discrete Fourier transform code (except for a sign, I think). Tao also gives a link to the same article after it had gone through a bit of editing, which is another great reason to go through the piece — by a careful perusal and comparison of the drafts, one can learn a bit about mathematical writing and editing too!