Need to sort millions of positive integers

Daniel Goldman dgoldman at ehdp.com
Sun Mar 1 20:43:40 CET 2009


I'm using GMP to help generate millions of positive integers for a
program I'm developing. The max integer is currently 10^25 (11 bytes).
There are currently about a few million integers, in tests I'm doing.
Later, there could be many millions of integers.

At one step, I have to sort the numbers, to allow subsequent binary
searches. I currently do it with linux sort -n command, using as input
a text file with the large integers in base 10.

The sorting can take a while. I don't think the sort program ever craps
out. But as more numbers have to be sorted, it will eventually take
too long to be practical. Once the numbers are sorted, the following
steps work quickly and fine. The sorting is the bottleneck.

Using larger sizes for sort --buffer-size option helped some. About 20%.
But I need more improvement. I'd like it to sort in an hour to a day.

Any GMP function or other program that might sort the integers faster?
Any particular hardware suggestions related to speeding the sorting?
Any other suggestions related to sorting the integers faster?

Daniel Goldman


More information about the gmp-discuss mailing list