Need to sort millions of positive integers
Paul Zimmermann
Paul.Zimmermann at loria.fr
Sun Mar 1 21:07:42 CET 2009
> Date: Sun, 01 Mar 2009 11:43:40 -0800
> From: Daniel Goldman <dgoldman at ehdp.com>
>
> I'm using GMP to help generate millions of positive integers for a
> program I'm developing. The max integer is currently 10^25 (11 bytes).
> There are currently about a few million integers, in tests I'm doing.
> Later, there could be many millions of integers.
>
> At one step, I have to sort the numbers, to allow subsequent binary
> searches. I currently do it with linux sort -n command, using as input
> a text file with the large integers in base 10.
>
> The sorting can take a while. I don't think the sort program ever craps
> out. But as more numbers have to be sorted, it will eventually take
> too long to be practical. Once the numbers are sorted, the following
> steps work quickly and fine. The sorting is the bottleneck.
>
> Using larger sizes for sort --buffer-size option helped some. About 20%.
> But I need more improvement. I'd like it to sort in an hour to a day.
>
> Any GMP function or other program that might sort the integers faster?
> Any particular hardware suggestions related to speeding the sorting?
> Any other suggestions related to sorting the integers faster?
>
> Daniel Goldman
I believe this is out-of-scope for gmp-discuss. My advice would be to look
at http://en.wikipedia.org/wiki/Sorting_algorithm. Maybe radix sort is
what you want (http://en.wikipedia.org/wiki/Radix_sort). Notice also that
you don't need to sort the numbers for a binary search: look at
http://en.wikipedia.org/wiki/Trie.
Paul Zimmermann
More information about the gmp-discuss
mailing list