Click to See Complete Forum and Search --> : Faster than quicksort (?)


thegreatorangepeel
06-11-2005, 03:04 PM
I rather anticipate that I'm going to hear that this has been done 1000 times before, but here it goes...

In my spare time I brainstorm new sorting algorithms in an attempt to beat out quicksort. I've actually got somthing that shows promise and I'm rather hoping that someone might put the code (http://euclid.nmu.edu/~adrong/tricycle_sort/) through its paces better than I can as my background is with coding, not testing.

Before I get hit with any criticism, I should say that I've been basing my ideas on 2 priciples, the first of wich I just flat out know many will not agree with me on.

1. RAM is cheap. Make use of it.
2. Swapping is expensive. Avoid it.

That said, all I've done is made a binary tree (original concept was trinary but I realized this was wasteful) and dumped the values back into the array. It seems so increadibly simple I wonder what's wrong with it.

I failed to mention that I compiled with "-O3 " optimization.

bwkaz
06-11-2005, 05:40 PM
Originally posted by thegreatorangepeel
In my spare time I brainstorm new sorting algorithms in an attempt to beat out quicksort. You might want to take a college computer-algorithms course then.

You'll find out a couple of things. One is that there are very few tasks which we can put an absolute lower bound on the complexity of (in big-O terms), but sorting is one of them. The other is that the absolute lower bound on the complexity of sorting is O(n*log(n)). You cannot sort any faster than that.

Quicksort is O(n*log(n)).

So, uh, good luck. ;)

all I've done is made a binary tree (original concept was trinary but I realized this was wasteful) and dumped the values back into the array. It seems so increadibly simple I wonder what's wrong with it. Well, the insertion into the tree will take time O(log(n)) for each item, for a total of O(n*log(n)) time. That's already the speed of Quicksort. The inorder traversal of the tree (pulling items back out, then deallocating the tree nodes) takes O(n) time.

So ignoring constant multipliers, you're no faster or slower than Quicksort.

Your constant multiplier may in fact be lower (but this doesn't give you an enormous speed advantage: it's comparable to just getting a processor with double the clock speed and the same internal architecture), but you have an extra C*n term in there too (pulling everything out of the tree). So it offsets the cost a bit.

Also, with a small enough array (fewer than something like 10000 elements), bubblesort is actually faster to run than quicksort. It's an n^2 algorithm, yes, but the constant is quite a bit lower, so for small values of n, it ends up winning anyway. And it takes less programmer time to write and verify. The point is, even quicksort isn't always The Algorithm To Beat -- ESPECIALLY if your array is only 100 elements!. ;)

Try your algorithm against quicksort on an array of 100000 elements instead of 100.

thegreatorangepeel
06-11-2005, 05:47 PM
Try your algorithm against quicksort on an array of 100000 elements instead of 100.
Actually, I ran it at 10,000,000 elements before I posted and is faster, by my own tests, when values are 0 to 12998.

Edit:
Ran it at 100000 as you suggeted. At that size, the difference was negligable until I expanded my dataset to 0-3196. (Although I noticed somthing I found to be bizzare. Quicksort did measurably better against itself at the range 0-3196 than it did from 0-3195)

In short, it's worthwhile for small datasets in very large arrays. Almost never in otherwords.

bwkaz
06-11-2005, 09:43 PM
Umm... huh?

The range of values inside the array normally should have no bearing on the time it'll take to sort that array... Unless the range is significantly smaller than the number of elements. Then you'll have quite a lot of duplicates, which will never need to be swapped. So you'll save a ton of swaps with quicksort (and your algorithm will still take O(n*log(n)) or more to insert all the nodes -- the fact that it keeps a counter once it finds an equal node doesn't affect the time taken to find that node); that could be why quicksort is so much faster.

You should start by populating the array with random numbers all across the range of integers (0 to 4 billion), to try to get as few duplicates as possible. Try both algorithms in that situation.

You also aren't ever rebalancing your tree during the additions. How's your performance look when the array starts out sorted or almost sorted? <evil grin>

For that matter, how are you doing your timing? I can't see any timing code anywhere in main.cc...

thegreatorangepeel
06-11-2005, 10:37 PM
I, quite intentionally, removed the timing.

If you look in main.cc you'll see a comment that says "note that the tricycle sort should favor high repitition." and if you look in tn.h you'll note that the variable 'eq' isn't a pointer to another node at all, but just an integer and then take a quick look at the tn::add(int num). In cases where I have the same value twice I mearly increment 'eq' and put that value into the array 'eq' number of times. This saves some time as I'm not going through the process of instantiating.

Another way to put it might be that each instance of tn (tree node) really has two values. The value to put into the array, the the occurences of that value (e.g. eq is at least 1).

You should start by populating the array with random numbers all across the range of integers (0 to 4 billion)
yea. Did that. Calling the results unfavorable is rather generous.

You also aren't ever rebalancing your tree during the additions. How's your performance look when the array starts out sorted or almost sorted? <evil grin>
lol. Haven't been brave enough to try that one yet. Thought about it though.

Fryguy8
06-12-2005, 12:55 PM
Two things I would like to point out:

1. We are talking about comparison based sorting. A lot of this goes out the window when we sort via other means. For example, radix sort beats quicksort by a lot, if it's usable.

2. As has been mentioned before, the absolute fastest comparison-based sort there is is nlogn. You absolutely cannot be faster than this. Whether you implement it via quicksort, mergesort, or heap sort, short of small constants that aren't going to matter, you can not beat this time.

bwkaz
06-12-2005, 01:25 PM
Originally posted by thegreatorangepeel
"note that the tricycle sort should favor high repitition." and if you look in tn.h you'll note that the variable 'eq' isn't a pointer to another node at all, Yes, I saw that, but it doesn't affect the complexity of your algorithm.

It STILL takes O(log(n)) time (in the best case -- try it when the array is sorted and it'll take O(n) time) to get to that node. Chopping off some very minor constant allocation work doesn't help you.

For instance, suppose you have 10 values per tree node (alternately: you have 10 times the number of array elements as you have values). Your tree will be one-tenth the size that it would otherwise be, but that's insignificant. It'll take n*log(n/10) time to insert all the nodes, but this is just n*(log(n)-1), or is n*log(n)-n, which is still n*log(n) asymptotically. (Plus, you add that n back in when you pull the elements back out of the tree.)

Although actually, quicksort probably won't be any faster either. You'll have fewer than n swaps (swaps are generally the unit of work in in-place sorting algorithms), but only by a constant multiplier, and it'll still go back to n*log(n) asymptotically. (Actually, it may be that the "real" time is n*log(n) - 10n, but I'm not positive on that.)

yea. Did that. Calling the results unfavorable is rather generous. Unfavorable for which sort? Both?

Haven't been brave enough to try that one yet. Thought about it though. Well, your algorithm will turn into O(n^2), not O(n*log(n)). You'll have to search through every node in the tree (until you get to the end) every time you insert another node. Basically, it'll be a linked list rather than a tree.

(This is why red-black trees and AVL trees exist, BTW.)

Originally posted by Fryguy8
1. We are talking about comparison based sorting. A lot of this goes out the window when we sort via other means. For example, radix sort beats quicksort by a lot, if it's usable. Very true. But the catch is "if it's usable" -- comparison based sorting is always usable (as long as there's a total order on your elements, anyway).

gamblor01
06-15-2005, 05:55 PM
Quicksort is O(n*log(n)).

Come on bwkaz...you're not telling the WHOLE truth to that story! In the worst case scenario, Quicksort actually degrades to an O(n^2) algorithm. Under most cases though, it is n*log(n). Not to mention it's not a stable sort. Merge sort on the other hand happens to be stable and is ALWAYS n*log(n). Oddly enough though...it usually seems to be slower than Quicksort (I think it's because the inner loop that's running on them is MUCH smaller in the case of Quicksort).

In any case...I'm going to stop mouthing off to bwkaz since he has solved several of my problems and I'm positive has a much more extensive knowledge in this area, and is already aware of all of the above statements I made. :p

Orange...if you really want to get somewhere, try impelementing sort on a quantum computer. Wasn't it Shor that discovered a MUCH faster factoring algorithm on quantum machines? If you could do sorting in linear time or something like that on a quantum machine...you could probably walk into a CS department and walk out with a PhD!

I'll scope out your code later, but I'm interested in checking it out.

bwkaz
06-15-2005, 07:46 PM
In the worst case scenario, Quicksort actually degrades to an O(n^2) algorithm. Do you happen to remember what that case was? I'm thinking it's when the "pivot" always ends up at the top or bottom end of the range (so a pathological case, but I'm sure it happens sometimes), right?

Not to mention it's not a stable sort. True. So if a stable sort matters, feel free to skip quicksort. ;)

Merge sort on the other hand happens to be stable and is ALWAYS n*log(n). Oddly enough though...it usually seems to be slower than Quicksort (I think it's because the inner loop that's running on them is MUCH smaller in the case of Quicksort). I think it's because Quicksort's constant is smaller, but I'm not sure on that. I don't know what the actual constants are.

In any case...I'm going to stop mouthing off to bwkaz since he has solved several of my problems and I'm positive has a much more extensive knowledge in this area, and is already aware of all of the above statements I made. :p :p

Seriously, though, you do raise some good issues with it. Nobody's going to do comparison sorting faster than O(n*log(n)), though, so it's kinda moot. Especially when C has a qsort function all ready-made for you, and most "scripting" languages have something similar. ;)

Pafnoutios
06-15-2005, 10:38 PM
It seems to me that what he's doing is the same general algorithm as quicksort, just with an extra binary tree in memory to hold the result being more temporally efficient, even if much less memory efficient. This implementation, and the swapping single array implementation seem to be the same base algorithm with different emphases on efficiency: quicksort on memory and this one on time (that is, if he's saving any over swapping with the extra time required to create all the nodes of the binary tree and then de-allocate them).

ooagentbender
06-17-2005, 05:55 AM
heapsort and mergesort are the only algorithms that have a big O that is comparable to quicksort (in fact its the same O(n*lgn) except for the odd case of quicksort). The bottome line is that there is no such thing as sorting faster than n lgn. best case and worst case for heapsort is O(n * lg n). As was stated above best case for many algorithms on small sets will be better than that but for sufficiently large n nothing beats quicksort or heapsort.

PS his sort is basically a heapsort, which is a binary tree, complete or otherwise with all parents being larger in value than their children. If you want the code for a heapsort as I havn't looked at his, I can post it here.

scifire
06-17-2005, 02:36 PM
bzkaz
O(n*log(n)) - you cannot sort any faster than that.

The FlashSort Algorithm
FlashSort sorts n elements in O(n) time. Flash-Sort uses a vector L(k) of length m in a first step for the classification of the elements of array A. Then, in a second step, the resulting counts are accumulated and the L(k) point to the class boundaries. Then the elements are sorted by in situ permutation. During the permutation, the L(k) are decremented by a unit step at each new placement of an element of class k in the array A. A crucial aspect of FlashSort is that for identifying new cycle leaders. A cycle ends, if the the vector L(k) points to the position of an element below the classboundary of class k. The new cycle leader is the element situated in the lowest position complying to the complimentary condition, i.e. for which L(k) points to a position with L(k(A(i))) >= i. Evidently, in addition to the array A of length n which holds the n elements to be sorted, the only auxiliary vector is the L(k)-vector. The size of this vector is equal to the number m of classes which is small compared to n, e.g. m typically may be set to m=0.1 n in case of floating point numbers.
Finally,a small number of partially distinguishable elements are sorted locally within their classes either by recursion or by a simple conventional sort algorithm.


http://www.neubert.net/Flapaper/9802n.htm
;)

bwkaz
06-17-2005, 06:36 PM
That's either not a comparison based sort, or it's not really O(n). (I'll try to understand it first, and then decide which of these it is.)

There is no other alternative. It is utterly IMPOSSIBLE to sort (using a comparison-based algorithm) in less than O(n*log(n))!

Edit: It seems that this is approximately similar to radix-sort (at least according to what I remember about radix-sort; I may be remembering incorrectly though). It is definitely not comparison based.

Fryguy8
06-17-2005, 09:17 PM
Also, it is possible to add code to quicksort so that there is no worst case, and it runs nlgn no matter what, at the cost of a small constant increase.

scifire
06-18-2005, 07:01 AM
bwkaz
I agree with you that there is no comparison based algorithm that is under the limit O(n*log(n))