thesleepylizard
Programmer
Part of my program needs to do something relatively simple:
remember a list of numbers. So I have an class called
CNumberTracker.
It has one member called CheckAndAdd(int). This checks to see
if the number has already been added - if so it returns true.
If not, it adds the number, and returns false.
So anyway this in a server environment, and I need this to be
blindingly fast. Usually, a CNumberTracker will be created,
and anywhere from 1 to 100,000 numbers will be added. The
numbers will generally be anything from 1 to 10,000,000,
really (they are IDs in a pretty large database).
I thought about a few indexing methods. One of them was based
around a huge bitfield, and was extremely fast, but rather
clumsy to implement and could potentially be very wasteful on
memory.
So anyway I wrote a binary search routine. Now I've never
done this before, so I'm interested to know if you have any
suggestions for it - how to make it faster and faster and
faster. Every millisecond counts.
(Incidentally, anyone know a round() function in C++? I
looked everywhere but couldn't find one, so I had to write my
own.)
So here's the basic code. "List" is a
"int*" containing all the numbers, ordered.
"ListCount" says how many "ints" there
are in "List". "MinNumber",
"MaxNumber" and "LastNumber" are some
nice optimizations which in my scenario will make a big
difference.
////////////
Any comments would be appreciated. I've stress tested the
above code with random numbers and it seems to be reliable
and pretty fast, I just want to know if there are any other
optimizations I could make. (Or if this is just totally
wrong)
Cheers,
-blowfly
remember a list of numbers. So I have an class called
CNumberTracker.
It has one member called CheckAndAdd(int). This checks to see
if the number has already been added - if so it returns true.
If not, it adds the number, and returns false.
So anyway this in a server environment, and I need this to be
blindingly fast. Usually, a CNumberTracker will be created,
and anywhere from 1 to 100,000 numbers will be added. The
numbers will generally be anything from 1 to 10,000,000,
really (they are IDs in a pretty large database).
I thought about a few indexing methods. One of them was based
around a huge bitfield, and was extremely fast, but rather
clumsy to implement and could potentially be very wasteful on
memory.
So anyway I wrote a binary search routine. Now I've never
done this before, so I'm interested to know if you have any
suggestions for it - how to make it faster and faster and
faster. Every millisecond counts.
(Incidentally, anyone know a round() function in C++? I
looked everywhere but couldn't find one, so I had to write my
own.)
So here's the basic code. "List" is a
"int*" containing all the numbers, ordered.
"ListCount" says how many "ints" there
are in "List". "MinNumber",
"MaxNumber" and "LastNumber" are some
nice optimizations which in my scenario will make a big
difference.
Code:
int round (double d) {
int i = (int)d;
if (d - i >= .5)
return i + 1;
else
return i;
}
CNumberTracker::CNumberTracker () {
List = NULL;
ListCount = 0;
}
CNumberTracker::~CNumberTracker () {
delete List;
}
bool CNumberTracker::CheckAndAdd (int Number) {
int Index;
//If there are any numbers already added
if (ListCount) {
if (Number > MaxNumber) {
//It's not in the list - let's just slot it at the end
ListCount++;
List = (int*)realloc(List, ListCount * sizeof(int));
List[ListCount-1] = MaxNumber = LastNumber = Number;
return false;
} else if (Number < MinNumber) {
//It's not in the list - let's just slot it at the start.
ListCount++;
List = (int*)realloc(List, ListCount * sizeof(int));
//Move all the numbers up. (how can I avoid this??)
for (int i = ListCount - 1; i > 0; i--) List[i] =
List[i-1];
List[0] = MinNumber = LastNumber = Number;
return false;
} else if (Number == LastNumber) {
//This is the last number they added. Nice optimization.
return true;
}
//Perform binary search.
Index = ListCount / 2;
int MaxDivisor = ListCount * 2; //When divisor is higher
than this, we stop the binary search.
for (int TestCount = 2; ; TestCount++) {
if (List[Index] > Number) {
int Divisor = pow (2, TestCount);
if (Divisor > MaxDivisor) break;
Index -= round((double)ListCount / Divisor);
} else if (List[Index] < Number) {
int Divisor = pow (2, TestCount);
if (Divisor > MaxDivisor) {
Index++; //Important.
break;
}
Index += round((double)ListCount / Divisor);
} else {
return true;
}
}
} else {
//First time we've added a number. Let's set these.
MinNumber = MaxNumber = Number;
Index = 0;
}
//Add the lil' sucker in. Index marks the spot we should
slot it in to.
ListCount++;
List = (int*)realloc(List, ListCount * sizeof(int));
for (int i = ListCount - 1; i > Index; i--)
List[i] = List[i-1];
List[Index] = LastNumber = Number;
return false;
}
////////////
Any comments would be appreciated. I've stress tested the
above code with random numbers and it seems to be reliable
and pretty fast, I just want to know if there are any other
optimizations I could make. (Or if this is just totally
wrong)
Cheers,
-blowfly