Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Binary searching routine. Any thoughts on this code?

Status
Not open for further replies.

thesleepylizard

Programmer
Mar 18, 2002
41
AU
Part of my program needs to do something relatively simple:
remember a list of numbers. So I have an class called
CNumberTracker.

It has one member called CheckAndAdd(int). This checks to see
if the number has already been added - if so it returns true.
If not, it adds the number, and returns false.

So anyway this in a server environment, and I need this to be
blindingly fast. Usually, a CNumberTracker will be created,
and anywhere from 1 to 100,000 numbers will be added. The
numbers will generally be anything from 1 to 10,000,000,
really (they are IDs in a pretty large database).

I thought about a few indexing methods. One of them was based
around a huge bitfield, and was extremely fast, but rather
clumsy to implement and could potentially be very wasteful on
memory.

So anyway I wrote a binary search routine. Now I've never
done this before, so I'm interested to know if you have any
suggestions for it - how to make it faster and faster and
faster. Every millisecond counts.

(Incidentally, anyone know a round() function in C++? I
looked everywhere but couldn't find one, so I had to write my
own.)

So here's the basic code. "List" is a
"int*" containing all the numbers, ordered.
"ListCount" says how many "ints" there
are in "List". "MinNumber",
"MaxNumber" and "LastNumber" are some
nice optimizations which in my scenario will make a big
difference.

Code:
int round (double d) {
	int i = (int)d;
	if (d - i >= .5)
		return i + 1;
	else
		return i;
		
}

CNumberTracker::CNumberTracker () {
	List = NULL;
	ListCount = 0;
}
CNumberTracker::~CNumberTracker () {
	delete List;
}
bool CNumberTracker::CheckAndAdd (int Number) {
	
	int Index;

	//If there are any numbers already added
	if (ListCount) { 

		if (Number > MaxNumber) {

			//It's not in the list - let's just slot it at the end
			ListCount++;
			List = (int*)realloc(List, ListCount * sizeof(int));
			List[ListCount-1] = MaxNumber = LastNumber = Number;
			return false;

		} else if (Number < MinNumber) {

			//It's not in the list - let's just slot it at the start.
			ListCount++;
			List = (int*)realloc(List, ListCount * sizeof(int));

			//Move all the numbers up. (how can I avoid this??)
			for (int i = ListCount - 1; i > 0; i--) List[i] =
List[i-1];
			List[0] = MinNumber = LastNumber = Number;

			return false;

		} else if (Number == LastNumber) {

			//This is the last number they added. Nice optimization.
			return true;

		}

		//Perform binary search.
		Index = ListCount / 2;
		int MaxDivisor = ListCount * 2; //When divisor is higher
than this, we stop the binary search.
		for (int TestCount = 2; ; TestCount++) {

			if (List[Index] > Number) {

				int Divisor = pow (2, TestCount);
				if (Divisor > MaxDivisor) break;
				Index -= round((double)ListCount / Divisor);

			} else if (List[Index] < Number) {

				int Divisor = pow (2, TestCount);
				if (Divisor > MaxDivisor) {
					Index++; //Important.
					break;
				}
				Index += round((double)ListCount / Divisor);

			} else {
				return true;
			}
		}

	} else {
		
		//First time we've added a number. Let's set these.
		MinNumber = MaxNumber = Number;
		Index = 0;

	}

	//Add the lil' sucker in. Index marks the spot we should
slot it in to.
	ListCount++;
	List = (int*)realloc(List, ListCount * sizeof(int));
	for (int i = ListCount - 1; i > Index; i--) 
		List[i] = List[i-1];
	List[Index] = LastNumber = Number;
	
	return false;
}




////////////
Any comments would be appreciated. I've stress tested the
above code with random numbers and it seems to be reliable
and pretty fast, I just want to know if there are any other
optimizations I could make. (Or if this is just totally
wrong)

Cheers,
-blowfly
 
Without digging into your code but only focus on &quot;remember a list of numbers&quot; I'd recommend std::set.

typedef std::set<int> NumberTracker;

NumberTracer nt;

Set: nt.insert(45)
Query if its set: if (nt.find(24)!=nt.end())

1) find/insert are quite fast (O(log n)) as it is implemented as a red-black tree.
2) As it is a tree it is also &quot;automatically&quot; sorted, just ++ your way from nt.begin() to nt.end()


/Per

if (typos) cout << &quot;My fingers are faster than my brain. Sorry for the typos.&quot;;
 
Thanks Per,

Sorry - I haven't used Templates much so I don't know how to get this working.

I entered the following code to a MFC console program:

typedef std::set<int> CNumberTracker;

And got these errors:

: error C2039: 'set' : is not a member of 'std'
: error C2143: syntax error : missing ';' before '<'
: error C2143: syntax error : missing ';' before '<'



Sorry for the trouble, any idea what i'm doing wrong?
 
>Sorry - I haven't used Templates much so I don't know how to get this working

There's a first time for everyone. Hopefully you'll soon realize how cool templates are ....

You probably just forgot a
#include <set>




/Per

if (typos) cout << &quot;My fingers are faster than my brain. Sorry for the typos.&quot;;
 
Solution 2:
As you mentioned is it perhaps a little waste of space (actually a lot waste if we talk about millions of elements) and it would be more efficient to just store a bit thats stating is its set or not. Also setting and quering can be done in constant time as we use the index immedeiately (no searching).

stl has a template class for that, std::bitset.

I made a similar class (before I realized STL had it). Use at will. It performs no range checks, but add those if you need 'em:
Code:
// ------------------------------------------------------------
// class BoolArray. 
// An array of bools (surprised?). The bools are internally
// stored as the bits of an array of BOOL_HOLDER_ELEMENTs.
// ------------------------------------------------------------

// 64 bit storage works fine, but is slightly slower on a 32 bit system.
// Uncomment to activate 64 bools per element storage
//#define USE_64_BIT_STORAGE

#ifdef USE_64_BIT_STORAGE	
	typedef unsigned __int64 BOOL_HOLDER_ELEMENT;
	#define BOOLS_PER_ELEMENT 64
	#define DIV_BY_BPE(x) ((x)>>6) // x >> 6 == x / 64
#else
	typedef unsigned __int32 BOOL_HOLDER_ELEMENT;
	#define BOOLS_PER_ELEMENT 32
	#define DIV_BY_BPE(x) ((x)>>5) // x >> 5 == x / 32
#endif

class BoolArray 
{
public:
	//------------------------------------------------------------------
	// Public Construction
	//------------------------------------------------------------------
	// Constructor.
	// size - max number of bits array shall handle
	// Will internally be rounded up to nearest BOOLS_PER_ELEMENT-1 number.
	// Ie if (size==37) highest bit number is 63	BoolArray(int size):mSize(DIV_BY_BPE(size)+1),mArray(0)
	{
		mArray = new BOOL_HOLDER_ELEMENT[mSize];
	}

	// Destructor 
	~BoolArray() 
	{
		delete [] mArray;
		mArray = 0;
	}

	//------------------------------------------------------------------
	// Public Commands
	//------------------------------------------------------------------

	// Provide functionality to rapidly fill the array with some value
	// using a __int32 variable.
	void Initialize(__int32 initValue)
	{
		memset(mArray,initValue,mSize* sizeof(__int32)*sizeof(BOOL_HOLDER_ELEMENT)/sizeof(__int32));
	}

	// Set bit value
	void Set(unsigned int i)
	{
		unsigned int index;
		BOOL_HOLDER_ELEMENT bit;
		GetIndexAndBit(i,index,bit);

		mArray[index] |= bit; // Set : OR it in
	}

	// Clear bit value
	void Clear(unsigned int i)
	{
		unsigned int index;
		BOOL_HOLDER_ELEMENT bit;
		GetIndexAndBit(i,index,bit);

		mArray[index] &= ~bit;// Clear : AND-INVERSE it out
	}

	//------------------------------------------------------------------
	// Public Queries
	//------------------------------------------------------------------
	bool Get(unsigned int i) const
	{
		unsigned int index;
		BOOL_HOLDER_ELEMENT bit;
		GetIndexAndBit(i,index,bit);

		return (mArray[index] & bit)!=0;
	}

	// Get operator. Ie foo = boolArray[x]
	bool operator[](unsigned int i) const
	{
		return Get(i);
	}

private:
	//------------------------------------------------------------------
	// Disabled Methods
	//------------------------------------------------------------------
	// Default constructor
	BoolArray();
	// Copy constructor
	explicit BoolArray(const BoolArray&);
	// Assignment operator
	BoolArray& operator = (const BoolArray&);

	//------------------------------------------------------------------
	// Private Helper Methods
	//------------------------------------------------------------------
	// GetIndexAndBit
	// Given a bit index, i, it returns
	//   index : Index to the BOOL_HOLDER_ELEMENT array, and
	//   bit   : A BOOL_HOLDER_ELEMENT with the aproperiate bit set.
	static void GetIndexAndBit(unsigned int i, unsigned int& index, BOOL_HOLDER_ELEMENT& bit) 
	{
		index = DIV_BY_BPE(i); // index = i / BOOLS_PER_ELEMENT;
		bit = (BOOL_HOLDER_ELEMENT)1 << (i % BOOLS_PER_ELEMENT);
	}

	//------------------------------------------------------------------
	// Members
	//------------------------------------------------------------------
	int mSize;
	BOOL_HOLDER_ELEMENT* mArray;

}; // class BoolArray

/Per

if (typos) cout << &quot;My fingers are faster than my brain. Sorry for the typos.&quot;;
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top