Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Calculating n, if the nth bit in a mask is set. 4

Status
Not open for further replies.

ChrisDanson

Programmer
Sep 3, 2006
8
GB
Hi,

I'm after a really efficient way of calculating n, given that the nth bit only is set in a bitmask. I have a 24bit mask with 1 bit set in it (it could be any bit). I can think of long winded ways to do it like bitshifting 1 to the left until a bitwise And returns true, but that is not efficient enough for my purposes. I'm going to go and read about the log function as I'm wondering if that will help. I guess a look up table would be ok for small masks but not for ones this big.

Can anyone shed light?

Thanks in advance.

Chris
 
> I'm going to go and read about the log function as I'm wondering if that will help.
That will be truly horrid if you're after performance.

> I guess a look up table would be ok for small masks
If 128 entries is enough, how about
Code:
if ( code >= 0x10000 ) return table[code>>16];
if ( code >= 0x100   ) return table[code>> 8];
return table[code];

Which processor are you running this on?
Would you object to inline assembler?
Some processors have single-instruction find-first-bit-set instructions.

--
 
Hi Salem,

Thanks for the reply. OK so logs are not going to give good performance and unfortunately the lookup table would need to be so large that it would not be practical. I like the idea of the inline assembler but this code has to be platform and processor independant. I will think on. Thanks for your time and suggestions.

Rgds,

Chris
 
What about breaking it into halfs? I believe it should take no more than 5 or 6 comparisons to find the correct bit. Something like this:
Code:
int bitmask = 0x100000;  /* 24-bit mask. */

if ( bitmask & 0xFFF000 )  /* Test high 12 bits. */
{
   if ( bitmask & 0xFC0000 )  /* Test high 6 bits. */
   {
      if ( bitmask & 0xE00000 )  /* Test high 3 bits. */
      {
         if ( bitmask >= 0xC00000 )  /* Test high 2 bits. */
         {
            if ( bitmask == 0x800000 )  /* Test highest bit. */
            {
               return 24;
            }
            else
            {
               return 23
            }
         }
         else
         {
            return 22;
         }
      }
      else  /* It's in the low 3 bits. */
      {
         if ( bitmask & 0x180000 )  /* Test high 2 bits. */
         {
            if ( bitmask == 0x100000 )  /* Test highest bit. */
            {
               return 21;
            }
            else
            {
               return 20;
            }
         }
         else
         {
            return 19;
         }
      }
   }
   else  /* It's in the low 6 bits. */
   {
      if ( bitmask & 0x038000 )  /* Test high 3 bits. */
      {
         ...
      }
      else
      {
         ...
      }
   }
}
else  /* It's in the low 12 bits. */
{
   ...
}
 
> unfortunately the lookup table would need to be so large that it would not be practical
How is 128 bytes a large lookup table?

How many extra instructions do you think more verbose code like cpjust's unrolled code would take, and how many bytes will those instructions take?

How about a 32-byte lookup table which determines the most significant bit in 6 bits, then my code only needs one extra statement.
Code:
if ( code >= 0x40000 ) return table[code>>18];
if ( code >= 0x1000  ) return table[code>>12];
if ( code >= 0x40    ) return table[code>> 6];
return table[code];

--
 
Thanks Salem. I should have spent more time understanding your first response. I've tried it out and my code runs quicker than before. Your help is greatly appreciated.
 
Cheers CP. Nice solution. I hadn't thought of doing it like that. Thanks for your time and help.
 
(1) ChrisDanson, in your original posting, you state that only 1 bit of the 24bit mask is set. If that is the case, have you considered whether the data should really be in the form of a bit-mask, or whether it would be more efficient to store the data as a simple byte, which not only shortens the data, but also avoids any need to find the position of the set bit?

(2) Just for a laugh, here is another suggestion for a possibly-worth-trying-reasonable-performance version (haven't tried it so it might be dreadful) with no conditional jumps whatsoever, and no large lookup tables. It relies utterly on there being only 1 bit in the mask that is set. I note that 24 bits is 3 bytes, and therefore treat it this way: In psedocode:

Two lookup tables of 256 bytes.
LookupOne contains the positions of bits in one byte (so 1 returns 1, 2 returns 2, 4 returns 3, 8 returns 4 etc. etc).
LookupTwo contains 0 in byte 0; 8 in all numbers where only 1 bit is set, and 16 in all numbers where more than 1 bit is set.

PartOne = byteA or byteB or byteC
PartTwo = byteB - byteC
Answer = LookupOne[PartOne] + LookupTwo[PartTwo]

The idea is that PartOne gives you the position within a single byte.

byteB - byteC will be zero if the bit was set in byteA. It will be a simple single-bit number if byteC is zero but the bit was set in byteB. If byteC contains the set bit but the other two bytes are zero, the answer will contain a lot of set bits.
 
Hi Lionel,

Thanks for your interesting post. The bitmask I'm using is actually a 32 bit mask for a debug manager. The top 8 bits identify a debug group (1 of a possible 8), while the remaining 24 bits are used to identify debug channels. That gives me 8 * 24 channels of debug which is about right for the size of the codebase I'm working on.

When a call is made to the debug manager, it's a quick process to see if (a) the group is switched on .. and (b) the channel is switched on (simply using bitwise Ands against a stored configuration of channel/group states). The time consuming part comes when, (if the group and channel are both switched on) I need to access the name of the debug channel from an array of 192 names. This is where the mapping of Nth bit to N comes in. (well N-1 actually, but that's not significant)

I like your thinking in terms of the subtraction and the way you logically broke the problem down. Referring to my C book it says that subtraction is the most efficient operation you can do with array access not far behind it.
 
Your question just brought back happy memories of high-performance code and Michael Abrash's articles and book.

I'd be deeply wary of guessing which approach is fastest without timing real code. I'd also expect that you'd need a really critical application before you'd have any problems with Salem's or cpjust's very logical, simple and appropriate approaches.

But so far as I know, your book is right. If your C-compiler is for standard PCs, then the quickest instructions are moves of data in and out of memory, simple arithmetic, and logic. Each is a single-cycle instruction, which the processor can handle in parallel with other instructions. Array searches are only slightly messier because the processor needs to load the address of the array (one simple instruction), add the necessary offset (one or two simple instructions) and collect the result (one simple instruction).

Incidentally, I think the 8086-style processors have a special instruction for finding the position of the first bit set in a register, but I can't remember, and it's probably one of the slower instructions.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top