How to search and extract byte patterns from byte array

muneebel · Feb 12, 2008

Hi All

I have an array of bytes containing byte values of

0 0 0 1 N N N 0 0 0 1. could anyone sugest the algorithm how to extract the N N N bytes from that byte array. Thanks

cpjust · Feb 12, 2008

When you say "extract" do you mean copy them to a new array, or also delete them from the original array?
Are the N bytes always in the same location? i.e. Is the pattern like this:

Code:

0 0 0 1 N N N 0 0 0 1 0 0 0 1 N N N 0 0 0 1 0 0 0 1 N N N 0 0 0 1 0 0 0 1 N N N 0 0 0 1...

If so, all you need is a loop like this:

Code:

size_t j;
size_t pos = 0;

for ( size_t i = 0; i < sizeof( array ); ++i )
{
   j = (i + 1) % 11;

   if ( (j > 4) && (j < 8) )
   {
      newArray[pos++] = array[i];
   }
}

muneebel · Feb 15, 2008

Hi

Thanks for the reply. Let me make it clear

I have byte sequence with byte pattern

0 0 0 1 N N N N 0 0 0 1 N N N N EOF

The N N N bytes can be any number. The 0 0 0 1 pattern is fixed. so basically I need to search for and extract these N N N bytes. By extractin I mean copying all the N N N bytes to another array. so I have an array with the sequence of N N N bytes. I know the number of these 0 0 0 1 pattern. so how would I copy all the N N N bytes from this array to another array. Any help will be higly appreciated. Thanks

cpjust · Feb 15, 2008

OK, so if I'm understanding correctly, for every 8 numbers, the last 4 are N's right? The pattern is basically:

Code:

(4 junk)(4 N's)(4 junk)(4 N's)...

Then it's simple:

Code:

size_t pos = 0;
for ( size_t i = 4; i < sizeof( array ); i += 4 )
{
    newArray[pos++] = array[i++];
    newArray[pos++] = array[i++];
    newArray[pos++] = array[i++];
    newArray[pos++] = array[i++];
}

muneebel · Feb 19, 2008

Hi

well the byte pattern is like this

0 0 0 1 N N N N 0 0 0 1 N N N N N EOF

so the number of N bytes could any . its only the fixed 0 0 0 1 that identifies what N N N N N bytes i have to copy.
so the N N N bytes are sandwiched between two 0 0 0 1 pattern and the last N N N N N N bytes are sandwiched between the last 0 0 0 1 pattern and the EOF.

so basically I need to do 2 things

1. determin how many 0 0 0 1 byte pattern are there in the file

and

2. extract only the N N N bytes and copy these bytes to memory or array where they can be processed.

I hope it makes it clear. I am still working on it and your algo has proved helpful. Any further help in this regard will be appreciated. Thanks

lionelhill · Feb 19, 2008

There will be many more efficient ways (cf Boyer-Moore string searching), but the obvious approach is:

Move forwards until you hit "0" or EOF.
If EOF, stop.
If "0", look to see if the next 3 bytes are "0 0 1"
If not, go back to step 1
If yes, copy bytes "N" until you next encounter "0", before going back to step 1.

Of course it is slightly more complicated if the series "N" can contain "0" provided it's not part of "0 0 0 1". If so, you can use a simple variant:

Move forwards until you hit "0" or EOF.
If EOF, stop.
If "0", look to see if the next 3 bytes are "0 0 1"
If not, go back to step 1
If yes, remember this place as "START"
Repeat the hunt for EOF or "0".
remember this place as "STOP"
Copy bytes from "START" to "STOP"
If EOF, all finished, else back to step 1

sedawk · Apr 5, 2008

How about this way?

For example, your unknown sequence is

0 0 0 1 N N N N N 0 0 0 1
0 0 0 1 0 0 0 0 1 0 0 0 1

Since you know the mark is "0 0 0 1" but only the length of consecutive N is unknown. Let's do an XOR with your sequence with "0 0 0 1" for the beginning to the end

Step 1. 0 0 0 1 0 0 0 0 1 0 0 0 1
XOR 0 0 0 1
------------------------------------
1 1 1 1

Step 2. We detect the first 4 bytes are "1 1 1 1", so this is a legit mark. We can safely move to the right by another 4 bytes.

Step 3. Now
0 0 0 1 0 0 0 0 1 0 0 0 1
XOR 0 0 0 1
-----------------------------------
1 1 1 0

the result is not "1 1 1 1", this is not a mark. At least the first byte must be part of N. So N = 0 (the 3nd 0 in the original sequence).

Step 4. Now we are not sure if the next 4 bytes are mark. So we move to the right by one byte only:
0 0 0 1 0 0 0 0 1 0 0 0 1
XOR 0 0 0 1
-----------------------------------
1 1 1 1

We see another mark again. So there is only one N=0 between these two detected marks.

Step 5. XOR results can tell more of the pattern depending on how many ones in the XOR results. For example after XOR, the result is "1 1 0 0", we know right away that the four bytes are all "N". Then we can move directly by 4 bytes.

xwb · Apr 6, 2008

Do these arrays contain the value 1 or the character '1'? If it is characters, just use strtok or strstr to get to the end of the header sequence if you're not too bothered about thread safety.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

How to search and extract byte patterns from byte array

muneebel

Technical User

cpjust

Programmer

muneebel

Technical User

cpjust

Programmer

muneebel

Technical User

lionelhill

Technical User

sedawk

Programmer

xwb

Programmer

Similar threads

Part and Inventory Search

Sponsor