Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

How to search and extract byte patterns from byte array

Status
Not open for further replies.

muneebel

Technical User
Feb 12, 2008
3
GB
Hi All

I have an array of bytes containing byte values of

0 0 0 1 N N N 0 0 0 1. could anyone sugest the algorithm how to extract the N N N bytes from that byte array. Thanks
 
When you say "extract" do you mean copy them to a new array, or also delete them from the original array?
Are the N bytes always in the same location? i.e. Is the pattern like this:
Code:
0 0 0 1 N N N 0 0 0 1 0 0 0 1 N N N 0 0 0 1 0 0 0 1 N N N 0 0 0 1 0 0 0 1 N N N 0 0 0 1...
If so, all you need is a loop like this:
Code:
size_t j;
size_t pos = 0;

for ( size_t i = 0; i < sizeof( array ); ++i )
{
   j = (i + 1) % 11;

   if ( (j > 4) && (j < 8) )
   {
      newArray[pos++] = array[i];
   }
}
 
Hi

Thanks for the reply. Let me make it clear

I have byte sequence with byte pattern

0 0 0 1 N N N N 0 0 0 1 N N N N EOF

The N N N bytes can be any number. The 0 0 0 1 pattern is fixed. so basically I need to search for and extract these N N N bytes. By extractin I mean copying all the N N N bytes to another array. so I have an array with the sequence of N N N bytes. I know the number of these 0 0 0 1 pattern. so how would I copy all the N N N bytes from this array to another array. Any help will be higly appreciated. Thanks
 
OK, so if I'm understanding correctly, for every 8 numbers, the last 4 are N's right? The pattern is basically:
Code:
(4 junk)(4 N's)(4 junk)(4 N's)...
Then it's simple:
Code:
size_t pos = 0;
for ( size_t i = 4; i < sizeof( array ); i += 4 )
{
    newArray[pos++] = array[i++];
    newArray[pos++] = array[i++];
    newArray[pos++] = array[i++];
    newArray[pos++] = array[i++];
}
 
Hi

well the byte pattern is like this

0 0 0 1 N N N N 0 0 0 1 N N N N N EOF

so the number of N bytes could any . its only the fixed 0 0 0 1 that identifies what N N N N N bytes i have to copy.
so the N N N bytes are sandwiched between two 0 0 0 1 pattern and the last N N N N N N bytes are sandwiched between the last 0 0 0 1 pattern and the EOF.

so basically I need to do 2 things

1. determin how many 0 0 0 1 byte pattern are there in the file

and

2. extract only the N N N bytes and copy these bytes to memory or array where they can be processed.

I hope it makes it clear. I am still working on it and your algo has proved helpful. Any further help in this regard will be appreciated. Thanks
 
There will be many more efficient ways (cf Boyer-Moore string searching), but the obvious approach is:

Move forwards until you hit "0" or EOF.
If EOF, stop.
If "0", look to see if the next 3 bytes are "0 0 1"
If not, go back to step 1
If yes, copy bytes "N" until you next encounter "0", before going back to step 1.

Of course it is slightly more complicated if the series "N" can contain "0" provided it's not part of "0 0 0 1". If so, you can use a simple variant:

Move forwards until you hit "0" or EOF.
If EOF, stop.
If "0", look to see if the next 3 bytes are "0 0 1"
If not, go back to step 1
If yes, remember this place as "START"
Repeat the hunt for EOF or "0".
remember this place as "STOP"
Copy bytes from "START" to "STOP"
If EOF, all finished, else back to step 1

 
How about this way?

For example, your unknown sequence is

0 0 0 1 N N N N N 0 0 0 1
0 0 0 1 0 0 0 0 1 0 0 0 1

Since you know the mark is "0 0 0 1" but only the length of consecutive N is unknown. Let's do an XOR with your sequence with "0 0 0 1" for the beginning to the end

Step 1. 0 0 0 1 0 0 0 0 1 0 0 0 1
XOR 0 0 0 1
------------------------------------
1 1 1 1

Step 2. We detect the first 4 bytes are "1 1 1 1", so this is a legit mark. We can safely move to the right by another 4 bytes.

Step 3. Now
0 0 0 1 0 0 0 0 1 0 0 0 1
XOR 0 0 0 1
-----------------------------------
1 1 1 0

the result is not "1 1 1 1", this is not a mark. At least the first byte must be part of N. So N = 0 (the 3nd 0 in the original sequence).

Step 4. Now we are not sure if the next 4 bytes are mark. So we move to the right by one byte only:
0 0 0 1 0 0 0 0 1 0 0 0 1
XOR 0 0 0 1
-----------------------------------
1 1 1 1

We see another mark again. So there is only one N=0 between these two detected marks.

Step 5. XOR results can tell more of the pattern depending on how many ones in the XOR results. For example after XOR, the result is "1 1 0 0", we know right away that the four bytes are all "N". Then we can move directly by 4 bytes.

 
Do these arrays contain the value 1 or the character '1'? If it is characters, just use strtok or strstr to get to the end of the header sequence if you're not too bothered about thread safety.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top