Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Search Byte Array for Pattern 1

Status
Not open for further replies.

AnonGod

IS-IT--Management
Jun 5, 2000
223
0
0
I'm looking for something that I can plug 2 byte arrays into and get an index returned of the first occurrence of the first array in the second array.

Found this link that does what I want it to do, but I do not understand the use of the failure array.


I see failure[] is an array that matches the size of the pattern array and contains 0,1,2,3... if the first element in the pattern array matches the 2nd, the 2nd matches the 3rd, etc... and will output all 0's if the 1st does not match the 2nd.

It then uses this in the indexOf method to determine how far to "backup" in the pattern array when part of the pattern array is not found in the data array.

So, I guess what I'm not understanding conceptually is how the failure[] array works. I would be grateful for an explanation and sample data that illustrates how it works.

Thank you!
:)

Here is my sample data:
Code:
    public static void main(String[] args)
    {
	byte[] data = {1,2,3,4};
	byte[] pattern = {3,4};
	int result = indexOf(data,pattern);
	System.out.println("Result: " + result);
    }
    
    public static int indexOf(byte[] data, byte[] pattern)
    {
	int[] failure = computeFailure(pattern);
	int j = 0;
	if (data.length == 0) return -1;
        for (int i = 0; i < data.length; i++) {
            while (j > 0 && pattern[j] != data[i])
            {
                j = failure[j - 1];
            }
            if (pattern[j] == data[i]) { j++; }
            if (j == pattern.length)
            {
                return i - pattern.length + 1;
            }
        }
        return -1;
    }
	
	/**
* 	Computes the failure function using a boot-strapping process,
* 	where the pattern is matched against itself.
	*/
    private static int[] computeFailure(byte[] pattern)
    {
        int[] failure = new int[pattern.length];
        int j = 0;
        for (int i = 1; i < pattern.length; i++)
        {
            System.out.println("i: " + i);
            while (j > 0 && pattern[j] != pattern[i])
            {
                System.out.println("j: " + j);
                j = failure[j - 1];
            }
            if (pattern[j] == pattern[i])
            {
                j++;
            }
            failure[i] = j;
        }
    return failure;
    }

alien.gif

[Signature modified by admin request]
-TAG

'Drawing on my fine command of language, I said nothing.'
 
I'm not usually a fan of posting code solutions, but as it happens, I wrote a similar thing a couple of days ago.

It returns a boolean to see if one byte array is contained within another, but you should easily be able to modify it to return the desired index.

Here it is :

Code:
	static boolean matchData(byte[] srcData, byte[] dataToFind) {
		int iDataLen = srcData.length;
		int iDataToFindLen = dataToFind.length;
		boolean bGotData = false;
		int iMatchDataCntr = 0;
		for (int i = 0; i < iDataLen; i++) {
			if (srcData[i] == dataToFind[iMatchDataCntr]) {
				iMatchDataCntr++;
				bGotData = true;
			} else {
				if (srcData[i] == dataToFind[0]) {
					iMatchDataCntr = 1;
				} else {
					iMatchDataCntr = 0;
					bGotData = false;
				}
				
			}
	
			if (iMatchDataCntr == iDataToFindLen) {
				return true;
			} 
		}
	
		return false;
	}

	public static void main(String args[]) throws Exception {
		System.out.println(matchData(new byte[] { 1,2,3,4 }, new byte[] { 2,3,1 }));
	}

--------------------------------------------------
Free Java/J2EE Database Connection Pooling Software
 
Excellent, thank you for the sample. The failure array in my original post provided some kind of error checking, but I'm not sure if it was needed or not in any situation.

:) -Andrew

alien.gif

[Signature modified by admin request]
-TAG
[Contact information removed by another admin request]
 
Just as a matter of curiosity, transforming them into Strings and use indexOf is not an option?

Cheers,
Dian
 
Not if \0 is valid data. The string functions will stop there without checking the rest of the string.
 
I think you're confusing C/C++ with Java there Miros ...

--------------------------------------------------
Free Java/J2EE Database Connection Pooling Software
 
Ok, ascii character 0. \0 is just a handy shorthand.
 
Ermmmm, no they are not - one is a zero, and the other is a null terminator - very different indeed.

In either case, java will not stop enumerating a string just because it finds one of those characters.


--------------------------------------------------
Free Java/J2EE Database Connection Pooling Software
 
Miros, in Java Strings are objects part of whose state is responsible for tracking the number of characters in the 'string' it represents. It is not an arbitrary vector of character bytes terminated by a zero value like in c/c++.

Tim
 
Ah, I have too many languages in my head!

I'd still feel safer knowing my 0-bytes are in a byte array, not a string. That triggers the thought for me that this ain't just printable ASCII when I have to go back and look at the code months later, especially if I'm switching back and forth between languages.
 
Ah, I have too many languages in my head!

[bigsmile] Sometimes I long for the days 24 years ago when I only knew Spectrum BASIC.

Tim
 
Strings in Java do not neccessarily require printable characters, and and neither do char* in C/C++ - so there should be no confusion ! Either can contain non-printable characters, because the char value (under intel x86 GCC and Sun JDK) are 8 bytes in C, and 16 bytes in Java ...

--------------------------------------------------
Free Java/J2EE Database Connection Pooling Software
 
I've considered the String's .indexOf method, but that would require a whole seperate chunk of memory to convert the byte array to a string. Double the memory, not double the fun.

:) -Andrew

alien.gif

[Signature modified by admin request]
-TAG
[Contact information removed by another admin request]
 
Code:
I've considered the String's .indexOf method, but that would require a whole seperate chunk of memory to convert the byte array to a string.  Double the memory, not double the fun.
That's why they pay us the big bucks... we think about issues like processing time vs storage space...

Code:
Strings in Java do not neccessarily require printable characters, and and neither do char* in C/C++  - so there should be no confusion ! Either can contain non-printable characters, because the char value (under intel x86 GCC and Sun JDK) are 8 bytes in C, and 16 bytes in Java ...

However, all the C/C++ routines that work on char* parameters assume that \0 ends the string. If his data contains one, but doesn't end the string, your search is going to end prematurely! (Just to reopen a nearly closed can of worms, don't bother responding.)
 
What's the size of your byte array? Nowadays, memory is not an issue on most systems.

A String has internally an array of bytes, so you can avoid the double memory by reading the data directly into a String. And I bet the indexOf method is failry optimized, more than what you can achieve.

Another argument, and that's what engineering is about is actual cost vs achieved result: you got a piece of code that you need to test and optimize, and maybe you have the same provided with the JDK but with the effort spent on it.

Cheers,
Dian
 
But comment clearly what you're doing and put a note "Don't try this in C/C++" in case someone tries to cannibalize your code...
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top