Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

finite state machines

Status
Not open for further replies.

tokra

Technical User
Feb 20, 2003
45
GB
I have an application where I have to read in large quanties of data from a file. The records are mixed binary and ascii having no standard format save a 4 charachter delimiter at both the beggining and end of a record. The delimiter is assummed but not garunteed to only occur there and not in data. The algorithim I'm using now is slow and I'd like to improve performance. I had heard the Finite State Machines were fast and particularly well suited to pattern mathcing. Unfortunatly when I tried to implement one I came up with some of the ugliest code ever written by man.
My questions are:
1>Is there a simpler way of doing reading and splitting the data with the standard library
2>if a finite state machine is indeed the fastest way of doing this how does one go about implementing them with code that isn't an eyesore.
Code:
std::vector<std::string> ans;
//states
//0 just reading and discarding characters
//1 found 1st char
//2 found 2nd
//3 found 3rd
//4 reading and storing characters
//5 found 1st
//6 found 2nd
//7 found 3rd
int state=0;
std::string temps;
char tempc;
while (!in.eof())
{
tempc=in.getchar()
switch(state)
{
0:if (tempc=='J') state=1;break;
1:if (tempc=='R') state=2;break;
2:if (tempc=='B') state=3; break;
3:
if (tempc=='P') 
{state=4;temps="JRBP";}
break;
4:
temps+=tempc;
if (tempc=='E') state=5;
break;
5:
temps+=tempc;
if (tempc=='O') state=6;
break;
6:
temps+=tempc;
if (tempc=='W') state=7;
break;
7:
temps+=tempc;
if (tempc=='T') 
{
ans.push_back(temps);
state=0;
}
break;
default:
throw;//this should never occur
}

}

WR
 
The code above isn't compiled i forgot the cases (Shows how much i use a switch)

WR
 
The records are mixed binary and ascii having no standard format save a 4 charachter delimiter at both the beggining and end of a record. The delimiter is assummed but not garunteed to only occur there and not in data.

Well, this can be a problem unless each record is somehow uniquely identified (i.e. a single file that you read in its entirety, a return value from a function that ensures a full binary/ascii return of the entire record, etc..)

If that is the case, you have a unique situation. Because "JRBP" is 4 characters, it is also a 32 bit value.

Knowing the size of your BYTE array, you can verify that

char ptr[] = "JRBP";

*( (int*)array) == *( (int*)ptr)

that will check the beginning but there may be an endianism-ness to it... I doubt it because it is dealing with characters.

For the end of the record you can start at the end of your array and move backwards by one until you achieve the same comparison. DO NOT FORGET TO START AT array[number_of_elements - 4] when beginning your reverse search.


Matt
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top