Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations derfloh on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Need to find byte position in a text file of a string

Status
Not open for further replies.

eholdo

Technical User
May 30, 2002
31
US
I need to determine the byte position of the start point of a string for a csh script. I'm going to use
dd skip=NNN bs=1 if=huge-file of=not-so-huge-file
to take a very large file and skip NNN blocks with a single byte size, so I am just writing out the second half of a very large file.

Any ideas on either part? csplit won't work on this. It makes the split, but the second half of the file loses 60% or so of the size that it should be.
Thanks
Erik
 
Can you show your text file and the example output?

tikual
 
<xml version=?><xpif><xpif-v2000.dtd&quot;></xpif>%PDF-1.4
%âãÏÓ
1 0 obj
<<
/Type /Catalog
/Pages 2 0 R
/PageMode /UseNone
/ViewerPreferences << /FitWindow true /PageLayout /SinglePage /NonFullScreenPageMode /UseNone >>
/Outlines 71 0 R
/Metadata 232 3 R
/AcroForm 265 1 R
>>
endobj
2 0 obj
<<
/Type /Pages
/Kids [ 158 0 R 157 0 R 212 0 R ]
/Count 22
/MediaBox 3 0 R
/CropBox 4 0 R
>>
endobj
3 0 obj
[
0 0 612 792
]
endobj
4 0 obj
[
0 0 612 792
]
endobj
5 0 obj
<< /Length 354 /Filter [ /FlateDecode ] >>
stream
xœ]‘ÉNÃ0@ïþŠa1ip\Ï؉7$iӖ‰Kn,HÑ ¿³”òe4zóf1N1ôOâ¥NÃãŽaŸÜoÇà~Ãê†


This is an example. The actual file is 2 MB. If I use csplit, the middle part of the file gets dropped from the resulting created file. I want to remove everything from the beginning up to the /xpif> string. Any ideas?
 
I don't know do I misunderstand or not, try it:

sed solution:

cat test.sed
/\/xpif\>/,${
s/^.*\/xpif\>//
p
}
sed -n -f test.sed yourfile or
sed -n '/\/xpif\>/,${s/^.*\/xpif\>//;p;}' yourfile

tikual
 
One reason for the csplit truncating data is the presence of a NULL (\000) value. All *nix strings related utilities will fail, as they presume the \000 is a end of string indicator.
If such case you may have to consider to write a little C program that will search the /xpif> string and then do binary read/write to achieve your goal.

Hope This Help
PH.
 
In SOME cases 'tr' can be useful :

1) Replace null character by another char not already in the file.

tr '\0' '\377' <input >temp # for example

2) Work with temp file

3) Restore the null chars in result file :

tr '\377' '\0' <result >output # for example

Jean Pierre.
 
Thanks everyone for the help. I found a solution. It happens the construction of the file includes certain tagged text in each line I want to remove. Simply fgrep -v on the string, a couple of times (to get all lines), and I'm left with what I need....

I really appreciate everyone's help, as each solution teaches me a bit more.

Cheers
Erik
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top