Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Post script filter

Status
Not open for further replies.

mondragon2001sg

Programmer
Mar 17, 2004
5
SG
Hi,

I have a problem regarding postscript filter. If I am given a txt file, especially when it is a non-conforming document, is there any way to write a code so that I can determine whether the file is a postscript file?

Thanks.
 
Yes. PostScript programs will always start with the comment "%!PS".

If the document conforms to DSC, then the comment will take the form "%!PS-Adobe-X.0".

But the first four characters are always "%!PS".

The exception is print to disk files on HP printers, where the PostScript itself might be bracketed by HP PJL comamnds. You can still scan down a few lines to find the %!PS, though.



Thomas D. Greer
Providing PostScript & PDF
Training, Development & Consulting
 
Thanks. However, I've come across a postscript which does not begin within "%!". Below is the file, which I believed is used to extract some info from a postscript printer.

/strx 255 string def
/F { findfont exch scalefont setfont } def
12 /Helvetica F
10 500 moveto
(../../passwd) (r) file /myworkfile exch def
myfile strx readline
pop show
showpage
 
That code snippet does contain PostScript, but it isn't well-formed. By definition, according to the PostScript Language specification, PostScript programs begin with "%!PS". Period.

That code you posted definitely looks "hand-written" to me.

I doubt it is used to extract information from a PostScript printer, unless that machine has a hard drive.

The code reads a single record from a file named "passwd", and then prints that record on a page.



Thomas D. Greer
Providing PostScript & PDF
Training, Development & Consulting
 
Yes, the code was "hand-written" and it is used to read from a machine that has a hard drive. The code works when i sent it to a postscript printer for execution. So does it mean that it is almost impossible to check for postscript file if it is not written according to specifications, even though it is executable?

Thanks
 
I'm sorry, I didn't fully understand your question at first. In other words, is there something you can do just by looking at a given file to tell if it will "run" on a PostScript interpreter, right?

I'll have to think about that. I don't think so, but let's not be premature! I was able to tell your sample was PostScript because I recognized the PostScript operators. I suppose if you have a list of every PostScript operator, you could write a program to search your file for matches.

Or you can winnow it down to a few assumptions. We could assume that "most" PostScript programs contain at least one definition. So you could search for the "def" operator.

"showpage" would be another good candidate. It's certainly possible for PostScript programs to be fully valid and yet never output a page, but the vast majority will use "showpage".

But that's not a good test, really. Save this message to a file and run your search, and it will find both "def" and "showpage", and yet it isn't PostScript. So you have to have an exclusion test to. If you find a word that isn't an operator and isn't part of a definition, then it isn't PostScript.

The only really valid way is to try to interpret the file. Short of writing your own interpreter, you can use GhostScript, an open source interpreter.



Thomas D. Greer
Providing PostScript & PDF
Training, Development & Consulting
 
My plan was actually to use a function to help me decide what types of interpreter to use, with the PostScript interpreter being one of the choices. I had tried looking out for keywords such as '/' and 'def', which you have mentioned. However, this test wasn't ideal since some text files may contain these words. I was planning to make the test more restrictive by searching for the existence of more than 1 keyword, and based on probability, decide whether a file is a PostScript. However, after searching the Net, it seems like no body has ever done a survey on words most commonly used in a PostScript file. Would like to enquire whether anyone has such info, or I will have to seek another alternative.
 
I have such a list. I put it together to allow my text editor to do syntax checking. It isn't exhaustive, but probably good enough to let you write your filter.

I'll post it on my website.


Keep in mind this is just a list. I don't think you can actually know which terms are "most used" in the universe of PostScript language files. In fact, most PostScript is application-generated, and the first thing most of those programs do is redefine all the "common" operators. So while in effect "bind" or "def" might be used hundreds of times, the code will immediately do something like this:

/bd{bind def}bind def

and thereafter use "bd". So in fact, your search for "bind def" will only return two hits.

Good luck!

Thomas D. Greer
Providing PostScript & PDF
Training, Development & Consulting
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top