Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Implementing a TGML-like markup for several languages

Status
Not open for further replies.

DrQuincy

Programmer
Jan 28, 2003
5
GB
Hi PHP gurus,

I'm developing a PHP site that will use a lot of source code excerpts from a MySQL database written in several different languages. I've been on many forums where you can use [] tags to make text bold or italicise it, etc. (Like the TGML on this site!)

How would I go about implementing this in PHP so I could use syntax colouring on the languages I am feeaturing.

For example I'd be able to type:

[cpp]

/****************************
* UserDefinedDataTypes.cpp *
****************************/
union mix_t{
long l;
struct {
short hi;
short lo;
} s;
char c[4];
} mix; // union variable

[/cpp]

and it would hightlight the syntax as though it were a c++ program.

Thanks for your time.
 
TGML is one thing. Syntax highlighting is another.

Actually performing the highlighting is not that hard. Create a table which maps a file extension to a syntax highlight table. Each syntax highlight table has a keyword and a color. A script applies the appropriate HTML tags around each keywork found in the appropriate table. The performance trick is to add the highlighting before the data is entered into the database.

The "gotcha" is detecting the keywords. For example, in PHP both of the following lines are syntactically valid:
$a = 3;
$a=3;

In the first case, you can detect keywords for lookup just by splitting the line at spaces.

The second requires that you write a lexical analyzer in PHP. Want the best answers? Ask the best questions: TANSTAAFL!
 
Not to mention it really depends just how well you want to highlight...

Probably going to want to turn things to a set color for comments for example...

or

Echo 'the value of $a is '.$a;

and

echo "the value is $a which is identical to saying".$a;

should highlight significantly differently.

and escape characters...

I've written lexical analyzers before, it's no small task, and you best really know your regexp's before you undertake it.

That said... the direct question was about something like TGML...

So the basic flow is, read a chunk, output a chunk... until the chunk contains...
[cpp]
at which point you stop outputting... (I imagine the easiest approach is to read in a line at a time, and if you find [cpp] then output the beginning of that line)

Anyway, then you set a flag of some sort, and throw away the [cpp]

now you continue reading into some sort of buffer until you come across
[/cpp]

at which point you stop again, process your buffer, output the result and continue on as before.

You may want to support comments as well, in which case you'll need another flag.

So that's the brute force approach.

The elegant approach is regular expressions... which of course can be implemented in the above example, but if you know them well and how to match with them (in the emacs sense) all you need is a match on

(anything1)[cpp](anything2)[/cpp](anything3)

then you process anything2, and output
anything1
anything2
anything3

or as is more usual syntax

/0
/1
/2


Hope I've shed some let in that rambling.

-Rob


 
YIKES that was really confusing. Ok here is what I've done so far with this type of subject myself. I have developed my own way to identify the [] tags and then to put the information into HTML.

This code identifies my code type tags
PHP:
for example:

while (preg_match("#\[([^\]]+)\](.+?)\[/([^\]]+)\]#ies",$articledata)){
$articledata=preg_replace("#\[([^\]]+)\](.+?)\[/([^\]]+)\]#ies","\$this->BuildCodeBox(array('1'=>'\\1','2'=>'\\2'))",$articledata)
}

(by the way this code bit is located in a function that is also inside a class thus the use of $this-> to call another function)

The function that will handle placing the information into a box is this:

function BuildCodeBox($arr){

$ret="<div class='codehead'>{$arr['1']} Code:</div><pre class='code'>{$arr['2']}</pre>";

return $ret;

It is in this code that I intend to implement the syntax highlighting. I haven't exactly worked that part out but I'm working on it ;)


JRSofty
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top