Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations biv343 on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

HTML to PDF, using PHP (including XML/XSLT, probably)

Status
Not open for further replies.

shadedecho

Programmer
Oct 4, 2002
336
US
(I searched and searched on this site, thinking surely i'm not the first person in these forums to want to know how, in purely PHP, to convert an HTML page to a PDF file. However, I couldn't find any relevant FAQ's or posts, so here it goes.)

FYI: I am running PHP 5.0.3 on Apache2 on a Linux machine. I have complete control over the server, so I can install PHP libraries/extensions at will, but I want to keep them FREE (open-source, non-commercial) and NON-JAVA. :) I already have PHP compiled with PDF, XSL, and XSLT support.

What I want seems simple: I want to be able to provide a "printer-friendly" version of certain pages on my existing website, which is coded entirely in PHP.

The problem is, I didn't plan ahead enough to have the site using any kind of templates or XML as the base data store, so my only real choice (aside from 1. re-coding ths site, or 2. having to maintain two sets of layout code, one for HTML and one for printer-friendly) is to have a way for a PHP script to take some of my existing HTML (code snippets for the page content only, for instance) and convert THAT (with maybe some other HTML formatting around it) to a pdf file.

Here's what I've got in that process so far (and please enlighten me if this is not the best path, etc).

1. Convert the HTML to XML:
I have the PHP Tidy library installed, and based on other FAQ's I've found, I believe Tidy can very easily (if nothing else but from the command line) convert some HTML to XML/XHTML compliant format.

2. Convert the XML to XSL-FO:
I believe this step can either be accomplished by applying an XSLT transformation to the resulting XML (XSL stylesheet) or by simply reading in the XML into PHP using standard XML parsing, and then applying PHP logic to what is read to generate and output the XSL-FO. I have to read up on XSL-FO, but I believe I should be able to accomplish this step in time.

3. Render the XSL-FO into PDF:
Using some sort of XSL rendering engine, I believe I undestand this can be accomplished. However, this is where my knowledge and understanding breaks down. I have seen that there are several commercial rendering programs out there, such as "FOP" (from Apache) and "XSL Formatter", which claim to be able to accomplish this goal.

However, it appears that they are all either Java or Commercial (fee-based). I thought for sure that PHP had some way of doing XSL rendering (isn't this the same as doing an XSLT transformation, which I know PHP supports?) Does PHP support rendering of XSL-FO?

I'd really like to avoid having to use some external program like this, I'd like to keep the transformation from HTML to PDF completely inside of PHP, without having to actually input/output physical xml/pdf files.
 
phpgramma-

thank you for the link, it's good to know yet another PDF generation tool in PHP.

However, I don't want to generate PDF from a script, i want to convert/render existing html into a pdf file that looks kinda like it would look in a browser, so I can create printer-friendly versions of certain pages, on the fly.

The thing is, most of the pages I want to do this with require somewhat complicated PHP scripting with a database and all that jazz, and I don't want to have 2 versions of these scripts for each page, one to output the HTML, and one to output the PDF.

Nor do I want to re-code all my thousands of lines of scripts to output some sort of structured version of the text in XML and then XSLT it into HTML or PDF. That would have been a good option if I had thought of this before creating the whole site. But I didn't.

So, I need to create a script which can sort of capture the HTML output and then render it as PDF.

Let me also say, that the program "HTML2DOC" does exactly what I am hoping to do, EXCEPT that it only supports HTML 3.2, and I need to support HTML 4.0.
 
ah. i see. Well, I'm good for round-about, jerry-riggin' stuff, so this might work:

Have another script outside of the page you're trying to convert actually read the output of the pages and save to a pdf.

Don't use fopen, but I think readfile() might work...? I guess the best way to describe what I'm thinking is to do exactly what a spider does when it reads web pages and then caches them. Instead, this time, "cache" them as a PDF.

Off the top of my head I can't think of exactly how to code that, but that's the idea...
 
that's my concept, to have another script which "reads" an HTML file, whether that be a physical HTML file or just a PHP file, and it captures the output, like using output buffering, for instance, and then renders that output into a pdf file.

It's the "renders that output to pdf" that i don't yet know how to do, without either buying a commercial program or using Apache's FOP, but that would be selling out to the java world, which i do not want to do.

HTML2DOC does almost exactly what I need, and it has a free version, and it's not dependent on java, so I know it's possible, the only problem is that HTML2DOC doesn't support HTML 4.0 like I need. Isn't there someone out there who does do what i need?
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top