Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chriss Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Creating XML and merging with PDF 1

Status
Not open for further replies.

1DMF

Programmer
Jan 18, 2005
8,795
GB
Hello Peeps,

Right before I outline my question I just wanted to point out I know NOTHING about XML, it's format or structure and I don't have a clue about PDF's, that said I'm not looking for some one to write my code and I am more than willing to learn, all I need is some help pointing me in the right direction.

I want to be able to create a PDF template from a WORD doc, and merge it with data in our MS SQL Database.

So the end result to the user is a PDF document, that has some fields with their Name, Address etc.. pre-populated from the SQL.

I am assuming I need to create an XML file with the SQL data via PERL and some how tell the PDF template to use the XML file to populate it.

So where do i start and what's the best approach for doing this.

Regards,
1DMF

"In complete darkness we are all the same, only our knowledge and wisdom separates us, don't let your eyes deceive you.
 
XML is a little like html, lots of tags with data between.
PDF is a binary format and therefore not very readable or processable (similar in some ways to postscript, although postscript is a more text like language).
MS Word, used to be binary but I understand the latest incarnations are compresed XML.

I'm not really sure how you can pull these things together but since I'm a linux guy, I'd be looking at processing the WORD XML directly and streaming data from your DB with DBI, merging the result in perl and creating the pdf directly (assuming that there is module support of some kind for pdf).
If I couldn't generate pdf directly, then I'd look to creating postscript and use ps2pdf to create the result.

Not sure if any of this helps at all. Maybe a windoze guy can help you more.





(very injured) Trojan.
 
The problem is we have office people who forever change the starting template which is in MS Word.

I know you can open a PDF and merge XML data to it, but I assume the PDF "Template" needs to be created using the full version of acrobat.

I guess what i'd like to do isn't possible, i'd be happy if the starting template was RTF, and I was able to process it with a PERL module and merge the data that way.
then output to PDF with some RTF->PDF converter module.

is that a posibility?

"In complete darkness we are all the same, only our knowledge and wisdom separates us, don't let your eyes deceive you.
 
RTF is basically a text(ish) format and therefore very possible to process with perl.
Does the output have to be pdf or could it be rtf again?




(very injured) Trojan.
 
well I want it in a format people cant edit, is it possible to make a readonly locked down version in RFT like you can with PDF ?

"In complete darkness we are all the same, only our knowledge and wisdom separates us, don't let your eyes deceive you.
 
Have a look on there's a plethora of PDF modules there. And XML

Another option is Win32::printer, and print it out to CutePDFWriter as opposed to a printer per se

Just a thought
--Paul

cigless ...
 
cheers paul, but i'm not sure I can do what i want. the template MUST be in WORD to start with and editible by our admin bods.

then i take that and want to merge it and create PDF, i don't want to have to hold a template in code or anywhere else, either give it the WORD doc or an RTF version of it.

like usual i'm trying to automate things too much, but I like to keep busy :)

maybe what I want to do is possible, not sure where to start though.

should i look at how I can create a word document with merge fileds in it and the data source being CSV.

can you call a mailmerge in word via a web server, that way I could easily create a csv file from the SQL DB using PERL, then issue the mailmerge to a new document, convert the result to PDF and then reroute the browser to the PDF.

that would work, is it possible ?, what would i need on the web server to do a mailmerge in word, I know our web host is on Windows.

I could use ASP to do the mail merge call I guess if that enables some kind of axctive-x plugin to VBA to run the MS Word MailMerge.

Any ideas anyone, please let me know if i'm talking spherical things, I don't know much about ASP but think it has some VB capabilities, I do this mailmerge thing via VBA & MS Access inhouse so thought I could use the same approach via the net, hhmmm would it help if I found out if the server also does .NET ?

P.S.

I Don't mean to go swearing in a PERL forum with things like ASP & VB [surprise] , but i'm prepared to harness what ever I can to achieve the results. :) , and I will be using PERL to query the SQL and produce the CSV for the mailmerge, so it's not all bad!

"In complete darkness we are all the same, only our knowledge and wisdom separates us, don't let your eyes deceive you.
 
I would think that you could get WORD to produce, say, Postscript using a postscript printer driver but writing to file.
Perl could then read the ps and replace the appropriate data.
Finally you could use ps2pdf to convert to pdf.
Any good?




(very injured) Trojan.
 
RTF on CPAN might be a start

Win32::OLE should be able to interact with Word as well, well worth a looksee. You have the power of Perl, and you're looking to step back, create a CSV from a database and mailmerge, that's a bit like trying to take out a sparrow with a H-Bomb ;-)

You'd have to get an additional license for Word on the server, I'd shy away from that if at all possible. (because Word is a fairly powerful application, in a GUI environment, someone might double click something they shouldn't)

It's more than possible I'd have thought, using Perl, on either Windows or *nix

--Paul

cigless ...
 
Good points here from Paul,
Perl could, I guess, open the word docs directly, read from the DB directly and write the pdf directly.




(very injured) Trojan.
 
Ok you've scared me Paul, not just the people clicking things, though how they would click anything on our webspace i'm unsure, but the licence thing throws the whole argument out the window, well , hang on , let me make a phone call.... nope no additional word licence on our host.

PHP, ColdFusion, ASP, .NET, PERL etc..

hmmm can coldfusion do it easily? it's great for creating graphs, might be worth a tinkle.

anyway whats wrong with using a h-bomb to take out a sparrow, at least i won't miss!

Hey trojan, what's with the comment
Perl could, I guess, open the word docs directly, read from the DB directly and write the pdf directly.

don't stop there tell me more like HOW :)

so should i look at Win32::OLE for reading the word doc ?

and why can't i use a CSV file for the merge data ? i'd rather that than give the word doc access to the SQL DB.

What about parsing the opened doc via win32::OLE the record set direct as an array of hashes, would that be possible.

so many questions!





"In complete darkness we are all the same, only our knowledge and wisdom separates us, don't let your eyes deceive you.
 
1) Get your admins to save the docs in RTF format
2) Have Perl parse the RTF
3) Have Perl read the database, and use the s/// or tr/// function to replace the placeholders
4) Have Perl generate the PDF's from there

PDF's can be edited using acrobat and other tools unless the security options are enabled.

IDMF, your problem is you know too much ;-)(Coldfusion, PHP, .NET ...), Keep it simple

HTH
--Paul

BTW - I'm now changing my signature to "Spend an hour a week on CPAN"

cigless ...
 
1DMF said:
and why can't i use a CSV file for the merge data 1? i'd rather that than give the word doc access to the SQL DB.2

1) It's another step you don't need. You're placing all the power in Word, while you've got Perl H-Bomb/Sparrow thing again
2) If you're worried about security on the DB, have the admins set up a read only user for the DB, and access the data through that

--Paul

Spend an hour a week on CPAN, helps cure all known programming ailments ;-)
 
Paul my saviour, thank you so very much.

Yeah your right , jack of all trades, master of some! (or none i hear some of you cry :) )

ok now all i need is the module name for parsing RTF and the module name for creating PDF and i'm sorted.

unless I can do this with the RTF file direct.

Code:
my @fld = split(/\,/,$fields);
my @rs  = getSQL("Table","$fields","ARID='myID'");
my $data = "";

open(RTF, "+<$file"); 
      flock(RTF, 2);
      binmode(RTF);
      my @lines = <RTF>;
      foreach my $line (@lines){
         for(my $x=0; $x<=$#fld; $x++){
          $line =~ s/\[$fld[$x]\]/$rs[1]{$fld[$x]}/i;
         }
        $data .= $line;       
      }
    truncate(RTF, length($fld));
    seek(RTF, 0, 0);
    print RTF "$data";
    close(RTF);

any thoughts on the code above?



"In complete darkness we are all the same, only our knowledge and wisdom separates us, don't let your eyes deceive you.
 
P.S. - If I could understand a word CPAN and PerlDocs wrote, it might be worth looking, tell you what i'll do a deal with them , I'll visit CPAN when they learn plain english :p

"In complete darkness we are all the same, only our knowledge and wisdom separates us, don't let your eyes deceive you.
 

Looks the shiznit for the PDF part,

and I'd guess RTF::parser, but it's 1999 ...

What exactly is in the Document, style, whatever?
There are document convertors, rtf2html for example, and the html is shedloads easier to parse, or even rtf2text, which RTF::parser actually uses

Spend an hour a week on CPAN, helps cure all known programming ailments ;-)
 
I think it might be worth looking at PDF::Reuse since they have an example that shows re-using a pdf doc as a template and dropping in data from another source to generate an output pdf doc.
Very close to what you're trying to do methinks.
Do you have a pdf printer driver? You could create the pdf template straight from WORD if you do.


Trojan.




(very injured) Trojan.
 
I thought the code I wrote would do the RTF bit not the PDF, which one of us is confused - lol

it is meant to read the RTF file and replace [fieldname] with the corresponding data and re-save the file.

the idea was then next create a PDF from the RTF file.

now your offering a possible PDF template with this reuse module, i've come to fork in the road and am not sure which way to turn.

If the code I wrote will work on an RTF file then all i need is a module that converts RTF->PDF and i'm done.

Which is probaly best, because if the full version of acrobat is needed to generate the template with the fields on the PDF, then it isn't an option.

Though yes i do have a PDF printer writer (Primo-PDF), that's how i'm creating the PDF of the word doc at the moment.

so if there is a way of putting place holders on the WORD doc , convert to PDF and then use this PDF::Reuse, would that be the right fork to take ?



"In complete darkness we are all the same, only our knowledge and wisdom separates us, don't let your eyes deceive you.
 
google doc2pdf

Spend an hour a week on CPAN, helps cure all known programming ailments ;-)
 
I was suggesting starting with pdf since you can at least have some confidence that the output format will be reliable.
If you use perl to read rtf and try to write pdf, what will happen to the layout and alignment?
Just a thought.





(very injured) Trojan.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top