Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Wrapping Lines of HTML at 1024 Charactors

Status
Not open for further replies.

BlueScr33n

Instructor
Feb 10, 2003
78
US
Initially what sounded fairly easy has me stumped. The type setting program I'm using is able to export HTML with in-line css formatting attributes.

Everything comes through fine until it's opened in the HTML editor I'm forced to use. Because of a per-line character length limit, the program forces a break in the HTML regardless of whether it's interrupting a tag or a word.

These breaks have resulted in a space appearing within words, so the HTML output c ould look a bit bro ken to those tr ying to rea d it.

My question is, using Perl, how would I break a line only if I'm not in the middle of a word or HTML tag.

If the 1024th character is in the middle of a word or tag, how do I "go back" to the previous best break location?

As of now, I'm not even sure what to google for, so any help would be greatly appreciated.

- Thomas
 
personally I'd change the HTML editor, i'm also curious as HTML doesn't care about spaces you could put a 1,000,000 spaces between a word and you would still only get 1.

Also it sounds like your HTML code isn't formatted to be viewed in an editor, what is creating this HTML code?

"In complete darkness we are all the same, only our knowledge and wisdom separates us, don't let your eyes deceive you.
 
I don't think Text::Wrap is HTML aware. But I also don't see much problem at breaking lines inside of HTML tags. HTML tags can be broken at spaces and not affect their functioning.
 
Maybe he meant like

Code:
<di
v align="ce
nter">
 
Thanks for all the responses.

Because this HTML has to be filed with the U.S. Securities and Exchange Commission, we're pretty much bound to a certain type of HTML editor (the SEC only accepts a subset of HTML 3 and 4).

Viewing the HTML also has to be done through this pre-SEC html application that is character space aware - which is where I've been running into problems.

I'll look into Text::Wrap, if that doesn't fit, at least it might point me in the right direction (although any other suggestions would be welcome).
 
Are we talking tons of html files here? Is that why you want a perl script to go through all those files and make lines 1024bytes or less in length but to be broke at non-word or space characters?
 
I'm not going to be running this on pre-existing html files, rather as the last step of an export/conversion process.
 
Then I would think this would not be an issue. If the files are not pre-existing, why can't you just use the editor and not break lines on words or tags?
 
if you are using perl to dynamically generate the HTML, concider using the template module, the templates can be generated via any editor as you require and formated correctly.

Plus you get code -> presentation separation , which is always a good thing :)

"In complete darkness we are all the same, only our knowledge and wisdom separates us, don't let your eyes deceive you.
 
KevinADC: Because all of the css attributes are in-line with the tag, it wouldn't be easy to find a logical breakpoint in a larger document (these can run upwards of 300+ pages), which is why I was hoping to find something similar to an HTML aware Text::Wrap.

1DMF: Perl, right now, is being used to string a series of translation tables together to convert one series of document tags to SEC compliant HTML. I'll be reading through HTML::Template to see if this could introduce better placed line breaks at or before the 1024th char.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top