Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

remove HTML formatting

Status
Not open for further replies.

vinayak

Technical User
Sep 17, 2001
50
IN
Hi,

Is there any means (TP tools/built in functions etc...) by which I can remove the HTML code (remove formatting) from a HTML file and just retain the clean data?


(I'm working with Servlet/JSP's. )

Thanks in advance.
- Vinayak


 
If you are referring to HTML Tables, then you can open it in Excel & and save it as a comma delimited file.
 

If you're working with JSPs, then your best bet would be to create a clean page with just the JSP data on it.

I don't know of any easy methods to strip all HTML from a page, leaving just "data" - technically, your "data" will also be part of the HTML, so you'd end up with a blank page.

Dan
 

Not the cleanest solution but:

Try pasting the html text the code into Excel, then go to Edit, Find and Replace. (If you paste the html code from word it will prevent Excel from formatting the page using the html tags.)

Replace <*> with nothing and that should do it - all html tags will be gone.

Regards
 
Hi,
Sorry if I have mislead you, this is what I need.

1. There is a HTML file lying on a server (TOMCAT).
2. Contents of that file need to be stripped off of HTML
3. Replace the HTML code with tabs, spaces CR etc. where needed (text equivalant of HTML code though not 100% accurate).
4. Display the contents in a TEXTAREA.

Thanks,
Vinayak


 
maybe this may be more appropriate in that case

Jakarta: Tomcat forum; forum877




Chris.

Indifference will be the downfall of mankind, but who cares?
 
Actually... it's kind of tricky finding the correct forum to post this one.

Assuming that there is no cool tool that does this already (and judging by the reaction here - there isn't one that is widely known about)... I bet someone could create a great Regular Expression that did the job (even to the point of inserting tabs, returns etc to replace certain html tags).

So how about posting the request on, say, the Perl forum as well?

Hmmm... are you a JSP developer? If so... you could code a JSP web solution that picks up a file (server-side) and parses the code for you... outputting a fresh new "stripped" version (saved to the server). It can be done in php and asp too (you mentioned tomcat though).

All the best,
Jeff
 

You need to better define what you mean by ""stripped of HTML".

For example, this is all HTML:

Code:
<b>This is bold</b>

Do you just want to strip HTML *tags*, or *all* HTML?

You also haven't given us any idea what your data looks like, or which parts you want where.

Can you give us an example? Everyone here seems to finding it hard to answer your query...

Dan
 
Hi,

Example INPUT:
<TABLE WIDTH="100%" BORDER="0" CELLSPACING="0" CELLPADDING="0" BGCOLOR="#FFFFCC">
<TR>
<TD><B>Cell 1</B></TD>
<TD>
<P ALIGN="CENTER"><I><B><FONT FACE="Arial, Helvetica, sans-serif" SIZE="2" COLOR="#808080">Cell
2</FONT></B></I></P>
</TD>
</TR>
<TR>
<TD><S>Cell 3</S></TD>
<TD><VAR>Cell 4</VAR></TD>
</TR>
</TABLE>
---------------------------------------------------------

Expected OUTPUT (TEXT ONLY):
-----------------------
|Cell1 | Cell2 |
-----------------------
|Cell3 | Cell4 |
-----------------------



Thanks,
Vinayak
 
No such tool exists that I am aware of to do this task with the expectations you have presented.

Jeff
 
You can use Python...
Check this out:

If you search around for more on this module, I'm sure you could get the results you desire. You may want to take this over to the Python forum, and see what they think.
I hope this helps. If it doesn't, please post back why this won't work for you... so we can help you more efficiently.

X
 
You don't want to pay the $20 that it costs to solve this problem that you have had open for 5 days so far?! How much money do you think you have cost your company searching for a solution so far?

Jeff
 
Hi Jeff,

The question is not of Money, however dependeny on some third party software.

Thanks,
Vinayak
 
Vinayak said:
The question is not of Money, however dependeny on some third party software.
Where does the dependency end? Are you going to write your own servlet engine, your own database manager, your own operating system? All these are third party software.

Just pointing out the flaw in your statement.

Jeff
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top