Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Pull a Segment of HTML Body Text 1

Status
Not open for further replies.

SuperKoopa

Technical User
Jul 30, 2012
26
US
Hello there,

I am trying to pull specific lines of data from the body text of a secured website. I've done some tinkering and have just one issue that hopefully there are many ways to resolve and perhaps someone can help me.

So, the snippet of source code looks like this:

<TD vAlign=top width="50%" ><FONT size=+0><I><B>NAME</B></I><BR><B>John Smith</B> <BR>123 N. 13th Ave <BR>Queensboro, NY<BR><BR></FONT></TD>
<TD vAlign=top width="50%" ><FONT size=+0><I><B>NAME 2</B></I><BR><B>Jane Smith</B> <BR>123 N. 13th Ave <BR>Queensboro, NY<BR><BR></FONT></TD>

For someone reason the host of this site didnt feel it was necessary to make the "Name" a class so I'm having to try and write something that identifies the word NAME (and if it exists NAME 2). Then, it needs to get the actual name (which could be 20 characters or 60 characters. I don't want it to copy the actual "NAME" or "NAME 2" text, just the actual names...which of course will vary.

SO, what I've some up with so far will pull just the word "NAME", but since the number of characters in the name will vary I haven't been able to account of that without putting a fixed number to it. That is unless I use the Len("<BREAK<") and of course that only gives me until the 1st BREAK, i need it to START after the 1st break after the word name, and end at the 2nd break. If anyone can help me with this or think of an alternative I would greatly appreciate it.

Here is what I currently have (I haven't done the IF/THEN for "NAME 2" yet):

Dim CaseInf As String
Dim CResults As String
Dim dtTimer As Date
Dim lAddTime As Long
Dim FName As Integer
Const lREADYSTATE_COMPLETE As Long = 4

dtTimer = Now
lAddTime = TimeValue("00:00:20")

Do Until appIE.readystate = lREADYSTATE_COMPLETE And Not appIE.busy
DoEvents
If dtTimer + lAddTime > Now Then Exit Do
Loop

CaseInf = appIE.document.Body.innertext

FName = InStr(1, CaseInf, "NAME")
CResults = Mid(CaseInf, FName, Len("<BREAK>"))
Sheets("Sheet1").Range("A1") = CResults
 
A starting point:
FName = InStr(1, CaseInf, "NAME")
iStart = InStr(FName, CaseInf, "<B>") + 3
iLength = InStr(iStart, CaseInf, "</B>") - iStart
CResults = Mid(CaseInf, iStart, iLength)


Hope This Helps, PH.
FAQ219-2884
FAQ181-2886
 
Thanks for your response PH,

Inevitably a silly question: What would I Dim iStart and iLength As?

I'm not a saavy with all this as I like to think. [glasses]
 
Dim FName As Integer
Dim iStart As Integer
Dim iLength As Integer
Dim CResults As String

Have fun.

---- Andy
 
Thank you guys for your assistance.

I've tried severeal variations of the starting point provided but was unable to produce any text from the source. The character that is produced with the sample code provided results in a symbol that is equivilent to the websites "BREAK" (or RETURN) symbol.(It's a little rectable box with a question mark in it)I tinkered with it but was not able to have success other than s few variations of the code that would produce numbers. Any other ideas? Also I had to change the code provided by PH to include Len(iLength), otherwise I recieved a run-time error. This is where I stand now (which is what produces that box with the ? in cell A1 of my spreadsheet)...


Dim FName As Integer
Dim iStart As Integer
Dim iLength As Integer
Dim CResults As String
Dim CaseInf As String
Dim dtTimer As Date
Dim lAddTime As Long

Const lREADYSTATE_COMPLETE As Long = 4


dtTimer = Now
lAddTime = TimeValue("00:00:20")

Do Until appIE.readystate = lREADYSTATE_COMPLETE And Not appIE.busy
DoEvents
If dtTimer + lAddTime > Now Then Exit Do
Loop

CaseInf = appIE.document.Body.innertext

FName = InStr(1, CaseInf, "NAME")
iStart = InStr(FName, CaseInf, "<B>") + 3
iLength = InStr(iStart, CaseInf, "</B>") - iStart
CResults = Mid(CaseInf, iStart, Len(iLength))
Sheets("Sheet1").Range("A1") = CResults
 
CaseInf = appIE.document.Body.inner[!]HTML[/!]
FName = InStr(1, CaseInf, "NAME")
iStart = InStr(FName, CaseInf, "<B>") + 3
iLength = InStr(iStart, CaseInf, "</B>") - iStart
CResults = Mid(CaseInf, iStart, [!]iLength[/!])
Sheets("Sheet1").Range("A1") = CResults

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886
 
Sorry for the delay, super busy lately on other projects. As for this issue unfortunately it hasn't but we're getting closer. I know it can't be easy without having all the info in front of you so I appreciate your patience.

So we are getting feedback from the site, however it includes HTML coding as well it doesn't appear to start from the first HTML "<B>" after the TEXT "NAME". Instead, it finds the first HTML "<B>" on the webpage and prints the line of HTML following that <B>.

It sounds like what I'm needing now that I know you can search for HTML code as opposed to text is the following:

I would need the code to find the first HTML code "<B>" after the TEXT "NAME", and then print the TEXT after said "<B>" until the next HTML "<./B>".

So start getting the TEXT after the the first <B> starting from the word NAME, and get the text until the HTML code <./B>. I hope I explained it properly. I appreciate your help!
 
Okay I got it!

I had to adjust a few things but I've tested it with names of different lengths and its rendering the results perectly based off of your code. Here's the code:

CaseInf = appIE.document.Body.innerHTML
FName = InStr(1, CaseInf, "NAME</B></I><BR><B>")
iStart = InStr(FName, CaseInf, "<B>") + 3
iLength = InStr(iStart, CaseInf, "</B>") - iStart
CResults = Mid(CaseInf, iStart, iLength)
Sheets("Sheet1").Range("A1") = CResults

I see this still refences HTML only which is cool since it accurately ccaptures the text within HTMl (which I assume is what you were going for in the first place, I was just being short sided about it, lol) I appreciate your help very much, without this base it would never have been possible for me to get this done! Thank you, kudos to you sir!
 
This is precisely what my suggested code is supposed to do ...
What is your code finding the wrong "<B>" tag.
Anyway, what is the REAL value of CaseInf ?

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886
 
Would you happen to know hot to account for the situation of the "NAME" doesnt exist in the HTML code?

(Here's the code again):
CaseInf = appIE.document.Body.innerHTML
FName = InStr(1, CaseInf, "NAME</B></I><BR><B>")
iStart = InStr(FName, CaseInf, "<B>") + 3
iLength = InStr(iStart, CaseInf, "</B>") - iStart
CResults = Mid(CaseInf, iStart, iLength)
Sheets("Sheet1").Range("A1") = CResults


So in the Fname line, if "NAME</B></I><BR><B>" doesn't exist, I would like to proceed without error or msgbox; just proceed as normal and End Sub.
 
...
FName = InStr(1, CaseInf, "NAME</B></I><BR><B>")
If FName <= 0 Then Exit Sub
...

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top