Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Problem getting value between html tags 2

Status
Not open for further replies.
Apr 28, 2006
69
NL
Hi all i have this application that loads html and it needs to search trough all the html and list the numbers between VALUE=" and "></td> .I mean i want to collect bold number :VALUE="3018"></td>.

My program what it does now it outputs like this with some extra things:

Code:
[URL unfurl="true"]http://localhost/new/player.php?song=,"album.php?show_albums,"3018","3019","3020","3021"[/URL]

but i want it to look like this this

Code:
[URL unfurl="true"]http://localhost/new/player.php?song=3018,3019,3020,3021[/URL]

i tried many things i could not remove the extra album.php?show_albums, and extra " from output url . I be happy if some one help me fix these problems.I bolded importent part.Thanks

Html code hast mane of this type of blocks
Code:
 <tr>
    <td align="center" scope="row">1</td>
    <td align="center"><INPUT TYPE="Checkbox" NAME="song_id" ONCLICK="reviewSelection();" [b]VALUE="[/b]3018[b]"></td>[/b]
    <td><a href="#" class="song_title" onclick="loadPlayer('3018');return false;"> my life
</a> </td>
    <td align="center">&nbsp;</td>
    <td align="center">&nbsp;</td>
  </tr>


my code:

Code:
Private Sub Command1_Click(Index As Integer)

Select Case Index
    Case 0:
        If txtURL.Text <> "" Then
            RichTextBox1.Text = Inet1.OpenURL(txtURL.Text, icString)
        End If
    
    Case 1:
        End
End Select
End Sub


Private Sub Command2_Click()
 Dim sResult() As String, n As Long
[COLOR=red]
              If GetLine(RichTextBox1.Text, "[COLOR=red]VALUE=[/color]", " [COLOR=red]></td>[/color] ", sResult) Then
        ' Occurances were found and have been placed in the array
      
        Text1.Text = "[URL unfurl="true"]http://localhost/new/player.php?song"[/URL]

        For n = LBound(sResult) To UBound(sResult)
          List1.AddItem sResult(n)
          Text1.Text = Text1.Text & "[COLOR=red],[/color]" & Split(sResult(n), "=")(1)

        Next n
        

               
        '--------------- end of making url code
        
        
    Else
        ' No occurances were found
    End If
End Sub




Private Function GetLine(ByVal sText As String, ByVal sStart As String, ByVal sEnd As String, ByRef sArr() As String) As Boolean
    Dim lPos As Long, lEnd As Long, lCount As Long, sTemp() As String
    
    ReDim sTemp(100)
    
    lPos = InStr(1, sText, sStart, vbTextCompare)
    Do While lPos
        lEnd = InStr(lPos, sText, sEnd, vbTextCompare)
        If lEnd Then
        'Remove & sEnd from the below line.
        'sTemp(lCount) = Mid$(sText, lPos, lEnd - lPos) & sEnd
            sTemp(lCount) = Mid$(sText, lPos, lEnd - lPos)
            lPos = InStr(lEnd, sText, sStart, vbTextCompare)
        Else
            sTemp(lCount) = Mid$(sText, lPos)
            lPos = 0
        End If
        lCount = lCount + 1
        If lCount > UBound(sTemp) Then ReDim Preserve sTemp(100 + lCount)
    Loop

    If lCount > 0 Then
        ReDim Preserve sTemp(lCount - 1)
        sArr = sTemp
    End If
    GetLine = lCount
End Function
 
Private Function rpl(ByVal sText As String) As String
'Chr(32) = space
'Chr(34) = "
'Chr(39) = '
Dim s As String
s = Replace(sText, Chr(32), "")
s = Replace(s, Chr(34), "")
s = Replace(s, Chr(39), "")
rpl = s
End Function

the from which your results came would help.
( ,"album.php? show_albums )
 
I do not see what good is for a construction such as getline. I would simply put the onclick like this.
[tt]
Private Sub Command2_Click()
Dim rx As RegExp
Dim cm As MatchCollection
Dim m As Match
Dim separator As String
Dim i As Integer

separator = ","
Text1.Text = "
Set rx = New RegExp
MsgBox TypeName(rx)
Dim s As String
s = RichTextBox1.Text
With rx
.Global = True
.IgnoreCase = True
.MultiLine = True
.Pattern = "value\W*=\W*""(.*?)""\W*>\W*</td>"
End With

If rx.Test(s) Then
Text1.Text = Text1.Text & "?song="
End If

Set cm = rx.Execute(s)
For i = 0 To cm.Count - 1
MsgBox cm(i) & vbCrLf & cm(i).SubMatches(0)
If i = cm.Count - 1 Then
Text1.Text = Text1.Text & cm(i).SubMatches(0)
Else
Text1.Text = Text1.Text & cm(i).SubMatches(0) & separator
End If
Next
End Sub
[/tt]
You have to add reference to the project Microsoft VBScript Regular Expressions 5.5. After satisfied with the result, you can sure reconstruct your getline function if you feel like it by separating out that part of functionality.
 
Further note:
Those msgboxes are for testing. You sure comment them out for the real.
 
tsuji thank u for u reply . It worked well. Since i am doing a lot of parttern finding in html code could u tell me for each diffrent pattern shoud i just change this line :

.Pattern = "value\W*=\W*""(.*?)""\W*>\W*</td>"

furthermore, could u explain to me a little more about if this can be used for any type of pattern finding in html ? I be happy if u let me know how to construct the pattertn search criaterial.Thanks
 
I can only briefly explain the main idea in it.
[1] ignorecase=true
It is because the value attribute as well as td tag have a loose standard on their cases for the majority of pages out there.
[2] \W*
It is also the loose standard on the "whitespace" between before and after "=" in the name/value pair. Equally true before </td>. But there is a specific ingredients in the space in front of </td> which is per your input. In general, I would not be surprised the content of <td>...</td> is not whitespace. In that case
[tt]\W*</td>[/tt]
should be replaced something like
[tt](.|\W)*</td>[/tt]
(You see I am not pretending full scope of generality.)
[3] (.*?)
Capture the meat of the value of the value attribute. The construction attempts to make it not greedy. The parentheses are meant to build a submatch which would be here the first submatches(0).

These are about the ideas behind a construction I coined. Not much of an ambition. Refine it if you approach the problem full-fledged general.
 
Many thanks for u reply. I have difficulty setting the search pattern criaterial .Could tell me how to find data shown in bold in the following pattern.

Code:
<option value="album.php?show_albums=[B]oldies&JALSA=294c58d2c91828eae51d39707dd7e793;allow=NO;mohim=Download[/B]">Old Songs</option>

and place the bold values in type type . I mean for each bold value i want one and put all the finds in listbox.

the bold part is changing i want to us them to construtt a full url.Thanks
 
Many thanks for u reply. I have difficulty setting the search pattern criaterial .Could tell me how to find data shown in bold in the following pattern.

Code:
<option value="album.php?show_albums=[B]oldies&JALSA=294c58d2c91828eae51d39707dd7e793;allow=NO;mohim=Download[/B]">Old Songs</option>

and place the bold values in end of I mean for each bold value i want one and put all the findings into listbox for further manipulation.

the bold part is changing i want to us them to construtt a full url.Thanks
 
To exactly identify the surrounding (look behind and look ahead) of the targeted data, there involves a trade-off. The more info there, the more precise the locations will be, but with more chances to creating holes in the patterns for the surrending.

In the concrete case here, I would look behind with signature:
[tt]="album.php?show_albums=[/tt]
and look ahead with signature
[tt]">[/tt]
The old songs may be generic or may be not. If it is not, you get yourself into complication. There might be whitespace between " and > but not likely beween " and album. Hence, try this and feel free to refine on it.
[tt]
rx.pattern="""album\.php\?show_albums=(.*?)""\W*>"
[/tt]
(Mind the escape characters . and ?, otherwise it is pretty straightforward.)

Only that > seems to link to html tag. If you feel the album part already quite certainly locates the target, just use this.
[tt]
rx.pattern="""album\.php\?show_albums=(.*?)"""
[/tt]
And the data is retrieved the same.
[tt]
i=0 'in a for loop as shown---here the first match
sdata=rx.execute(s)(i).submatches(0)
[/tt]
 
Many thanks for u reply . I tried both of them but it is kind of unreadable since they put all the urls one after another. could tell me how to output one url per line in text box ? i used both

pattern="""album\.php\?show_albums=(.*?)""\W*>"
pattern="""album\.php\?show_albums=(.*?)"""

and could not find diffrent in output . could tell me what is the diffrence. Also i noticed some ofthe urls are missing by comparing it with actual html but i need to make this readable before i check if the output is correct.Thanks

Code:
    With rx
        .Global = True
        .IgnoreCase = True
        .MultiLine = True
     
        .Pattern = """album\.php\?show_albums=(.*?)"""
        '.Pattern = """album\.php\?show_albums=(.*?)""\W*>"

    End With

 
> but it is kind of unreadable since they put all the urls one after another

I see what you mean. But that the core data which you can start splitting... using classic vb string functions to exact substrings and append to the then assign to listbox. (I hope you know how it can be done.)

If you want to do it in one go, this is how... (But I see line all in bold, so I am not sure what separator is for the sub-data. So I use this as an illustration.)
[tt]
With rx
.Global = True
.IgnoreCase = True
.MultiLine = True
'.Pattern = """album\.php\?show_albums=(.*?)"""
.Pattern = """album\.php\?show_albums=(.*?)&.*?=(.*?);(.*?);(.*?)"""
End With
if rx.test(s) then
set cm=rx.execute(s)
for i=0 to cm.count-1
for j=0 to cm(i).submatches.count-1
sdata=cm(i).submatches(j)
msgbox sdata 'just to show what you get
'assign " & sdata? to some listbox control?
next
next
else
'no match and you have to decide what to do
end if
[/tt]
You'll probably see only one match and then all the submatches will be successively "oldies", "294...e793", "allow=NO", "mohim=Download". I keep the last two "name=value" because I don't know how you use "No" and "Download" as they seem to be quite different in nature in comparison with "oldies" and "294c...e793". But you get the idea how to extract each part of the data.

In any case, you have classic vb string functions at your disposal and can make good use of in conjunction with regexp.
 
Thank u for u reply . i could not run it i got empty textbox. could u tell me how to populate the output to textbox or listbox?

My intention is to get data betweeen show_albums= and " not matter what they are even if they change and put that data to end of and post the result back to textbox or litbox line by line in readable fashon.Thanks

I want data shown in ...
show_albums=.....">Old
 
>how to populate the output to textbox...?
Now, you get me! How do you read the data from a textbox say? It is the same how to write to it...?

>i could not run it i got empty textbox.
As to the empty textbox, I hope you are not saying you get empty matching! else I have not a clue.
 
The only problem with previouse code was the output in textbox had all the fiding matches one after another isntead of one finding per line but i do not know how to make it one perl line to make it readable !!
 
Hate to butt in here, but there are much easier ways of parsing this sort of thing than using a RegExp ... The HTML Object library is your friend
 
strongm thanks man for pointing me to that direction. Since i am doing lots of html pattern checking and extracting data from it, could show me an example of using html object library for example i want extract the bold part of example html and place them in listview. I be happy if show me how that can be then using your method.Thanks



Code:
<tr>
<td align="center" scope="row">1</td>
<td align="center"><INPUT TYPE="Checkbox" NAME="song_id" ONCLICK="reviewSelection();" VALUE="2206"></td>
<td><a href="#" class="song_title" onclick="loadPlayer('[b]2206[/b]');return false;">[b]song title[/b] 
</a> </td>
<td align="center">&nbsp;</td>
<td align="center">&nbsp;</td>
</tr>

code to get html in textbox:

Code:
Private Sub Command1_Click(Index As Integer)

Select Case Index
    Case 0:
        If txtURL.Text <> "" Then
        
    
            RichTextBox1.Text = Inet1.OpenURL(txtURL.Text, icString)
  
        End If
    
    Case 1:
        End
End Select
End Sub
 
Wjhat I'll do is give you some code using the HTML libraries that mostly solves your intial question, and then let you figure out the rest having been given a great big pointer ...

You'll need to a project with a command button. Add a reference to the Microsoft HTML Object Library, and then copy and paste in this code:
Code:
[blue]Option Explicit

Private Sub Command1_Click()
    MsgBox GetSongsfromURL("<your_source_url_goes_here>")
End Sub

Public Function GetSongsfromURL(strURL As String) As String
    Dim myTemp As HTMLDocument
    Dim myDoc As HTMLDocument
    Dim myElement As IHTMLElement
    Dim srcString As String
    
    GetSongsfromURL = "[URL unfurl="true"]http://localhost/new/player.php?song"[/URL]
    Set myTemp = New HTMLDocument
    Set myDoc = myTemp.createDocumentFromUrl(strURL, "")
    Do Until myDoc.readyState = "complete"
        DoEvents
    Loop
    
    For Each myElement In myDoc.getElementsByName("song_id")
        srcString = srcString & myElement.Value & " "
    Next
    If srcString <> "" Then GetSongsfromURL = GetSongsfromURL & "=" & Join(Split(Trim(srcString), " "), ", ")
End Function[/blue]
 
strongm your code all does tha put the in mssage box!! lol.
the url of page is not like what u put
I want to scan html for bold par
Code:
<tr>
<td align="center" scope="row">1</td>
<td align="center"><INPUT TYPE="Checkbox" NAME="song_id" ONCLICK="reviewSelection();" VALUE="2206"></td>
<td><a href="#" class="song_title" onclick="loadPlayer('[b]2206[/b]');return false;">[/b]song title[b] 
</a> </td>
<td align="center">&nbsp;</td>
<td align="center">&nbsp;</td>
</tr>
 
>the url of page is not like what u put

I was hoping that "<your_source_url_goes_here>" might be ebough of a clue for you to drop in the URL you are actually using since - given that I am not a mind reader - I have no idea what URL you are really using

>your code all does tha

Well yes, if you feed don't feed the function with a correct URL (i.e. one that points to a page that contains the info you want) then that's exactly what the example function does. A miniscule rewrite might make it return nothing at all, if that was your preference, or an error message. But these are things I thought I'd leave to you once I'd pointed you in the right direction.

>I want to scan

As I said, my example was for the question you posed in the first post in this thread. That way I get to illustrate a basic solution using the HTML Object Library without simply providing you with a full code solution to your current problem. The idea behind that is that I don't simply write a full code solution for you (this isn't really a code shop), then you should take the example,learn from what it illustrates and then apply what you learn to the actual problem.

 
tsuji Thanks for u nice method finnaly i got your code working . could u help extract Artistname , albumname,songname,artistpic from the following pattern.
. i have difficulty constructing the pattern search criater for them.i want display all of them in listbox.Thanks

Note: in one page there is one album for single artist but mutliple songnames
Note: the bold parts are dynamaic and changing and i want to extract them


html part that holds each song name:


Code:
<img border="0" src="../images/download.gif" width="16" height="16" longdesc="Download [b]songname[/b]" alt="Download [b]songname[/b]"></a>
                            &nbsp;[b]songname[/b]
                               </td>
Html part that holds artist name and album


Code:
::: Singer: <b>[b]artistname[/b]</b> Album <b>[b]albumename[/b]</b>::::</font></td>
Html part tha holdes artist image

Code:
<td>
                  <br>
                  <a href="[URL unfurl="true"]http://localhost/ShowImage.asp?img=http://localhost/[/URL][b]artistpic.jpg[/b]" target=_blank>
                  <img border="0" src="../CdImages/artistpic.jpg" width="180" height="180" longdesc="Click here to Enlarge" alt="Click here to Enlarge" >
                  </a>
                </td>
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top