Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Data Mining - Data Scraping

Status
Not open for further replies.

LPV

Technical User
Jun 18, 2001
12
0
0
US
I need to write a VB.NET program that will automatically download a text product from the Internet and search for specific words in the text.

I have done searching for data mining and data scraping but have not had much luck.

Can anyone point me in the right direction? It will be running on a Windows XP system.

Thanks.

J. Engel
 
This is a very simple process and is best done using the HTTPRequest/Response Objects. With this you can make a request and process the results, below is a part of an asynch request.

Code:
Public Function doBOD(ByVal D As Date, ByVal Plant As String) As Boolean
        Try
            Dim CurrDate As Date = DateAdd(DateInterval.Day, -1, FromDate.Value())
            Dim SDate As String = String.Format("{0:D2}-{1:D2}-{2:D2}", CurrDate.Year(), CurrDate.Month(), CurrDate.Day())
            Dim BODRequest As HttpWebRequest = WebRequest.Create(String.Format(URI_BOD, Plant, SDate))
            Dim BODState As RequestState = New RequestState()
            BODState.Request = BODRequest
            Dim IAResult As IAsyncResult = CType(BODRequest.BeginGetResponse(New AsyncCallback(AddressOf ResponseCallback), BODState), IAsyncResult)
            ThreadPool.RegisterWaitForSingleObject(IAResult.AsyncWaitHandle, New WaitOrTimerCallback(AddressOf TimeOutCallback), BODRequest, DefaultTimeOut, False)
            allDone.WaitOne()
            If (BODState.Response IsNot Nothing) Then BODState.Response.Close()
            Return True
        Catch ex As WebException
            MsgBox(ex.Message(), MsgBoxStyle.Exclamation Or MsgBoxStyle.OkOnly, ex.Status)
        End Try
        Return False
    End Function

    Private Shared Sub BODReadCallback(ByVal IAResult As IAsyncResult)
        Try
            Dim ReqState As RequestState = CType(IAResult.AsyncState, RequestState)
            Dim ResStream As Stream = ReqState.ResponseStream
            Dim Read As Integer = ResStream.EndRead(IAResult)
            If (Read > 0) Then
                ReqState.RequestData.Append(Encoding.ASCII.GetString(ReqState.BufferRead, 0, Read))
                IAResult = ResStream.BeginRead(ReqState.BufferRead, 0, 2048, New AsyncCallback(AddressOf BODReadCallback), ReqState)
                Exit Sub
            Else
                ResStream.Close()
            End If
        Catch ex As WebException
            MsgBox(ex.Message(), MsgBoxStyle.Exclamation Or MsgBoxStyle.OkOnly, ex.Status)
        End Try
        allDone.Set()
    End Sub

Private Shared Sub ResponseCallback(ByVal IAResult As IAsyncResult)
        Try
            Dim ReqState As RequestState = CType(IAResult.AsyncState, RequestState)
            Dim Request As HttpWebRequest = ReqState.Request
            ReqState.Response = CType(Request.EndGetResponse(IAResult), HttpWebResponse)
            Dim ResStream As Stream = ReqState.Response.GetResponseStream()
            ReqState.ResponseStream = ResStream
            IAResult = ResStream.BeginRead(ReqState.BufferRead, 0, 2048, New AsyncCallback(AddressOf BODReadCallback), ReqState)
            Exit Sub
        Catch ex As WebException
            MsgBox(ex.Message(), MsgBoxStyle.Exclamation Or MsgBoxStyle.OkOnly, ex.Status)
        End Try
        allDone.Set()
    End Sub

Also, depending on how often you need to do this you may want to use a service to control the process, but remember that there us no UI on a service.

Hope this helps.

--
Woogoo
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top