Hello,
I have been successful at scraping HTTP pages that require forms using HttpWebRequest and HttpWebResponse, but I have hit a brick wall when trying to do the same across a SSL page. I have read many blogs and KB's online, but I have not found a solution that will allow me to get the correct response page to my form request.
Attached is my code:
My problem is not due to any exception occurring. My problem is that for some reason, the website may not be interpreting my credentials. The HttpWebResponse is only a forwarding to a "System is down. Please try again later" page (it still gives a 200 OK message). The page containing the form only has one variable listed, which I already have in my code.
Does anyone know how to fix my problem? Or if anyone has some sample code of a SSL form post site they have scraped before, I would greatly appreciate it.
Thank you,
Nick Ruiz
I have been successful at scraping HTTP pages that require forms using HttpWebRequest and HttpWebResponse, but I have hit a brick wall when trying to do the same across a SSL page. I have read many blogs and KB's online, but I have not found a solution that will allow me to get the correct response page to my form request.
Attached is my code:
Code:
Public Function DownloadInformation(ByVal theDate As Date) As Boolean
Dim link As String = "[URL unfurl="true"]https://www.firstenergycorp.com/supplierservices/forms/requestedDate.do"[/URL]
'Dim cache As CredentialCache
Dim creds As NetworkCredential
Dim b As Byte()
Dim s As System.IO.Stream
Dim params As String
'Dim params2 As String
' Set up the request
Dim myWebRequest As HttpWebRequest = WebRequest.Create(link)
Debug.WriteLine(link)
' Post variables
params = String.Format("reqDate={0}&submit=Go", theDate.ToString("MM/dd/yyyy"))
b = System.Text.Encoding.ASCII.GetBytes(params)
creds = New NetworkCredential("XXXXX", "YYYYYY")
'cache = New CredentialCache()
'cache.Add(New Uri("[URL unfurl="true"]https://www.firstenergycorp.com/supplierservices/forms/login.jsp"),[/URL] "Basic", creds)
With myWebRequest
' Set the credential property to username/password
.Credentials = creds
.CookieContainer = New CookieContainer()
' Set the agent name to reflect the nature of this class
'.UserAgent = "Connectiv Load Profiles"
.UserAgent = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.1.4322; .NET CLR 2.0.50727; 'InfoPath.1)"
' Set the keep-alive to true, to persist the connection once we've established it over TCP
.KeepAlive = True
' Disable caching.
.Headers.Add("Pragma", "no-cache")
' Set the timeout
.Timeout = CInt(15000)
' Set the method of the call to POST.
.Method = "POST"
' Set the content type
.ContentType = "application/x-[URL unfurl="true"]www-form-urlencoded"[/URL]
' Length of the formm post
.ContentLength = b.Length
.AllowWriteStreamBuffering = True
.Accept = "image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/msword, application/vnd.ms-excel, 'application/vnd.ms-powerpoint, application/x-shockwave-flash, */*"
.Referer = "[URL unfurl="true"]https://www.firstenergycorp.com/supplierservices/forms/login.do"[/URL]
.Headers.Add("Accept-Language", "en-us")
' .Headers.Remove("Expect")
End With
' Send the form
s = myWebRequest.GetRequestStream()
s.Write(b, 0, b.Length)
' Receive the response
Dim myWebResponse As HttpWebResponse = DirectCast(myWebRequest.GetResponse(), HttpWebResponse)
' Check the return code to make sure we can continue
If myWebResponse.StatusCode = HttpStatusCode.Redirect Then
Debug.WriteLine("Redirect")
Return False
ElseIf myWebResponse.StatusCode <> HttpStatusCode.OK Then
Debug.WriteLine("Stopped here.")
Return False
End If
Dim myStringReader As New StreamReader(myWebResponse.GetResponseStream())
Dim strResult As String = myStringReader.ReadToEnd()
Debug.WriteLine(strResult)
myStringReader.Close()
myWebResponse.Close()
Return True
End Function
My problem is not due to any exception occurring. My problem is that for some reason, the website may not be interpreting my credentials. The HttpWebResponse is only a forwarding to a "System is down. Please try again later" page (it still gives a 200 OK message). The page containing the form only has one variable listed, which I already have in my code.
Does anyone know how to fix my problem? Or if anyone has some sample code of a SSL form post site they have scraped before, I would greatly appreciate it.
Thank you,
Nick Ruiz