Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Screen Scrape Almost there need help!

Status
Not open for further replies.

Stelring

ISP
Jul 19, 2007
3
0
0
US
Okay, I'm trying to scrape information from a Motorola product and am having trouble posting to authenticate to it and then moving on from there.

I've modified some sample screen scrape code and POST examples, but I just don't get it. So here is the information from the device and then the code I'm using.

The URL of the login page is


The URL I end up at after logging in is


I'm almost there, but I don't quite have the code working.

Here is the page source of the login page

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "<html xmlns=" lang="en" xml:lang="en">
<head>
<link rel="stylesheet" type="text/css" href="_canopy.css" media="screen" />
<link rel="stylesheet" type="text/css" href="_canopypda.css" media="handheld" />
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta http-equiv="Content-Style-Type" content="text/css" />
<title>Log In</title>
</head>

<body>

<!-- Insert Logo Here -->
<!-- Use Argument -->
<div class='logo'><img class='logo' src='_canopy.jpg' alt='Logo' /></div>

<table id="frame">
<tr><td class="menu">
<div id="menu">
<div id="loginmenu" >
<a class="menu" href="main.cgi?mac_esn=0a003ef22754">Back to Main Page</a>
</div>
</div>

</td>

<td id="tabandpage">
<div id="page">

<h1>Log In</h1>
<h2>5.7GHz - Subscriber Module - 0a-00-3e-f2-27-54 </h2>


<div class="section">
<h2 class="sectiontitle">Log In</h2>
<p>Please login into the system</p>
<form action="login.cgi" method="post">
<p>
Username: <input type="text" name="CanopyUsername" /><br/>
Password: <input type="password" name="CanopyPassword" /><br/>
</p>
<div class="buttons">
<input type="submit" value="Ok" name="ok" />
<input type="reset" value="Cancel" />
</div>

</form>
</div>

</div>

</td>
</tr>
</table>

</body>
</html>

CODE

<%@ Page Language="C#" AutoEventWireup="true" CodeFile="Default.aspx.cs" Inherits="_Default" %>
<%@ Import Namespace="System.Net" %>
<%@ Import Namespace="System.IO" %>

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "
<html xmlns=" >
<head runat="server">
<title>Untitled Page</title>
</head>
<script language="C#" runat="server">

void Page_Load(Object Src, EventArgs E)
{
String text = readHtmlPage(" }

private String readHtmlPage(string url)
{
String result = "";
String strPost = "CanopyUsername=User&CanopyPassword=password&Ok=ok";
StreamWriter myWriter = null;

HttpWebRequest objRequest = (HttpWebRequest)WebRequest.Create(url);
objRequest.Method = "POST";
objRequest.ContentLength = strPost.Length;
objRequest.ContentType = "application/x-
try
{
myWriter = new StreamWriter(objRequest.GetRequestStream());
myWriter.Write(strPost);
}
catch (Exception e)
{
return e.Message;
}
finally {
myWriter.Close();
}

HttpWebResponse objResponse = (HttpWebResponse)objRequest.GetResponse();
using (StreamReader sr =
new StreamReader(objResponse.GetResponseStream()) )
{
result = sr.ReadToEnd();

// Close and clean up the StreamReader
sr.Close();
}
return result;
}
</script>


<body>
<form id="form1" runat="server">
<div>

</div>
</form>
</body>
</html>
 
the issue you're going to have is the session, i seen quite a few requests about screen scraping on this forum, with pretty much the exact same issue.

When you login to a site, it gives you the unique session, ordinarily this session will die after 20 mins of activity, and you will then be issued with a new a one.

this session is intended to prevent what you are trying to do. You can't inherintly hold a session through screen scraping, as you will only see the html being returned.
 
Yeah, I gathered that because the URL after logging in to the device has a &Session=xxxx in it.

Isn't this handled by something like a CookieContainer?

Or is there a different Session object?
 
I am having the exact same problem as you. I have searched all over the net and could not find anything to resolve this issue. If you find a solution could you please post it here and I will do the same.
Thanks
Matt
 
I'm not exactly familiar with what you're trying to do here. You log in using their webform by submitting a post.

Where does the problem exist after that?

 
Well, I think it's after the post since that seems to work ok. Then when I stream the response it gives me auth denied.

But I used Wireshark to trace it from my browser and it does return auth denied, but then continues on from that.

So maybe I have to learn how to keep the session state and just ignore the auth denied and move to the next request...
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top