Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Tracing a web site

Status
Not open for further replies.

arlequin

Programmer
Sep 21, 1999
232
UY
Hi!!<br>Is there any possibility to navigate thru all the pages into a web site?<br><br>perhaps an utility or porgramme that reports all the href tags or something..<br><br>I will be wonderful if someone could give me a tip! <p>Arlequin<br><a href=mailto:arlequin@montevideo.com.uy>arlequin@montevideo.com.uy</a><br><a href= </a><br>
 
Visual Interdev will map your site for you. <p>nick bulka<br><a href=mailto: > </a><br><a href= > </a><br>
 
great!..<br><br>I use that Microsoft IDE but I want to trace a site beginning from an URL...<br><br>I am clear?<br>I think I am not being clear..&nbsp;&nbsp;:(<br><br>I want to find out the structure of a site from an URL..<br>:)<br><br>Hope I am clear.. <p>Arlequin<br><a href=mailto:arlequin@montevideo.com.uy>arlequin@montevideo.com.uy</a><br><a href= </a><br>
 
Try use a utility called WebZip,<br>&nbsp;<br>it downloads an entire site. Hope this helps.<br><br>Regards<br><br>Lars<br>
 
Hi Arlequin.<br><br>If you mean what I think you mean, you can build a site structure using visual interdev, or go live cyber studio.&nbsp;&nbsp;you can also use Microsoft site server which gives a cyberbolic view and checks bad links.&nbsp;&nbsp;You can download some freeware which does the same thing from <A HREF=" TARGET="_new"> also provides a link checker which maps out your site structure.
 
The methods mentioned above are probably better than anything I could suggest. However, I'm working on something like this right now in java with a very manual approach. The steps I'm using are:<br><br>1. Open the URL as a stream and get the HTML content (as a string).<br>2. Loop through this string and find every instance of &quot;href=&quot; and &quot;action=&quot;, &quot;location=&quot;, and &quot;src=&quot; after &quot;&lt;frame &quot; then find the string between the two next double quotes (this would be a link). <br>3. For every instance make sure that it is a link within the site - i.e. does not contain &quot;<A HREF=" TARGET="_new"> &quot;mailto:&quot;, &quot;javascript:&quot; etc.<br>4. append the links to the URL base and repeat the process recursively to make sure that you don't repeat a URL call.<br>5. After a while this will get most of the links that are in the content of the site.<br><br>The problem with this kind of &quot;from scratch&quot; approach is that you really have to hammer out a lot of kinks and that's time consuming. However, I've also found that going out and getting some software someone else wrote always seems to take more time than you thought it would (you have to find it, it never does exactly what you want, you have to install some component and it won't register properly or whatever).&nbsp;&nbsp;If you want to have something you could really tailor to do something specific later, this could be a valid approach. If you are interested at all, email me at <A HREF="mailto:wduty@radicalfringe.com">wduty@radicalfringe.com</A>. <p>--Will Duty<br><a href=mailto:wduty@radicalfringe.com>wduty@radicalfringe.com</a><br><a href= > </a><br>
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top