creating a timeout

wduty · Jul 24, 2000

This is a really basic question so for those of you with patience I appreciate any input. I have written a kind of primitive search engine for urls with three letter domain names. It runs a loop which creates all possible alphabetic combinations of three letters and calls a URL with that name then parses out the meta-keywords from the HTML (if they exist). It then puts the keywords in a database along with the URL. This database I can then query from an asp page. And I get all the entries containing the keywords I enter. It works fine and is chugging along on my dos prompt. However, although I've hammered out all the exceptions so that the program is running, some URLs are slow and it tends to get stuck, sometimes for over a minute waiting for a response before going to the next URL. My question is, how do I create a timer so that the attempted read to the URL skips to the next one after say 15 seconds. I've used simple threading for applets but I'm not sure how I would do this for something like this. Here's where the code where this should happen: ... try     {     URL u = new URL(str);     InputStream input = u.openStream();     InputStreamReader reader = new InputStreamReader(input);     BufferedReader buffreader = new BufferedReader(reader);     while ((str=buffreader.readLine()) != null)         {         this.HTML += str;         if (linecounter>5000) break;         linecounter++;         }     } catch(UnknownHostException e)     {String unknownhosterror = str+": UNKNOWN HOST EXECPTION:  * * * S K I P P I N G   U R L * * *";}     catch(MalformedURLException e) {System.err.println(e);} catch(IOException e) {System.err.println(e);} ... I'm not sure if the delay is occuring when it opens the input stream or when it actually reads the lines. Either way any help appreciated. --Will Duty <a href=mailto:wduty@radicalfringe.com>wduty@radicalfringe.com</a> <a href= > </a>

carpeliam · Jul 24, 2000

well you get the current date in milliseconds before going into a loop... then only continue the process the loop if the new current date (the date in milliseconds of that particular loop) is less than 15 seconds * 1000. Liam Morley <a href=mailto:lmorley@wpi.edu>lmorley@wpi.edu</a> <a href=

http://www.wpi.edu/~lmorley/>

] :: imotic :: website :: [</a> "light the deep, and bring silence to the world. 
light the world, and bring depth to the silence.

wduty · Jul 24, 2000

That is exactly the kind of obviousness which I have to kick myself for missing. I've even used this technique (before my lobotomy) in other situations! Thank you Mr. Morley! --Will Duty <a href=mailto:wduty@radicalfringe.com>wduty@radicalfringe.com</a> <a href= > </a>

carpeliam · Jul 24, 2000

hey, any time

) Liam Morley <a href=mailto:lmorley@wpi.edu>lmorley@wpi.edu</a> <a href=

http://www.wpi.edu/~lmorley/>

] :: imotic :: website :: [</a> "light the deep, and bring silence to the world. 
light the world, and bring depth to the silence.

palbano · Jul 24, 2000

Dear wduty, You might also consider going down to the Socket layer in Java and setting the timeout value, i.e.:   Socket.setSoTimeout public synchronized void setSoTimeout(int timeout) throws SocketException Enable/disable SO_TIMEOUT with the specified timeout, in milliseconds. With this option set to a non-zero timeout, a read() call on the InputStream associated with this Socket will block for only this amount of time. If the timeout expires, a java.io.InterruptedIOException is raised, though the Socket is still valid. The option must be enabled prior to entering the blocking operation to have effect. The timeout must be > 0. A timeout of zero is interpreted as an infinite timeout. Since: JDK 1.1 Good luck -pete

wduty · Jul 24, 2000

But you mean using a socket object (instead of a url object like I'm doing now)? I can do this but I'll have to wait until this thing is done reading the entire three letter permutation set. It started at "<A HREF="

http://www.aaa.com&quot"

TARGET="_new">

http://www.aaa.com&quot</A>;

this afternoon and it's currently on "<A HREF="

http://www.evn.com&quot"

TARGET="_new">

http://www.evn.com&quot</A>;

(which is about 3300 sites read). It should get to "<A HREF="

http://www.zzz.com&quot"

TARGET="_new">

http://www.zzz.com&quot</A>;

by sometime late tonight. Thing is, it's only reading the HTML, so datawise there's very little actual data being transferred. I'm assumming I could read many urls (or sockets) simultaneously but I'm wondering where the limit is and how to go about it so that I can continue working on my computer without it interfering. I'm thinking it would be possible to do by having several concurrent threads each with an assigned batch of urls. Would this be possible? --Will Duty <a href=mailto:wduty@radicalfringe.com>wduty@radicalfringe.com</a> <a href= > </a>

carpeliam · Jul 24, 2000

Well my understanding is that under HTTP1.1, you can only visit two sites under the same domain simultaneously... but I haven't heard of a limit on the number of domains. If you look back, you might be able to find Fenris' thread which was all about using multiple threads of the same class to read files simultaneously, I'm not sure if that would help.. it concerned an array of files, which you don't necessarily have here, but it might be interesting nonetheless. Liam Morley <a href=mailto:lmorley@wpi.edu>lmorley@wpi.edu</a> <a href=

http://www.wpi.edu/~lmorley/>

] :: imotic :: website :: [</a> "light the deep, and bring silence to the world. 
light the world, and bring depth to the silence.

wduty · Jul 24, 2000

I'll check it out. My impression from seeing firsthand how long it takes to read urls in a linear way suggests that sites like altavista and excite must use some crazy threading scheme and probably multiple machines to run their robot programs. It is interesting to see how the gathering of search-engine information works even at this ultra-simple level. Another thing I would never have noticed is how much the tendency to memorable naming schemes is reflected in existing url names. I can see as the prompt reads off urls that there are a very large number of "unknownHostException" returns towards the end of an alphabetic cycle (where the names are all "*zq", "*zr", "*zs" etc.) vs how responsive urls are at the beginning where names include more vowels. --Will Duty <a href=mailto:wduty@radicalfringe.com>wduty@radicalfringe.com</a> <a href= > </a>

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

creating a timeout

wduty

Programmer

carpeliam

Programmer

wduty

Programmer

carpeliam

Programmer

palbano

Programmer

wduty

Programmer

carpeliam

Programmer

wduty

Programmer

Similar threads

Part and Inventory Search

Sponsor