Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Validating the existence of a file on the web

Status
Not open for further replies.

ChrisRChamberlain

Programmer
Mar 23, 2000
3,392
GB
Hi all

Microsoft Knowledge Base Article - 174524 How To Retrieve and Insert HTML Into Memo Field provides a quick and simple way of downloading any type of file, not just a webpage, without the complication of FTP etc.

Should the file not exist what is then returned to the memo field is the HTML code of the webpage called by the server when the file is not found instead of the expected .zip file or whatever.

What simple ways are there of validating the existence of a file on the web?

TIA

FAQ184-2483 - answering getting answered.​
Chris [pc2]
PDFcommandertm.com
PDFcommandertm.co.uk


 
Chris.

Are the files in questions shown on the Web Page? Meaning is there a reference to the file on the web page. I use this where I load the Web Page in to a string and count the references of "href" in the string. No reference, no file.

Mike Gagnon

If you want to get the best response to a question, please check out FAQ184-2483 first.
 
Mike

Thanks for responding.

Not necessarily - FI, I could advise you by email of a .zip file that might be of interest to you at a certain web address.

Deploying the Microsoft Knowledge Base Article, you would need to use
Code:
[COLOR=blue]lcExt = JUSTEXT([[URL unfurl="true"]http://www.somesite.com/downloads/test.zip[/URL]])[/color]
to get the type of file passed as a parameter to readurl.prg, and adding
Code:
[COLOR=blue]STRTOFILE(HURL.memo,[C:\temp\filename.] + lcExt)[/color]
would pull the file down to your machine, always assuming that it exists at the time you want to download it.

One workaround would be
Code:
[COLOR=blue]IF [</HTML>] $ UPPE(HURL.memo)
[tab]MESSAGEBOX([Error])
ELSE
[tab]STRTOFILE(HURL.memo,[C:\temp\filename.] + lcExt)
ENDI[/color]
which is error trapping after the event as opposed to before. [smile]

FAQ184-2483 - answering getting answered.​
Chris [pc2]
PDFcommandertm.com
PDFcommandertm.co.uk
 
I would say that:

Add Download Ability to Your App in 2 Seconds
faq184-3838

...is a pretty easy way.

boyd.gif

craig1442@mchsi.com
&quot;Whom computers would destroy, they must first drive mad.&quot; - Anon​
 
Craig

Thanks for the suggestion.

What I was looking at was adding a second parameter to the readurl.prg as per the KB example which would be the target file on the local drive and thus there would be no requirement for user intervention. [smile]

FAQ184-2483 - answering getting answered.​
Chris [pc2]
PDFcommandertm.com
PDFcommandertm.co.uk


 
Getting back to this...I think that writing the contents to the memo field and THEN extracting it is an unnecessary step. Also, if I was going to to do it that way I guess I would use the Copy Memo command rather than strtofile() - though I'm not sure it would be that much different, just seems that it is a more appropriate command. Here's what I would do if I was going to forego the memo field...tested on my end, it works good. The httpget.zip exists on my server and the code will save it to your desktop for easy retrieval. Cut-N-Paste the code below into a prg file and run it from VFP to see how it works.
Code:
LOCAL lcFilePathOnDesktop
lcFilePathOnDesktop = GetUsersDesktop() + "httpget.zip"
?HttpGetFile("[URL unfurl="true"]http://www.sweetpotatosoftware.com/httpget.zip",[/URL] lcFilePathOnDesktop)

****************************
Function HttpGetFile(tcUrlName, tcDestinationFile)
****************************

	Declare Integer InternetOpen In wininet.Dll String sAgent, ;
		INTEGER lAccessType, String sProxyName, ;
		STRING sProxyBypass, Integer lFlags

	Declare Integer InternetOpenUrl In wininet.Dll ;
		INTEGER hInternetSession, String sUrl, String sHeaders, ;
		INTEGER lHeadersLength, Integer lFlags, Integer lContext

	Declare Integer InternetReadFile In wininet.Dll Integer hfile, ;
		STRING @sBuffer, Integer lNumberofBytesToRead, Integer @lBytesRead

	Declare short InternetCloseHandle In wininet.Dll Integer hInst

	#Define INTERNET_OPEN_TYPE_PRECONFIG 0
	#Define INTERNET_OPEN_TYPE_DIRECT 1
	#Define INTERNET_OPEN_TYPE_PROXY 3
	#Define SYNCHRONOUS 0
	#Define INTERNET_FLAG_RELOAD 2147483648
	#Define CR Chr(13)


* what application is using Internet services?
	sAgent = "VPF 8.0"

	hInternetSession = InternetOpen(sAgent, INTERNET_OPEN_TYPE_PRECONFIG, ;
		'', '', SYNCHRONOUS)

* debugging line - uncomment to see session handle
* WAIT WINDOW "Internet session handle: " + LTRIM(STR(hInternetSession))

	If hInternetSession = 0
		Wait Window "Internet session cannot be established" Time 2
		Return 0
	Endif

	hUrlFile = InternetOpenUrl(hInternetSession, tcUrlName, '', ;
		0, INTERNET_FLAG_RELOAD, 0)

* debugging line - uncomment to see URL handle
* WAIT WINDOW "URL Handle: " + LTRIM(STR(hUrlFile))

	If hUrlFile = 0
		Wait Window "URL cannot be opened"
		Return 0
	Endif

	lnFileHandle = Fcreate(tcDestinationFile)
	
	If lnFileHandle < 0 && Check for error opening file
		Wait 'Cannot open or create output file' Window Nowait
	Else  && If no error, write to file
		On Error Do HandleError With lnFileHandle, tcDestinationFile && Just in case
		lnTotalBytesWritten = 0
		llFileExists = .T.
		Do While .T.
* set aside a big buffer
			sReadBuffer = Space(32767)
			lBytesRead = 0
			m.OK = InternetReadFile(hUrlFile, @sReadBuffer, ;
				LEN(sReadBuffer), @lBytesRead)

* debugging code - uncomment if necessary
*WAIT WINDOW "hURLFile: " + LTRIM(STR(hURLFile)) + CR + ;
*                  "lBytesRead: " + LTRIM(STR(lBytesRead)) + CR ;
*                  + "m.OK      : " + LTRIM(STR(m.OK))
			If Occurs("404 NOT FOUND", sReadBuffer) > 0
				llFileExists = .F.
				EXIT
			Endif
			
			lnTotalBytesWritten = lnTotalBytesWritten+ Fwrite(lnFileHandle, sReadBuffer)

* error trap - either a read failure or read past eof()
			If m.OK = 0 Or lBytesRead = 0
				Exit
			Endif
		Enddo

* close all the handles we opened
		=InternetCloseHandle(hUrlFile)
		=InternetCloseHandle(hInternetSession)

	Endif
	=Fclose(lnFileHandle) && Close file
	If !llFileExists
		Messagebox("The file " + tcUrlName + " does not exist.",64,"UNABLE TO RETRIEVE FILE")
		Erase (tcDestinationFile)
	Endif
	Clear Dlls InternetOpen, InternetOpenUrl, InternetReadFile, InternetCloseHandle
	Return (lnTotalBytesWritten) && Total Number of Bytes Written
Endfunc

****************************
Procedure HandleError(tnFileHandle, tcFileToErase)
****************************
	=Fclose(tnFileHandle) && Close file
	Erase (tcFileToErase)
	Messagebox("An Error has occured and this program will now shut down.",16,"ERROR RETRIEVING FILE")
	If _vfp.StartMode = 0 && Running in VFP IDE
		Cancel
	Else
		Quit
	Endif
Endproc

***********************************************
*!* Not necessary for the download
*!* Only for the example so file will be on your desktop
FUNCTION GetUsersDesktop()
***********************************************
DECLARE SHORT SHGetFolderPath IN SHFolder.dll ;
    INTEGER hwndOwner, INTEGER nFolder, INTEGER hToken, ;
    INTEGER dwFlags, STRING @pszPath
    
#DEFINE CSIDL_DESKTOP 0x0000

LOCAL cFolderPath, cDesktopPath
cFolderPath = space(255)

SHGetFolderPath(0, CSIDL_DESKTOP, 0, 0, @cFolderPath)

cDesktopPath = Alltrim(cFolderPath)

cDesktopPath = SubStr(cDesktopPath,1, Len(cDesktopPath)-1)

CLEAR DLLS SHGetFolderPath

RETURN (ADDBS(cDesktopPath))

ENDFUNC


I've got code I wrote awhile back that will allow you to get the size of the file that is up on the HTTP server too (not an extremely easy task), so progress indication or verification of downloaded file size could be done as well (though signing the file would work bettter if you are the one that is creating the files up on the server and you need your app to verify file integrity after the download).

boyd.gif

 
Craig

Interesting alternative
Code:
If Occurs("404 NOT FOUND", sReadBuffer) > 0
Unfortunately that will fail on both case and content - you can test it by adding '.zip' to the URL of this page in your browser and seeing what happens.

This is why I check to see if the file in the memo field is a web page - if it is, the file does not exist.

It also means there is no requirement to verify that the file exists - not the tidiest approach but it works. [wink]


FAQ184-2483 - answering getting answered.​
Chris [pc2]
PDFcommandertm.com
PDFcommandertm.co.uk


 
Case is easy enough to remedy using by using UPPER. As to content, you are right... the webmaster may even have created their own error page to handle the 404 so it could be anything. Hmmm...what to do...

Disadvantages to your approach of checking for "</HTML>" that I am trying to solve:

HTML pages or any file containing the </HTML> closing tag would be excluded as a possible file type to download

Entire file must be downloaded (in this case the error page) before the fact that the file doesn't exist is caught and do loop is exited.

I'll keep looking and thinking, there has to be a reliable way to check if a file exists that doesn't have the drawbacks of either of these methods. At least the code we have now doesn't create a useless table and memo field (unless the memo had some advantages that I'm not aware of).

boyd.gif

 
On further review it appears that there is no reliable way...quite simply webservers can serve up anything when a file is requested and it doesn't exist.

Checking for the existance of a substring is flawed, but is perhaps the best that can be done. Just hope that the webmaster wasn't given to malformed html such as an extra space in the tag:

</html >

I would probably check for the existence of "<html" in sReadBuffer on the first pass through the Do Loop using the ATC() function.

Code:
IF ATC("<html", sReadBuffer) > 0
     llFileExists = .F.
     Exit
ENDIF

boyd.gif

 
Craig

For my own purposes, the option to download a web page is not required.

I agree that what I have is
Chris said:
...error trapping after the event as opposed to before
hence the question.


In real terms the error checking is quick as the error page received is small - I also started down the '404' route and quickly gave it up as impractical.

I also use a cursor as apposed to a table for housekeeping purposes.

Your comment about the malformed tag is valid and space(s) after 'html' still produce a valid page.

FAQ184-2483 - answering getting answered.​
Chris [pc2]
PDFcommandertm.com
PDFcommandertm.co.uk


 
Craig... I'd be interested in the code you have to return the size of a file on a HTTP server...

Thanks in advance.

Andy Snyder
SnyAc Software Services
 
Yep... I've seen that FAQ Chris... thanks for the suggestion there... and I have FTP routines already. I'm interested in how to get the file size without using FTP.

Andy Snyder
SnyAc Software Services
 
Andy

From memory - unable to verify just now - the size of the file is returned in the list.
Code:
       INSERT INTO WEB_FILES (filename, date, time, size)     ;
            VALUES                                    ;
            (SUBSTR(lcString,lnFirst + 1,lnEnd - lnFirst - 1)    ,;
            ldDate                                                ,;
            lcTime                                                ,;
            lcSize)

FAQ184-2483 - answering getting answered.​
Chris [pc2]
PDFcommandertm.com
PDFcommandertm.co.uk


 
SnyAc,

Up to my neck in alligators for the moment...will be back and post a working example of the code regarding file size using HTTP. I will probably start a new thread in this forum for it as it doesn't directly pertain to this threads content, so look for a new thread sometime today.

boyd.gif

 
On further review it appears that there is no reliable way...quite simply webservers can serve up anything when a file is requested and it doesn't exist.

It is true that a web server can be configured to show anything when a file does not exist... however, the error is still returned as 404, and you should be able to detect this error in the first line of the header.

I tested these websites:
Yahoo: HTTP/1.1 404

MSN: HTTP/1.1 404 Not Found

HTTP/1.1 404 Not Found

HTTP/1.1 404 Not Found

So, it seems pretty consistent (I was trying to find one responding in HTTP/1.0 but couldn't)... you just have to see the header.



- Bill

Get the best answers to your questions -- See FAQ481-4875.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top