Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

How can I tell if curl is calling my script? 2

Status
Not open for further replies.

southbeach

Programmer
Jan 22, 2008
879
US
OK - How can I tell if a savvy user is calling my script via curl?

For example:
Code:
user #: curl [URL unfurl="true"]http://www.mydomain.com/index.php[/URL]

Is there any way PHP can tell the script is been invoked by other than a browser?


--
SouthBeach
The good thing about not knowing is the opportunity to learn - Yours truly, 2008.
 
Hi

No way.

If the visitor allows his tool to introduce itself with its real name, then [tt]$_SERVER['HTTP_USER_AGENT'][/tt] can give you a hint. But if the visitor specifies a different name, then all can not do much :
Code:
curl -A 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:49.0) Gecko/20100101 Firefox/49.0' [URL unfurl="true"]http://www.mydomain.com/index.php[/URL]


Feherke.
feherke.ga
 
Indeed, and that's the reason the first thing to do is establishing a session, have a secure session management (prevent session hijacking) and thus know who's "knocking".
If there is no seesion active, reroute to authentication or answer with a 401 status code.

And at least the first request needs authentication of a user. Many APIs work stateless and thus request an authoritzation with every request, ie via the request body or as separate authentication headers. Basic Authis okay, if https is used, for example.

The typical would be something only landing on a homepage with no content needing any authentication, but if your whole site should only be used by authenticated users, then this URI like any other request can be answered with a 401 and no content.

Bye, Olaf.

PS: Even, if you could make a distinction about the used client side software, a hack can also be started from within a browser. The concern should never be which software makes a request, but whether it is legitimate or not.
 
You guys never cease to amaze me ... Great and fast answers.

I have
Code:
if ($_SERVER["SERVER_PORT"] != 443) {
    $redir = "Location: [URL unfurl="true"]https://"[/URL] . $_SERVER['HTTP_HOST'] . $_SERVER['PHP_SELF'];
    header($redir);
    exit();
}
to make sure that https is used and I do have, as suggested, session in place where as if user is not logged in, proper page is loaded (log on form).

Thanks!


--
SouthBeach
The good thing about not knowing is the opportunity to learn - Yours truly, 2008.
 
Hi

Honestly I can not see how such restriction would stop someone to access your site with cURL. Requiring HTTPS only stops [tt]telnet[/tt] and [tt]netcat[/tt] users, but cURL is able to handle both HTTPS and cookies.

Anyway, if somebody accesses your site with the command line [tt]curl[/tt] tool, they are probably just curious, not malefic.

If more control is needed, probably a dedicated tool/library will be used. For example PHP with the cURL library will be much harder to stop than a command line [tt]curl[/tt]. ( My favorite is twill, although is old, unmaintained and buggy, but still strong enough for many cases. )

And even if you involve AJAX to harden the crawling, a web application testing tool like Selenium will still work its way through it.


Feherke.
feherke.ga
 
What you show is only redirecting any http request through https. HTTPS alone won't stop anybody, it just ensures safe transfer of any data, body, header with each request. As I already said you have to apply authentication for any request of things, which need legitimation, like a user profile. HTTPS alone doesn't do that. You say you have a "session in place where as if user is not logged in, proper page is loaded (log on form)." If I trust this is also done, then you have a rather complete setup in that regard.

HTTPS alone isn't making your site safe. HTTPS also is only essential if your authentication mechanism is basic authentication, as without HTTP this means the password of a user is not transfer encrypted. It indeed is also a very good measure to even go to https before presenting the login form, as you redirect any http request through https, this would happen. But https is not essential, you can also have a safe unencrypted authentication process with OAuth, which just once for the first account setup or for any process yielding an access token should have a safe transfer. Since OAuth requests are signed with a secret only transferred once and known to both sides, signed requests can be verified and no man in the middle can know the secret from knowing the signature, this can't sign requests and can get through with a request.

Anyway, https allows you to use the simpler basic authentication. Without it you still don't know, who's requesting. Anyway a typical PHP login with user/passwrod and stored password hash also is OK done via https. The essential part is, that every request towards protected resources in the first place needs to know, which access privilege is needed, ie which account has the right to get the full response to the request. And to determine this legitimacy, you typically need to know what resources are in the response and whether they need authentication and which authentication. This is easier, if your site is built with the MVC pattern, where every request is routed through a controller, then you have one central point of entrance, no matter what is requested, and can apply the "doorman" logic there, this http->https redirect only is a good start.

The better place to do the http->https redirection is a URL rewrite .htaccess file like:
Code:
RewriteEngine On
RewriteCond %{HTTPS} off
RewriteRule ^(.*)$ [URL unfurl="true"]https://%{HTTP_HOST}%{REQUEST_URI}[/URL] [L,R=301]

This'll be executed by Apache, Nginx or such webservers using this htaccess standard.

The "doorman logic" very much depends on your site and of course only starts with knowing who makes the request. As said and to summarize, you next need to knwo what is accessed and whether the authenticated account has the privilege to see that resource. That may include any metadata about who owns which data, any other state like hitting a request limit, etc. There is no general script doing that, it very much depends on what your site is about.

And then of course it's essential you don't only add the 401 header to the response, but leave the response body empty, of course, or you finally redirect to login via php header, but do so before any output was done. One simple way of handling late decisions is to buffer any output via ob-functions (ob_start and ob_flush_end or ob_clean_end). This'll also redirect curl requests or any requests to the login as result. The doorman logic may also not only redirect to login, but answer with just a status code header, especially under the circumstance of being flooded with requests (DDOS attack), though that best already is handled by your hoster.

It's not important, as feherke said, whether access then is coming from curl or a browser, no request will get anything it shouldn't be able to get or - even more important - will be able to submit anything like a payment. The importance of https is for transferring things, which should remain secret to anybody on the transfer line, but it does decrypt for the client, not only for valid clients, for any client, in itself. https makes no authentication, it just handles the encryption for transfer and decryption after transfer, so ie you see the submitted user password in cleartext server side and client side, your html output is seen as is. https lets all others on the way between client and server only see encrypted http packages.

Bye, Olaf.
 
Thanks for the htaccess rule - I will go that route.

The reason why the site needs to have https at all time is due to a "credit card processing company" policy; they would not allow us to process credit card charges unless this was the case.

The site is not an eCommerce site but it is a B2B site and money is paid on line (Cash Receipts / Cash Disbursements).



--
SouthBeach
The good thing about not knowing is the opportunity to learn - Yours truly, 2008.
 
The reason why the site needs to have https at all time is due to a "credit card processing company" policy; they would not allow us to process credit card charges unless this was the case.

Just as they should!

They should ALSO be 'insisting' that your IT systems meet PCI DSS compliance requirements if you ARE processing and/or storing credit card data on your own web server and office systems.

Chris.

Indifference will be the downfall of mankind, but who cares?
Time flies like an arrow, however, fruit flies like a banana.

Never mind this jesus character, stars had to die for me to live.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top