Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

How do I keep my CGI scripts safe when receiving form data?

The Basics

How do I keep my CGI scripts safe when receiving form data?

by  Kirsle  Posted    (Edited  )
When writing CGI scripts, it's almost always a necessity for the script to receive variables ("form data"). This can be in the form of POSTed parameters (common with HTML forms that are submitted to CGI scripts), or they can get GET parameters in the query string (i.e. "index.cgi?[color blue]variable=value[/color]").

When writing CGI scripts which receive variables, however, it is important to maintain reasonable security practices when dealing with these variables. There are several common mistakes that amateur CGI programmers run into that can cause a LOT of damage when they come across a more educated hacker who wishes to cause some harm.

In this FAQ, I'm going to list some of the most common mistakes that amateur CGI programmers make, and offer some solutions to increase security.

But first: [color red]Rule #1 in defensive programming: never never NEVER trust any data you got from your users![/color] The following examples all boil down to not trusting your user input. Always filter everything you get from your users before you use it for anything else.

Reading and Writing Files

The first issue I'm going to talk about is with taking parameters and using them to read and write to files. This is a common idea that many content management systems use. Instead of having multiple files on your server, you simply link to things such as "index.cgi?page=index" or "index.cgi?page=about", where there are counterpart files named index.txt and about.txt which contain that page's actual data.

Here is a simple code that a programmer might write to get a real basic content management system going:

Code:
#!/usr/bin/perl -w

use CGI;
my $q = new CGI;

my $page = $q->param ("page");

open (PAGE, "./private/pages/$page\.txt");
my @data = <PAGE>;
close (PAGE);
chomp @data;

print $q->header;
print join ("\n",@data);

Now, that code is all fine and dandy. It reads pages from the directory "./private/pages" based on the query string parameter "page". But here's the problem:

In directories, a single . means "current directory" and two dots means "up one directory" or "parent directory". Nobody says that $page has to be JUST a file name. $page can easily be a whole file path structure.

For example, if the hacker calls on index.cgi?page=[color red]../users/admin[/color], then they are no longer reading a file from the "pages" folder, but from somewhere else on your server!

To put that example back into code, your CGI script is therefore doing this:

Code:
open (PAGE, "./private/pages/../users/admin.txt");

You can translate that ".." into "up one directory", so the file actually being accessed is [color red]./private/users/admin.txt[/color]! And if you keep your site's private user data in "./private/users", this poses a MAJOR security threat! The hacker can then read your private user data, potentially even getting your password!

Now, this example assumes you're making a more sophisticated content management system that supports users. But the hacker can do other things with it too. They can potentially use the same trick to read ANY file on your whole server!

So what can we do about this? The simplest way is to remove any characters from $page which could be abused to change directories around. So, we'll want to remove the forward slash "/" and the dots "." (you may want to keep the dots if you're going to have pages like "user.login.txt", since dots can't be abused without slashes anyway).

But we'll also want to remove the backward slashes "\". Why? The backslash has special meaning in Perl. Every character can be represented by a backslash followed by a few numbers. So they could put in a \### which would be interpreted as a forward slash, and they could then cheat and browse other directories and we've defeated the purpose in trying to stop it.

So: remove /, \, and .

We can do that with a simple regular expression. With these revisions, our code now looks like this:

Code:
#!/usr/bin/perl -w

use CGI;
my $q = new CGI;

my $page = $q->param ("page");

[color blue]# Remove potential abuse in the page name.
$page =~ s~(/|\.|\\)~~g; # remove /, ., and \[/color]

open (PAGE, "./private/pages/$page\.txt");
my @data = <PAGE>;
close (PAGE);
chomp @data;

print $q->header;
print join ("\n",@data);

This way, the hacker isn't allowed to trick the script into opening files in places that it shouldn't be. Problem solved.

Note: on the other hand, if your script is writing files based on user input, be especially careful that you're restricting what files can be opened. Not only is their ability to read files bad, but being able to write them is even worse! A good hacker could even write an entire Perl script this way and execute it to cause even more damage to your server!

Evaluating Variables

Another common mistake that is made is when programmers throw user-provided variables into an eval statement. This is VERY dangerous. You must make sure that the variables going in have been restricted so that they CAN'T POSSIBLY CONTAIN ANY PERL CODES!

For the example, here's a (very bad and unprotected) script for a calculator CGI:

Code:
#!/usr/bin/perl -w

use CGI;
my $q = new CGI;

my $expr = $q->param ("expr");

print $q->header;

# Calculate the expression.
if ($expr) {
   my $result = eval ($expr);
   print "$expr = $result";
}

print qq~<form action="calc.cgi">
<input type="text" name="expr">
<input type="submit" name="Calculate!">
</form>~;

Now, when used in the way it was meant to, it may be running eval("30+50/6") or any other mathematical equation. But if somebody gives it an equation that isn't mathematical at all, such as unlink ("*.*"), then not the code is running eval (q~unlink ("*.*")~), or more simply, "unlink (*.*)" which would delete lots of files and really cause problems on your server.

I have two solutions to this. The first one is the easy one, but not necessarily the recommended one.

The first solution is to do what we did above: limit what can be in $expr. In this case, we went to define what CAN be in there, rather than what CAN'T be. Here's the code to do this:

Code:
$expr =~ s/[^0-9\+\-\*\/ ]//g;

That code will remove everything that isn't a number, +, -, *, /, or a space. In other words, it allows a mathematical equation to pass through but without risking any Perl code. Note that the backslash \ isn't allowed, even though to most non-programmers this also means division. It doesn't. Forward slash divides, backslash is an escape code. We don't allow the backslash here for the same reason as above.

The other solution is to use one of the Math:: modules on CPAN, as these can run all kinds of mathematical calculations without the risk that eval inherently brings.

SQL Injection

This is similar to the "evaluating variables" section above, but tends to be more common in code written by newbies. Take for example a simple CGI script that lets a user look up the name of a city by giving the zip code:

Code:
#!/usr/bin/perl -w

use CGI;
use DBI;
my $zip = $q->param("zipcode");

# connect to the DB
my $dbh = DBI->new(...);

# run a query
my $query = $dbh->prepare (
  "SELECT name FROM cities WHERE zip=$zip"
);
$query->execute();

# give results
print $q->header();
if (my $row = $query->fetchrow_hashref) {
   print "The city's name is $row->{name}.";
}
else {
   print "That wasn't in the database.";
}

This script looks innocent enough. You might expect that your user isn't going to give your script anything other than a zip code. You might even have JavaScript on your form page that won't let the form submit if the zip code isn't numeric.

First of all, don't ever trust JavaScript to keep your CGI scripts out of harm's way. JavaScript is run client-side and all it can do is guide your end users, but it by no means prevents them from misusing your form. Users can disable JavaScript, edit your script "live" with some Firefox plugins, or skip the web browser altogether and submit data over telnet directly.

So, if your user is playing by the rules, the following SQL command gets executed:

Code:
SELECT name FROM cities WHERE zip=[color blue]90230[/color]

But what if your user doesn't give it a zip code? What if they submit the form and send "[color red]90230; DROP TABLE cities[/color]"? Or any other valid MySQL commands?

Now your Perl script executes this:

Code:
SELECT name FROM cities WHERE zip=[color red]90230; DROP TABLE cities[/color]

Now you've lost your entire `cities` table out of your database. Or if the user was even more savvy, he could give your script "[color red](SELECT password AS name FROM users WHERE username='admin')[/color]", or other such stuff. The door is open for malicious users to start poking and prodding at your database, and you don't want that.

What's the solution? Should you filter all your inputs? Make sure that the zip code is all numeric by deleting everything that isn't a number? This would be a good solution, but it's still not the best. It takes a lot of extra effort to filter all these inputs yourself and you're bound to make mistakes here and there. Just let DBI handle it for you with placeholders:

Code:
my $query = $dbh->prepare (
   "SELECT name FROM cities WHERE zip=?"
);
$query->execute($zip);

Now we don't insert $zip directly into the query. We use a question mark as a placeholder, and then send $zip in the execute() statement. If you need to use multiple placeholders, you can; just send your parameters into execute() in the same order as you want them to fill in your placeholders. Here's an example of that:

Code:
my $query = $dbh->prepare (q~
   UPDATE users SET password=? WHERE username=?
~);
$query->execute($password, $username);

Handling your SQL inputs this way will save you a lot of stress in the event that somebody tries to mess up your site (and somebody will definitely try, so it's better to take precautionary steps ahead of time).

Denial of Service Attacks

The last mistake I'm going to cover here is something that most all CGI scripts can fall victim to. CGI has the ability to upload files.

A hacker can upload a file to any CGI script, even if the CGI script isn't prepared to receive them. Any CGI script can fall victim to this.

If the hacker uploads a VERY large file (i.e. several gigabytes), it wears heavily on the server. The same can also happen with normal form submissions, if the POSTed data is too big.

Using the CGI module, there is a simple solution to avoid this: specify what types of data the CGI will accept. Here are some variables you should set (if you use CGI)

(straight from the documentation)
Code:
$CGI::POST_MAX

    If set to a non-negative integer, this variable puts a ceiling on the size of POSTings, in bytes. If CGI.pm detects a POST that is greater than the ceiling, it will immediately exit with an error message. This value will affect both ordinary POSTs and multipart POSTs, meaning that it limits the maximum size of file uploads as well. You should set this to a reasonably high value, such as 1 megabyte.

$CGI::DISABLE_UPLOADS

    If set to a non-zero value, this will disable file uploads completely. Other fill-out form values will work as usual.

Be Mindful of Your Environment

To wrap up this FAQ, there's one more important aspect that gets often overlooked: when programming defensively, be mindful of what environment your program is running in.

Usually one of the first CGI applications a new programmer will write are guestbook scripts. You offer your users a form where they can fill out their name and a comment, save their comment to the server's hard drive, and then be able to display all your saved comments.

Here is how a beginner's guestbook might look:

Code:
#!/usr/bin/perl -w

use CGI;
my $q = new CGI;

# allow updates
if ($q->param) {
   my $name = $q->param("name");
   my $comment = $q->param("comment");

   open (GUESTBOOK, ">guestbook.txt");
   print GUESTBOOK "<b>From:</b> $name<br>\n"
      . "<b>Comment:</b> $comment<p>\n\n";
   close (GUESTBOOK);
}

# read the guestbook
open (READ, "guestbook.txt");
my @gb = <READ>;
close (READ);
chomp @gb;
my $guestbook_html = join("\n",@gb);

# print the page
print $q->header();
print <<EOF;
<html>
<body>

<h1>My Guestbook</h1>

$guestbook_html

<h1>Add an Entry</h1>

<form action="guestbook.pl" method="post">
Name: <input type="text" name="name"><br>
Message: <textarea cols="50" rows="10" name="comment"></textarea><br>
<input type="submit" value="Post!">
</form>
EOF

This is a really simple example of a guestbook, and will illustrate something wrong with it. While your guestbook might work fine most of the time, if your users are behaving themselves, what happens if your user inserts some HTML code into their comment?

You might not immediately see the problem with this. If your users want to insert <b> for bold text, or <font color="red"> to style up their post, what's wrong with that? Well, the <script> tag is what's wrong with that.

One black-hat-wearing user of your site might write out a <script> tag, and then be able to include JavaScript, and all your other users who view this page will execute this script, which might send them a redirect to a malicious website, or send their HTTP cookies away to the hacker's server, or worse.

So, you should be mindful of the environment here. Your CGI-based guestbook is in the world of HTML, so you should keep in mind that your users can also write their own HTML if you don't filter their inputs.

We can get around this with code like this:

Code:
# allow updates
if ($q->param) {
   my $name = $q->param("name");
   my $comment = $q->param("comment");

   [color blue]# don't let 'em use HTML
   $name =~ s/</&lt;/g;
   $name =~ s/>/&gt;/g;
   $comment =~ s/</&lt;/g;
   $comment =~ s/>/&gt;/g;[/color]

   open (GUESTBOOK, ">guestbook.txt");
   print GUESTBOOK "<b>From:</b> $name<br>\n"
      . "<b>Comment:</b> $comment<p>\n\n";
   close (GUESTBOOK);
}

Now we substitute < for &lt; and > for &gt; on all their inputs, so that if they write out the word "<script>" in their comment, then it will be displayed in your guestbook as literally "<script>", instead of being processed by the browser as HTML code and potentially causing problems.

If you'd want users to use a small subset of commands, like <b>, <i>, and <u>, you'll have to do a couple extra regular expressions on these (i.e. turn <b> into , then do your HTML filter so everything else gets neutralized, then turn back into <b>).

So, these are some important tips to keep in mind while developing web applications that interact with your users.
Register to rate this FAQ  : BAD 1 2 3 4 5 6 7 8 9 10 GOOD
Please Note: 1 is Bad, 10 is Good :-)

Part and Inventory Search

Back
Top