Sanitizing your input considered harmful.

Standard

Yeah, you heard me. I love those sensationalist headlines.

I’ve been frustrated lately trying to bring some associates along in their web development practices. For those getting started now in web development, please don’t try and climb the long road of discovery to good web development practices along the path that I and so many others have taken.

Take the new road.

You see, the technique of sanitizing your input assumes that it must be sanitized. It assumes that you are constructing code (be it a SQL statement or other) from input provided by an end user, who may or may not be trying to hack your application. In this post, I’m focusing on database interfacing.

Modern application frameworks have taken this burden away from you, so please use them.

Hopefully, you’re using a web framework that does this for you internally. Some programming environments (ie. PHP) are stuck in the land of eternal santization because they don’t support the more sophisticated database binding technologies. If you’re using a framework in an environment like PHP, you must take care – do you have projects out there using old exploitable versions of frameworks?

Let’s dig further.
Modern database interfaces typically have the concept of bound variables. Bound variables are 100% safe, because the executed code is STATICALLY DEFINED in your app, and the variables are passed in separately.

So you can issue statements like this: “select * from users where id=:id”, where “:id” is a place-holder to be defined separately.

I’ll use discreet PHP for the first example. Consider the following three pieces of code:

// BAD!
$v = $REQUEST['myparam'];
$explode_me = mysql_query("select * from users where id='".$v."'");
// Dangerous
$v = $REQUEST['myparam'];
$v = my_magic_sanitizer($v); // hope you've thought of everything!
$please_dont_explode = mysql_query("select * from users where id='".$v."'");
// To my knowledge, the best you can do in PHP
$v = $REQUEST['myparam'];
$v = mysql_real_escape_string($v);
$probably_safe = mysql_query("select * from users where id='".$v."'");

Notice that the best you can accomplish in PHP is a dynamic SQL statement using the MySQL sanitizer, mysql_real_escape_string().

Here’s the absolute best you can do (I’ll use Java JPA as an example, but many environments provide this ability)

// Java
List users = pm.createQuery("SELECT u FROM Users u WHERE name=:name")
                 .setParameter("name", request.getParameter("name"))
                 .getResultList();

Notice that we can simply pass the request parameter directly into the database call. It’s 100% guaranteed safe. Here’s a similar example in Ruby with SQLLite:

// Ruby
db.execute("SELECT * FROM users WHERE name=:name", "name" => param['name'])

There’s another side-benefit of this technique, and that’s a feature few people pay attention to until they realize their application doesn’t scale well. Statement Caching is an automated caching technique used by your database to save it having to parse through the text of your queries every single time you execute them.

If you always use dynamic statements in your application, you rarely benefit from Statement Caching.

However if you use other environments and the techniques I’ve shown you above, you can gain significant performance (depending on the rest of your application) through your databases Statement Cache. Prepared statements are closely related to statement caching, except you are in control over what is cached.

If you are using a framework such as CakePHP or Ruby On Rails the task of looking up items based on their id or primary key is taken care of by the framework, so you don’t typically build statements anyways.

Happy coding…

Advertisements

2 thoughts on “Sanitizing your input considered harmful.

  1. Parameterized SQL is absolutely a must. Sneaky testers will even have scripts at their disposal which will find instances in the code they are working on where ‘Thou shalt not build queries dynamically’ is violated.

    There are other reasons to ‘sanitize’ all data from the client though than just (first order) SQL Injection.
    – second order sql injection (where nasty things get put into the database to be run later (usually in a c/blob)
    – XSS
    – business rule validation
    – general data quality (according to cnn a couple years back, this is one ‘area of specialization that will become of greater importance over the next 10 years’)
    – preventing unnecessary database calls

    And of course, you have to do these checks in multiple places. Doing them in the browser prevents the system from doing things that could not possibly have a chance of working but those are all html or javascript based and easily avoided through tools like firebug and tamper data. So you have to do them at the app layer as well where (in theory) they cannot be avoided. And if that physical machine is compromised (to allow them to change the rules) then you’ve lost the game anyways.

    It’s not paranoia if they really are after you.

    -adam

  2. For sure – I tried to keep things focused on the database realm but you’re right there are many ways in which they are out to get you.

    My next blog post is going to cover data binding on the application presentation tier, which touches a couple things including XSS.

    Poor data quality drives me insane.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s