// posts

Don't trust user input

While I was working on pyup.io's database for known security vulnerabilities, I've manually reviewed thousands of changelogs and commits over a couple of weeks.

A lot of the security issues I've found are somehow related to user input. Even expierenced developers with a long track record of outstanding code make mistakes related to this, so I thought it might be a good idea to give a quick overview of common pitfalls.

Now, let's take a look at why you should treat user input with caution and check out a couple of examples.

Why is user input bad?

Before going into details, let's take a look at one of the safest ways to shoot yourself in the foot. Evaluating or directly executing user input.

data = "hi"
eval(data)

>> "hi"

It should be pretty clear why this is problematic. It allows your users to execute arbitrary Python code on behalf of your program.

data = "open('secret.txt').read()"
eval(data)

>> "whatever is in secret.txt"

Granted, nobody ever writes code like that. So, why is it a good example anyway? Even if you don't use eval, you are doing something with the input. Evaluating it would be the absolute worst thing to do.

But maybe you update a database record, load a file here and there, set a cookie, somehow display it to other users or request an external resource.

What is user input?

The first thing that comes to mind is a web form of some sorts. But it goes way beyond that.

Do you allow custom usernames? External resources like an Image? That's user input. What about an API call you do on behalf of you users? Yeah, user input.

Everything you haven't explicitly written yourself is user input.

Shooting yourself in the foot

Now that we know what user input is and why it might be bad, let's take a look at a couple of examples.

SQL Injection

If you are doing some kind of database operation based on user input make sure to validate and escape it properly.

If you are using Django, use forms and the ORM. If you are using flask, bottle or pyramid, check out SQLAlchemy and WTForms. Don't try to write this yourself.

Something like

data = 'foo'
f'INSERT INTO users VALUES ("{data}");

>> 'INSERT INTO users VALUES ("foo");'

quickly becomes

data = 'foo"); DROP TABLE "users" --'
f'INSERT INTO users VALUES ("{data}");'

>> 'INSERT INTO users VALUES ("foo"); DROP TABLE "users" --");'

and your users database is gonet.

Make sure to escape the whole query and not just whatever is the form. Maybe you have a user with the username ; DROP TABLE "COMPANIES";-- LTD just like the British Companies House.

Path Traversal

If you are doing any work where the file system is involved, like storing files for your users, be careful.

You might think that using something like

username = 'johnny'
image = 'image.png'
f'/data/uploads/{username}/{image}'

>> '/data/uploads/johnny/image.png'

is a good idea, but it isn't.

It looks okay-ish on the first glance. Everything that johnny uploads lands in /data/uploads/johnny. Great!

Except when johnny changes his name to ../../etc/ssh and uploads a file named ssh_config.

username = '../../etc/ssh'
image = 'ssh_config'
f'/data/uploads/{username}/{image}'

>> '/data/uploads/../../etc/ssh/ssh_config'

You might say that /etc/ssh/ssh_config is only writable by root and your webserver is run by a user with less priviliges. That's good, but only as a last resort.

You shouldn't have to fall back to whatever your webserver runs as to circumvent things like that.

You can use uuids and hardcoded paths to store user uploads on your file system.

import uuid

user_id = uuid.uuid4()
image_id = uuid.uuid4()
f'/data/uploads/{user_id}/{image_id}.png'

>> '/data/uploads/8a901dc8-97ee-4a76-bccc-693d67886de1/310769c9-f337-402b-bc7c-729f7cd27dcb.png'

Cross-site Scripting (XSS)

Cross-site Scripting is one of the most common vulnerabilities in webapps. It allows attackers to inject client-side JavaScript into your page.

If you are displaying user input, you need to escape it properly.

A great comment like this one

comment = "great site!"

f'<div class="comment">{comment}</div>'

>> '<div class="comment">great site!</div>'

may become more malicious like this one here

comment = '<script>alert("this site sucks!")</script>'

f'<div class="comment">{comment}</div>'

>> '<div class="comment"><script>alert("this site sucks!")</script></div>'

which directly executes JavaScript in your visitors browser.

Django is doing the escaping automatically for you. If you are using a template engine like Jinja2, load it with autoescape=True. If you are working with raw strings, you can escape them with html.escape.

Even if you don't work with user input directly, escape each and everything.

But wait, maybe you are just displaying DNS records fetched from public nameservers? This should be relatively safe, right?

No. Something like shown in the video below is just one carefully crafted DNS record away.