AoC 2022 Day 16: Secure Coding II
The standard language for querying databases is called Structured Query Language (SQL). Any application you construct that uses a database must be able to instantly generate SQL statements and send them to the database engine in order to retrieve the data it needs to function. Fortunately for you, SQL was designed with simplicity in mind, and to make it simpler for programmers to learn, its syntax is meant to mirror simple English statements.
Let’s discuss the specific database engine utilized in our project, MySQL, before going over the SQL syntax. In MySQL, data is kept in structures known as tables. Consider them to be any table with columns and rows in a spreadsheet document. For clarity, let’s check one of the tables in use in the current application where the toys are stored:
As you can see, each row of the table corresponds to a different toy, and each column is a field of data that describes the toy. Looking at the third row, we can see that our toys table contains a toy named “Car” and that there are 12 units available for it. Pretty easy.
As mentioned before, when an app needs to retrieve information from the database, it will need to build an SQL query. Queries are simple instructions that ask for specific data in a structured way that the database can understand. To query information, we will use SELECT statements, indicating which rows of which table we want to retrieve. If we wanted to get all of the columns from the “toys” table in our database, we could use the following statement:
SELECT * FROM toys;
Notice the asterisk(*), which indicates that you want to retrieve all columns from the table. If you need to ask for specific columns, you can replace the asterisk with a comma-separated list of columns. For example, to retrieve only the name and quantity columns, you can issue the following SQL query:
SELECT name, quantity FROM toys;
All table rows are returned in both cases before, but you can filter those if needed using a WHERE clause. Suppose you want to filter the results of the last query so that you only get the toys for which there is at least a quantity of 20. You could do so with the following statement:
SELECT name, quantity FROM toys WHERE quantity >= 20;
In real-world apps, you are likely to find much more complex queries in some cases, but what we’ve covered so far should be enough for the rest of the room.
Sending SQL Queries from PHP
Now that we understand how a SELECT statement works, let’s see how a PHP application builds and sends such a query to MySQL. Although we are focusing on PHP and MySQL, the same idea generally applies to other programming languages.
The first step is always to get a connection to the database from our code. To do so, PHP includes the mysqli extension, which provides the mysqli_connect()
function. The function receives the IP or name of the database server as a first parameter ($server
), followed by the username ($user
) and password ($pwd
), and finally, the name of the schema to use($schema
), which is just an identifier of the database to which we are connecting. As a result, the function returns a connection handler, which can be used to send further SQL Queries. Think of the connection handler as a variable that holds the connection information to the database, so that queries can be sent to it:
$server="db";
$user="logistics_user";
$pwd="somePass123";
$schema="logistics";
$db=mysqli_connect($server,$user,$pwd,$schema);
Once the connection is made, we can issue SQL queries using the mysqli_query()
function. The first parameter we pass to the function is the connection handler we got before, and the second parameter is a string with our SQL query:
$query="select * from users where id=1";
$elves_rs=mysqli_query($db,$query);
As a result of executing the query, we obtain an SQL result set and store it in the $elves_rs
variable in our example. A result set is nothing more than an object that contains the results of our query, which can be used in the rest of our program to access the resulting data.
Building Dynamic Websites
Now here’s where things get interesting.
Depending on the number you put on the id
parameter of the URL, you get served the profile of a different elf. Behind the curtains, this works by creating an SQL query that embeds the id
parameter value and returns the information on the corresponding elf.
In code, it would look like this:
$query="select * from users where id=".$_GET['id'];
$elves_rs=mysqli_query($db,$query);
The first line builds an SQL query by concatenating the $_GET['id']
variable as part of the where clause. Note that in PHP, you can access any parameter in the URL as $_GET['name_of_parameter']
. This query will ask the database for all columns of the table users that correspond to the elf with a specific id. The second line sends the query and returns the information of one particular elf as a result set that we store in the $elves_rs
variable. The rest of the website then parses the result set and renders the page accordingly.
If you test the website, you can see that it works as expected. You have, however, introduced an SQL injection vulnerability in your code that could allow an attacker to dump your whole database!
SQL Injection (SQLi)
The approach previously demonstrated has a flaw in that it automatically concatenates untrusted user input into a SQL query without asking any questions. Our app should properly evaluate every input the user submits before using it, as was demonstrated in the work from the previous day. If not, unforeseen events might occur.
With regard to SQL and our example, a hacker may submit SQL syntax through one of the app’s URL parameters and have it concatenated with a SQL query in the app’s code, possibly affecting its intended use.
Let’s get back to the elf’s profile page to understand this better. Remember the application is creating a query by concatenating whatever is sent in the id
parameter as part of the WHERE clause:
$query="select * from users where id=".$_GET['id'];
If the attacker sends the following through the id parameter of the URL:
http://LAB_WEB_URL.p.thmlabs.com/webapp/elf.php?id=-1 OR id = 4
When PHP concatenates “-1 OR id = 4” to our SQL statement, it will end up with this query:
select * from users where id=-1 OR id = 4
Suddenly, the attacker has injected some SQL syntax that, when concatenated to our original query, ends up serving the data of the elf with id=4
for some weird reason.
If we read the resulting query string, we can see that our WHERE clause was modified to filter only the elves that either have id=-1
or id=4
. Since the id values used by the database are likely all positive numbers, no elf will match id=-1
. Therefore, the database will only return the elf with id=4
.
While this example shows a harmless injection, a skilled attacker can try to get your server to run much more complex SQL queries and potentially force the database to return any data on any table they want. Just as an example, look at what happens when you put the following in the id
parameter:
The SQL injected will make the database return all of the users and passwords of the application.