Presentation:

A database is a set of tables. A table is set of columns.
In this tutorial, we will assume that you already know how to create databases and tables with a dbms like phpMyAdmin.
I will write all the SQL keyword in uppercase. You can download the tables here.

Select statement:

The SELECT statement is explicit enough to understand what it does. It selects column, string, and expression and retrieves them.

Select everything by using the joker *:

    SELECT *

Select a string:

    SELECT 'My string'

Select a column:

    SELECT product.name
    or
    SELECT name

Select rows:

The clause FROM means that we will select data FROM somewhere.

    SELECT * FROM product

We can filter by using the clause WHERE:

    SELECT * FROM product WHERE id='1'

We want to group the result by the "manufacturer" column:

    SELECT manufacturer FROM product GROUP BY manufacturer

Counting things:

We want to count the number of rows:

    SELECT COUNT(*) FROM product
    SELECT COUNT(name) FROM product
    SELECT COUNT(name) FROM product WHERE manufacturer ='apple'
    SELECT COUNT(name), manufacturer FROM product GROUP BY manufacturer

Insert data:

Insert many rows:

    INSERT INTO product 
        (name, manufacturer, price, category)
            VALUES
        ('Macbook pro','Apple','1200','1),
        ('Dummy 1,D&D,'9200','1),
    ….

Insert one row:

    INSERT INTO product SET name='Macbook pro', category='1

Update data:

To update data we have to tell WHERE we want to update a table. It means which row or rows.

    UPDATE FROM product SET name='MacBook Pro Unibody' WHERE id='1'

If we don't specify where we want to update the table, it will update all the rows of your table.

    UPDATE FROM product SET name='MacBook Pro Unibody'

Now all of our products will have 'MacBook Pro Unibody' as name. So before to proceed to an UPDATE, a good practice is to select the rows that we want to update in order to see what it will return.

Be careful of SQL injection, because adding just "OR 1" at the end of the query will UPDATE all the rows of you table.

    UPDATE FROM product SET name='MacBook Pro Unibody' WHERE id='1' OR 1

Insert/Update data:

When you insert data by specifying ON DUPLICATE KEY, if the unique key already exists, it will update your row. In our case, the unique key is the column ID.

    INSERT INTO product
    	(id, name, manufacturer, price, category)
            VALUES
        ('Macbook pro','Apple','1200','computer')
        ON DUPLICATE KEY UPDATE
    	    name='Macbook pro', manufacturer='Apple',
    	    price='1200', category='1'

Type, collation, extra & index:

When we create a table (like in phpMyAdmin), we have to choose some options for each column. Basically, there's: field, type, collation, and attributes...

FIELD is the name of your column.

TYPE is the type of data. The most use is:
- INT is a numeric type, it's use for ID column
- TINYINT is for small numeric type, it's use for a Boolean column
- VARCHAR is string type, it's use for a name or a title
- TEXT is string type too; it's for a description.
- DATE is a date formatted column, (YYYY-mm-dd)

So when we create a table we must think of what kind of data we want to put in, and look for the length of our field type. At the beginning, for example with a VARCHAR, if we don't know exactly the length we need, it's better to put 255 in order to prevent mistake.
We can have more specifications of data type here:
http://dev.mysql.com/doc/refman/5.0/en/storage-requirements.html

COLLATION is the encoding of our data, for new project it's better to use UTF-8, which is the universal encoding. For example, AJAX always use UFT8 encoding to pass the data, if our table is in UTF-8 too, we don't have to use Php functions like utf8_decode, utf8_encode. It can be really boring sometime to work with non-UTF8 encoding.

To work with UFT8 encoding, don't forget to set after the connexion the character type:

	SET CHARACTER SET uft8

For more informations: http://dev.mysql.com/doc/refman/5.0/en/charset-connection.html

You will tell me, there's a lot of utf8 collation with Mysql. Which one I have to choose.

Let's see UTF8_BIN:
BIN is for binary, it will compare string by the binary value of each character.
«é» is not equal to «e»

UTF8_GENERAL_CI:
CI is for case insensitive. It will compare a string using case insensitive comparisons, and using general language rules:
«é» is equal to «e»
«Ê» is equal to «e»
It's really good to use this encoding for search, especially if you write in French or other language that need accent.

UTF8_GENARAL_CS:
CS is for case sensitive. Guess what it does....
«é» is equal to «e»
«Ê» is not equal to «e»
«Ê» is not equal to «ê»

ATTRIBUTES We can specify an attribute to our field. For example, if we use a TINYINT type, we can set the value of our column with a negative number or positive number: -128 to 127

If we use the attribute UNSIGNED, we will not be able to put a negative value anymore. But by this way we can increase the max value of our column: 0 to 255
Most of time we use UNSIGNED for ID column.

The attribute UNSIGNED ZEROFILL fixe the width of the number. For example with a INT(3) and a value of 7, you will retrieve 007.

NULL is allowing the column to be null!

DEFAULT if we want to set a default value on a new record without specifying the value for the column. It can be a number, string, TIMESTAMP (Unix date format)

AUTO_INCREMENT. It automatically the value of the column in order to have a unique value for this field. Again, in 99% of case it's use for ID column.

INDEXES Indexes are the most important things in sql. Index creates relation between tables. Index is use to get data from two tables using JOIN clauses. Index is fast, and useful for search rows, and link table between them.

As you know, your database is stored on your hard drive. But all the data of your database is not on the same place on it. Except for SSD, your hard drive is mechanic and to read data the read head must move to the right position. If you have 200 000 rows on your table product and you just want to get the category «computer». Schematically, the hard drive will check each row sequentially, and if it find the category «computer» it will return it. But it can take a long time by checking each row. It's the reason why INDEXES exist. If you specify the column category as INDEX, it will create an index file by category with the place on the hard drive of each row. It's like the index of a book with the page number. So, now, if you want the category «computer», it will check the INDEX file, place the read head on the right position and return directly the rows that you need. It's very useful with large table. If you have 2 rows on your table, you will see no difference. But it's important to think about your indexes when you create a table. More, it will help, or guide you during your queries development.

We can create INDEX by two columns or three. For example INDEX by column «category» and «published».

Alias:

    SELECT p.name FROM product AS p
    SELECT p.name AS product_name FROM product AS p
    SELECT p.name AS 'Product name' FROM product AS p

Join clause:

The join clause is used for getting data from two or more tables. Let's see what is happen with some examples. To test we can use the tab SQL on your phpMyAdmin interface.
Note: It's important to indent your query to read them quickly.

    SELECT p.name AS product_name, c.name AS category
        FROM product AS p
        JOIN category AS c
            ON c.id = p.category

As you can see, the table product is linked with the table category by the column category. We used this column to do the JOIN between the two tables. It will return only the products that have a match with the category table.

What's happening if I want to get all the products even if they don't have a category? We can use the LEFT JOIN.

    SELECT p.name AS product_name, c.name AS category
        FROM product AS p
        LEFT JOIN category AS c
            ON c.id=p.category

Now the other products appear but it tells you that the category is NULL. This is interesting. And now if I want to get only these uncategorized products, we can filter with the WHERE clause:

    SELECT p.name AS product_name, c.name AS category
        FROM product AS p
        LEFT JOIN category AS c
            ON c.id=p.category
        WHERE c.id is NULL

If I want to get all the categories even if they don't have product, I can use the RIGHT JOIN.

    SELECT p.name AS product_name, c.name AS category
        FROM product AS p
        RIGHT JOIN category AS c
            ON c.id = p.category

This time it tells that the product_name is NULL.

Most of time you will use LEFT JOIN and JOIN. RIGHT JOIN is more rare. Because LEFT JOIN and RIGHT JOIN are the same things. It's just a different point of view.

This query:

    SELECT p.name AS product_name, c.name AS category
        FROM product AS p
        RIGHT JOIN category AS c
            ON c.id = p.category

Is equivalent to that query:

    SELECT p.name AS product_name, c.name AS category
        FROM category AS c
        LEFT JOIN product AS p
            ON c.id = p.category

What's happening if we don't specify a link between two tables?

    SELECT p.name AS product_name, c.name AS category
        FROM product AS p
        JOIN category AS c

Wow, it multiplies the number of product rows by the number of category rows! So, don't do that on large table J and don't forget to link your tables ;)

Order by:

To sort the result by the name of products:

    SELECT p.name
        FROM product AS p
        ORDER BY name

Multi column sorting, by category and name:

    SELECT p.name
        FROM product AS p
        ORDER BY category, name

DESC and ASC options. By default ASC sorting is use.

    SELECT p.name
        FROM product AS p
        ORDER BY name DESC

It sorts the result by alphabetic descendant order.

    SELECT p.name
        FROM product AS p
        ORDER BY category ASC, name DESC

It sorts the category by numeric ascendant order, and the name by alphabetic descendant order.

On date column.

    SELECT c.category
        FROM category AS c
        ORDER BY c.created_at

Random order.

    SELECT c.category
        FROM category AS c
        ORDER BY RAND()

String functions:

We are going to see some basic string function. I invite you to discover more functions on the mysql website: http://dev.mysql.com/doc/refman/5.0/en/string-functions.html

Concat and Concat_ws

    SELECT CONCAT(p.name, '=>', c.name) AS MyConcat
        FROM product AS p
        JOIN category AS c
            ON c.id=p.category
        ORDER BY c.name ASC, p.name ASC

But what's happen if an argument of my CONCAT function is null:

    SELECT CONCAT(p.name, NULL, c.name) AS MyConcat
        FROM product AS p
        JOIN category AS c
            ON c.id=p.category
        ORDER BY c.name ASC, p.name ASC

It will return NULL for all the rows!! It can be good for some reason, but sometime we just want to ignore if an argument is NULL.

    SELECT CONCAT_WS(' => ',p.name, NULL, c.name) AS MyConcat
        FROM product AS p
        JOIN category AS c
            ON c.id=p.category
        ORDER BY c.name ASC, p.name ASC

Note that the first argument of CONCAT_WS is the separator! Of course you can put nothing in.

    SELECT CONCAT_WS(' ',p.name, NULL, c.name) AS MyConcat
        FROM product AS p
        JOIN category AS c
            ON c.id=p.category
        ORDER BY c.name ASC, p.name ASC

Trim

TRIM removes space before and after a string.

SELECT TRIM(' HI GUY ') AS MyTrim

It's good to use it on UPDATE OR INSERT statement.

A lot of string functions exist. Just have a look. Even if you won't use it, it's good keep in mind that these functions exist.

We can combine functions.

SELECT LOWER(
    CONCAT(
        TRIM(' HI GUYS '), ' WHAT\'S UP ')
        )
    ) AS MyTrim

On small table we can use these functions on the ORDER BY clause. But on big table it will alter the performance.

    SELECT *, SUBSTRING(MD5(name),1,1) AS FirstMD5
        FROM product
        ORDER BY SUBSTRING(MD5(name),1,1)

This is just for fun; it takes the first character of the MD5 of name and it sort the result by this one. This query on a table with around 160 000 rows takes 0,27 sec to sort. It' is not so bad actually.

Other functions and operators:

You can find the list of mysql functions here: http://dev.mysql.com/doc/refman/5.0/en/functions.html

I will not explain the numeric functions cause it's the same use than the string functions.

Just have a look on the manual for functions, operators, conversion, etc… It's really easy to understand.

By the way, on the SELECT statement you can use comparison to create Boolean operation.

SELECT 1>0 AS myBoolean

It will return 1

SELECT 1>2 AS myBoolean

It will return 0

SELECT 2=2 AS myBoolean

It will return 1

So of course, we can use columns instead numbers.

Just some words about date functions. The default SQL dates time format is: YYYY-mm-dd HH:mm:ss You can easily isolate the day, month or year.

SELECT MONTH(created_at) FROM category

SELECT created_at FROM category WHERE MONTH(created_at) = '04'

Aggregating results, grouping results:

Aggregate functions work with GROUP BY clause. The GROUP BY clause is used to group a list of rows. It expects a set of conditions.

Count()

It counts the number of product by categories

    SELECT COUNT(p.id) AS nb, c.name
        FROM category AS c
        JOIN product AS p
            ON p.category=c.id
        GROUP BY c.id

Sum()

It sums the price of each product by categories

    SELECT SUM(p.price) AS nb, c.name
        FROM category AS c
        JOIN product AS p
            ON p.category=c.id
        GROUP BY c.id

Avg()

It calculates the average of the products price by categories

    SELECT AVG(p.price) AS nb, c.name
	FROM category AS c
	JOIN product AS p
		ON p.category=c.id
	GROUP BY c.id

Min()

It retrieves the products that have the lower price by categories

    SELECT MIN(p.price) AS nb, c.name
        FROM category AS c
        JOIN product AS p
            ON p.category=c.id
        GROUP BY c.id

Max()

It retrieve the product who have the highest price by categories

    SELECT MAX(p.price) AS nb, c.name
        FROM category AS c
        JOIN product AS p
            ON p.category=c.id
        GROUP BY c.id

Group_concat

It concatenates the entire products name by categories

    SELECT GROUP_CONCAT(p.name ORDER BY p.name ASC SEPARATOR ', ') AS nb, c.name
        FROM category AS c
        JOIN product AS p
            ON p.category=c.id
        GROUP BY c.id

Distinct()

It removes the doubles values. It retrieves each order value, and in the second query it counts the number of orders.

SELECT DISTINCT s.order AS num_order
    FROM sales AS s

SELECT COUNT(DISTINCT s.order) AS num_order
    FROM sales AS s

We can use DINTINCT on several columns like:

    SELECT DISTINCT a, b, c

And with COUNT()

    SELECT COUNT(DISTINCT a, b, c)

We can say the same thing with the GROUP BY clause:

    SELECT … FROM … GROUP BY a, b, c

Temp table:

Sometime it can be good to create a temp table to speed up your query. The following is just an example, and it's not effective to process like that in this case.

    SELECT p.name, c.name AS category
        FROM category AS c
        JOIN (SELECT id, name, category
    		FROM product
    		WHERE price BETWEEN 500 AND 900
    	) AS p
    	    ON p.category=c.id

You will use it in action in the next tuto. (Nested set).

Searching rows:

Often on a website, you allow your visitor to search articles or products on your website.

LIKE: with this keyword you can find a word in a string. The sign % is used to search the word at the beginning, middle or end of the string.

    //Start with book
    SELECT p.name
        FROM product AS p
        WHERE p.name LIKE 'book%'
    
    //Contain book
    SELECT p.name
        FROM product AS p
        WHERE p.name LIKE '%book%'
    
    //End with book
    SELECT p.name
        FROM product AS p
        WHERE p.name LIKE 'book%'

We can use regular expression too

    //Searching with a REGEX (which row contain macbook, or macbok, maco, etc...)
    SELECT p.name
        FROM product AS p
        WHERE p.name REGEXP '^mac[bok]'
    
    //Test a field with regext and return 1 or 0 (which row is ending by 'pro')
    SELECT p.name REGEXP '^.*(pro)$'
        FROM product AS p

Conclusion

First of all, I want to say, sorry for my english, don't hesitate to send me a little word to advise me about my mistakes. I will try to update this article if I find something to add :) I hope you found some interesting things but for sure nothing is better than the manual :)

Add comment