SQL Query Manipulation Language with SQLite

<figure class="highlight"><pre><code class="language-text" data-lang="text">name: inverse
layout: true
class: center, middle, inverse
.indexlink[[](#) [](#index) <a href="#" class="color"></a>]

---

name: top
layout: true
class: top, left
.indexlink[[](#) [](#index) <a href="#" class="color"></a>]

---

name: normal
layout: true
class: left, middle
.indexlink[[](#) [](#index) <a href="#" class="color"></a>]

---

# SQL

## Data Query Language

Gonçalo Gonçalves  
[gflcg@fe.up.pt](mailto:gflcg@fe.up.pt)

.footnote[-- original slides by <a target=_blank href="http://www.fe.up.pt/~arestivo">André Restivo</a>]

---
name: index
template: inverse

# Index
.index[

]

---

---

# SELECT and FROM

* **SELECT** and **FROM** are the most basic SQL query operators.
* They allow us to specify which tables (FROM) and columns (SELECT) we want to retrieve from the database.
* The result of an SQL query is also a table.

---

# Selecting all columns

To select all columns from a table we can use an *

```sql
SELECT * FROM Employee;
```

.sqltable[
|id|name|salary|taxes|dep_num
|-|-|
|1 | John Doe    | 1000 | 200 | 1
|2 | Jane Doe    | 800  | 100 | 2
|3 | John Smith  | 1200 | 350 | 2
|4 | Jane Roe    | 1000 | 200 | 3
|5 | Richard Roe | 900  | 0   | NULL
]

---

---

# Choosing columns

We can select only some columns

```sql
SELECT id, name FROM Employee;
```

.sqltable[
|id|name
|-|-|
|1 | John Doe    |
|2 | Jane Doe    |
|3 | John Smith  |
|4 | Jane Roe    |
|5 | Richard Roe |
]

---

# Column operations

We can also perform any operations between columns

```sql
SELECT id, name, salary - taxes FROM Employee;
```

.sqltable[
|id|name|salary - taxes
|-|-|
|1 | John Doe    | 800
|2 | Jane Doe    | 700
|3 | John Smith  | 850
|4 | Jane Roe    | 800
|5 | Richard Roe | 900
]

---

# Renaming columns

Any column can be renamed using the **AS** operator

```sql
SELECT id AS num, name, salary - taxes AS net_salary FROM Employee;
```

.sqltable[
|num|name|net_salary
|-|-|
|1 | John Doe    | 800
|2 | Jane Doe    | 700
|3 | John Smith  | 850
|4 | Jane Roe    | 800
|5 | Richard Roe | 900
]

---

---

# WHERE

* The **WHERE** command allows us to filter which rows we want in our result table according to a condition.
* The condition can use any comparison operator (<, >, <=, <>, ...) and can be composed using AND, OR and NOT.

---
# Example

```sql
SELECT * FROM Employee WHERE dep_num = 2 OR salary <= 900;
```

.sqltable[
|id|name|salary|taxes|dep_num
|-|-|
|2 | Jane Doe    | 800  | 100 | 2
|3 | John Smith  | 1200 | 350 | 2
|5 | Richard Roe | 900  | 0   | NULL
]

---

# Example

To test if a value is null, we have to use the special **IS NULL** operator.

```sql
SELECT * FROM Employee WHERE dep_num IS NULL;
```

Use **IS NOT NULL** to select rows where a certain attribute is not null.

---

# Removing duplicates

We can remove duplicates from the final result by using the DISTINCT operator

```sql
SELECT DISTINCT salary FROM Employee;
```

---
template: inverse
name: set
# Set operators

---

# Set operators

Two tables are compatible for being used in a set operation if they have the same **number of columns** and the **type** of each column is **compatible**:

* The **UNION** between two tables (R1 and R2) is a table that includes all lines that are present in R1 or R2.
* The **INTERSECT**ion between two tables (R1 and R2) is a table that includes all lines that are present in R1 and R2.
* The difference (**EXCEPT**) between two tables (R1 and R2) is a table that includes all lines that are present in R1 but not in R2.

With all these operators, **duplicate** rows are **eliminated** automatically.

---

# Union, Intersection and Difference

![](../img/sql/setoperators.png)

---

# Example

```sql
SELECT * FROM Employee WHERE salary >= 1000;
```

.sqltable[
|id|name|salary|taxes|dep_num
|-|-|
|1 | John Doe    | 1000 | 200 | 1
|3 | John Smith  | 1200 | 350 | 2
|4 | Jane Roe    | 1000 | 200 | 3
]

```sql
SELECT * FROM Employee WHERE dep_num = 2;
```

.sqltable[
|id|name|salary|taxes|dep_num
|-|-|
|2 | Jane Doe    | 800  | 100 | 2
|3 | John Smith  | 1200 | 350 | 2
]

---

# Union

```sql
SELECT * FROM Employee WHERE salary >= 1000;
UNION
SELECT * FROM Employee WHERE dep_num = 2;
```

.sqltable[
|id|name|salary|taxes|dep_num
|-|-|
|1 | John Doe    | 1000 | 200 | 1
|2 | Jane Doe    | 800  | 100 | 2
|3 | John Smith  | 1200 | 350 | 2
|4 | Jane Roe    | 1000 | 200 | 3
]

---

# Intersection

```sql
SELECT * FROM Employee WHERE salary >= 1000;
INTERSECT
SELECT * FROM Employee WHERE dep_num = 2;
```

---

# Difference

```sql
SELECT * FROM Employee WHERE salary >= 1000;
EXCEPT
SELECT * FROM Employee WHERE dep_num = 2;
```

.sqltable[
|id|name|salary|taxes|dep_num
|-|-|
|1 | John Doe    | 1000 | 200 | 1
|4 | Jane Roe    | 1000 | 200 | 3
]

---

---

# Cartesian product

* The cartesian product allows us to combine rows from **different tables**.
* To use it, we just have to indicate which tables we want to combine using commas to separate them.
* The result is a table containing the columns of all tables and **all possible combinations** of rows.
* Also known as **CROSS JOIN**.

---

# Example

```sql
SELECT * FROM Employee;
```

.smaller.sqltable[
|id|name|salary|taxes|dep_num
|-|-|
|1 | John Doe    | 1000 | 200 | 1
|2 | Jane Doe    | 800  | 100 | 2
|3 | John Smith  | 1200 | 350 | 2
|4 | Jane Roe    | 1000 | 200 | 3
|5 | Richard Roe | 900  | 0   | NULL
]

```sql
SELECT * FROM Department;
```

.smaller.sqltable[
|num|name
|-|-|
|1 | Marketing
|2 | Sales
|3 | Production
]

---

.small-code[
```sql
SELECT * FROM Employee, Department;
-- or
SELECT * FROM Employee CROSS JOIN Department;
```
]

.smaller.sqltable[
| id |     name    | salary | taxes | dep_num | num | name       |
|----|-------------|--------|-------|---------|-----|------------|
| 1  | John Doe    | 1000   | 200   | 1       | 1   | Marketing  |
| 2  | Jane Doe    | 800    | 100   | 2       | 1   | Marketing  |
| 3  | John Smith  | 1200   | 350   | 2       | 1   | Marketing  |
| 4  | Jane Roe    | 1000   | 200   | 3       | 1   | Marketing  |
| 5  | Richard Roe | 900    | 0     | NULL    | 1   | Marketing  |
| 1  | John Doe    | 1000   | 200   | 1       | 2   | Sales      |
| 2  | Jane Doe    | 800    | 100   | 2       | 2   | Sales      |
| 3  | John Smith  | 1200   | 350   | 2       | 2   | Sales      |
| 4  | Jane Roe    | 1000   | 200   | 3       | 2   | Sales      |
| 5  | Richard Roe | 900    | 0     | NULL    | 2   | Sales      |
| 1  | John Doe    | 1000   | 200   | 1       | 3   | Production |
| 2  | Jane Doe    | 800    | 100   | 2       | 3   | Production |
| 3  | John Smith  | 1200   | 350   | 2       | 3   | Production |
| 4  | Jane Roe    | 1000   | 200   | 3       | 3   | Production |
| 5  | Richard Roe | 900    | 0     | NULL    | 3   | Production |
]

---
template: top

# Joining using WHERE

```sql
SELECT * FROM Employee, Department WHERE dep_num = num;
```

.smaller.sqltable[
|  id   |      name       |  salary  |  taxes  | dep_num  | num  |      name
|-------|-----------------|----------|---------|----------|------|---------------
|   1   |   John Doe      |   1000   |   200   |   1      |  1   |   Marketing
| ~~2~~ | ~~Jane Doe~~    | ~~800~~  | ~~100~~ | ~~2~~    |~~1~~ | ~~Marketing~~
| ~~3~~ | ~~John Smith~~  | ~~1200~~ | ~~350~~ | ~~2~~    |~~1~~ | ~~Marketing~~
| ~~4~~ | ~~Jane Roe~~    | ~~1000~~ | ~~200~~ | ~~3~~    |~~1~~ | ~~Marketing~~
| ~~5~~ | ~~Richard Roe~~ | ~~900~~  | ~~0~~   | ~~NULL~~ |~~1~~ | ~~Marketing~~
| ~~1~~ | ~~John Doe~~    | ~~1000~~ | ~~200~~ | ~~1~~    |~~2~~ | ~~Sales~~
|   2   |   Jane Doe      |   800    |   100   |   2      |  2   |   Sales
|   3   |   John Smith    |   1200   |   350   |   2      |  2   |   Sales
| ~~4~~ | ~~Jane Roe~~    | ~~1000~~ | ~~200~~ | ~~3~~    |~~2~~ | ~~Sales~~
| ~~5~~ | ~~Richard Roe~~ | ~~900~~  | ~~0~~   | ~~NULL~~ |~~2~~ | ~~Sales~~
| ~~1~~ | ~~John Doe~~    | ~~1000~~ | ~~200~~ | ~~1~~    |~~3~~ | ~~Production~~
| ~~2~~ | ~~Jane Doe~~    | ~~800~~  | ~~100~~ | ~~2~~    |~~3~~ | ~~Production~~
| ~~3~~ | ~~John Smith~~  | ~~1200~~ | ~~350~~ | ~~2~~    |~~3~~ | ~~Production~~
|   4   |   Jane Roe      |   1000   |   200   |   3      |  3   |   Production
| ~~5~~ | ~~Richard Roe~~ | ~~900~~  | ~~0~~   | ~~NULL~~ |~~3~~ | ~~Production~~
]

---
template: top

# Joining using WHERE

```sql
SELECT * FROM Employee, Department WHERE dep_num = num;
```

.smaller.sqltable[
|id|name|salary|taxes|dep_num|num|name
|-|-|
|1 | John Doe    | 1000 | 200 | 1    |1 | Marketing
|2 | Jane Doe    | 800  | 100 | 2    |2 | Sales
|3 | John Smith  | 1200 | 350 | 2    |2 | Sales
|4 | Jane Roe    | 1000 | 200 | 3    |3 | Production
]

---
template: top
name: ambiguities

```sql
SELECT * FROM Employee, Department WHERE dep_num = num;
```

When selecting from more than one table, columns with the same name might lead to ambiguities.
.flex[
```sql
SELECT id, name, name FROM Employee, Department WHERE dep_num = num;
```
&#x274C;
]

---
template: ambiguities

To solve them we must use the table name before the column name:

.flex[
```sql
SELECT id, Employee.name, Department.name 
FROM Employee, Department
WHERE dep_num = num;
```

---

# Join using JOIN ... ON

* Instead of using a cartesian product followed by the WHERE keyword, we can use the more specific keywords: **JOIN ON**.
* These keywords allow us to specify simultaneously **which tables** to join and with which joining **condition**.
* Separates regular row filtering from joining conditions.
* Makes joining lots of tables **easier** to understand.
* These are also called *inner joins*.

```sql
SELECT * FROM Employee, Department WHERE dep_num = num;
```

Same as:

```sql
SELECT * FROM Employee JOIN Department ON dep_num = num;
```

---

# Join using JOIN ... ON

```sql
SELECT * FROM Employee JOIN Department ON dep_num = num;
```

---

# Join using JOIN ... USING

If the columns used in the join operation have the same name, we can join them using in a simpler way with JOIN USING.

```sql
SELECT * FROM Employee;
```

.smaller.sqltable[
|id|name|salary|taxes|num
|-|-|
|1 | John Doe | 1000 | 200 | 1
|2 | Jane Doe | 800 | 100 | 2
|3 | John Smith | 1200 | 350 | 2
|4 | Jane Roe | 1000 | 200 | 3
|5 | Richard Roe | 900 | 0 | NULL
]

```sql
SELECT * FROM Department;
```

.smaller.sqltable[
|num|name
|-|-|
|1 | Marketing
|2 | Sales
|3 | Production
]

---

# Join using JOIN ... USING

```sql
SELECT * FROM Employee JOIN Department USING(num);
```

.smaller.sqltable[
|id|name|salary|taxes|num|name
|-|-|
|1 | John Doe | 1000 | 200 | 1 | Marketing
|2 | Jane Doe | 800 | 100 | 2 | Sales
|3 | John Smith | 1200 | 350 | 2 | Sales
|4 | Jane Roe | 1000 | 200 | 3 | Production
]

---

# Join LEFT, RIGHT and FULL

* Sometimes, when joining tables, some of the rows are left out as they do not match any rows on the other table.
* If we want these rows to be present in the result, we must use what is called and *outer join*.
* There are three types of *outer joins*:
  * LEFT - Rows on the **left** table that do not match any row in the right table are kept.
  * RIGHT - Rows on the **right** table that do not match any row in the left table are kept.
  * FULL - Rows on any of the tables that do not match any row in the other table are kept.

---

# JOIN vs LEFT JOIN

```sql
SELECT * FROM Employee JOIN Department USING(num);
```

.smaller.sqltable[
|id|name|salary|taxes|num|name
|-|-|
|1 | John Doe    | 1000 | 200 | 1 | Marketing
|2 | Jane Doe    | 800  | 100 | 2 | Sales
|3 | John Smith  | 1200 | 350 | 2 | Sales
|4 | Jane Roe    | 1000 | 200 | 3 | Production
]

```sql
SELECT * FROM Employee LEFT JOIN Department USING(num);
```

.smaller.sqltable[
|id|name|salary|taxes|num|name
|-|-|
|1 | John Doe    | 1000 | 200 | 1    | Marketing
|2 | Jane Doe    | 800  | 100 | 2    | Sales
|3 | John Smith  | 1200 | 350 | 2    | Sales
|4 | Jane Roe    | 1000 | 200 | 3    | Production
|**5** | **Richard Roe** | **900**  | **0**   | **NULL** | **NULL**
]

---

# Renaming tables

* We can also rename tables.
* This might be useful if we need to use the **same table twice** in the same query.
* Or if we want to make the table names **simpler** in a complicated query.

&nbsp;

For example, say we have the following table of employees:

.smaller.sqltable[
|id|name|sup_id
|-|-|
|1 | John Doe    | NULL
|2 | Jane Doe    | 1
|3 | John Smith  | 1
|4 | Jane Roe    | NULL
|5 | Richard Roe | 4
]

And we want to know the name of the supervisor of each employee.

---

# Renaming tables

* We can join the table with itself in order to obtain the names of the supervisors.
* But, when using a table twice in the same query, we must **use a different name for each instance of the table**:

```sql
  SELECT *
  FROM Employee JOIN Employee AS Supervisor 
    ON Employee.sup_id = Supervisor.id;
```

.smaller.sqltable[
|id|name|sup_id|id|name|sup_id
|-|-|
|2 | Jane Doe    | 1 |1 | John Doe    | NULL
|3 | John Smith  | 1 |1 | John Doe    | NULL
|5 | Richard Roe | 4 |4 | Jane Roe    | NULL
]

---

---

# Aggregate Functions

There are **five** special aggregate functions defined in the SQL language:

```sql
MIN, MAX, SUM, AVG and COUNT
```

.small-code[
```sql
SELECT MIN(val) FROM MyTable;            -- 1
SELECT MAX(val) FROM MyTable;            -- 3
SELECT SUM(val) FROM MyTable;            -- 8
SELECT AVG(val) FROM MyTable;            -- 2
SELECT COUNT(val) FROM MyTable;          -- 4 (counts non null values)
SELECT COUNT(DISTINCT val) FROM MyTable; -- 3 (counts distinct non null values)
SELECT COUNT(*) FROM MyTable;            -- 5 (counts lines)
```
]

---

# Aggregate Functions

When using aggregate functions, all rows are grouped into a single row.

.sqltable[
|id|name|salary|taxes|num|d_name
|-|-|
|1 | John Doe    | 700 | 200 | 1 | Marketing
|2 | Jane Doe    | 800  | 100 | 2 | Sales
|3 | John Smith  | 1500 | 350 | 2 | Sales
|4 | Jane Roe    | 1000 | 200 | 3 | Production
]

```sql
SELECT AVG(salary) FROM Employee;       -- 1000
SELECT name, AVG(salary) FROM Employee; -- Does not work
```

---

# GROUP BY

Groups the rows into sets based on the value of a specific column or columns.

.smaller.sqltable[
|id|name|salary|taxes|num|d_name
|-|-|
|1 | John Doe    | 700  | 200 | 1 | Marketing
|2 | Jane Doe    | 800  | 100 | 2 | Sales
|3 | John Smith  | 1500 | 350 | 2 | Sales
|4 | Jane Roe    | 1000 | 200 | 3 | Production
]

```sql
SELECT AVG(salary) FROM Employee GROUP BY num;
```

* All rows with same value in the *num* column get grouped together.
* The aggregate functions are used inside each group.

---

# GROUP BY

Only columns used in **GROUP BY** expressions can be selected without using an aggregate function.

```sql
SELECT num, AVG(salary) FROM Employee GROUP BY num;
```

---

# GROUP BY

Only columns used in **GROUP BY** expressions can be selected without using an aggregate function.

```sql
SELECT num, d_name, AVG(salary) FROM Employee GROUP BY num, d_name;
```

---

# HAVING

Grouped rows can be filtered using the **HAVING** clause.

```sql
SELECT num, d_name, AVG(salary)
FROM Employee
GROUP BY num, d_name
HAVING AVG(salary) >= 1000;
```

---

---

# ORDER BY

* The order in which rows are sorted in a query result is unpredictable.
* You can sort the end result using the **ORDER BY** clause.

```sql
SELECT * FROM Employee ORDER BY salary;
```

.sqltable[
|id|name|salary|taxes|num|d_name
|-|-|
|1 | John Doe    | **700**  | 200 | 1 | Marketing
|2 | Jane Doe    | **800**  | 100 | 2 | Sales
|4 | Jane Roe    | **1000** | 200 | 3 | Production
|3 | John Smith  | **1500** | 350 | 2 | Sales
]

---

# ORDER BY

* By default values are sorted in **ascending** order.
* We can change the default order using the **ASC** and **DESC** clauses.

```sql
SELECT * FROM Employee ORDER BY salary ASC; -- default
```

.smaller.sqltable[
|id|name|salary|taxes|num|d_name
|-|-|
|1 | John Doe    | **700**  | 200 | 1 | Marketing
|2 | Jane Doe    | **800**  | 100 | 2 | Sales
|4 | Jane Roe    | **1000** | 200 | 3 | Production
|3 | John Smith  | **1500** | 350 | 2 | Sales
]

```sql
SELECT * FROM Employee ORDER BY salary DESC;
```

.smaller.sqltable[
|id|name|salary|taxes|num|d_name
|-|-|
|3 | John Smith  | **1500** | 350 | 2 | Sales
|4 | Jane Roe    | **1000** | 200 | 3 | Production
|2 | Jane Doe    | **800**  | 100 | 2 | Sales
|1 | John Doe    | **700**  | 200 | 1 | Marketing
]

---

# ORDER BY

* We can also sort by more than one column.
* When this happens, we first sort using the first column and if there is a tie we use the next column.

```sql
SELECT * FROM Employee ORDER BY taxes DESC, salary ASC;
SELECT * FROM Employee ORDER BY taxes DESC, salary;     -- equivalent
```

.sqltable[
|id|name|salary|taxes|num|d_name
|-|-|
|3 | John Smith  | **1500** | 350 | 2 | Sales
|1 | John Doe    | **700**  | 200 | 1 | Marketing
|4 | Jane Roe    | **1000** | 200 | 3 | Production
|2 | Jane Doe    | **800**  | 100 | 2 | Sales
]

---

---

# LIMIT

The number of rows returned by a query can be limited using the **LIMIT** clause.

```sql
SELECT * FROM Employee ORDER BY salary DESC LIMIT 2;
```

.sqltable[
|id|name|salary|taxes|num|d_name
|-|-|
|3 | John Smith  | 1500 | 350 | 2 | Sales
|4 | Jane Roe    | 1000 | 200 | 3 | Production
]

---

# OFFSET

When using the **LIMIT** clause, we can also indicate how many rows to skip using the **OFFSET** clause.

```sql
SELECT * FROM Employee ORDER BY salary DESC LIMIT 2 OFFSET 1;
```

.sqltable[
|id|name|salary|taxes|num|d_name
|-|-|
|4 | Jane Roe    | 1000 | 200 | 3 | Production
|2 | Jane Doe    | 800  | 100 | 2 | Sales
]

---

# Pagination

**LIMIT** and **OFFSET** can be used to paginate our results.

```sql
SELECT * FROM Employee ORDER BY id LIMIT 10 OFFSET 20;
```

```sql
SELECT * FROM Employee ORDER BY id LIMIT R OFFSET R * (N - 1);
```

---

---

# A Complete Query

* A query will always start with the **SELECT** clause.
* All the other clauses are optional.
* But they always follow the same order.

```sql
SELECT c1, c2, SUM(c3), AVG(c4) AS c5
FROM T1
  JOIN T2 ON T1.foreign_key = T2.primary_key
  JOIN T3 USING (join_column)
WHERE condition_1 AND (condition_2 OR condition_3)
GROUP BY c1, c2
HAVING c5 > 10
ORDER BY c5
LIMIT 5 OFFSET 10;
```

---

---

# LIKE

The **LIKE** operator can be used to compare strings using simple patterns.

* The **%** particle means zero or more characters
* The **_** character means exactly one character.

.smaller.sqltable[
|id|name
|-|-|
|1 | John Doe
|2 | Jane Doe
|3 | Jean Doe
|4 | Jennifer Doe
|5 | William Doe
]

.small-code[
```sql
SELECT * FROM People WHERE name LIKE '% Doe'    -- All five rows
SELECT * FROM People WHERE name LIKE 'J% Doe'   -- John, Jane, Jean and Jennifer
SELECT * FROM People WHERE name LIKE 'J___ Doe' -- John, Jane and Jean
SELECT * FROM People WHERE name LIKE '_e%'      -- Jean and Jennifer
```
]

.box_info[In SQLite, LIKE is case-insensitive. To make it case-sensitive, use: &nbsp; `PRAGMA case_sensitive_like = ON;`]

---

---

# Queries as Tables

**The result of a query is a table**. This allows us to use queries in the same
places we use tables.

```sql
  SELECT *
  FROM (
      SELECT *
      FROM Employee JOIN Department USING(d_num)
      WHERE Department.name = 'Logistics'
  ) AS logistic_employees JOIN Works ON logistic_employees.id = Works.id
  WHERE Works.hours > 10;
```

.box_info[In the SQL standard, *subqueries* must be named using the **AS** particle. In SQLite this rule is not enforced.]

---

# Subquery Expressions

Subquery expressions can be used to combine two queries in the **WHERE** or
**HAVING** clauses.

There are several subquery expressions:

```sql
expression IN subquery
expression NOT IN subquery
```

---

# IN and NOT IN

```sql
expression IN subquery
expression NOT IN subquery
```

These expressions test if a value (or values) exist (or not) in the subquery.

```sql
 SELECT * -- Employees whose id exists in the subquery
 FROM Employee
 WHERE Employee.id IN (
 SELECT emp_id -- The ids of employees that have worked
 FROM Works -- more than 10 hours in projects
 GROUP BY emp_id
 HAVING SUM(hours) > 10
 );
```</code></pre></figure>