I’d like to work an example of using SQL WITH Common Table Expressions to produce more legible

SQL.

A major complaint with SQL is that it composes statements by right-ward nesting.

That is: a sequence of operations `A -> B -> C`

is represented as `SELECT C FROM SELECT B FROM SELECT A`

. However, the SQL 99 standard introduced the `WITH`

statement and common table expressions (ref). `WITH`

statements allow forward composition.

Let’s take a look at asking the `data_algebra`

(ref) to emit SQL with and without common table expressions.

First we set up some example data.

```
import sqlite3
from data_algebra.data_ops import *
import data_algebra.test_util
import data_algebra.SQLite
d = data_algebra.default_data_model.pd.DataFrame({
'x': [1, 2, 3]
})
d
```

x | |
---|---|

0 | 1 |

1 | 2 |

2 | 3 |

Next we set up our calculations. Please note the order they are performed.

```
ops = describe_table(d, table_name='d') .\
extend({'z': 'x + 1'}) .\
extend({'q': 'z + 2'}) .\
extend({'h': 'q + 3'})
ops
```

TableDescription( table_name='d', column_names=[ 'x']) .\ extend({ 'z': 'x + 1'}) .\ extend({ 'q': 'z + 2'}) .\ extend({ 'h': 'q + 3'})

```
res_pandas = ops.transform(d)
res_pandas
```

x | z | q | h | |
---|---|---|---|---|

0 | 1 | 2 | 4 | 7 |

1 | 2 | 3 | 5 | 8 |

2 | 3 | 4 | 6 | 9 |

```
expect = data_algebra.default_data_model.pd.DataFrame({
'x': [1, 2, 3],
'z': [2, 3, 4],
'q': [4, 5, 6],
'h': [7, 8, 9]
})
assert data_algebra.test_util.equivalent_frames(res_pandas, expect)
```

```
db_model = data_algebra.SQLite.SQLiteModel()
with sqlite3.connect(":memory:") as conn:
db_model.prepare_connection(conn)
db_handle = db_model.db_handle(conn)
db_handle.insert_table(d, table_name='d')
sql_regular = db_handle.to_sql(ops, pretty=True, use_with=False, annotate=True)
res_regular = db_handle.read_query(sql_regular)
sql_with = db_handle.to_sql(ops, pretty=True, use_with=True, annotate=True)
res_with = db_handle.read_query(sql_with)
assert data_algebra.test_util.equivalent_frames(res_regular, expect)
assert data_algebra.test_util.equivalent_frames(res_with, expect)
```

The standard nested SQL for these operations looks like the following.

```
print(sql_regular)
```

SELECT -- extend({ 'h': 'q + 3'}) "x", "z", "q", "q" + 3 AS "h" FROM (SELECT -- extend({ 'q': 'z + 2'}) "x", "z", "z" + 2 AS "q" FROM (SELECT -- extend({ 'z': 'x + 1'}) "x", "x" + 1 AS "z" FROM "d") "extend_0") "extend_1"

Notice the variables are generated in reverse order (h, q, z instead lf z, q, h).

The common table expression version looks like this, which involves less nesting and values move forward notation.

```
print(sql_with)
```

WITH "extend_0" AS (SELECT -- extend({ 'z': 'x + 1'}) "x", "x" + 1 AS "z" FROM "d"), "extend_1" AS (SELECT -- extend({ 'q': 'z + 2'}) "x", "z", "z" + 2 AS "q" FROM "extend_0") SELECT -- extend({ 'h': 'q + 3'}) "x", "z", "q", "q" + 3 AS "h" FROM "extend_1"

It is interesting to note when `WITH`

or common table expressions became widely available. The Wikipedia has the versions (and hence dates) (ref) showing when common table expressions are supported in the following

databases.

- Teradata (starting with version 14) (2012)
- Microsoft SQL Server (starting with version 2005)
- Oracle (with recursion since 11g release 2) (2009)
- PostgreSQL (since 8.4) (2009)
- MariaDB (since 10.2) (2017)
- MySQL (since 8.0) (2016)
- SQLite (since 3.8.3) (2014)
- DB2 (starting with version 11.5 Mod Pack 2 (ref) (2019)

Some of the cost of implementing common table expressions, is they are where databases allow recursive or fixed-point semantic extensions. From the database point of view these are major semantic changes, not mere notational conveniences.

Categories: data science Opinion Programming Tutorials