1
0
Fork 0
mirror of https://github.com/treffynnon/sqlstyle.guide.git synced 2025-03-09 12:49:51 -05:00

First draft of query style

DE-10835
This commit is contained in:
apurvis@lumoslabs.com 2016-04-07 10:39:04 -04:00
parent a67f5fe8f1
commit 0495c287af
2 changed files with 84 additions and 82 deletions

View file

@ -1,6 +1,6 @@
# SQL style guide
**[Read the guide](http://www.sqlstyle.guide)**
**[Read the guide](includes/sqlstyle.guide.md)**
---

View file

@ -2,23 +2,14 @@
## Overview
You can use this set of guidelines, [fork them][fork] or make your own - the
key here is that you pick a style and stick to it. To suggest changes
or fix bugs please open an [issue][] or [pull request][pull] on GitHub.
These are guidelines to help you write SQL queries that will be easier to read.
These guidelines are designed to be compatible with Joe Celko's [SQL Programming
Style][celko] book to make adoption for teams who have already read that book
easier. This guide is a little more opinionated in some areas and in others a
little more relaxed. It is certainly more succinct where [Celko's book][celko]
contains anecdotes and reasoning behind each rule as thoughtful prose.
Remember that even if you hate a given style at first, generally speaking it is
far more important that we have _any_ agreed upon style than that we all like it.
It is easy to include this guide in [Markdown format][dl-md] as a part of a
project's code base or reference it here for anyone on the project to freely
read—much harder with a physical book.
SQL style guide by [Simon Holywell][simon] is licensed under a [Creative Commons
Attribution-ShareAlike 4.0 International License][licence].
Based on a work at [http://www.sqlstyle.guide][self].
**Queries submitted to the Data Engineering or Data Science teams _must_ follow
the style guide.** Reading queries is tough enough already without figuring out
that you prefer a different indentation style.
## General
@ -45,21 +36,6 @@ Based on a work at [http://www.sqlstyle.guide][self].
* Quoted identifiers—if you must use them then stick to SQL92 double quotes for
portability (you may need to configure your SQL server to support this depending
on vendor).
* Object oriented design principles should not be applied to SQL or database
structures.
```sql
SELECT file_hash -- stored ssdeep hash
FROM file_system
WHERE file_name = '.vimrc';
```
```sql
/* Updating the file record after writing to the file */
UPDATE file_system
SET file_modified_date = '1980-02-22 13:19:01.00000',
file_size = 209732
WHERE file_name = '.vimrc';
```
## Naming conventions
@ -67,6 +43,8 @@ UPDATE file_system
* Ensure the name is unique and does not exist as a
[reserved keyword][reserved-keywords].
* Avoid abbreviations and if you have to use them make sure they are commonly
understood.
* Keep the length to a maximum of 30 bytes—in practice this is 30 characters
unless you are using multi-byte character set.
* Names must begin with a letter and may not end with an underscore.
@ -74,12 +52,10 @@ UPDATE file_system
* Avoid the use of multiple consecutive underscores—these can be hard to read.
* Use underscores where you would naturally include a space in the name (first
name becomes `first_name`).
* Avoid abbreviations and if you have to use them make sure they are commonly
understood.
```sql
SELECT first_name
FROM staff;
FROM staff;
```
### Tables
@ -111,13 +87,13 @@ SELECT first_name
```sql
SELECT first_name AS fn
FROM staff AS s1
JOIN students AS s2
ON s2.mentor_id = s1.staff_num;
FROM staff AS s1
JOIN students AS s2
ON s2.mentor_id = s1.staff_num;
```
```sql
SELECT SUM(s.monitor_tally) AS monitor_total
FROM staff AS s;
FROM staff AS s;
```
### Stored procedures
@ -132,13 +108,14 @@ The following suffixes have a universal meaning ensuring the columns can be read
and understood easily from SQL code. Use the correct suffix where appropriate.
* `_id`—a unique identifier such as a column that is a primary key.
* `_at`-denotes a column that contains the time of something.
* `_date`—denotes a column that contains the date of something.
* `_status`—flag value or some other status of any type such as
`publication_status`.
* `_total`—the total or sum of a collection of values.
* `_num`—denotes the field contains any kind of number.
* `_name`—signifies a name such as `first_name`.
* `_seq`—contains a contiguous sequence of values.
* `_date`—denotes a column that contains the date of something.
* `_tally`—a count.
* `_size`—the size of something such as a file size or clothing.
* `_addr`—an address for the record could be physical or intangible such as
@ -159,8 +136,8 @@ exists performing the same function. This helps to make code more portable.
```sql
SELECT model_num
FROM phones AS p
WHERE p.release_date > '2014-09-30';
FROM phones AS p
WHERE p.release_date > '2014-09-30';
```
### White space
@ -170,22 +147,6 @@ spacing is used. Do not crowd code or remove natural language spaces.
#### Spaces
Spaces should be used to line up the code so that the root keywords all end on
the same character boundary. This forms a river down the middle making it easy for
the readers eye to scan over the code and separate the keywords from the
implementation detail. Rivers are [bad in typography][rivers], but helpful here.
```sql
SELECT f.average_height, f.average_diameter
FROM flora AS f
WHERE f.species_name = 'Banksia'
OR f.species_name = 'Sheoak'
OR f.species_name = 'Wattle';
```
Notice that `SELECT`, `FROM`, etc. are all right aligned while the actual column
names and implementation specific details are left aligned.
Although not exhaustive always include spaces:
* before and after equals (`=`)
@ -195,9 +156,9 @@ Although not exhaustive always include spaces:
```sql
SELECT a.title, a.release_date, a.recording_date
FROM albums AS a
WHERE a.title = 'Charcoal Lane'
OR a.title = 'The New Danger';
FROM albums AS a
WHERE a.title = 'Charcoal Lane'
OR a.title = 'The New Danger';
```
#### Line spacing
@ -205,6 +166,7 @@ SELECT a.title, a.release_date, a.recording_date
Always include newlines/vertical space:
* before `AND` or `OR`
* after WITH subqueries
* after semicolons to separate queries for easier reading
* after each keyword definition
* after a comma when separating multiple columns into logical groups
@ -230,9 +192,9 @@ UPDATE albums
```sql
SELECT a.title,
a.release_date, a.recording_date, a.production_date -- grouped dates together
FROM albums AS a
WHERE a.title = 'Charcoal Lane'
OR a.title = 'The New Danger';
FROM albums AS a
WHERE a.title = 'Charcoal Lane'
OR a.title = 'The New Danger';
```
### Indentation
@ -240,21 +202,61 @@ SELECT a.title,
To ensure that SQL is readable it is important that standards of indentation
are followed.
**ONLY** the fundamental keywords - `SELECT`, `FROM`, `WHERE`, `GROUP BY`, `HAVING`, `LIMIT`,
and `ORDER BY`should be fully left justified. Other clauses should be indented to the end of
that keyword.
```sql
SELECT first_name,
last_name,
is_still_tippin_on_four_fours,
is_still_wrapped_in_four_vogues
FROM rappers
WHERE first_name = 'Mike'
AND last_name = 'Jones'
```
This allows the reader to quickly scan for the important building blocks of the query.
#### Joins
Joins should be indented to the other side of the river and grouped with a new
line where necessary.
Single line `JOIN`s are fine for simple situations
```sql
SELECT r.last_name
FROM riders AS r
INNER JOIN bikes AS b
ON r.bike_vin_num = b.vin_num
AND b.engines > 2
FROM riders AS r
INNER JOIN bikes b ON r.bike_vin_num = b.vin_num
INNER JOIN crew c ON r.crew_chief_last_name = c.last_name
```
INNER JOIN crew AS c
ON r.crew_chief_last_name = c.last_name
AND c.chief = 'Y';
Multi line JOINs should be indented the same as base keywords:
```sql
SELECT r.last_name
FROM riders AS r
INNER JOIN bikes b
ON r.bike_vin_num = b.vin_num
AND r.bike_lane = r.lane
```
#### WITH statements (postgres only)
Indent them until the closing parentheses.
```
WITH my_tmp_table AS (
SELECT r.last_name
FROM riders AS r
INNER JOIN bikes b
ON r.bike_vin_num = b.vin_num
AND r.bike_lane = r.lane
)
SELECT *
FROM my_tmp_table
```
#### Sub-queries
@ -270,12 +272,12 @@ SELECT r.last_name,
FROM champions AS c
WHERE c.last_name = r.last_name
AND c.confirmed = 'Y') AS last_championship_year
FROM riders AS r
WHERE r.last_name IN
(SELECT c.last_name
FROM champions AS c
WHERE YEAR(championship_date) > '2008'
AND c.confirmed = 'Y');
FROM riders AS r
WHERE r.last_name IN
(SELECT c.last_name
FROM champions AS c
WHERE YEAR(championship_date) > '2008'
AND c.confirmed = 'Y');
```
### Preferred formalisms
@ -294,10 +296,10 @@ SELECT CASE postcode
WHEN 'BN1' THEN 'Brighton'
WHEN 'EH1' THEN 'Edinburgh'
END AS city
FROM office_locations
WHERE country = 'United Kingdom'
AND opening_time BETWEEN 8 AND 9
AND postcode IN ('EH1', 'BN1', 'NN1', 'KW1')
FROM office_locations
WHERE country = 'United Kingdom'
AND opening_time BETWEEN 8 AND 9
AND postcode IN ('EH1', 'BN1', 'NN1', 'KW1')
```
## Create syntax
@ -331,7 +333,7 @@ about though so it is important that a standard set of guidelines are followed.
#### Choosing keys
Deciding the column(s) that will form the keys in the definition should be a
Deciding the column(s) that will form the keys in the definition should be a
carefully considered activity as it will effect performance and data integrity.
1. The key should be unique to some degree.
@ -1264,7 +1266,7 @@ ZONE
[rivers]: http://practicaltypography.com/one-space-between-sentences.html
"Practical Typography: one space between sentences"
[reserved-keywords]: #reserved-keyword-reference
"Reserved keyword reference"
"Reserved keyword reference"
[eav]: https://en.wikipedia.org/wiki/Entity%E2%80%93attribute%E2%80%93value_model
"Wikipedia: Entityattributevalue model"
[self]: http://www.sqlstyle.guide