First draft of query style

DE-10835
2025-03-09 12:49:51 -05:00 · 2016-04-07 10:39:04 -04:00 · 2016-04-07 10:39:04 -04:00 · 0495c287af
commit 0495c287af
parent a67f5fe8f1
2 changed files with 84 additions and 82 deletions
--- a/README.md
+++ b/README.md
@ -1,6 +1,6 @@
 # SQL style guide

-**[Read the guide](http://www.sqlstyle.guide)**
+**[Read the guide](includes/sqlstyle.guide.md)**

 ---

--- a/_includes/sqlstyle.guide.md
+++ b/_includes/sqlstyle.guide.md
@ -2,23 +2,14 @@

 ## Overview

-You can use this set of guidelines, [fork them][fork] or make your own - the
-key here is that you pick a style and stick to it. To suggest changes
-or fix bugs please open an [issue][] or [pull request][pull] on GitHub.
+These are guidelines to help you write SQL queries that will be easier to read.

-These guidelines are designed to be compatible with Joe Celko's [SQL Programming
-Style][celko] book to make adoption for teams who have already read that book
-easier. This guide is a little more opinionated in some areas and in others a
-little more relaxed. It is certainly more succinct where [Celko's book][celko]
-contains anecdotes and reasoning behind each rule as thoughtful prose.
+Remember that even if you hate a given style at first, generally speaking it is
+far more important that we have _any_ agreed upon style than that we all like it.

-It is easy to include this guide in [Markdown format][dl-md] as a part of a
-project's code base or reference it here for anyone on the project to freely
-read—much harder with a physical book.
-
-SQL style guide by [Simon Holywell][simon] is licensed under a [Creative Commons
-Attribution-ShareAlike 4.0 International License][licence].
-Based on a work at [http://www.sqlstyle.guide][self].
+**Queries submitted to the Data Engineering or Data Science teams _must_ follow
+the style guide.**  Reading queries is tough enough already without figuring out
+that you prefer a different indentation style.

 ## General

@ -45,21 +36,6 @@ Based on a work at [http://www.sqlstyle.guide][self].
 * Quoted identifiers—if you must use them then stick to SQL92 double quotes for
  portability (you may need to configure your SQL server to support this depending
  on vendor).
-* Object oriented design principles should not be applied to SQL or database
-  structures.
-
-```sql
-SELECT file_hash  -- stored ssdeep hash
-  FROM file_system
- WHERE file_name = '.vimrc';
-```
-```sql
-/* Updating the file record after writing to the file */
-UPDATE file_system
-   SET file_modified_date = '1980-02-22 13:19:01.00000',
-       file_size = 209732
- WHERE file_name = '.vimrc';
-```

 ## Naming conventions

@ -67,6 +43,8 @@ UPDATE file_system

 * Ensure the name is unique and does not exist as a
  [reserved keyword][reserved-keywords].
+* Avoid abbreviations and if you have to use them make sure they are commonly
+  understood.
 * Keep the length to a maximum of 30 bytes—in practice this is 30 characters
  unless you are using multi-byte character set.
 * Names must begin with a letter and may not end with an underscore.
@ -74,12 +52,10 @@ UPDATE file_system
 * Avoid the use of multiple consecutive underscores—these can be hard to read.
 * Use underscores where you would naturally include a space in the name (first
  name becomes `first_name`).
-* Avoid abbreviations and if you have to use them make sure they are commonly
-  understood.

 ```sql
 SELECT first_name
-  FROM staff;
+FROM staff;
 ```

 ### Tables
@ -111,13 +87,13 @@ SELECT first_name

 ```sql
 SELECT first_name AS fn
-  FROM staff AS s1
-  JOIN students AS s2
-    ON s2.mentor_id = s1.staff_num;
+FROM staff AS s1
+JOIN students AS s2
+  ON s2.mentor_id = s1.staff_num;
 ```
 ```sql
 SELECT SUM(s.monitor_tally) AS monitor_total
-  FROM staff AS s;
+FROM staff AS s;
 ```

 ### Stored procedures
@ -132,13 +108,14 @@ The following suffixes have a universal meaning ensuring the columns can be read
 and understood easily from SQL code. Use the correct suffix where appropriate.

 * `_id`—a unique identifier such as a column that is a primary key.
+* `_at`-denotes a column that contains the time of something.
+* `_date`—denotes a column that contains the date of something.
 * `_status`—flag value or some other status of any type such as
  `publication_status`.
 * `_total`—the total or sum of a collection of values.
 * `_num`—denotes the field contains any kind of number.
 * `_name`—signifies a name such as `first_name`.
 * `_seq`—contains a contiguous sequence of values.
-* `_date`—denotes a column that contains the date of something.
 * `_tally`—a count.
 * `_size`—the size of something such as a file size or clothing.
 * `_addr`—an address for the record could be physical or intangible such as
@ -159,8 +136,8 @@ exists performing the same function. This helps to make code more portable.

 ```sql
 SELECT model_num
-  FROM phones AS p
- WHERE p.release_date > '2014-09-30';
+FROM phones AS p
+WHERE p.release_date > '2014-09-30';
 ```

 ### White space
@ -170,22 +147,6 @@ spacing is used. Do not crowd code or remove natural language spaces.

 #### Spaces

-Spaces should be used to line up the code so that the root keywords all end on
-the same character boundary. This forms a river down the middle making it easy for
-the readers eye to scan over the code and separate the keywords from the
-implementation detail. Rivers are [bad in typography][rivers], but helpful here.
-
-```sql
-SELECT f.average_height, f.average_diameter
-  FROM flora AS f
- WHERE f.species_name = 'Banksia'
-    OR f.species_name = 'Sheoak'
-    OR f.species_name = 'Wattle';
-```
-
-Notice that `SELECT`, `FROM`, etc. are all right aligned while the actual column
-names and implementation specific details are left aligned.
-
 Although not exhaustive always include spaces:

 * before and after equals (`=`)
@ -195,9 +156,9 @@ Although not exhaustive always include spaces:

 ```sql
 SELECT a.title, a.release_date, a.recording_date
-  FROM albums AS a
- WHERE a.title = 'Charcoal Lane'
-    OR a.title = 'The New Danger';
+FROM albums AS a
+WHERE a.title = 'Charcoal Lane'
+   OR a.title = 'The New Danger';
 ```

 #### Line spacing
@ -205,6 +166,7 @@ SELECT a.title, a.release_date, a.recording_date
 Always include newlines/vertical space:

 * before `AND` or `OR`
+* after WITH subqueries
 * after semicolons to separate queries for easier reading
 * after each keyword definition
 * after a comma when separating multiple columns into logical groups
@ -230,9 +192,9 @@ UPDATE albums
 ```sql
 SELECT a.title,
       a.release_date, a.recording_date, a.production_date -- grouped dates together
-  FROM albums AS a
- WHERE a.title = 'Charcoal Lane'
-    OR a.title = 'The New Danger';
+FROM albums AS a
+WHERE a.title = 'Charcoal Lane'
+   OR a.title = 'The New Danger';
 ```

 ### Indentation
@ -240,21 +202,61 @@ SELECT a.title,
 To ensure that SQL is readable it is important that standards of indentation
 are followed.

+**ONLY** the fundamental keywords - `SELECT`, `FROM`, `WHERE`, `GROUP BY`, `HAVING`, `LIMIT`,
+and `ORDER BY`should be fully left justified.  Other clauses should be indented to the end of
+that keyword.
+
+```sql
+SELECT first_name,
+       last_name,
+       is_still_tippin_on_four_fours,
+       is_still_wrapped_in_four_vogues
+FROM rappers
+WHERE first_name = 'Mike'
+  AND last_name = 'Jones'
+```
+
+This allows the reader to quickly scan for the important building blocks of the query.
+
 #### Joins

 Joins should be indented to the other side of the river and grouped with a new
 line where necessary.

+Single line `JOIN`s are fine for simple situations
+
 ```sql
 SELECT r.last_name
-  FROM riders AS r
-       INNER JOIN bikes AS b
-       ON r.bike_vin_num = b.vin_num
-          AND b.engines > 2
+FROM riders AS r
+     INNER JOIN bikes b ON r.bike_vin_num = b.vin_num
+     INNER JOIN crew c ON r.crew_chief_last_name = c.last_name
+```

-       INNER JOIN crew AS c
-       ON r.crew_chief_last_name = c.last_name
-          AND c.chief = 'Y';
+Multi line JOINs should be indented the same as base keywords:
+
+```sql
+SELECT r.last_name
+FROM riders AS r
+     INNER JOIN bikes b
+             ON r.bike_vin_num = b.vin_num
+            AND r.bike_lane = r.lane
+```
+
+#### WITH statements (postgres only)
+
+Indent them until the closing parentheses.
+
+```
+WITH my_tmp_table AS (
+  SELECT r.last_name
+  FROM riders AS r
+       INNER JOIN bikes b
+               ON r.bike_vin_num = b.vin_num
+              AND r.bike_lane = r.lane
+)
+
+SELECT *
+FROM my_tmp_table
 ```

 #### Sub-queries
@ -270,12 +272,12 @@ SELECT r.last_name,
          FROM champions AS c
         WHERE c.last_name = r.last_name
           AND c.confirmed = 'Y') AS last_championship_year
-  FROM riders AS r
- WHERE r.last_name IN
-       (SELECT c.last_name
-          FROM champions AS c
-         WHERE YEAR(championship_date) > '2008'
-           AND c.confirmed = 'Y');
+FROM riders AS r
+WHERE r.last_name IN
+      (SELECT c.last_name
+         FROM champions AS c
+        WHERE YEAR(championship_date) > '2008'
+          AND c.confirmed = 'Y');
 ```

 ### Preferred formalisms
@ -294,10 +296,10 @@ SELECT CASE postcode
       WHEN 'BN1' THEN 'Brighton'
       WHEN 'EH1' THEN 'Edinburgh'
       END AS city
-  FROM office_locations
- WHERE country = 'United Kingdom'
-   AND opening_time BETWEEN 8 AND 9
-   AND postcode IN ('EH1', 'BN1', 'NN1', 'KW1')
+FROM office_locations
+WHERE country = 'United Kingdom'
+  AND opening_time BETWEEN 8 AND 9
+  AND postcode IN ('EH1', 'BN1', 'NN1', 'KW1')
 ```

 ## Create syntax
@ -331,7 +333,7 @@ about though so it is important that a standard set of guidelines are followed.

 #### Choosing keys

-Deciding the column(s) that will form the keys in the definition should be a 
+Deciding the column(s) that will form the keys in the definition should be a
 carefully considered activity as it will effect performance and data integrity.

 1. The key should be unique to some degree.
@ -1264,7 +1266,7 @@ ZONE
 [rivers]: http://practicaltypography.com/one-space-between-sentences.html
    "Practical Typography: one space between sentences"
 [reserved-keywords]: #reserved-keyword-reference
-    "Reserved keyword reference" 
+    "Reserved keyword reference"
 [eav]: https://en.wikipedia.org/wiki/Entity%E2%80%93attribute%E2%80%93value_model
    "Wikipedia: Entity–attribute–value model"
 [self]: http://www.sqlstyle.guide