Merge pull request #1 from lumoslabs/lumos_style

SQL style guide
2025-03-09 12:49:51 -05:00 · 2016-04-15 14:05:10 -04:00 · 2016-04-15 14:05:10 -04:00 · ea4d592ed8
commit ea4d592ed8
parent a67f5fe8f1 b2550fbf18
2 changed files with 146 additions and 92 deletions
--- a/README.md
+++ b/README.md
@ -1,6 +1,6 @@
 # SQL style guide
-**[Read the guide](http://www.sqlstyle.guide)**
+**[Read the guide](_includes/sqlstyle.guide.md)**
 ---
--- a/_includes/sqlstyle.guide.md
+++ b/_includes/sqlstyle.guide.md
@ -2,21 +2,17 @@
 ## Overview
-You can use this set of guidelines, [fork them][fork] or make your own - the
+These are guidelines to help you write SQL queries that will be easier to read.
 key here is that you pick a style and stick to it. To suggest changes
 or fix bugs please open an [issue][] or [pull request][pull] on GitHub.
-These guidelines are designed to be compatible with Joe Celko's [SQL Programming
+Remember that even if you hate a given style at first, generally speaking it is
-Style][celko] book to make adoption for teams who have already read that book
+far more important that we have _any_ agreed upon style than that we all like it.
 easier. This guide is a little more opinionated in some areas and in others a
 little more relaxed. It is certainly more succinct where [Celko's book][celko]
 contains anecdotes and reasoning behind each rule as thoughtful prose.
-It is easy to include this guide in [Markdown format][dl-md] as a part of a
+Queries submitted to the Data teams should follow the style guide, and queries starkly
-project's code base or reference it here for anyone on the project to freely
+contrasting to, or ignorant of, these guidelines may be asked to be reformatted and
-read—much harder with a physical book.
+resubmitted. Feel free to ask about style rationale, or pose a question how you can make
 your query (or often taking a step back, question) adhere.
-SQL style guide by [Simon Holywell][simon] is licensed under a [Creative Commons
+Original SQL style guide by [Simon Holywell][simon] is licensed under a [Creative Commons
 Attribution-ShareAlike 4.0 International License][licence].
 Based on a work at [http://www.sqlstyle.guide][self].
@ -45,28 +41,15 @@ Based on a work at [http://www.sqlstyle.guide][self].
 * Quoted identifiers—if you must use them then stick to SQL92 double quotes for
  portability (you may need to configure your SQL server to support this depending
  on vendor).
 * Object oriented design principles should not be applied to SQL or database
  structures.
 ```sql
 SELECT file_hash  -- stored ssdeep hash
  FROM file_system
 WHERE file_name = '.vimrc';
 ```
 ```sql
 /* Updating the file record after writing to the file */
 UPDATE file_system
   SET file_modified_date = '1980-02-22 13:19:01.00000',
       file_size = 209732
 WHERE file_name = '.vimrc';
 ```
 ## Naming conventions
 ### General
-* Ensure the name is unique and does not exist as a
+* Ensure the name is unique and does not exist as a [MySQLreserved keyword][reserved-keywords]
-  [reserved keyword][reserved-keywords].
+  or [Redshift reserved keyword](http://docs.aws.amazon.com/redshift/latest/dg/r_pg_keywords.html)
 * Avoid abbreviations and if you have to use them make sure they are commonly
  understood.
 * Keep the length to a maximum of 30 bytes—in practice this is 30 characters
  unless you are using multi-byte character set.
 * Names must begin with a letter and may not end with an underscore.
@ -74,8 +57,6 @@ UPDATE file_system
 * Avoid the use of multiple consecutive underscores—these can be hard to read.
 * Use underscores where you would naturally include a space in the name (first
  name becomes `first_name`).
 * Avoid abbreviations and if you have to use them make sure they are commonly
  understood.
 ```sql
 SELECT first_name
@ -84,13 +65,9 @@ SELECT first_name
 ### Tables
 * Use a collective name or, less ideally, a plural form. For example (in order of
  preference) `staff` and `employees`.
 * Do not prefix with `tbl` or any other such descriptive prefix or Hungarian
  notation.
 * Never give a table the same name as one of its columns and vice versa.
 * Avoid, where possible, concatenating two table names together to create the name
  of a relationship table. Rather than `cars_mechanics` prefer `services`.
 ### Columns
@ -126,19 +103,24 @@ SELECT SUM(s.monitor_tally) AS monitor_total
 * Do not prefix with `sp_` or any other such descriptive prefix or Hungarian
  notation.
 ### Uniform prefixes
 * `is_` - denotes a boolean
 ### Uniform suffixes
 The following suffixes have a universal meaning ensuring the columns can be read
 and understood easily from SQL code. Use the correct suffix where appropriate.
 * `_id`—a unique identifier such as a column that is a primary key.
 * `_at`-denotes a column that contains the time of something.
 * `_date`—denotes a column that contains the date of something.
 * `_status`—flag value or some other status of any type such as
  `publication_status`.
 * `_total`—the total or sum of a collection of values.
 * `_num`—denotes the field contains any kind of number.
 * `_name`—signifies a name such as `first_name`.
 * `_seq`—contains a contiguous sequence of values.
 * `_date`—denotes a column that contains the date of something.
 * `_tally`—a count.
 * `_size`—the size of something such as a file size or clothing.
 * `_addr`—an address for the record could be physical or intangible such as
@ -170,22 +152,6 @@ spacing is used. Do not crowd code or remove natural language spaces.
 #### Spaces
 Spaces should be used to line up the code so that the root keywords all end on
 the same character boundary. This forms a river down the middle making it easy for
 the readers eye to scan over the code and separate the keywords from the
 implementation detail. Rivers are [bad in typography][rivers], but helpful here.
 ```sql
 SELECT f.average_height, f.average_diameter
  FROM flora AS f
 WHERE f.species_name = 'Banksia'
    OR f.species_name = 'Sheoak'
    OR f.species_name = 'Wattle';
 ```
 Notice that `SELECT`, `FROM`, etc. are all right aligned while the actual column
 names and implementation specific details are left aligned.
 Although not exhaustive always include spaces:
 * before and after equals (`=`)
@ -205,6 +171,7 @@ SELECT a.title, a.release_date, a.recording_date
 Always include newlines/vertical space:
 * before `AND` or `OR`
 * after WITH subqueries
 * after semicolons to separate queries for easier reading
 * after each keyword definition
 * after a comma when separating multiple columns into logical groups
@ -240,42 +207,127 @@ SELECT a.title,
 To ensure that SQL is readable it is important that standards of indentation
 are followed.
 **ONLY** the fundamental keywords - `SELECT`, `FROM`, `WHERE`, `GROUP BY`, `HAVING`, `LIMIT`,
 and `ORDER BY`should be fully left justified.  Other clauses should be indented to the end of
 that keyword.
 ```sql
 SELECT first_name,
       last_name,
       is_still_tippin_on_four_fours,
       is_still_wrapped_in_four_vogues
 FROM rappers
 WHERE first_name = 'Mike'
  AND last_name = 'Jones'
 ```
 This allows the reader to quickly scan for the important building blocks of the query.
 #### Joins
-Joins should be indented to the other side of the river and grouped with a new
+Joins should be indented 2 spaces right from the `FROM` keyword
-line where necessary.
+
 Single line `JOIN`s are fine for simple situations
 ```sql
 SELECT r.last_name
 FROM riders AS r
  INNER JOIN bikes AS b ON r.bike_vin_num = b.vin_num
  INNER JOIN crew AS c ON r.crew_chief_last_name = c.last_name
 ```
 Multi line JOINs should be indented the same as base keywords:
 ```sql
 SELECT r.last_name
 FROM riders AS r
  INNER JOIN bikes AS b
          ON r.bike_vin_num = b.vin_num
-          AND b.engines > 2
+         AND r.bike_lane = r.lane
  INNER JOIN crew c ON r.crew_chief_last_name = c.last_name
 WHERE id = 5
 ```
-       INNER JOIN crew AS c
+#### WITH statements (PostgreSQL only)
-       ON r.crew_chief_last_name = c.last_name
+
-          AND c.chief = 'Y';
+Indent them until the closing parentheses.
 ```sql
 WITH my_tmp_table AS (
  SELECT r.last_name
  FROM riders AS r
    INNER JOIN bikes AS b ON r.bike_vin_num = b.vin_num
  WHERE id = 10
 ),
 my_other_tmp_table AS (
  SELECT last_name
  FROM staff
 )
 SELECT *
 FROM my_tmp_table
  JOIN my_other_tmp_table ON my_other_tmp_table.last_name = my_tmp_table.last_name
 ```
 #### Sub-queries
 In PostgreSQL you should be doing subqueries with `WITH` clauses.
 Sub-queries should also be aligned to the right side of the river and then laid
-out using the same style as any other query. Sometimes it will make sense to have
+out using the same style as a `WITH` statement w/r/t parentheses.
 the closing parenthesis on a new line at the same character position as it's
 opening partner—this is especially true where you have nested sub-queries.
 ```sql
 SELECT r.last_name,
-       (SELECT MAX(YEAR(championship_date))
+       (
         SELECT MAX(YEAR(championship_date))
           FROM champions AS c
         WHERE c.last_name = r.last_name
-           AND c.confirmed = 'Y') AS last_championship_year
+           AND c.confirmed = 'Y'
       ) AS last_championship_year
 FROM riders AS r
 WHERE r.last_name IN
-       (SELECT c.last_name
+      (
        SELECT c.last_name
          FROM champions AS c
        WHERE YEAR(championship_date) > '2008'
-           AND c.confirmed = 'Y');
+          AND c.confirmed = 'Y'
      )
 ```
 #### Case statements (PostreSQL)
 `CASE` and `END` can either be inline:
 ```sql
 SELECT CASE WHEN x > y THEN 1 ELSE 0 END
 FROM table
 ```
 or should have the same left justification, and `WHEN`/`THEN` should be indented the same as the `ELSE`/`value`.
 ```sql
 SELECT CASE
         WHEN x > y AND x < z
           THEN 'x more than y but less than z'
         WHEN x > y AND x > z
           THEN 'x more than y and more than z'
         ELSE
           'x and y not related'
       END AS city
 FROM office_locations
 ```
 #### Case statements (MySql)
 ```sql
 SELECT CASE postcode
         WHEN 'BN1'
           THEN 'Brighton'
         WHEN 'EH1'
           THEN 'Edinburgh'
       END AS city
 FROM office_locations
 ```
 ### Preferred formalisms
@ -291,8 +343,10 @@ SELECT r.last_name,
 ```sql
 SELECT CASE postcode
-       WHEN 'BN1' THEN 'Brighton'
+         WHEN 'BN1'
-       WHEN 'EH1' THEN 'Edinburgh'
+           THEN 'Brighton'
         WHEN 'EH1'
           THEN 'Edinburgh'
       END AS city
 FROM office_locations
 WHERE country = 'United Kingdom'