* Use consistent and descriptive identifiers and names.
* Make judicious use of white space and indentation to make code easier to read.
* Store [ISO-8601][iso-8601] compliant time and date information
(`YYYY-MM-DD HH:MM:SS.SSSSS`).
* Try to use only standard SQL functions instead of vendor specific functions for
reasons of portability.
* Keep code succinct and devoid of redundant SQL—such as unnecessary quoting or
parentheses or `WHERE` clauses that can otherwise be derived.
* Include comments in SQL code where necessary. Use the C style opening `/*` and
closing `*/` where possible otherwise precede comments with `--` and finish
them with a new line.
```sql
SELECT file_hash -- stored ssdeep hash
FROM file_system
WHERE file_name = '.vimrc';
```
```sql
/* Updating the file record after writing to the file */
UPDATE file_system
SET file_modified_date = '1980-02-22 13:19:01.00000',
file_size = 209732
WHERE file_name = '.vimrc';
```
### Avoid
* Plurals—use the more natural collective term where possible instead. For example
`staff` instead of `employees` or `people` instead of `individuals`.
* Object oriented design principles should not be applied to SQL or database
structures.
## Naming conventions
### All Naming Conventions
* See Confluence documentation for [Ives Database Design Guide][design-guide]
### Aliasing or correlations
#### Aliasing Column Names
* Avoid unnessisary aliases at all times
* Must always alias `COUNT(*)` columns
* Must always alias computed data (`SUM()` or `AVG()` or `IF()`) use the name you would give it were it a column defined in the schema.
* Must always include the `AS` keyword, which makes it easier to read as it is explicit.
* Should relate in some way to the object or expression they are aliasing.
#### Aliasing Table Names
* All Tables must be aliased when using more than one in a JOIN
* Table aliases will be made up of the first letter of every word in the table name unless
* unless the alias is a reseverd word ie. `FROM INTERNATIONAL_FILINGS AS IF` will cause an error in SQL
* in this case us an abbreviated name for the table ie. `FROM INTERNATIONAL_FILINGS AS IFILINGS`
* if the aliases for two table will be the same, or the same table is used more then once, append a number in order of apperance in the query
* When a query contains multiple databases the first letter of the database, in lower case will be prepended to the table alias ie. `FROM international.ENTITY_MAP AS iEM INNER JOIN secdocs.COMPANY AS sC`
```sql
SELECT
COUNT(*) as student_staff_count
FROM
STAFF AS S1
INNER JOIN
STUDENTS AS S2
ON S2.mentor_id = S1.staff_num;
```
```sql
SELECT
SUM(s.monitor_tally) AS monitor_total
FROM
STAFF AS S
GROUP BY
S.staff_department_fkey;
```
### Uniform suffixes
The following suffixes have a universal meaning ensuring the columns can be read
and understood easily from SQL code. Use the correct suffix where appropriate.
*`_key`—a unique identifier such as a column that is a primary key.
*`_status`—flag value or some other status of any type such as `publication_status`.
*`_total`—the total or sum of a collection of values.
*`_num`—denotes the field contains any kind of number.
*`_name`—signifies a name such as `first_name`.
*`_date`—denotes a column that contains the date of something.
*`_count`—a count.
*`_size`—the size of something such as a file size or clothing.
*`_addr`—an address for the record could be physical or intangible such as `ip_addr`.
## Query syntax
### Reserved words
Always use uppercase for the [reserved keywords][reserved-keywords] like `SELECT`, `WHERE` or `IF`.
Data manipulation statements should have every clause keyword on a line of its own unless performing extremely simple statements. Examples of the clause keywords are `SELECT`, `DELETE`, `UPDATE`, `FROM`, `WHERE`, `HAVING`, `GROUP BY`, `ORDER BY`, `LIMIT`. An example of a simple single line statements `SELECT COUNT(*) as student_count FROM STUDENTS WHERE graduated = 0;`
### White space
To make the code easier to read it is important that the correct compliment of
spacing is used. Do not crowd code or remove natural language spaces.
#### Spaces
Spaces should never be used to line up the code so that the root keywords all end on the same character boundary.
* Indentations of 4 spaces are the standard that is utilized throughout the codebase.
* All `(` and `)` must be placed on a line of there own unless only operating on two or less items
```sql
(
SELECT
species_name,
AVG(height) AS average_height,
AVG(diameter) AS average_diameter
FROM
FLORA
WHERE
species_name = 'Banksia'
OR
species_name = 'Sheoak'
OR
species_name = 'Wattle'
GROUP BY
species_name,
observation_date
)
UNION ALL
(
SELECT
species_name,
AVG(height) AS average_height,
AVG(diameter) AS average_diameter
FROM
BOTANIC_GARDEN_FLORA
WHERE
species_name = 'Banksia'
OR
species_name = 'Sheoak'
OR
species_name = 'Wattle'
GROUP BY
species_name,
observation_date
)
```
Although not exhaustive always include spaces:
* before and after equals (`=`)
* after commas (`,`)
* surrounding apostrophes (`'`) where not within parentheses or with a trailing
comma or semicolon.
```sql
SELECT a.title, a.release_date, a.recording_date
FROM albums AS a
WHERE a.title = 'Charcoal Lane'
OR a.title = 'The New Danger';
```
#### Line spacing
Always include newlines/vertical space:
* after semicolons to separate queries for easier reading
* after each `VALUE` group in an `INSERT` statement
* to separate code into related sections, which helps to ease the readability of large chunks of code.
Always on their own line:
* Data manipulation statements should have every clause keyword on a line of its own unless performing extremely simple statements. Examples of the clause keywords are `SELECT`, `DELETE`, `UPDATE`, `FROM`, `WHERE`, `HAVING`, `GROUP BY`, `ORDER BY`, `LIMIT`. An example of a simple single line statements `SELECT COUNT(*) as student_count FROM STUDENTS WHERE graduated = 0;`
* Every field being selected, updated, grouped on or limted by in the query should be on their own line. Unless involved in a functional operation such as an `IF()`, `CASE`, `COALESCE()` ... etc. which require additional fields to function
*`AND` and `OR` should appear on their own lines
```sql
INSERT INTO albums (title, release_date, recording_date)
('The New Danger', '2008-01-01 01:01:01.00000', '1990-01-01 01:01:01.00000');
```
```sql
UPDATE
ALBUMS
SET
release_date = '1990-01-01 01:01:01.00000',
producer_name = NULL
WHERE
title = 'The New Danger';
```
```sql
SELECT
title,
release_date,
recording_date,
production_date
FROM
ALBUMS
WHERE
title = 'Charcoal Lane'
OR
title = 'The New Danger';
```
### Indentation
To ensure that SQL is readable it is important that standards of indentation
are followed.
#### Clause Keywords
* Should be at the top level with the least indentation of anything else contained in their statement.
* These words should be on a line alone
#### Joins
* Natural Joins are not allowed ... ever
* A Join type must be indicated `LEFT OUTER`, `RIGHT OUTER`, `INNER`
* Joins should be indented one indent under their tables or sub-queries
* ON clauses should be indented to be left justified with the JOINs
* Multiple ON clauses should be indented to be indented benieth the ON and JOIN keywords
```sql
SELECT
R.last_name
FROM
RIDERS AS R
INNER JOIN
BIKES AS B
ON R.bike_vin_num = B.vin_num
AND
B.engines > 2
INNER JOIN
CREW AS C
ON R.crew_chief_last_name = C.last_name
AND
C.chief = 'Y';
```
#### Subqueries
Subqueries should be aligned with the indentation level that their non-subquery counterpart would reside. Subqueries should begin with a line containing only an opening `(` then the next line being indented 1 indent deeper. The subquery should end with a closing `)` and the alias for that subquery if appropriate. Try and include a commend line to describe the subquery
```sql
SELECT
r.last_name,
(
SELECT
MAX(YEAR(championship_date))
FROM
champions AS c
WHERE
c.last_name = r.last_name
AND
c.confirmed = 'Y'
) AS last_championship_year
FROM
riders AS r
WHERE
r.last_name IN
(
SELECT
c.last_name
FROM
champions AS c
WHERE
YEAR(championship_date) > '2008'
AND
c.confirmed = 'Y'
);
```
### Preferred formalisms
* Make use of `BETWEEN` where possible instead of combining multiple statements
with `AND`.
* Similarly use `IN()` instead of multiple `OR` clauses.
* Where a value needs to be interpreted before leaving the database use the `CASE`
expression. `CASE` statements can be nested to form more complex logical structures.
* Avoid the use of `UNION` clauses and temporary tables where possible. If the
schema can be optimised to remove the reliance on these features then it most
likely should be.
```sql
SELECT
CASE postcode
WHEN 'BN1' THEN 'Brighton'
WHEN 'EH1' THEN 'Edinburgh'
END AS city
FROM
OFFICE_LOCATIONS
WHERE
country = 'United Kingdom'
AND
opening_time BETWEEN 8 AND 9
AND
postcode IN ('EH1', 'BN1', 'NN1', 'KW1')
```
## Create syntax
```sql
/* I do not think that this is a nessisary part of a style guide since we generally should not be creating tables in production */
```
When declaring schema information it is also important to maintain human
readable code. To facilitate this ensure the column definitions are ordered and
grouped where it makes sense to do so.
Indent column definitions by four (4) spaces within the `CREATE` definition.
### Choosing data types
* Where possible do not use vendor specific data types—these are not portable and
may not be available in older versions of the same vendor's software.
* Only use `REAL` or `FLOAT` types where it is strictly necessary for floating
point mathematics otherwise prefer `NUMERIC` and `DECIMAL` at all times. Floating
point rounding errors are a nuisance!
### Specifying default values
* The default value must be the same type as the column—if a column is declared
a `DECIMAL` do not provide an `INTEGER` default value.
* Default values must follow the data type declaration and come before any
`NOT NULL` statement.
### Constraints and keys
Constraints and their subset, keys, are a very important component of any
database definition. They can quickly become very difficult to read and reason
about though so it is important that a standard set of guidelines are followed.
#### Choosing keys
Deciding the column(s) that will form the keys in the definition should be a
carefully considered activity as it will effect performance and data integrity.
1. The key should be unique to some degree.
2. Consistency in terms of data type for the value across the schema and a lower
likelihood of this changing in the future.
3. Can the value be validated against a standard format (such as one published by
ISO)? Encouraging conformity to point 2.
4. Keeping the key as simple as possible whilst not being scared to use compound
keys where necessary.
It is a reasoned and considered balancing act to be performed at the definition
of a database. Should requirements evolve in the future it is possible to make
changes to the definitions to keep them up to date.