Suggestions for Greenfield PostgreSQL Applications
Are you building a new project that uses Postgres? How exciting! Here are some suggestions.
Actual results may vary. If you don’t like these, make your own!
Extensions
Install the pg_uuidv7 extension. Itis unfortunately not supported by AWS RDS. uuid_generate_v7() will apparently be in Postgres 17 without needing an extension.
To compile it from source (these instructions are not for AWS RDS!):
Linux:
# export PG_HOME=?? # If necessary
# export PATH="$PATH:$PG_HOME/bin" # If necessary
git clone https://github.com/fboulnois/pg_uuidv7.git
cd pg_uuidv7
make
cp *.so $PG_HOME/lib/postgresql
cp *.control sql/*.sql $PG_HOME/share/postgresql/extension
# At the psql prompt, run: CREATE EXTENSION IF NOT EXISTS pg_uuidv7;
MacOS:
# Change version if necessary
export PG_HOME=/Applications/Postgres.app/Contents/Versions/16
export PATH="$PATH:$PG_HOME/bin"
git clone https://github.com/fboulnois/pg_uuidv7.git
cd pg_uuidv7
make
sudo cp *.dylib *.so $PG_HOME/lib/postgresql
sudo cp *.control sql/*.sql $PG_HOME/share/postgresql/extension
# At the psql prompt, run: CREATE EXTENSION IF NOT EXISTS pg_uuidv7;
UUIDv7s vs Integers and Regular UUIDs
“We use sequential primary keys for efficient indexing, and UUID secondary keys for external use. The upcoming UUIDv7 standard offers the best of both worlds; its time-ordered UUID primary keys can be utilized for indexing and external use.” Source:
API resource ids should be virtually unguessable (non-sequential) UUIDs, in any readable format (e.g. BASE62). This is a necessary security requirement. Database primary keys could use a different scheme, such as sequential numbers.
However, one id is better than two. Using one id scheme for both external resources and internal database rows simplifies the code and eliminates a class of bugs. It also removes the database overhead caused by additional lookups.
It’s virtually impossible to accidentally reuse the same UUID across environments (dev/stage/production/etc.). This also avoids a class of bugs.
One downside to using UUIDs as database keys is they are less efficient compared to numeric keys in terms of both storage and CPU. Another disadvantage is that UUIDs for new rows must be generated via a UUIDv7 library. For example, JavaScript can use uuidv7obj.
Case Sensitivity
Postgres is case sensitive. Use the following case_insensitive collation to assist with case-insensitive operations.
To perform case-insensitive uniqueness enforcement for a particular column, specify COLLATE case_insensitive after the column name.
/* See https://www.postgresql.org/docs/current/collation.html#COLLATION-NONDETERMINISTIC */
CREATE COLLATION case_insensitive
(provider = icu, locale='und-u-ks-level2', deterministic=false);
CREATE UNIQUE INDEX users_email_unique
ON users(email COLLATE case_insensitive);
To perform case-insensitive queries, use ILIKE, regular expressions, or the lower() function on both sides of the comparison operator. If lower() is used frequently, the column should be indexed using the case_insensitive collation since otherwise, the index will not be used. Verify the explain plan indicates the index is used:
EXPLAIN
SELECT * FROM users
WHERE lower(email) = lower('Seed-system-user')
AND deleted_at IS NULL;
Outputs:
-> Bitmap Index Scan on users_email_unique
Hard and Soft Deletes
Rows are not typically deleted because:
- Deletes may require cascading deletes due to foreign keys, resulting in extra code and unpredictable performance
- Data retention policies may require records to be retained even when an audit trail is in place
Rows may be deleted in response to a data takedown request. Deletes should be expected to take much longer compared to other database operations.
Row Tombstones
Tombstones implement soft deletes.
- Reference
- All tables contain a deleted_at column which contains a timestamp
- deleted_at is NULL when the row is active
Excluding Soft-Deleted Rows
Always add deleted_at IS NULL to WHERE and JOIN clauses!
Joins and Soft-Deleted Rows
A row can refer to a row that has been soft-deleted. This can cause unexpected results when ‘deleted_at IS NULL’ appears in JOIN clauses (recommended in the previous section). Although the foreign key’s value is not NULL, the columns selected from the referenced table will be NULL. SQL statements should never use the idiom foreign_key_id IS NULL to check whether a foreign row exists and should instead inspect whether a (hopefully NOT NULL) column from the referenced table.
Partial Unique Indexes
Since rows aren’t physically deleted, they can interfere with unique indexes. Whether a row should be created that has the same natural key as a ‘deleted’ row depends on the table. If this is desirable, unique indexes should be written with a WHERE clause, for example:
CREATE UNIQUE INDEX users_email_unique
ON users(email, deleted_at)
WHERE deleted_at IS NULL;
Example Users Table
CREATE TABLE users (
user_id UUID PRIMARY KEY DEFAULT uuid_generate_v7(),
email TEXT NOT NULL,
created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP NOT NULL,
updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP NOT NULL,
deleted_at TIMESTAMP WITH TIME ZONE,
created_by UUID NOT NULL,
updated_by UUID NOT NULL,
FOREIGN KEY (created_by) REFERENCES users(user_id),
FOREIGN KEY (updated_by) REFERENCES users(user_id),
is_system SMALLINT DEFAULT 0 NOT NULL CHECK (is_system BETWEEN 0 AND 1)
);
CREATE UNIQUE INDEX users_email_unique
ON users(email COLLATE case_insensitive)
WHERE deleted_at IS NULL;
Deleted users (email addresses) can not be undeleted, so the case_insentive collation is used.
Seed User
The seed user avoids the chicken and egg problem caused by created_by and updated_by not allowing NULL values. It’s easy to locate because its user_id is the Nil UUID.
INSERT INTO users(user_id, email, is_system, created_by, updated_by)
VALUES(
'00000000–0000–0000–0000–000000000000',
'seed-system-user',
1,
'00000000–0000–0000–0000–000000000000',
'00000000–0000–0000–0000–000000000000'
);
System Users
System users are for background services that perform database operations that are not related to a user-initiated request, allowing troubleshooters and auditors to be able to discern, via audit reports, which service made which database change. Thus, each service should use its own representative user.
- System users are not associated with users in any identity system and can not be used to log into the platform
- created_by and updated_by are initially set to the user_id of the seed user (Nil UUID)
Example Table
CREATE EXTENSION IF NOT EXISTS pg_uuidv7;
CREATE TABLE examples (
example_id UUID PRIMARY KEY DEFAULT uuid_generate_v7(),
created_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP NOT NULL,
updated_at TIMESTAMP WITH TIME ZONE DEFAULT CURRENT_TIMESTAMP NOT NULL,
deleted_at TIMESTAMP WITH TIME ZONE,
created_by UUID NOT NULL,
updated_by UUID NOT NULL,
FOREIGN KEY (created_by) REFERENCES users(user_id),
FOREIGN KEY (updated_by) REFERENCES users(user_id)
);
Updates
Always set the last_updated_at column to now().