Sitemap

Locating gaps in enumerated rows using the SQL lag window function (MySQL edition)

2 min readJun 12, 2022

The Set-Up

Consider the scenario in which rows in a table are ordered relative to each other. Each row in the table is assigned a unique number so they can be displayed in a particular order, like a spreadsheet:

1. Doors
2. Appliances
3. Plumbing
4. Windows

Note that the items are not in alphabetical order.

As a second requirement, the table can contain rows belonging to multiple “spreadsheets.” The spreadsheet identifier is a key that is located in one or more columns in the table: col1, col2, and so on. Each key is numbered via a column in the the table named, for example, ‘position.’

In a simple example, the key could be stored in the ‘name’ column but for the purposes of this example, the key may consist of multiple columns (you’ll see why in the solution below).

Finally, let’s define some rules for the position values of any particular key. They must start with 1 and increment by 1 (no gaps!). For example:

key (col1+col2+…) position assessment
================= ======== ==========
A 1 good
A 2 good
A 3 good
B 1 good
B 2 good
C 5 bad — Doesn’t start with 1
D 1 good
D 5 bad —2, 3, and 4 are missing

The Challenge

What is an easy way to locate the ‘bad’ rows as demonstrated above using SQL?

A Solution

Use the LAG window function to find key+position tuples that violate the rules:

SELECT DISTINCT col1, col2, …
FROM (
SELECT
col1, col2, … /* The key */
LAG(position, 1) OVER (
PARTITION BY col1, col2, …
ORDER BY position
) prev,
position next
FROM (
SELECT position, col1, col2, …
FROM …
) a /* MySQL wants subselects to be aliased - 'a' is not used */
) b /* MySQL wants subselects to be aliased - 'b' is not used */
WHERE (next — IFNULL(prev, 0)) > 1

The above SQL finds rows such that (current position) - (previous position) is greater than 1. The meaning of “previous” is defined by the OVER clause which specifies how to order the rows. LAG(position, 1) essentially means “return the value of the position column from the previous row” — you could instead process the previous row’s previous row via LAG(position, 2).

--

--

Terris Linenbach
Terris Linenbach

Written by Terris Linenbach

Coder since 1980. Always seeking the Best Way. CV: https://terris.com/cv

No responses yet