Show Me the Numbers (in PostgreSQL)

As a SQL Server DBA I was fortunate enough to encounter the idea of a numbers table — principally due to the work of Itzik Ben Gan (e.g. in books like ‘Inside Microsoft SQL Server 2005: T-SQL Querying’)… but I’ve not found many internet resources for a numbers tables in PostgreSQL. I’m working with Postgres at the moment, so I thought it would be good to provide a little introduction.

What is a Numbers Table?

Quite simply; a table with sequential integer numbers in; typically starting from 0 or 1 (or perhaps, both, as we shall see)… we can then use this table to join to other tables to provide useful functionality.

Show me the Numbers

Let’s create a numbers table in Postgres:

CREATE TABLE public.numbers (
     numbers_id         INTEGER         NOT NULL,
     CONSTRAINT pk__numbers PRIMARY KEY (numbers_id)
);

And now let’s populate it:

INSERT INTO public.numbers(numbers_id)
 SELECT      i
 FROM        generate_series(0, 999999) as i
 ;

In certain circles there are all sorts of discussions about how to make this process fast and efficient; but at the moment I would like to concentrate on simplicity.

Just to confirm how this looks, here’s a little screenshot from the excellent DBeaver:

The first few rows of the numbers table

Let’s Use Those Numbers!

Recent Calendar Months

My first proposed use-case is to get a list of recent calendar months.

select      date_trunc('month',
                       (NOW() AT TIME ZONE 'UTC'))
                        - n.numbers_id * INTERVAL '1 month' as "from"
            , date_trunc('month',
                         (NOW() AT TIME ZONE 'UTC'))
                         - (n.numbers_id-1) * INTERVAL '1 month' as "to"
            , -1 * n.numbers_id as "relative_month"
 from        public.numbers as n
 where       n.numbers_id between 1 and 12
 order by    1

This sql could produce results like this:

Relative Month Query Results

You might gather that this result set has particular signifigance to the work I am doing at the moment, where much of what I report on is in full-calendar month ranges, with the ‘exclusive’ end date.

Minutes in the Day

Another use-case is to list every minute in a day:

select      date_trunc('day', (NOW() AT TIME ZONE 'UTC'))
              + n.numbers_id * INTERVAL '1 minute' as "minute"
 from        numbers as n 
 where       n.numbers_id < (24*60)  -- Minutes in the day

This SQL can produce results like this:

The first few results from the ‘minutes of the day’ query

You might wonder what the value of such a query is!

I have frequently used such queries to group reporting results (e.g. ‘messages processed per minute’) with an outer-join such that I can easily see rows with ‘0’ results… without this approach if you summarise some dataset and group by time truncated to the minute, and then group by that, it can be tough to notice the ‘missing’ rows for minutes that had no data for some reason. Another reason for such an approach is that I have found some charting tools are not good at plotting time series well, and they sometimes seem to plot time values in sequence regardless of the gaps in time between rows of data. By ensuring we provide data for e.g. every minute of a day, we can make it easier for the graphing tools to provide good-looking results:

An example of a chart plotting ‘every minute of a day’

This particular chart seems to have values for most minutes of the day; if not all, but this chart was also used on a much quieter test system that did not have such data volumes.

Notes and Other Areas of Improvement

My personal preference is to use ‘singular’ table names, but as ‘number’ seems to be a reserved word in postgres I did decide to use the plural instead. Similarly for the column name I chose (‘numbers_id’) because I tend to avoid the use of column name of ‘val’ or ‘value’ because value is a keyword in C# which I often use for Data Access Layers (that doesn’t mean you can’t have DALs that refer to a column called ‘value’; rather sometimes it is just best to avoid such confusion!

A possible area for improvement could be adding another column which has a 1-based sequence in addition to the 0-based sequence shown above; of course you can get the same impact by subtracting 1 from your number value in sql — but in some circumstances it might be clearer to have the additional column; the recent calendar months query above used similar maths to get a similar result.

Summary

A numbers table can be a really useful tool to generate reliable and complete data for several scenarios; with my own experience probably focusing on date and time ranges.