As a SQL Server DBA I was fortunate enough to encounter the idea of a numbers table — principally due to the work of Itzik Ben Gan (e.g. in books like ‘Inside Microsoft SQL Server 2005: T-SQL Querying’)… but I’ve not found many internet resources for a numbers tables in PostgreSQL. I’m working with Postgres at the moment, so I thought it would be good to provide a little introduction.
What is a Numbers Table?
Quite simply; a table with sequential integer numbers in; typically starting from 0 or 1 (or perhaps, both, as we shall see)… we can then use this table to join to other tables to provide useful functionality.
Show me the Numbers
Let’s create a numbers table in Postgres:
CREATE TABLE public.numbers ( numbers_id INTEGER NOT NULL,
CONSTRAINT pk__numbers PRIMARY KEY (numbers_id));
And now let’s populate it:
INSERT INTO public.numbers(numbers_id) SELECT i FROM generate_series(0, 999999) as i ;
In certain circles there are all sorts of discussions about how to make this process fast and efficient; but at the moment I would like to concentrate on simplicity.
Just to confirm how this looks, here’s a little screenshot from the excellent DBeaver:
Let’s Use Those Numbers!
Recent Calendar Months
My first proposed use-case is to get a list of recent calendar months.
select date_trunc('month', (NOW() AT TIME ZONE 'UTC')) - n.numbers_id * INTERVAL '1 month' as "from" , date_trunc('month', (NOW() AT TIME ZONE 'UTC')) - (n.numbers_id-1) * INTERVAL '1 month' as "to" , -1 * n.numbers_id as "relative_month" from public.numbers as n where n.numbers_id between 1 and 12 order by 1
This sql could produce results like this:
You might gather that this result set has particular signifigance to the work I am doing at the moment, where much of what I report on is in full-calendar month ranges, with the ‘exclusive’ end date.
Minutes in the Day
Another use-case is to list every minute in a day:
select date_trunc('day', (NOW() AT TIME ZONE 'UTC')) + n.numbers_id * INTERVAL '1 minute' as "minute" from numbers as n where n.numbers_id < (24*60) -- Minutes in the day
This SQL can produce results like this:
You might wonder what the value of such a query is!
I have frequently used such queries to group reporting results (e.g. ‘messages processed per minute’) with an outer-join such that I can easily see rows with ‘0’ results… without this approach if you summarise some dataset and group by time truncated to the minute, and then group by that, it can be tough to notice the ‘missing’ rows for minutes that had no data for some reason. Another reason for such an approach is that I have found some charting tools are not good at plotting time series well, and they sometimes seem to plot time values in sequence regardless of the gaps in time between rows of data. By ensuring we provide data for e.g. every minute of a day, we can make it easier for the graphing tools to provide good-looking results:
This particular chart seems to have values for most minutes of the day; if not all, but this chart was also used on a much quieter test system that did not have such data volumes.
Notes and Other Areas of Improvement
My personal preference is to use ‘singular’ table names, but as ‘number’ seems to be a reserved word in postgres I did decide to use the plural instead. Similarly for the column name I chose (‘numbers_id’) because I tend to avoid the use of column name of ‘val’ or ‘value’ because value is a keyword in C# which I often use for Data Access Layers (that doesn’t mean you can’t have DALs that refer to a column called ‘value’; rather sometimes it is just best to avoid such confusion!
A possible area for improvement could be adding another column which has a 1-based sequence in addition to the 0-based sequence shown above; of course you can get the same impact by subtracting 1 from your number value in sql — but in some circumstances it might be clearer to have the additional column; the recent calendar months query above used similar maths to get a similar result.
A numbers table can be a really useful tool to generate reliable and complete data for several scenarios; with my own experience probably focusing on date and time ranges.