Show Me the Numbers (in PostgreSQL)

As a SQL Server DBA I was fortunate enough to encounter the idea of a numbers table — principally due to the work of Itzik Ben Gan (e.g. in books like ‘Inside Microsoft SQL Server 2005: T-SQL Querying’)… but I’ve not found many internet resources for a numbers tables in PostgreSQL. I’m working with Postgres at the moment, so I thought it would be good to provide a little introduction.

What is a Numbers Table?

Quite simply; a table with sequential integer numbers in; typically starting from 0 or 1 (or perhaps, both, as we shall see)… we can then use this table to join to other tables to provide useful functionality.

Show me the Numbers

Let’s create a numbers table in Postgres:

CREATE TABLE public.numbers (
     numbers_id         INTEGER         NOT NULL,
     CONSTRAINT pk__numbers PRIMARY KEY (numbers_id)
);

And now let’s populate it:

INSERT INTO public.numbers(numbers_id)
 SELECT      i
 FROM        generate_series(0, 999999) as i
 ;

In certain circles there are all sorts of discussions about how to make this process fast and efficient; but at the moment I would like to concentrate on simplicity.

Just to confirm how this looks, here’s a little screenshot from the excellent DBeaver:

The first few rows of the numbers table

Let’s Use Those Numbers!

Recent Calendar Months

My first proposed use-case is to get a list of recent calendar months.

select      date_trunc('month',
                       (NOW() AT TIME ZONE 'UTC'))
                        - n.numbers_id * INTERVAL '1 month' as "from"
            , date_trunc('month',
                         (NOW() AT TIME ZONE 'UTC'))
                         - (n.numbers_id-1) * INTERVAL '1 month' as "to"
            , -1 * n.numbers_id as "relative_month"
 from        public.numbers as n
 where       n.numbers_id between 1 and 12
 order by    1

This sql could produce results like this:

Relative Month Query Results

You might gather that this result set has particular signifigance to the work I am doing at the moment, where much of what I report on is in full-calendar month ranges, with the ‘exclusive’ end date.

Minutes in the Day

Another use-case is to list every minute in a day:

select      date_trunc('day', (NOW() AT TIME ZONE 'UTC'))
              + n.numbers_id * INTERVAL '1 minute' as "minute"
 from        numbers as n 
 where       n.numbers_id < (24*60)  -- Minutes in the day

This SQL can produce results like this:

The first few results from the ‘minutes of the day’ query

You might wonder what the value of such a query is!

I have frequently used such queries to group reporting results (e.g. ‘messages processed per minute’) with an outer-join such that I can easily see rows with ‘0’ results… without this approach if you summarise some dataset and group by time truncated to the minute, and then group by that, it can be tough to notice the ‘missing’ rows for minutes that had no data for some reason. Another reason for such an approach is that I have found some charting tools are not good at plotting time series well, and they sometimes seem to plot time values in sequence regardless of the gaps in time between rows of data. By ensuring we provide data for e.g. every minute of a day, we can make it easier for the graphing tools to provide good-looking results:

An example of a chart plotting ‘every minute of a day’

This particular chart seems to have values for most minutes of the day; if not all, but this chart was also used on a much quieter test system that did not have such data volumes.

Notes and Other Areas of Improvement

My personal preference is to use ‘singular’ table names, but as ‘number’ seems to be a reserved word in postgres I did decide to use the plural instead. Similarly for the column name I chose (‘numbers_id’) because I tend to avoid the use of column name of ‘val’ or ‘value’ because value is a keyword in C# which I often use for Data Access Layers (that doesn’t mean you can’t have DALs that refer to a column called ‘value’; rather sometimes it is just best to avoid such confusion!

A possible area for improvement could be adding another column which has a 1-based sequence in addition to the 0-based sequence shown above; of course you can get the same impact by subtracting 1 from your number value in sql — but in some circumstances it might be clearer to have the additional column; the recent calendar months query above used similar maths to get a similar result.

Summary

A numbers table can be a really useful tool to generate reliable and complete data for several scenarios; with my own experience probably focusing on date and time ranges.

There are Known Knowns

I was reminded earlier today of the famous Donald Rumsfeld press conference in which he said:

“Reports that say that something hasn’t happened are always interesting to me, because as we know, there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns – the ones we don’t know we don’t know. And if one looks throughout the history of our country and other free countries, it is the latter category that tend to be the difficult ones.”

There was a considerable uproar in the press at the time, which mostly seemed to focus on these words as if they were a radical new idea… but I think I just considered his words to represent a broad concept that seemed very familiar… probably so familiar in fact that no-one ever says them!

Continue reading

Who Are We… I Mean… Who Am I Kidding?

Some of the earlier posts on this blog use ‘we’ and ‘us’ to hint or suggest that 42 IT Solutions has a team of like-minded people working for a better future in IT.  I’m sure there were any number of clues that this was ‘enterprise-speak’; but as things stand 42 IT Solutions is effectively only one person: me, Nigel Rheam.

Hello!

Continue reading

The Dangers of Duplicate Data

Received an email from Amazon today, this email explaining that delivery of my parcel was delayed demonstrates the dangers of duplicate data, I think:

29/09/2016 is actually a Thursday!

29 September 2016 is actually a Thursday!

The duplication I am thinking of is that in addition to providing the date; 29/09/2016, they have also chosen to provide a refinement; ‘Friday’.

Continue reading

Rejuvenating a Stalled Project

The client had a stalled project with many changes to an existing system; there were several areas of new functionality, but they were not completely trusted. There was considerable concern regarding the reliability and performance of the database, and in some senses the database was seen as the whole problem.  Not only had this set of changes stalled, but minor fixes had also stacked up waiting to be released and all were dependent upon the main project.
Continue reading