This post is about some top tips and tricks to use when writing and modifying SQL. At first sight they may seem a little bit strange, but hopefully after explanation they will make sense. Let’s take a look at the first bit of SQL:
SELECT SomeTableId , TeamName -- (2) --, EmployeeId , EmployeeName --, BigXmlCol -- (3) FROM dbo.SomeTable -- (4) WHERE 1 = 1 -- (5) AND TeamName LIKE 'Blah%' --AND EmployeeName LIKE 'Nigel%' -- (6) ORDER BY TeamName --ORDER BY EmployeeName
1. Save Your Work!
My first tip is to save your work. This is not a comment about losing your work due to power-outage or anything like that… it is a comment on how many times I have watched smart and experienced people type the same stuff again, and again, and again. e.g. Select data from a common table; because they need to see it to investigate a live issue… then select from a related table… and then another, and so on. In practice, even systems with hundreds of tables use some tables more often than others, and storing SQL to access these tables in a consistent way is incredibly useful, and has saved me hundreds of hours or typing; and I have been more effective at support because I have been able to see related data much more quickly.
Save your standard queries, and improve them as you need to!
2. One Column-name per Line, Prefixed with a Comma
Although queries with lots of outputs can make this strategy undesirable, the reasons for this are related to the addition and commenting-out of columns – sometimes you purposefully want to exclude a column from a query because it is large and slows the query down (as in 2) and in those cases you want it to be as easy as possible to comment out. Imagine; to stop TeamName from being included in the output with this format you can simply comment out this one line. If the commas had been at the end of the line, the row could still be commented out – so what’s the point?
There are two reasons:
- In version control systems, if you happen to add a column to the query output, the change will generally be the simple addition of one line, not ‘add a line and add a comma at the end of the prior line’. It’s not a huge difference, but can be useful sometime;
- This ‘comma first’ approach does have problems if you want to comment out the first column; think what we would have to do to this query to stop returning the RowNumberId – we would have to comment-out the RowNumberId and the following comma on the next line which is a bit of a hassle. So why prefer this strategy over the more common ‘comma-last’ strategy? The answer is because that approach has more problems if you are removing the last column in a list, and I find that occurs far more often.
3. Line Commented Out
Sometimes particular columns are very large and slow down the query (e.g. Xml data). Sometimes you literally need different data to be displayed, depending on circumstances… so you want the quickest way to facilitate the addition or removal of an output from the query. Another case might be that you have two sources of information, and you just want to compare them side-by-side as an experiment, and then comment out one.
As noted above, the comma-first strategy works very well, and the way I prefer to work is to comment-out lines with the double-hyphen approach. This is especially aided by SQL Server Management Studio’s ability to comment lines for you automatically (Ctrl+K, Ctrl+C) to comment the lines in a selection, or (Ctrl+K, Ctrl+U) to uncomment a selection.
4. Specifying the Schema name dbo
Why bother to specify the schema if it is dbo, which is the default in Sql Server? There are two reasons:
- There is a slight performance improvement because it saves a lookup; however this is a small (probably unnoticeable) benefit;
- You are reminding your colleagues which schema a table is in when they (or you!) come to look at the query later.
Obviously, the latter point is substantially less relevant if you only use the dbo schema… but the more you plan to use various schema it can really help understand a query where you are not the author.
5. Where 1 = 1
Just as we discussed the ability to comment out different columns from the output, we sometimes need to be able to comment out and uncomment (or comment-in) various filters. If you have an important filter after the
WHERE clause that you always use, then you’ll be fine, and you will be able to comment-out other clauses as shown in 6 easily. If, however, there is no constant filter it is much easier to have this
WHERE 1 = 1 clause. It is like saying
WHERE TRUE in some languages. It essentially has no impact on the filtering, and thus now you can place all filter clauses on a new line.
I should add that I don’t like to see this in ‘live’ SQL; I think its great for manual updates and trial-and-error development of a query… but when I look at live code the last thing I want to think is that it wasn’t finished. For me, commented-out lines and this sort of construct should not make it live.
6. Commented-Out Clause
As noted above, you may need to comment-out filtering clauses. There are several times when you really can not commit to a specified set of filters. For example, in a recent role we often needed to query a logging database; but sometimes we’d need to see the most recent errors, sometimes errors in a specific time range, and sometimes by logging level, application source, the server source, and many other criteria. We generally had tens of possible filters commented-out, and each time we ran the query we’d comment-in different parts to suit the current need.
7. Alias All Tables, Alias All Columns
Well, we didn’t show this above, but if you have two or more tables in a query, it is substantially easier to specify which table a column belongs to using an alias and not the full table name. For example:
SELECT c.CustomerId , c.FirstName , c.Surname , c.DateOfBirth , AddressId -- Unaliased column (see below) , o.OrderId , o.CreatedOn , o.ShippedOn FROM dbo.Customer AS c JOIN dbo.[Order] AS o ON c.CustomerId = o.CustomerId WHERE 1 = 1 --AND ShippedOn IS NULL
Although this example is not terribly complex, the alias on every column would help you in the future when you are trying to understand this query; it really doesn’t take long before a query becomes a sprawling mess, and aliases at least help you understand where particular values are coming from.
By the way, some people suggest using ‘meaningful’ aliases; I assume that ‘cust’ would suit those people more than ‘c’ does. My preference is for using short aliases; but we would be happy to set conventions for systems to disambiguate tables with names that start with the same letter. For example, we worked on a system which had person, provider and payment tables all of which were significantly important tables. They could have been consistently abbreviated per, proc and pay system-wide. Or, in many cases queries including payment did not also include provider, so to be honest it was often safe to abbreviate to p for either of them. Whichever way, there is a reason to alias all columns, and that is because SQL can be broken by the addition of columns to a table. Looking at the example above, we can see that the AddressId column has not been aliased. Now, while that AddressId column only exists on the Customer table there will be no problems; but what if you decided to add an AddressId column to the Order table¹ (to indicate the delivery address for the specific order, perhaps) — this SQL would then break with an ambiguous column name!
Of course, if you did want to set up system-wide aliases, you could consider the use of synonyms; but this is about helping identify the source of data, and synonyms might be used to hide distinctions on schema, and I am not sure that is a very modern approach to take.
8. Layout SQL to Confirm Meaning
As a starting point, type your
JOINs on a new line… but in addition, make sure the indentation of the queries and sub-queries makes it obvious which part of the query applies to which. For example, I see this query layout repeated far too many times:
SELECT * FROM Customer AS c JOIN (SELECT something FROM somewhere WHERE field = value) ON ...
With layout like this, it is natural to assume that the
WHERE clause relates to the main (outer) select, but in fact it associates with the derived table. The detail is only understood after fairly careful reading of the query, which demonstrates that the layout is misleading. Indenting sub-queries is a common way to visually show the relationship:
SELECT * FROM Customer AS c JOIN (SELECT s.something FROM somewhere AS s WHERE field = value) ON ...
There are lots of tools out there to help in formatting SQL, but when I find unformatted SQL, I find it quite useful to format it manually; it generally doesn’t take too long, and it can help you absorb the meaning of the SQL.
9. Write SQL that Works With and Without a Parameter Value
Sometimes you want to write SQL that will sometimes have a value to filter on, but only sometimes. When you are personally changing the code and present, it’s simple to comment out lines as we’ve seen with the above examples. But sometimes you want to make it easier to modify the SQL for someone less skilled than yourself, or perhaps for code that has to be fixed because it’s embedded in a web page or similar.
DECLARE @TeamIdFilter INT = NULL -- < -- Set to the team you want to see or leave null for all SELECT * FROM Employee AS e JOIN Team AS t ON e.TeamId = t.TeamId WHERE t.TeamId = CASE WHEN @TeamIdFilter IS NULL THEN t.TeamId ELSE @TeamIdFilter END
It’s a bit of a mouthful at first; but what that CASE statement does is say ‘if the filter is null, allow all values, but if it has a value, only show employees in the specific team’. Note that when the filter is null, and only then, the where clause becomes
WHERE t.TeamId = t.TeamId which is essentially the same as
WHERE 1 = 1 we saw earlier. But then consider the alternative when the filter is not null; we get
WHERE t.TeamId = @TeamIdFilter. This is just the behaviour we need, and we have achieved this with a single bit of fixed SQL without having to build it in code dynamically. I have used this very successfully in several situations, including ASP.Net web pages.
10. Make Utility-SQL Educational
Sometimes people share SQL that has been built up to help them undertake support or similar – before you do that you have to be saving it as per point 1. But, in addition you can sometimes help the user by compressing the display of data in a small way to show lookup values as text alongside their identifier:
SELECT c.CustomerId , c.FirstName + ' ' + c.Surname AS 'CustomerFullName' , ' ... ... ... ' AS ' ' -- Either show the Identifier and the lookup value as separate columns: , c.CustomerTypeId , ct.Name AS 'CustomerTypeName' , '... ... ... ... ' AS ' ' -- Or show them compressed into one column value: , ct.Name + ' (' + CAST(c.CustomerTypeId AS VARCHAR(12)) + ')' AS 'Customer Type' FROM dbo.Customer AS c LEFT JOIN dbo.CustomerType AS ct ON c.CustomerTypeId = ct.CustomerTypeId
Which outputs the data like this:
Note how the
LEFT (OUTER) JOIN includes rows in the output that do not have the column value set. It is very important that you do not inadvertently remove results from the output when you add in lookups like this – but in this case the outer join is only necessary because the
CustomerTypeId column is nullable.
By the way; this strategy is very helpful to people new to your system (and many find it difficult to remember even the most-common lookup values in a system); but never use this concatenated-value approach on anything other than in SQL that will be run manually or the ‘presentation layer’. Imagine you created a view with the concatenated
Customer Type column shown above; unless you also included the
CustomerTypeId and the
CustomerTypeName columns you would force them to parse the column to find out the Id if needed. In a sense, you are denormalising the data, so try to avoid embedding this sort of thing in Views that could be reused by others (unless it is really important to share such sql)
Finally, the end-users of a system should never have to know the real identifying value of a lookup value! This strategy is only for use with other database and programming professionals who may need to know the identifier to help them search for related code.
In some respects, this is one piece of advice repeated many times; primarily that we should code with the conjunction that connects elements of the query on the next line, not at the end of the line; and to do that we occasionally have to write odd always-true
WHERE clauses. This also applies to the commas between column selections in addition to the way we join our tables together.
1. As is the case with examples so often, I really would not recommend adding a column called AddressId to an Orders table if what I meant was DeliveryAddressId. There are too many types of Address that might be associated with an Order not to disambiguate them from the earliest point. And, realistically, you probably would want to keep an address history for Customers so this model is clearly wrong for that, but…