Today I am continuing my reminiscence over system design problems I have addressed in the past, this time looking specifically at a data-structure a colleague of mine used in a particular situation, and the consequences of that decision. Let’s begin!
The scenario was that of running expensive credit-checks for loan applicants. The ‘Risk’ team wanted easy access to the results, both for processing on-the-fly as part of the loan application decision process, and also for future analysis to see if they could identify markers for successful repayment. Credit-checks like the ones my client used are expensive, and can return hundreds of varying fields in complex data structures and the general thinking about the process was something like this:
- There are about 30 key fields we regularly retrieve (but not quite always) and we would like access to those fields to be easy;
- We may need to add more important fields to this list of easy-access fields in future, so we don’t want to commit to a fixed structure – ‘that would be wasteful’;
- We have to store the full result set for later analysis as XML, as we paid a lot for it and it may prove to have useful fields deep in the belly which we may need to access one day.
While I am writing this in 2017, the environment I am referring to was at a client from about 2008 – 2010, at a time when the client was using SQL Server 2005. While later versions of SQL Server have introduced functionality that might have helped us here (such as Sparse Columns) they were not available to us then.
I was reminiscing over my last two main roles (which together account for 10 years!) and I reminded myself of some of the challenges that I faced in that time. In two separate businesses I found coding styles and system-design scenarios that seemed to have been borne from some nugget of wisdom that seemed to me to have been miunderstood, and had led to code that had huge problems. As ever, I do not want to pretend that the lessons I learnt from these scenarios can always be applied; I imagine that in-general my conclusion will be something like ‘your mileage may vary’… but I hope that I can at least demonstrate some harm came about from these misapprehension in the given scenario.
Today I want to discuss the idea that ‘SELECT *’ is bad; and what I really mean in this context is that ‘selecting all the data from a record in a database table is wasteful‘… with a possible additional proviso added ‘…if you don’t need it all‘. To try and put that into a wider context; the ideal some people have in mind is that we should only retrieve data from a database if we really need to use it. It is probably also appropriate to mention that the environment I am referring to was one in which only very simple Object Relational Mapping (ORM) was in-use, meaning that selection of records was generally hand-written and mostly referred to a single table at a time, and not companies with huge volumes of database use.
I believe that this view leaves out several key factors; which will hopefully be obvious once described: Continue reading
Sometimes the success or failure of a new system function can end up depending on one feature; and if that feature or function is missing the whole reason for the system existing can be undermined.
This system we are looking at today is one which can track vehicles, and one feature is the ability to plot journeys on a map in a web-browser; it’s actually very cool, but the plotting of the route was a little slow. Continue reading
We made a mistake recently, breaking one of our own rules; Be Consistent. Now, of course it is not always possible to ‘be consistent’, sometimes because you are doing something truly new; but often because one incorrectly sees differences – when you may be better off seeing patterns and similarities (and thus implementing something to fit an existing pattern)!
What is Premature Simplification – other than ‘Allowing UI Display Formats to Drive Data Storage Formats’? It will be easiest to start with an example: A company receives records from many devices, and decides that the end-user of the system web site will never want to view detail at finer grain than 1 second… so they decide that all time formats should be stored without milliseconds (or ticks) – that is to say, timing data is rounded or truncated to seconds. Continue reading
Many modern development tools are providing ways to create databases and populate them with test data; often with the idea that unit tests can then be run against them. But there is an alternative approach available to some people; which is to use live data as a source for our test environments. Now, there may be reasons why this is not possible (not the least of which is ‘compliance’), and there are certainly issues of practicality that will need to be considered, but if you are allowed to do this there can be huge benefits.
Last time we discussed some general ways to write SQL that made it easier to use it interactively. Today I want to talk about an error I often see with general SQL, especially the use of commenting the
NOT EXISTS clause.
This post is about some top tips and tricks to use when writing and modifying SQL. At first sight they may seem a little bit strange, but hopefully after explanation they will make sense. Let’s take a look at the first bit of SQL:
, TeamName -- (2)
--, BigXmlCol -- (3)
FROM dbo.SomeTable -- (4)
WHERE 1 = 1 -- (5)
AND TeamName LIKE 'Blah%'
--AND EmployeeName LIKE 'Nigel%' -- (6)
ORDER BY TeamName
--ORDER BY EmployeeName