Today I am continuing my reminiscence over system design problems I have addressed in the past, this time looking specifically at a data-structure a colleague of mine used in a particular situation, and the consequences of that decision. Let’s begin!
The scenario was that of running expensive credit-checks for loan applicants. The ‘Risk’ team wanted easy access to the results, both for processing on-the-fly as part of the loan application decision process, and also for future analysis to see if they could identify markers for successful repayment. Credit-checks like the ones my client used are expensive, and can return hundreds of varying fields in complex data structures and the general thinking about the process was something like this:
- There are about 30 key fields we regularly retrieve (but not quite always) and we would like access to those fields to be easy;
- We may need to add more important fields to this list of easy-access fields in future, so we don’t want to commit to a fixed structure – ‘that would be wasteful’;
- We have to store the full result set for later analysis as XML, as we paid a lot for it and it may prove to have useful fields deep in the belly which we may need to access one day.
While I am writing this in 2017, the environment I am referring to was at a client from about 2008 – 2010, at a time when the client was using SQL Server 2005. While later versions of SQL Server have introduced functionality that might have helped us here (such as Sparse Columns) they were not available to us then.
I was reminiscing over my last two main roles (which together account for 10 years!) and I reminded myself of some of the challenges that I faced in that time. In two separate businesses I found coding styles and system-design scenarios that seemed to have been borne from some nugget of wisdom that seemed to me to have been miunderstood, and had led to code that had huge problems. As ever, I do not want to pretend that the lessons I learnt from these scenarios can always be applied; I imagine that in-general my conclusion will be something like ‘your mileage may vary’… but I hope that I can at least demonstrate some harm came about from these misapprehension in the given scenario.
Today I want to discuss the idea that ‘SELECT *’ is bad; and what I really mean in this context is that ‘selecting all the data from a record in a database table is wasteful‘… with a possible additional proviso added ‘…if you don’t need it all‘. To try and put that into a wider context; the ideal some people have in mind is that we should only retrieve data from a database if we really need to use it. It is probably also appropriate to mention that the environment I am referring to was one in which only very simple Object Relational Mapping (ORM) was in-use, meaning that selection of records was generally hand-written and mostly referred to a single table at a time, and not companies with huge volumes of database use.
I believe that this view leaves out several key factors; which will hopefully be obvious once described: Continue reading
Sometimes the success or failure of a new system function can end up depending on one feature; and if that feature or function is missing the whole reason for the system existing can be undermined.
This system we are looking at today is one which can track vehicles, and one feature is the ability to plot journeys on a map in a web-browser; it’s actually very cool, but the plotting of the route was a little slow. Continue reading
What is Premature Simplification – other than ‘Allowing UI Display Formats to Drive Data Storage Formats’? It will be easiest to start with an example: A company receives records from many devices, and decides that the end-user of the system web site will never want to view detail at finer grain than 1 second… so they decide that all time formats should be stored without milliseconds (or ticks) – that is to say, timing data is rounded or truncated to seconds. Continue reading
This post by Martin Fowler quotes Phil Karlton:
There are only two hard things in Computer Science: cache invalidation and naming things
(and Martin adds the derivative quote: ‘there are two hard things in computer science: cache invalidation, naming things, and off-by-one errors’) which is quite nice. Today’s post was originally about naming things being hard… but I think I can extend it to something close to the two topics in the quote. Continue reading
“Everyone knows that debugging is twice as hard as writing a program in the first place. So if you’re as clever as you can be when you write it, how will you ever debug it?”
Kernighan and Plauger
There’s supposedly a very similar quote attributed just to Kernighan, but I chose this one mentioned in this blog which also gave me a date of 1974.
There are two assertions in this statement, and I am pretty sure that the first is not true any more given that we have modern visual debuggers provided by tools such as Visual Studio. Debugging can be even be fun! But I think it is certainly fair to say that advanced debugging on a platform like Windows can easily extend complexity greatly. Continue reading