What is Premature Simplification – other than ‘Allowing UI Display Formats to Drive Data Storage Formats’? It will be easiest to start with an example: A company receives records from many devices, and decides that the end-user of the system web site will never want to view detail at finer grain than 1 second… so they decide that all time formats should be stored without milliseconds (or ticks) – that is to say, timing data is rounded or truncated to seconds.
Now, the way that dates and times are stored on most systems does not save space if you only want to store down to seconds; so there is no space saving. And when considering the display of dates and times the majority of formats hide milliseconds, the developers will not have any extra effort in displaying the data in the desired format, so there is no development-time saving. So what is the cost of a decision to only stores times down to a second then? Simply that the system will accrete code that assumes that is true, and that in turn will have costs down-the-line. Little things like assumptions that there will never be more than one message in any given second start to appear – generally invisibly – until the day comes when some device is required to send in more than one message; and then the bizarre consequences of that come in to play, where part of a system might be changed to add time to a message time a message with that time has already been ‘used’ and so on.
Another example might be a ‘status’ display. Imagine an HR system that wishes to conveniently display employee availability; perhaps team leads will most want to know staff that are available to work (perhaps on a phone-call right now) but they may also want to know about staff who are expected to be on-shift but have not logged-in yet. One can imagine an ‘Employee Status’ icon that might have graphical representations of the following states:
- Expected to be at work but not logged in;
- Unavailable (off-shift)
- On Holiday
- On a Phone Call
This status display attempts to simplify several employee and system attributes: the time now, employee shifts, employee leave, phone call status of employee, etc. In the UI this may make lots of sense, because designers do not want to bombard the user with useless information… and let’s face it; it is hardly relevant if the employee’s shift patterns indicate that they should be at work, if their leave records indicate that they should be on holiday, and you wouldn’t particularly want to indicate that they were ‘unable to take a call’ if they were not even on-shift!
The insidious problem, though, is that it is fairly common to see these types of xxxxxStatus fields stored in databases, and often as the only record of some particular set of attributes! That’s a problem because then you start to see code that tries to answer some question about one of the underlying attributes based on the current xxxxxStatus… or some new value for an attribute in its native form comes into the system, and suddenly the system has to try and decide what the correct new status should be based on the current status plus the new attribute value. e.g. Available + ‘Shift Ended message’ changes status to Unavailable (off shift)… or Expected to be at work but not logged in + ‘Log in message received’ changes status to Available. What we do not see here is that in many cases, lower priority statuses end up invisible, and therefore the data is hidden from view, and therefore it can not be reverse engineered from the status, but has to be recalculated, or frankly guessed at.
Another issue is that the display priorities of these statuses is sometimes open to interpretation and preference, but when the actual data is stored like that you bake-in a given priority at the time you stored the record. For our contrived example, there may be arguments over if it was more important to show that an employee was on leave, or if they were off-shift. With one priority you show them on-leave if they are on-leave (at all times of day) but then someone will complain that it isn’t even relevant to show them on-leave if their shift-pattern did not need them to be working. If you ever change those priorities, you will change what historic data in the database appears to indicate. You can easily imagine that someone else may argue the other way once it gets changed (they are never going to work at any time of day if they are on leave, so you may as well show the leave at the higher priority)!
Our simple advice is to try to avoid storing data that appears to coalesce two or more different attributes, and instead store the actual attributes! Computers work so incredibly fast that the simple decision-tree to decide what status to display on a UI could probably take microseconds once the data is loaded… and loading several (reasonably sized) columns really need not add much overhead to a database call compared to the network overhead of making the database call in the first place.
Once you have all these attributes to you available separately it becomes trivial to figure out what status to display because it just involves running the same code every time – some sort of GetStatusFromAttributes(attr1, attr2…) function. Of course some attributes may only be available from wrapper functions – like perhaps a method to decide if an employee’s shift patterns indicate that they should be working at this time.