This post by Martin Fowler quotes Phil Karlton:
There are only two hard things in Computer Science: cache invalidation and naming things
(and Martin adds the derivative quote: ‘there are two hard things in computer science: cache invalidation, naming things, and off-by-one errors’) which is quite nice. Today’s post was originally about naming things being hard… but I think I can extend it to something close to the two topics in the quote.
Naming Things is Hard
Even beyond all the usual conversations about bad naming styles – like Hungarian notation (the bad kind) and including data-type in the name and probably fifty other things we could all argue about, I think one of the toughest problems coders seem to have is describing what they are doing in a method or function succinctly in its name. I have certainly had the same problems, and I believe I used to overcome it through forgetfulness and tenacity – I would work long hours and was working two roles at once, so I would forget the details of something I had coded yesterday… and tenacity in that I would recognise this problem and rename things when I realised I was having a hard time recalling (or finding) them.
The idea I would like to convey about naming today is that of naming things appropriately for the layer of code you are working at (even if it is only a metaphorical layer that no-one has actually implemented separately in the code):
- At the coal-face of the data-access layer (if it exists), or simply close-to-the-database, I find it considerably helpful to name methods that describe the data action: CustomerInsert if a static method or perhaps Customer.Insert if you are using the Active Record pattern or similar. Of course more and more people are using ORMs like Gentle and Entity Framework where one does not ‘persist’ an individual record (you tell your session or context to save itself) – but if you have to write your own database methods I’d suggest naming them in the language of the database;
- In the ‘business layer’ if there is one (or closer to the UI if you prefer to put it that way) methods should be named in terms if the business rules or process that are done by the code. I suspect people find this surprisingly hard (or they simply do not try) and looking at code retrospectively may not be the fairest filter to a new team member as time passes and other developers add and change things.
In one recent example we found a set of routines that were called FinishUnfinishedProcess (or something like that) which in the client made perfect sense from a business perspective – but unfortunately some of those methods had accrued (or simply been written with) code that was not about finishing an unfinished / unterminated process, and related to other stuff that might be done around that time. This naming approach filtered down to the functions that inserted records to the database… and they were still name ‘blahUnterminatedProcess’. The issue (from a naming perspective) was that this method was actually fixing the business problem ‘we have things that we are incomplete’, and was inserted the resolving ‘complete process’ record into the database! So it appeared from its name to be about some sort of Unfinished Process, but in fact really all it did was Insert a Finished Process into the database. Perhaps this is an indication of top-down development driving naming… but if you go too far in this path then you will probably not reuse code as much as you could (because your general methods to construct an object from x, and to save an object to the database appear to be more unique than that).
Perhaps another way of saying this is that methods in libraries should be given names having thought about what function or ‘service’ the method is offering to the caller.
Caches Are Hard
The original quote referred to cache invalidation, but we have now seen at two clients that caches are themselves hard – if you create ones that you have to maintain yourself. At two clients in a row, I have seen attempts by people to optimise some aspect of the code and the fundamental reason has been that they have not known how to make the database run fast… or design their original data structures so that they can be used fast. The result has been people designing tables that they will ‘simply’ keep updated as things change; in one example that was a table that marked progress through an application process, and now we work with an example whereby the travel / work status of an employee is tracked and maintained on a single table (which translates into how that employee’s icon will appear in a website). One puzzling aspect of both these examples is that the cache has – in a sense – been designed uniquely for this application, and yet the developers have placed the cache in the slowest point in the system (which is to say, on a transaction aware database a network hop away). Why did neither develop an in-memory cache of objects that could prime itself from the database?
In both cases, (relatively) large amounts of database time are spent updating these cache tables – and in some cases a single inbound message that requires a single update to some status field or similar is then followed by several updates to the various ‘cache’ tables. And that’s to say nothing of the coding time that will be absorbed managing these extra copies of duplicate data. In the former case, a couple of indexes and some time spent on the queries made the relevant queries fast enough (but by then the ‘cache’ code was baked into the system and close to impossible to remove). In thew latter case, the development team did not even wait to have a performance problem – they assumed it would have.
It seems that developers find it relative easy to optimise prematurely! And when I say optimise I mean ‘do things that they think will make something faster’ but they won’t take the time to verify make it faster.