Programming: Subtleties and Complexities
Why do applications become unmaintainable juggernauts? How does a simple one-line hello-world program become a million line monster? What makes code complex?
As many human endeavours, it’s a glacial phenomenon. Complexity gradually creeps into a codebase without warning, it doesn’t suddenly happen overnight. There are many contributing factors and none of them are individually complex, it’s the summation of these factors that make code complex.
One source of this complexity are requirements. Requirements come and go but the code is always there. With changing requirements, the codebase is modelled to do things that it wasn’t originally intended to do. Code develops a history that is always there and will always have an influence on the presence.
On a fundamental level, it comes as no surprise that coders have developed certain mantras to stave off complexity. Don’t Repeat Yourself (DRY) meaning reuse code instead of copying and pasting the same code. Keep It Simple Stupid (KISS) points out the need to deal with todays problem and not to make assumptions about what tomorrow will bring.
However, what I will talk about is far more cunning. The form of complexity I’m talking about is more cunning in the sense that it builds on individuals knowledge that is hard to document or codify. This knowledge tends to be based on subconscious assumptions about the problem domain and feature requirements.
What follows are five examples of what I call “code subtleties”. These are things I experienced myself, so your mileage might well vary! Also I’m going to use Ruby, Python, Elixir and NodeJs to illustratively demonstrate what I have experienced — I hope the examples are clear enough so that the code is illustrative and not fundamental to the explanation.
Code Flow
Code flow is the order in which things happen. This can be very obvious, especially if the code is in the form of serial computation but can become very complex if code is executed in parallel or triggered by specific events.
It is easy to make subconscious assumptions of the ordering of how things happen. For example, the following Python code. Here I’m initialising a cfg
property from data stored in a Redis database. (The self
prefix in Python just means this object.) Starting point is the following:
class Topic(object):
def __init__(self, name, path, hlpr, klag):
...
self.redis = hlpr.redis_for_topic(name)
self.cfg = defaultdict(lambda: None, self.redis.get_cfg())
The assumption here is that Redis will be available when a Topic object is initialised and this is indeed the case ninety percent of the time. Unfortunately, the initial implicit assumption made by the code is false when Redis is unavailable.
So the code was changed to lazily initialise the cfg
property, meaning that the property is only initialised the first time it’s required. This introduces new object property _cfg
for storing configuration once it’s required:
class Topic(object):
def __init__(self, name, path, hlpr, klag):
...
self.redisdb = hlpr.redis_for_topic(name)
self._cfg = None
@property
def cfg(self):
if self._cfg == None:
self._cfg = defaultdict(lambda: None, self.redisdb.get_cfg())
return self._cfg
Ok this now works for the initialisation of the Topic object, however it still makes the assumption that by the time the cfg
property is accessed, the Redis instance is initialised and running.
How to document this subtlety? Sure I could write a comment in the code. I could also add a try/catch
around the initialisation of the cfg
property but the caller wants a dictionary. What should I return if there is no Redis? Returning an empty dictionary would make the assumption there is no configuration, which certainly isn’t correct.
Should the entire code fail or can it somehow do without Redis for a moment? Should Redis be started here in the code?
Here the subtlety is related to the flow of the code. Why Redis becomes available between initialisation of the Topic
class and the first usage of cfg
is explained by the code flow.
The codebase is such that between the initialisation of the Topic object and accessing of the cfg
property, there is in fact a parallel step that starts a Redis Docker container. In addition, the get_cfg()
method on the redis
property actually does a backoff retry, so it can handle bad network state or a Redis instance that isn’t completely started.
When reading this code without knowing the code flow, I would assume that the lazy initialisation has something to do with speed and storage requirements. So without knowing the real reason for this, a refactoring might remove the lazy initialisation and with it, introduce a bug.
This code fragility needs to be documented or clarified by tests. Unfortunately external factors led to this situation. If the initialisation of the Redis instance should change, then the documentation might well be out-of-date.
External dependencies often cause code subtleties and are often the source of error handling in the codebase. How external dependencies are handled depends on the context of the code, it might well be fine to ignore errors in accessing dependencies (as in the statsd dependency in the next example) or they become the source of a complete failure.
Edge Cases
Edge cases or boundary conditions, represent, in part, situations that need to be dealt with but which occur rarely. Examples of edge cases can be found in every try/catch
block: usually the catch is handling something that only occurs rarely but when it occurs, it needs to be dealt with. (Of course, this is only true if Exception Driven Development isn’t being practiced.)
This time a NodeJs example, incrementing the request.count.stored
statsd (statsd being a statistics daemon) counter by one:
try {
statsd.increment('request.count.stored',1)
} catch(e) {}
The catch
block is empty, which means the overall application doesn’t care whether the increment fails. The only thing the code is preventing is the failure of the entire application because of a statsd failure.
That’s the edge-case subtlety: we need to know that statsd is not vital for the workings of the codebase, so if it fails, we assume that our code will continue to work.
Of course, it could fail because there is no network, so no connection can be established to the statsd daemon. But does that need to be handled here? Should we be retrying this request on failure? Should we be checking for a bad network here?
The code explicitly answers all these questions with ‘no’. The assumption here is that if statsd fails, then either it’s a statsd specific issue, in which case we can ignore the failure. Alternatively, the root cause is so momentous that the codebase will handle that independently.
There are many other types of edge cases, sometimes it’s a subtlety just recognising edge cases as such.
The next example involves having a mental model of how data is represented in code. Simple types, such as integers and floats are relatively easy to cope with, however, complex types, even strings, can become harder to think about.
Data Representation Knowledge
Data representation knowledge is knowing the contents or format of a specific piece of data. As an example, the representation of a request within a string and the assumption made based on that. This time, Elixir code that takes a string and splits it whitespace-separated parts. But the string is only valid if it has four parts:
@spec msgs_to_commands([KafkaEx.Protocol.Fetch.Message.t()],
[Redix.command()]) :: [Redix.command()]
def msgs_to_commands([msg | msgs], cmds) do
case String.split(to_string(msg.value), ~r/\s+/) do
[_, _, md, py] -> msgs_to_commands(msgs, [md <> " " <> py | cmds])
_ -> msgs_to_commands(msgs, cmds)
end
end
(Elixir does variable assignment via pattern matching, so the case statement is actually assigning the md
and py
variables when matching on [_,_,md,py]
.)
The subtlety here is the format of the msg.value
. In this case, value is a space separated string, of the form path retry_count metadata payload
. It makes sense that the variables are named md
and py
, representing the 3rd and 4th parts of the string respectively. Since the overall string is representing a HTTP request, metadata is the header information and payload is the data being sent by the request.
Content knowledge can, as in this case, be made clear by adding some unit tests that demonstrate what is valid input for this function. However the deeper (or finer) subtlety is that somewhere, somehow, a new developer needs to know that messages that go into the overall system have to have this form, i.e., whitespace-separated strings with four elements. (The above code is only a small part of a larger system for handling HTTP requests.)
Of course, this becomes part of the documentation of the system however what happens when this changes? In how many places have is this documented? Or where else is this assumption made?
A second subtlety here is the type of the msg
and that it has a value
attribute. This is clarified by have a @spec
defining the required type to be a KafkaEx.Protocol.Fetch.Message
. This can be fixed by using strongly typed languages. Elixir is weakly typed but has compile-time support for type checking.
The next example involves the subtle difference between programming languages and can be a pitfall when mixing languages within a project.
Language Subtleties
Not all programming languages work the same. How strings are concatenated differs, the overloading of operators differ, etc. Generally these differences are obvious and differ syntactically. Where the subtleties start are when these differences appear with the same syntax.
For example, new_value = other_value or 1
- the if/else
shortcut. In Ruby if other_value
is zero, new_value
would be 0. In Python, if other_value
is zero, new_value
would be 1. The statement is valid in both languages, they just have a differing assumptions of what false is. Zero is true in Ruby and false in Python.
Comparing some other languages:
So there is no clear right or wrong here, since in isolation it makes sense for zero to be false or for zero to be true.
To remove this subtlety, the code needs to be made more verbose. For example, simulating Ruby behaviour in Python:
new_value = other_value if (other_value == 0 or other_value) else 1
Simulating Python behaviour in Ruby:
new_value = other_value == 0 ? 1 : (other_value ? other_value : 1)
Of course, these aren’t the only ways to do this however all solutions will remove the elegance of using x || y
for a gain in clarity. Unfortunately, less elegance means more verbosity, meaning a larger codebase, C’est la Vie as they say.
The last example involves magic numbers and is another example of how removing subtleties leads to more code.
Magic Numbers
A classic example of subtlety in code. Magic numbers instead of named constants can make code unnecessarily complex with little benefit. One argument is that interpreted programming languages don’t need to do an extra lookup.
A contrived example is the following pseudo code:
circumference = radius * 2 * 3.14
So without knowledge of the formula of circumference, this code is telling us that twice the radius multiplied by 3.14 is the circumference. But what do the magic numbers represent? (This could also be more complex by using 6.28 instead of 2 * 3.14
.)
PI = 3.14
funct circumference(radius) {
return 2 * radius * PI
}
Ok, now we know that the constant Pi has the value 3.14 and it is used to compute the circumference. But what about the numeral two? Without domain knowledge, it’s a magic number that seems vital to the solution.
So let’s make it clear what the two means:
PI = 3.14
RADIUS_TO_DIAMETER_FACTOR = 2
funct to_diameter(radius) {
return RADIUS_TO_DIAMETER_FACTOR * radius
}
funct circumference(radius) {
return to_diameter(radius) * PI
}
Now it’s clear that the diameter of a circle is twice the radius and that the circumference is the diameter multiplied by Pi. It’s just unfortunate that our codebase has swollen from basically one line of code, to two functions and two constants, all of which need maintenance and testing.
So removing subtleties from code can mean that an elegant solution becomes more verbose, introducing more code. On the other hand, removing subtleties clarifies the codebase, making assumptions obvious and transparent. Again, as most human endeavours, it’s a trade-off between too opposing positions.
Dealing with Code Subtleties
There are plenty more subtleties in code, be aware of them. An important part of dealing with subtleties is realising that they exist. Even if they can’t easily be removed, be aware that they may well be a source of errors that will fall on your feet.
One common thread through all subtleties is knowledge. Usually having enough knowledge about the problem domain, the system and the context of how the system was built helps to recognise subtleties. We all make subconscious assumptions based on our knowledge and mental models. When this is expressed in code, it leads to subtleties that are, for other developers, hard to understand since mental models differ.
So, one way around this is knowledge sharing. Open communication and a willingness to explain decision making is vital for clarifying code subtleties. However it’s important to remember that the developer who wrote the code cannot know what other developers don’t understand. So the burden of knowledge sharing bears largely on the shoulders of the reader not the writer. It’s up to the reader to ask questions and the writer to share their knowledge.
Code is not unlike a well written book: it has its plots, its characters and its mysteries. And like a good book, it shouldn’t have too many complex plot twists nor too many loose ends. Sometimes to understand a good book, we read between the lines. This is fine for a book but very bad when trying to understand code.
Avoid making the wrong assumptions on what code does, it can cause a lot of wasted time. As always in life, if unsure, ask someone!