Reflections About the Reality of a Codebase

It seems to me that in the multicultural multibelief global society we find ourselves involved has much more commonalities to polytheistic societies of old times that to a purely secular scientific framework.

Even if the higher elements of society operate in this manner, it is a worldview restricted to a certain intellectual elite which even with all their frameworks, models and techniques, they too fall into the hands of belief and dogmatic decision making.

Dogmatism occurs when we base our understanding of reality in positions outside of the inmediate data we receive to build up our reality, we somewhat fail ourselves to objective truth when we decide to delegate our thinking to a book, be it the bible, Das Kapital, a primer on physics or wikipedia.

Sometimes we have no other choice, the hard truth to be told here is that most of the assumptions and information we need to function in the modern world is backed up by the authority of the small set of people that control the main narrative, a large apparatus composed of media, academics, and the so called experts.

The propagation of the so called fake news and conspiracy theories have their source on this fact. Independent researchers seldom have access to the raw data of a research project, just the conclusions written in a white paper.

Do not get me wrong, some white papers are really well written and documented, and the private nature of raw data is mostly due to legal restrictions of the research institution and the unpracticality of sharing large datasets.

We see this issue in software engineering when we take documentation as the ultimate source of truth. We work under the assumption that there is a direct mapping between the requirements and specifications of the project and the underlying reality of whatever our compilers and interpreters get to produce. Most of us put our trust in our compilers as Ken Thompson pointed out.

In reality there is only one artifact from what all narratives of inner workings and mechanics operate, which is the compiled code itself. If your code is deterministic and has a consistent state, it will ony map to a single narrative of execution.

However, most documentation is written according to the idea that the original designers of the system wanted it to do or how to be used.

Through the years I have worked in several legacy projects that required a certain level of analysis due to the absolute lack of documentation. On these kind of projects the documentation was the code itself and as an engineer it was expected of you to extract enought of its meaning to make the right extensions.

Documentation is messy, in a fast paced organization it needs to be kept general or to simply be an explanation of the architecture.

If you refactor your codebase, which you should in order to manage complexity as it evolves, the narrative of your code might change and produce a mismatch between documentation and code.

Documentation becomes an artefact to be maintained in the same way that code is. It becomes a liability. Some documentation tools such as Doxygen understand this necessary coupling between code and documentation, specially when documenting APIs, and render the documentation from comments in the codebase.

This is why I am a supporter of the idea of writting code as documentation, which expects of a codebase to be easily readable. Another conclussion that can be extracted here is that comments should be minimized to the be as useful and essential as possible.

Probably with the use of tools such as ChatGPT we will find new ways to manage knowledge about the inner workings of our code without falling into dogmatic assumptions, although the cynic in me tells me that it will just have the assumptions of the data it has beed trained by, so they will just be propagated into the output and the internal logic of the model.

Aligning documentation and code is good, but aligning documentation, code and binaries is better. Although most widespread tools are fairly reliable, some of them have behaviour that is simply not documented nor easily infered by just analyzing the codebase.

This is where knowing how to use a debugger comes in handy. Even if this debugger is the one integrated on your IDE, the specific tools will mostly depend on the domain you are working on and the specific technologies, but a good guide is to at least be aware of this chart of Linux Performance Tools.

Linux Observability Chart

After these reflections I hope to have light some bulbs out there and provided some clarity to someone.

These days I am somewhat taking it easy and have spent some time away from the computer, or at least more than I use to, but I want to publish articles about more concrete technical work and experiments.

Stay tuned for it.

whoami

Jaime Romero is a software developer and cybersecurity expert operating in Western Europe.