data:image/s3,"s3://crabby-images/3d8bd/3d8bd1394e07bb36cfa2534405fcb7332ec5f5f6" alt="R Programming By Example"
Understanding the fundamentals of high-quality code
Code that is modular, flexible, and whose dependencies are well-managed, is said to be highly-cohesive and loosely-coupled. These terms are mostly used in object-oriented environments (more about these in Chapter 8, Object-Oriented System to Track Cryptocurrencies), but apply generally to any system. Highly-cohesive means that things that are supposed to be together, are. Loosely-coupled means that things that are not supposed to be together, are not. The following image shows these characteristics, where each of the circles can be a function or an object in general. These are the basics of dependency management. Many books focused on these topics have been, and continue to be, published. For the interested reader, Steve McConnell's Code Complete (Microsoft Press, 2004) and Robert Martin's Clean Code (Prentice Hall, 2009) are excellent references. In this book, you'll see some of these techniques applied.
High cohesion and low coupling (left) vs Low cohesion and high coupling (right)
The most important principles for high-quality code are:
- Make things small and focused on a single responsibility.
- Make the concrete depend on the abstract (not vice versa).
- Make things that are highly-cohesive and loosely-coupled.
By things, I mean functions, methods, classes, and objects in general. We'll touch more on what these are in Chapter 8, Object-Oriented System to Track Cryptocurrencies.
We start by creating two files: functions.R and main.R. The functions.R file contains high-level functions (mainly called from the main.R file) as well as low-level functions (used within other functions). By reading the main.R file, we should have a clear idea of what the analysis does (this is the purpose of the high-level functions), and executing it should re-create our analysis for any data that fits our base assumptions (for this example, these are mainly data structures).
We should always keep related code at the same level of abstraction. This means that we don't want to program things at the big-picture level and implement it with mixed details, and separating our code into the main.R and functions.R is a first step in this direction. Furthermore, none of the code in the main.R file should depend on details of the implementation. This makes it modular in the sense that if we want to change the way something is implemented, we can do so without having to change the high-level code. However, the way we implement things depends on what we want the analysis to ultimately do, which means that concrete implementations should depend on the abstract implementations that in turn depend on our analysis' purpose (stated as code in the main.R file).
When we bring knowledge from one set of code to another, we're generating a dependency, because the code that knows about other code depends on it to function properly. We want to avoid these dependencies as much as possible, and most importantly, we want to manage their direction. As stated before, the abstract should not depend on the concrete, or put another way, the concrete should depend on the abstract. Since the analysis (main.R) is on the abstract side, it should not depend on the implementation details of the concrete functions. But, how can our analysis be performed without knowledge of the functions that implement it? Well, it can't. That's why we need an intermediary, the abstract functions. These functions are there to provide stable knowledge to main.R and guarantee that the analysis its looking for will be performed, and they remove the dependency of main.R on the implementation details by managing that knowledge themselves. This may seem a convoluted way of working and a tricky concept to grasp, but when you do, you'll find out that it's very simple, and you'll be able to create code that is pluggable, which is a big efficiency boost. You may want to take a look at the books referenced previously to get a deeper sense of these concepts.
General code structure
The previous graph shows that our analysis depends on the abstract functions (interfaces), as well as the concrete code that implements those interfaces. These abstract functions let us invert the dependency between the concrete functions and the analysis. We'll go deeper into these concepts in Chapter 8, Object-Oriented System to Track Cryptocurrencies.