Photo by Victória Kubiaki on Unsplash
Design Pattern: Software abstractions
The importance of software indirection
9 min read
Tom is a very hardworking delivery man working for Ryder, a very large delivery company in Uganda. He was posted in the eastern part of the country and has been making deliveries there for 7 years now. By now, he knows the entire region and neighborhoods therein at the back of his hand. The neighborhood is very friendly and they also give him sizeable tips whenever he delivers to them. Tom has made lots of friends in the region, but his closest friends are the Mulindwa family, where he makes the most deliveries. On a busy week, he can make over 100 deliveries to the Mulindwas.
One day, he was given a package to deliver to the Mulindwas. He got into his delivery van to take the package to the Mulindwas. To his shock, he found the Mulindwas' house empty. It looked like they had moved. "To where?" Tom wondered to himself, "and what should I do with the package now?" he wondered even more. He tried asking around to find out where the Mulindwas had moved to, but it was all in vain. Not having any other choice, he decided to take back the package to the company. More and more packages were given to him to deliver to the Mulindwas, but he could not. And whenever he returned them, the packages were just placed in storage. They later sold them without anyone to claim them.
There is a lesson to learn about Tom's relationship with the Mulindwa family or the Mulindwa residence in particular. Tom was so accustomed to the Mulindwa residence that he never thought they'd ever move from that place. The Mulindwas also did not bother to inform Tom and the delivery company of their moving and their new address, so the company could not locate them. The attachment between Tom and the Mulindwas is what we call in software terms "tight coupling". When systems are tightly coupled, they are interdependent, and a change in one system may lead to the failure of the other.
Tightly coupled systems
Tightly coupled systems are characterized by programs that rely too much on each other to perform a given task. The programs expose to each other a set of API functions to call to run operations, and also the programs assume a lot of internal detail about the running of each other. For example in the case of Tom, he knew quite a lot about the time The Mulindwas woke up, their time for breakfast, the schools that John Mulindwa and Emily Mulindwa went to and the time they got back from school, basically he knew a lot about them. If you are worried about the security of the Mulindwa family in case Tom was a threat actor, then you absolutely should be. If one program knows a lot about how another program runs internally, there's a high chance it will be dependent on those steps to run before it also runs. And if program A exposes internal details to program B that can be modified to alter the running of A, then the system is almost beyond saving, as malicious code can easily corrupt it and the entire system
Tightly coupled systems are usually developed at the earlier stages of a project, and in most cases usually arise due to poor architectural design of the project. Tightly coupled systems are not scalable as they limit the interdependent programs from evolving independently of each other and according to the needs of each program. It means that if a change has to be made in program A, then program B has to be changed as well to reconcile with the new change in A to avoid failure. If there were over 100 programs or services interdependent of each other, it would be hard to reconcile the change in A among all the connected programs and hence impacting the scalability of the entire system.
Tightly coupled systems occur in various parts of a system, from variables inside a program to massive backend service clusters. So to deal with tightly coupled systems, we need to know at what level the coupling is happening. The general and most effective solution to coupling is introducing layers of abstraction into the system. These layers of abstraction should then only communicate to each other via message passing. the surface layer receives the message from the external program and transports it into the internal layers for processing. The internal layers then pass back a response to the surface API which is communicated back to the listening program. In case there's a change in the internal state of the processing program, the listening program is not affected. So to fix the problem with Tom and the Mulindwas, Tom is now supposed to deliver mail to the local regional mail receiver which will transport them to the mailboxes of the residents in that neighborhood. This limits Tom's direct interaction with residents. And when the Mulindwas move, they only leave their new delivery address to the regional mail receiver who can still route their mail to them, Tom can still keep delivering mail without knowing if the Mulindwas moved or not. It also relies on the heartbreaking moment where the Mulindwas have to say goodbye to Tom.
So to put some of the concepts discussed above into practice, he's how we would implement some software abstractions.
At the language level, if one is using the object-oriented paradigm, the objects should not expose their internal state. And if they should, external programs should not modify that state directly.
class Book (object): def __init__ (self, name:str, author:str, price:int, ) -> None: self._name = name self._author = author self._price = price def price (self): return self._price def author (self): return self._price def price (self): return self._price
for example in the code snippet above. we have a book class with three internal attributes:
_price. These attributes are only accessed (or should be accessed) by the internal methods of the class and external programs only interact with these attributes via the property descriptors. This means that external programs cannot modify these attributes, and if they should, we can extend the property descriptors to have controlled modifications. So even when we change the internal attributes to something else, the external programs can still interact with our class as if nothing has changed. For example, I'll change the
_author attribute to retrieve its value from an external database
import sqlite3 con = sqlite3.connect("bookstore.db") cur = con.cursor() class Book (object): def __init__ (self, name:str, author:str, price:int, ) -> None: self._name = name self._author = cur.execute('SELECT authors FROM Books WHERE authors = ? ',author) self._price = price def price (self): return self._price def author (self): return self._price def price (self): return self._price
The above code is not tested, but should not be used as well unless you place a
except block around the
_author attribute to protect you from name errors when the author is not found in the database. Anyways, with this level of detail added, our external program does not know that we've included this. They will happily interact with our API without fear.
Bigger systems combat coupling by introducing message queues and relying on the event-driven design pattern. Message queues help your system scale and allow you to plug in new services with little to no need for configurations and changes in the entire system. This helps you deploy products much faster and lessens your time to market. It also contributes to the high availability of the system.
CPython case study
The Python programming language is among the most popular, or even the most popular programming language in the world currently. It has many implementations, with the most popular being the reference implementation in C called CPython. Other implementations include PyPy, GraalPython, IronPython, Jython etc (The last two are no longer maintained.)
We turn our focus on CPython which was first published approximately 30 years ago by Guido Van Rossum. Guido designed the language to be as simple as possible and easy to learn, that's why it's popular today. But he made some design decisions back then that are raising many concerns today. One of the famous internal features is the Global Interpreter Lock often called the GIL. The GIL is supposed to help simplify the way multiple python threads interact with the python state such as dictionaries, lists and tuples to avoid their corruption by many threads trying to modify them at the same time. However, this exerts a heavy toll on multithreaded programs which would like to run multiple python threads in parallel not concurrently. Due to this very real need, many prominent libraries such as Numpy dug down to the C-API layer of cpython and side-stepped the effects of the GIL by releasing it around areas that require high computation and reacquiring it when it's time to interact with python.
Everything seemed calm until the efforts and push to remove the GIL became significant. There have been significant efforts to remove the GIL, however, all of them introduced breaking changes inside CPython which the external libraries rely on. So that means introducing these changes will render thousands of packages and libraries relying on these features useless, something that the CPython community can not let happen.
So there's an apparent need to evolve the internals of CPython to make it fit for the modern era of computation and data processing, but introducing those changes comes at a high cost which has been felt before when the community decided to move from version 2 to version 3. Some libraries never even recovered from that toll of rewriting hundreds of thousands of lines of python code. I've only highlighted the GIL, but many other areas have caused libraries to tightly couple with the python C-API. The problem comes from the issue I stated earlier above, Python exposes its internal state and internal API to external code which causes tight coupling.
There are significant efforts currently underway to solve this problem. One of them is the HPy project being worked on by the PyPy team. HPy is supposed to be a better C-API for the Python language. It's designed to be easily adopted by other implementations and hides the internal state and data of python internals behind a handle (hence the name HPy). The handle is our abstraction between the external code and the python internals, much as the regional mail receiver is an abstraction between Tom and the Mulindwas.
Abstractions are a very wide and very important concept in software development. I've not been able to mention a lot of concepts because this is meant to be a short blog post and not a book. I'll hopefully share more information on how to implement abstraction or indirections more efficiently in a future post (only if there's interest in it), but alas, this is the end of this blog post. Catch you in the next post.