During the years, the cat jump reflex was a phenomena that intrigued the population. They always fell on their feet.
This is the oldest recording we have of such phenomena, captured in 1894.
In this project, we are going to make the equivalent of a cat jump.
That is, we are going to implement an arquitecture using OOP, for then pivot to FP.
Let’s hope that we land on our feet.
This is a CQRS diagram, summarized by text:
Now this is the same, but as a painting.
This architecture can be implemented in many ways, no need to couple to any paradigm or framework to do so. After all, it was invented a while ago. Not in 1894 as in the photo of the jumping cat, mind you, but very old and known in terms of CS history.
CQRS was invented when performance was an issue, (still is), and to avoid system overload the entire system was split into two: The writeside, were the most transactional bussiness logic was placed, and then the conclusions, which could take their time to be computed, as it ussually did at the time. (And for some reason, as of 2020, still does).
It is true that even though computers have gotten stronger with the passage of time, our hunger for conclusions has all but decreased.
Software, someone wiser once said, expands like gas to fill any container, no matter the size. And because of this principle is that we are still talking about CQRS on 2020. Because of performance.
CQRS means to not block the input of data, to let it come in without throttle. Because it will be then when we will aggregate it’s values to achieve a conclusion.
Until then fullfill those transactions as fast as you can!
Don’t let anyone block you!
That is, if I can be concise about it, my definition for CQRS.
Now, let’s cut the talk and let’s talk bussiness.
Our country_gdp_ranking.entities.domain is going to be about the GDP of countries. How much they earn yearwise.
To do this we are going to have:
We are going to rank them, this is an aggregation.
The top ten countries is an aggregation where we sort the countries by GDP and take the first ten.
It is a conclusion, as such it will be decoupled from the ingestion, which is the writeside. We are going to do it later, on the readside.
It will be eventually consistent with the input, meaning that it may take a while to achieve full consistency with the input from writeside, but it will get there. Eventually.
We can make a drawing out if this. Let’s see.
2.1.1
We can start by modeling Domain Driven Design itself!
Now that we have these useful guidelines, we can continue to make a little proof of concept on the test folder:
2.1.2 Domain services
We just went through with the DomainDrivenDesign building blocks, now we can use another abstraction proposed by DDD, services.
Our country_gdp_ranking.entities.domain would need a service that given N countries with their GDP, would rank them and take the first ten.
Let’s apply some TDD.
2.1.3 State, Commands, and Events
A Command comes to our system, it wants to have consecuences.
It wants to leave a fingerprint on your codebase and your database.
But Command is, in fact, a suggestion. We could start calling them Suggestion, because what is most important about them is that they can be rejected.
An Event is Command we did not reject, its consecuence.
We will persist events, make them the cornerstone of our resilience:
as with EventSourcing we can recreate the entire state of our country_gdp_ranking.application in case of failure. Our system crashes? No problem. Let it crash. We will recover our state, and we will do so by iterating over every event we stored.
The foundations of our model, the Event.
And a State, given an Event, will change. Simple as that.
I got a small citation from a great author on this topic.
Citing Greg Young.
One way commands dont exist.
Commands can be rejected, or they are valid
and become Events.
… If one way commands existed you could
extract money from an ATM, just by asking
for it. And only after the fact would the ATM
know that your bank account had no funds,
and then, what then! It would grow a pair
of legs and chase you over the street.
Our country_gdp_ranking.application, implemented:
3.1.1
DomainDrivenDesign and the Actor Model just click.
See the State we did just there? Its going to become the immutable data structure that our Actor is going to be in charge of. The actor is going to constantly mutate, but it will do so responsibly, by doing one assignment.
The state assignment.
3.1.2
Now we can introduce the concept of sharding.
The Actor Model is about distributed computing, and as such the actors can live on different computers but still be able to communicate among each other.
The computers we call them shards, and sharding is the act of distributing actors evenly across shards, where you, the developer, can work as if they were all located on the same computer. There is no need for you to worry about networking, you can just send the message and Akka will handle the rest.
3.1.3 Sharded Actors
Let’s test our luck implementing a trait called ShardedEntity, which is going to become responsible of the boilerplate that takes to create a sharded actor.
Note that we had to make a compromise:
In order to get us a sharded actor, now our messages would have an entityId and a shardId.
However, again with the same builder, boilerplate remover by inheritance we did before, we can sort this out and forget about it.
All we have to do now is make some tests, see them run green, and start working on the country_gdp_ranking.infrastructure components, such as the Kafka consumer which we are going to use.
Okay, now we can venture into one of the last key pieces of an Akka project, the communication layer against the outside world.
Kafka Transactions provide exactly-once-delivery guarantees, which means that not only is Kafka going to try over and over again to send us the messages if the fail to answer, but there is also the guarantee that we are not going to receive the same message twice.
3.2.1 The algebra
So we are going to start by making a few reasonable abstractions, such as MessageProducer and MessageProcessor, which we are going to implement in both Production and Mock environments.
The idea is to avoid giving too much responsability for what should be a simple country_gdp_ranking.infrastructure module.
Thus, we are going to opt for delegating serialization/deserialization to the final user, and offer to publish and read the messages as they come and go over the network. And while it is true that over the network the messages are just array of bytes, there is something familiar about Strings that does not add overhead but makes our code look a little more practical to use.
3.2.2 The interpreter
For production we mentioned we are going to use Kafka Transactions.
We are going to take care when publishing that the messages are partitioned so that there is a 1 to 1 relationship between the Kafka partitions and the Akka shardings.
This means that if we do everything right, every message will land on the node it will be processed.
We do this by literally sharing the hashing function between Kafka and Akka, for partitioning and sharding, respectively.
I am going to make a diagram to explain the situation we are solving:
Use Cassandra materialized views to sort your events by tags, and Akka Persistence Query so stream them securely to your projections/http servers.
Here is my response, to Lightbend.
Not only have materialized views been pushed back to the experimental stage by Cassandra because of performance reasons.
But Scylla, the trending Cassandra made using C++, which does run 10X faster thanks to a non blocking architecture and avoidance of garbage collection pauses does not support them, either. And for a reason, which is, once more, performance.
You heard that right. Before you think about tuning your app, or even phantom the idea that Kafka may be the problem, know that unless you run a relation of 200% the nodes you have for processing for EventSourcing, you are missing out. That is that if you have 3 akka nodes you ought to have +6 Cassandra nodes. Maybe with Scylla the ratio would decrease, but the idea remains, that ES is the main bottleneck of any reactive app.
Generalized, this becomes a known truth about microservices in general, and that is that transactions against the databases are ussually the biggest impediment for performance.
Generalized two fold, this becomes the most known truth in software.
That is that IO is always the bottleneck. File IO? Bad. Network IO? Worse.
So as a general rule of thumb, embrace your processor as much as you can and don’t let go, if you care about optimization.
So. Lightbend proposes to use Cassandra for communication between writeside and readside
They are wrong. Plain and simple.
I will play devil’s advocate for a moment here.
Let’s say you care most for the consistency of your data than of performance.
Then it becomes apparent that using a DB is much safer than using, say, a message queue, like Kafka. Kafka will destroy your messages in three hours without notice! What if Kafka fails! Your messages will get lost inflight!
Well, to that I have two answers, and will leave the reader to ponder if my answer suffices.
Kafka will destroy the messages in three hours without notice.
Kafka can fail! Your messages… they will be lost inflight!
Look, if Kafka fails, the least you are going to be worried about is about a few messages lost. Because your whole system will be down. You will be getting phone calls that a major earthquake has awoken Godzilla who is eating voltage right now out of the AWS us-east-2 servers located in Oregon. Believe me, if Kafka fails, not only you will have a bigger problem in your hands, but also if it does fail, I have to branches for you to explore, and I presume them to be excluyent of any other. I do not think there to be any other two possible situations you would have to go through this scenario.
First branch, first situation you can be in:
Second branch, second situation you can be in:
See? In conclusion, there is never an excuse for using the ES database as a communication tool. What to do?
Avoid Lightbend advice. Use a battle tested proposal
Use a purposely made message broker for inter process communication.
Which by the way, they have gotten better with the years, and now have exactly once delivery guarantees, unless you pull the plug. In which case, sure, problems happen, but as we revised just before, we can confront them and say, “You are not going to make my country_gdp_ranking.application run like a snail!”
This is the veteran solution, the better solution. Has aged well, be LinkedIn thanked, for it is now not what it was before, as I hope this to be true for all of us. Let time improve our faults. Cheers.
* who knew Die Verwandlung would be not about a man made croach, but in time, to be that of it’s writer made name of a technological marble, of an octopus of information who its influence would rival in pervasive potential with that of the author works themselves.
https://soundcloud.com/migue-lemos/kafka-load-test-movement-i
By this point, I have an implementation of the Akka connector for Kafka, Alpakka, which is very OOP.
As said, we had traits to obey, which were tighly coupled to Akka: A MessageProcessor ought to return a Future[akka.Done].
To make matters worse, they were tighly coupled to the Future monad.
In short, very standard, by the book OOP . And while it sounds like I am throwing an entire paradigm under the bus here, bear with me, I have worked with OOP engineers who did not pause for a moment to ask for a more general signature. Are there OOP engineers who would have made such a claim? Do they exist? My personal experience is that they are much more focused around making DI work for unit and spec.acceptance tests but to stop for a minute and think in more general, broader terms.
Count me in on this bunch, still. There is a reason why this is a cat jump. I had not stopped to think, either.
Now, let’s see what the FP community has to say about this.
The following code is norm among FP engineers.
type Topic = String
type MessageProcessor[O, AlgorithmOutput] =
Topic => Algorithm[AlgorithmOutput] => O
where AlgorithmOutput
may be Future[Done]
or Unit
.
This type parametrization used on this level is considered extreme by most OOP devs, and norm by most FP devs.
We are going to make a signature like the one proposed above, and we will just have functions that complement it with either library.
Let’s start by taking KafkaMessageProcessor, which is a class that extends the MessageProcessor trait, and change it to be a function that given the necessary requirements, namely the ActorSystem, get’s us a MessageProcessor.
The signature alone tells us that it is indeed going to be coupled to Futures and Futures of akka.Done, specifically, but that it’s okay for this to be case, because, after all, what’s to expect? This is the alpakkaMessageProcessor!
The good thing is that now we don’t have to make a unit test version of the alpakkaMessageProcessor. There is no need! As long as we have a function that suffices the MessageProcessor signature, we are good.
It’s liskov substitution and single responsability principles taken to the max: We can replace this function with any other implementation of the MessageProcessor signature, and it would work!
So much boilerplate and naming avoided in one swift brush of paradigm change.