Building Scalable Systems

TL;DR: Reactive Architecture helps you to build scalable systems. It's all about Consistency, Availability, and Scalability.

Consistency, Availability, and Scalability

A system is considered scalable if it can meet increases in demand while remaining responsive.

A system is consistent if all members of the system have the same view or state.

A system is considered available if it remains responsive despite any failures.

Performance vs Scalability

Performance and Scalability are related, but different concepts.

Performance optimizes response time.
Scalability optimizes ability to handle load.

If we have a system that takes one second to process a single request and can only process one request at a time, that means that our requests-per-second (RPS) will be one. We can handle one request per second. If we improve our response time, we reduce it from one second to half a second, now we will be able to handle two requests in a second instead of one. So we'll see that our requests-per-second has improved. Now on the other hand, if we take that same system, one request-per-second, and we allow ourselves to handle two requests in parallel, again that means that we double our requests-per-second. So if we started out at one request-per-second we now go to two requests-per-second.

Performance

Improving performance speeds up the response time.
Total number of requests (e.g. load) may not change.
Improving performance has a limit.

Scalability

Improving scalability increases the ability to handle load.
Performance of each request may not change.
In theory, improving scalability has no limit; but it can become really expensive.

Because we know that scalability has no theoretical limit, it means that when we build Reactive Microservices we tend to focus on improving scalability because we know that we can push that forever. That doesn't mean that we don't also look at performance, but performance tends to be something that we focus on less.

Consistency in Distributed Systems

A Tree Falls

We have two people standing in a forest. We've named them Martin and Grace and there's a pine tree, and a spruce tree. Both the pine tree and the spruce tree fall at the same time. Now, Martin is 340 meters away from the pine, Grace is 340 meters away from the spruce, and there's 340 meters between the pine and the spruce. We've chosen that number because sound travels at 340 meters per second. So we can say that Martin will hear the pine fall at about 1 second after it falls. Grace will hear the spruce fall at about 1 second after it falls. Now, let's assume for the moment that Martin and Grace have some magical power to know the difference between the sound of a pine tree and the sound of a spruce tree falling. Now if we were to ask Martin and Grace which tree fell first, they're each going to give us a different answer. Martin's going to say the pine tree fell and then the spruce tree fell about one second later. Grace is going to say that the spruce tree fell and then the pine tree fell about one second later. Who is correct in this scenario?

Spoiler alert: None of them!

The problem with Distributed Systems:

Distributed Systems are systems that are separated by space.
Physics puts an upper limit on information speed (the speed of light).
When two systems are separated by space, there is always tie required to reach consensus.
In the time it takes to transfer the information, the state of the original sender may have changed.

Everything’s Eventual

This means that the receiver of that information is always dealing with stale data.
This applies whether we are talking about Computer Systems, people, or even just the physical world.
Reality is therefore eventually consistent.

Eventual Consistency

Eventual consistency guarantees that, in the absence of new updates, all accesses to a specific piece of data will eventually return the most recent value.

In order to reach a state of consistency you have to stop all updates, at least for some period of time.

Other Types of Consistency

Eventually consistent models break down into many different forms, each with different tradeoffs.
- Eventual Consistency.
- Causal Consistency.
- Sequential Consistency.
- etc.
Traditional monolithic architectures are usually based on Strong Consistency.

Causal consistency means that causally related items will be processed in a common order. For example if A causes B then A will always be processed before B. So when one thing causes another, it always has to be processed in that order. We also have a more strict form which is called sequential consistency. In that case all items have to be processed in a sequential order regardless of whether they're causally related.

Strong Consistency

Strong consistency means that an update to a piece of data needs agreement from all nodes before it becomes visible.

Strong Consistency in an Eventually Consistent World

Physics prevents Strong consistency, so mechanisms are introduced to simulate it.
Locks allow systems to simulate strong consistency.
Distributed Systems are isolated to non-distributed locks.
Locks introduce overhead in the form of contention. That overhead has consequences to our ability to be elastic, it has consequences to our ability to be resilient, and it has other consequences as well.

The distributed system problem only exists when you have multiple things that are responsible for the same piece of data. As long as only one thing is responsible for that data, as long as we only have one instance of the lock, it's not a distributed system problem anymore.

One way that we can achieve strong consistency, is using a technique like locking. Now as we mentioned, locking introduces overhead in the form of contention, and that has a cost associated with it.

Effect of Contention

Contention

Any two things that contend for a single limited resource are in competition.
This competition can only have one winner.
Others are forced to wait for the winner to complete.
As the number of things competing increases, the time to free up the resource increases.
As load increases, we will eventually exceed acceptable time limits.

Amdahl’s Law

Amdahl’s law defines the maximum improvement gained by parallel processing.
Improvements from parallelization are limited to the code that can be parallelized.
Contention limits the parallelization and therefore reduces improvements.

Effect of Coherency Delay

Coherency Delay

In distributed systems, synchronizing the state of multiples nodes is done using crosstalk or gossip.
Each node in the system will send messages to each other node informing them of any state changes.
The time it takes for this synchronization to complete is called the Coherency Delay.
Increasing the number of nodes increases the Coherency Delay.

Gunther’s Universal Scalability Law

Gunther’s Universal Scalability Law builds on Amdahl’s Law
In addition to contention, it accounts for Coherency Delay.
Coherence Delay results in negative returns.
As the system scales up, the cost to coordinate between nodes exceeds any benefits.

Laws Of Scalability

Both Amdahl and Gunther’s law demonstrate linear scalability is almost always unachievable.
Linear scalability requires total isolation. Basically stateless.
Reactive Systems understand the limitations imposed by these laws.
They try to exploit the limitations, rather than avoiding them.

Scalability in Reactive Microservices

Reactive Systems reduce Contention by:
- Isolating locks.
- Eliminating transactions.
- Avoiding blocking operations.
They mitigate coherency delay by:
- Embracing eventual consistency.
- Building in autonomy.
This allows for higher scalability.
The goal is to reduce or eliminate the things that prevent scalability.

CAP Theorem

Distributed Systems must account for the CAP theorem.
The CAP theorem states that a distributed system can not provide more than two of the following:
- Consistency
- Availability
- Partition Tolerance

Partition Tolerance

The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network.

No distributed system is safe from partitions.
They can occur due to problems in network.
They occur when a node goes down, either due to failure or routine maintenance.
They can be short or long lived.
The CAP Theorem is really about what happens when a partition occurs.

Dealing with partitions
When a partition occurs, a distributed systems has two options:

(AP) Sacrifice Consistency, allowing writes to both sides of the partition. When the partition is resolved you will need a way to merge the data in order to restore consistency.
(CP) Sacrifice Availability, disabling, or terminating one side of the partition. During the partition, some or all of your system will be unavailable.

CAP Theorem Complexities

The realities of the CAP Theorem are often more subtle than they seem.
Systems that claimed to be CP have areas where they are not always consistent.
Systems that claim to be AP have areas where they are not always available.
Most systems balance the two concerns, usually favoring one side or the other.

The interesting thing is, often times when we build a system, we look to the database to provide a guarantee. So we will choose an available database, or we will choose a consistent database, and then we'll use that everywhere. But the reality is, that's not always appropriate. But basically what it boils down to is there might be areas of your system where you want availability and other areas where you want consistency. And if you rely strictly on your database, then that could be very hard to achieve.

Sharding

Consistency and Scalability

Sometimes we need to be consistent, but also allow our system to scale.
Consistency creates contention.
Contention means scalability will have diminishing, or even negative, returns.
How can we balance the need for strong consistency with the need for scalability? Isolating Contention

Isolating Contention

To improve scalability we seek to eliminate contention, and reduce crosstalk.
When they can’t be eliminated, we try to isolate them instead.
Locks with a broad scope (e.g. Table Locks) create a lot of contention.
Locks with a smaller scope (e.g. Rows/Record Locks) create less contention.
Reactive Systems can use a technique called Sharding to limit the scope of contention, and reduce crosstalk.
Sharding is applied within the application (rather than the database).

Sharding for Consistency

If the goal in our application is to provide consistency, then a technique that we can use to do that is something called sharding. Sharding is a technique that provides strong consistency. We're not talking about eventual consistency. This is for strong consistency. When we talk about sharding, we're not talking about at the database level. A lot of databases support a technique called sharding for consistency. The technique is basically the same, however, this is done at the application level rather than at the database level.

Now the advantages of doing it at the application level are two. First one is that we're not limited by the features of our database. We may have chosen a database that provides us with high availability for example, instead of strong consistency this allows us to bring strong consistency into that application, even though the database doesn't necessarily support it. The other advantage is there are techniques we can use here to minimize the amount of communication that we have to do between the application and the database, what that means is that you have less communication with your database which means your database is less of a bottleneck. It can also provide you with some nice performance boosts as well.

Sharding

Sharding partitions Entities (or Actors) in the domain according to their Id.
Groups of Entities are called a shard.
Each entity exists only in one shard.
Each shard exists only in one location.
Because each entity lives in only one location, we eliminate the distributed system problem.
The entity acts as a consistency boundary.
In order for this to work, we need to know where do all of these entities live.
A coordinator ensures traffic for a particular entity is routed to the correct location.
The Id of the entity is used to calculate the appropriate shard.
All traffic for a particular Id goes through the same entity.
Aggregate roots are a good candidate for sharding.

Balancing Shards

Id or Shard key is important to ensure shards are balanced.
A good shard key provides an even, randomize distribution.
Common sharding strategies include using UUID’s or hash codes to ensure a randomized distribution.
Poor key selection results in hotspots.
You should aim for approximately 10x as many shards as you have nodes, so you can have the flexibility of scale up and down.

Effects of Sharding

Contention in a Sharded System

Sharding does not eliminate contention, it isolates it.
Within a single entity there is contention.
The router/coordinator represents a source of contention as well.
A shared system minimizes contention by:
- Limiting the amount of work the router/coordinator performs.
- Isolating contention to individual entities.

Sharding, Consistency, and Scalability

Scalability is achieved by distributing the shards over more machines.
Strong consistency is achieved by isolating operations to a specific entity.
Careful choice of shard keys is important to maintain scalability.

Sharding and the CAP Theorem

Sharding is primarily Consistency (CP) solution, therefore it sacrifices availability.
If a shard goes down, then there is a period of time where it is unavailable.
The shard will migrate to another node eventually.

Caching with Shards

Caching is problematic in a distributed system.
How do you maintain cache consistency with multiple writers.
Sharding allows each entity to maintain cache consistency.
The cache is unique to that entity.

CRDTs

Availability and Scalability

We established sharding as a technique we can use when we want Consistency.
It gives us a good balance between Consistency and Scalability.
The CAP Theorem forces us to choose between Consistency or Availability.
What if we would prefer to have Availability rather than Consistency?
CRDTs provide a highly available solution based on asynchronous replication.

CRDTs = Conflict-Free Replicated Data Types

CRDT’s for Availability

CRDT’s are applied at application level.
Conflict-Free Replicated Data Types (CRDTs) are specially designed data types.
They are Highly available and eventually consistent.
Data is stored on multiple replicas for availability.
Updates are applied on one replica and then copied asynchronously.
Updates are merged to determine final state.
CRDTs are a solution for Availability (AP).

CvRDTs

There are two types of CRDTs: CvRDTs and CmRDTs
Convergent Replicated Data Types (CvRDTs) copy state between replicas.
- Requires a merge operation that understands how to combine two states.
- Merge operations must be:
  - Commutative - Can be applied in any order.
  - Associative - Can be grouped in any order.
  - Idempotent - Duplicate operations don’t change the result.
Commutative Replicated Data Types (CmRDTs) copy operations between replicas.

Example G-Set (Grow Set)

G-Set is a Grow Only Set.
Sets that only grow are simple to merge.
The merge operation combines all values from two sets into a new set, removing duplicates.

Other Types of CRDTs

In addition to Sets, there are other several existing CRDTs types including Registers, Counters, Sets, Maps, etc.
CRDT data types can be nested to create complex structures.
You can create new data types if you can define an appropriate merge function.

Effects of CRDTs

CRDTs in Distributed Data are stored in memory.
- Requires that the entire structure fit into available memory.
They can be optionally be copied to disk as well.
- This speeds up recovery if replica fails.
- During normal operations, all data is still in memory.
Best used for small data sets, with infrequent updates, that require high availability.

Limitations of CRDTs

They don’t work with every Data Type.
- You must be able to define an appropriate merge function.
Some data types require complex merge functions that require the use of tombstones.
- A tombstone is a marker that shows something was deleted.
- Tombstones can result in data types that only get larger and never smaller. So in some circumstances depending on the data type we have to be wary of these tombstones. This is also called CRDT garbage, and there are other data structures that use different types of CRDT garbage and so it's something that we have to be aware of when we use CRDTs is that these objects do have additional overhead associated with them, which may cause problems.

Distributed Data and the CAP Theorem

Distributed Data is primarily intended to be an Availability Solution. It is Eventually Consistent.
Depending on the write consistency you can choose to push it towards a more Consistent approach, at the cost of Availability.

Consistency or Availability

The CAP Theorem forces us into a situation where we have to make a choice between strong consistency, or high availability.

This is a challenge because… we want both right? Ideally what we would like, is to have both of those all the time. But we can't have that. That's what the CAP Theorem tells us. So the question is, how do we decide whether we aim for consistency or whether we aim for availability? Which one is more important? Now the choice between consistency and availability isn't really a technical decision, it's actually a business decision. This is a decision that should be made at the business level rather than necessarily by the developers.

It is also rarely an either/or situation. It's usually about balance.
The decision of when and where to sacrifice Consistency or Availability should be discussed with the domain experts and product owners.
We need to factor in the impact on revenue if the system is unavailable vs eventually consistent.

If you look at a business that sells products, for example an e-commerce, is it more impactful on the business to potentially provide out-of-date information, or is it a larger impact if you just can't buy anything? What's going to have a bigger impact on the business? What's going to have a bigger impact on the customer? How is it going to affect them in the long run?