An Introduction to Reactive Systems

TL;DR: Reactive Architecture seeks to provide an experience that is responsive under all conditions.

Why Reactive

If you look maybe 10 or 15 years ago you it wouldn't have been uncommon to see a software installation that is installed on maybe two or three nodes, maybe as many as tens of nodes if it's a large installation. Also, the amount of data that we're dealing with is very different. Again going back in time we can see systems that were built to handle gigabytes of data; most of that data was at rest. What we mean by at rest is the data would typically not be changing actively.

What would happen is you'd have maybe a batch job or something that would run once a day, pull in a bunch of data and then that data would remain fairly static for a period of time. We also had long maintenance windows. So when we look at older systems it wouldn't have been uncommon to try to access that system and see a message like "this is down for maintenance please check back in a few hours". Those types of scenarios were fairly frequent. You still see that from time to time but it's certainly reduced in frequency.

If you look at newer systems, yeah we might still see small installations of two or three nodes but we also see some that go into the tens, the hundreds, or even the thousands of nodes, depending on the type of service. We also see instead of gigabytes of data, we're now dealing with petabytes of data in some cases. And that data isn't that rest anymore it's changing constantly. We now have these massive fire hoses of data that we're trying to consume. If we happen to fall by behind when trying to consume that data it's very hard to catch up. It can take days or weeks to try to catch up if we fall behind.

It's no longer acceptable to see a service or a website where you go and get that message that says it's down for maintenance. Those down for maintenance messages often came as a result of in upgrading the software or maybe running a database script or something like that. That's not acceptable anymore. We still need to run those scripts and we still need to do those deploys, we just have to figure out different ways to do it. That's the nature of the change that has happened but when we talk about that we're talking really about technical problems.

The primary goal of reactive architecture is to provide an experience that is responsive under all conditions.

The Goal

We want to avoid reaching a point where the application can't scale to meet our demands because it's simply not capable. At the same time, we need to make sure that as we're doing that, we only consume the resources that are necessary to support whatever our current load is. We don't want to be in a situation where "yes we can support 10 million users but no matter how many users we have we are always using enough hardware to support that full 10 million users."

There are cases where we have a certain type of failure that is going to affect the user no matter what we do about it. When that happens, we want that effect on the user to be as small as possible. Having the whole application go down completely is a large effect on the user. On the other hand, restricting access to certain parts of the application for a period of time, that's a much smaller effect on the user.

We need to make sure that no matter whether we have 10 million users or 10 users the user experience remains roughly the same. There might be small variations but it should be roughly the same no matter how many users we have and no matter whether there's a current failure or if everything is working just fine. If we can do that, if we can maintain a consistent level of quality and responsiveness despite all of these things, then we have built a reactive system.

The Four Principles of a Reactive System

The Reactive Principles Manifesto: http://www.reactivemanifesto.org/

There are four basic principles of a reactive system. First and foremost, the absolute most important thing is it needs to be responsive. A system consistently responds in a timely fashion, always.

In order to be responsive though, you have to be resilient. This means that a reactive system needs to remain responsive even when a failure occurs.

A reactive system will also remain responsive despite changes to system load. We call this Elasticity.

Elasticity implies that we can not only scale up when needed but then when the load decreases we can scale back down in order to conserve resources.

And finally all of this is built on a foundation of asynchronous non-blocking messages. It is Message Driven.

Resilience provides responsiveness despite any failures. This is achieved through a number of techniques including replication, isolation, containment, and delegation. If you're not familiar with those terms, replication basically means we have multiple copies. Isolation means that services can function on their own, they don't have external dependencies. Containment is a consequence of isolation: it means that if there is a failure it doesn't propagate to another service because it is isolated. And delegation means that recovery is managed by an external component.

Reactive Programming

It's important to understand the difference between Reactive Systems or Reactive Architecture and Reactive Programming, because they are different. They're often misunderstood but they are, in fact, not equivalent. Reactive Systems apply the Reactive Principles at the architectural level.

Reactive Programming can be used to build Reactive Systems, and quite frequently is used to build Reactive Systems, but just because you use Reactive Programming doesn't mean you have created a Reactive System.

Reactive Programming on the other hand can be used to support the construction of Reactive Systems. What it does typically, when you're looking at your futures and your streams and all those kinds of things, it takes a problem and breaks it up into small discrete steps. Those individual steps are then executed in an asynchronous non-blocking fashion, usually through some sort of callback mechanism.

The Actor Model

The actor model is a programming paradigm that supports the construction of reactive systems. Again like with any other reactive programming tool, just because you use the actor model doesn't necessarily mean you have built a reactive system.

First off, the actor model by its very nature is message driven. When you build systems using the actor model all communication between actors is done using asynchronous non-blocking messages. It also provides abstractions that give us elasticity and resilience. Because of the fact that it is message driven, it makes it very easy to make it elastic and to make it resilient.

So what are some of the fundamental concepts of the actor model? First, is that all computation occurs inside of an actor. At some point in your application you are gonna have one or more actors (ideally more than one, a single actor is not actually useful) so you will have some combination of actors. All your computation will occur inside of one of those actors or across many of those actors. Each of those actors is addressable, it has a unique address. And those actors communicate only through asynchronous messages. That is the only way they can talk to each other.

The message driven nature of actors provides us something that we call location transparency. Our actors communicate with the same technique regardless of location. This means that local versus remote is mostly about configuration.

We call location transparency when the original actor sends a message with no knowledge of the location of where that message is going to go. It doesn't use a specific technique to send to a remote actor, there isn't a different API to send a remote actor. The API is identical no matter whether it's sending to a remote actor or a local actor.

Transparent remoting basically tries to take remote calls and make them look like local calls. Everything looks like a local call even though it may be remote. What this does is it hides the fact that you're making remote calls. As a result, it can hide potential failure scenarios.

Location transparency, on the other hand, takes the opposite approach. It makes local calls look like remote calls. This basically means you're always assuming that you're making remote calls.

You can use a load balancer, message bus and, a service registry to accomplish the same things that the actor model provides.

The actor model can be reactive at the level of actors and actors are within a microservice. You can have many actors within a single microservice. Whereas using these tools, you have built something that is reactive at the level of microservices as opposed to internally.