Building Centrifuge: Part 1 — Theory

7 min readJan 23, 2019

Imaging building a system capable of sending many thousands of requests to different websites every second. Sounds easy, I hear you shrug, this is 2019. But how is it really done?

The following is the first of two articles describing my experience building such a system, inspired by Segments’ Centrifuge. Part one below deals with theory while, in classic style, part two will deal with praxis and subsequent learnings.

Centrifuge and systems like it, have the goal of doing many HTTP requests without the propagation of delays. Of course, I could simply do one request after another but then if I encounter a delay in one request, that delay will be propagated to all subsequent requests. So the intention is to isolate delays by parallelising the execution of requests, avoiding the propagation of delays to other requests.

So, for now, I want to start with some of the assumptions we need to make. Since implicitly we always make assumptions, be they based on requirements or the technology used, it’s important to clarify them from the start. As such, the technical detail can be ignored to a certain extent, what I would like to demonstrate is that when it comes to building software systems, it’s important to acknowledge the assumptions we make.

Requirements and assumptions aren’t always the same, in fact what I would like to point out is that sometimes our assumptions limit the requirements we can fulfill.

Assumptions

With every line of code we make assumptions about how the system works, what it supports and what its limits are. We even make assumptions about the end-user of that system. Clarifying our suppositions right from the start makes identifying the solution that much simpler. To wit, what am I trying to achieve?

Answer: Handling as many HTTP requests as possible in the shortest time frame as possible.

Sounds simple, but what does that even mean?

For example:

What type of requests? I.e. Get, Post, API or Website scraping?
Is response content relevant or just the response code?
Are there interdependencies between the requests? E.g. only do request B if request A was successful.
Should requests to be retried on failure?
What is success or failure of a request? E.g. what does a redirect mean in this context?
What about session cookies or authentication?
Is this a kind of HTTP proxy or are requests asynchronous?

There are many more such questions, most of which are use-case dependant. For my part, I’m going to make the assumption that all HTTP requests are supported; responses are relevant and will be sent back to a user specified endpoint; no interdependencies between requests; success and failure are the classic HTTP response codes and requests will be retried with exponential back-off. Any authentication will be limited to API tokens or Basic authentication but no cookies or OAuth, so no, this isn’t a proxy solution.

That was easy! So now I have made my assumptions clear, what are the consequences, or better said, how do those assumptions affect the architecture of the system?

One consequence of this methodology is that requests are stateless and self-contained when they are passed into the system, i.e., everything needed to make the request is known. As the request passes through the system, a request is assigned a system state, e.g., a request has arrived, is ready to be tried, failed, succeed, needs to be retried. What is unknown about the request is how long it will take. Simply, if we knew duration, we wouldn’t need systems like these.

Why don’t we know how long a request will take? Because the endpoint of the request has an indeterminate state: is it up, is it busy, does it handle the request, does it respond in a timely fashion? Which then throws up the next question: how long do we wait for the server to respond? Again, there is no clear answer to this but I will go into more details later.

Having a stateless system means that data is ephemeral and once a request has been handled and exited the system, no data needs to be retained within the system about that request. Historical data on requests will be maintained outside of the system.

What I would also like to do is send the response back to a user specified endpoint. But this endpoint won’t be part of the initial request, this is something the user would configure separately. Here I make another assumption: a group of responses will all be sent to the same user endpoint, instead of each response having a different endpoint.

To avoid coupling the components together, the system is going to annotate the request with response endpoint information upon arrival into the system. Hereby I maintain the goals of having self-contained requests and minimal coupling between components. Why this is minimal coupling will become clear in the second part of this article.

A particular bonus of having self-contained requests is that sharding can be done along any dimension. I can shard across the type of request, or the request server or the system-states of requests. I can do multi-level sharding. Keep this in mind when developing the system.

Since there is no fixed nor upper limit to the amount of traffic that will be handled, we need a strategy for self-scaling the system to the traffic requirements. I’ve left this intentionally vague since I don’t yet know which components need scaling i.e., where the bottlenecks are. Also when the system needs to scale and by how much it scales, but I know the system will have to scale. The alternative would be to build an extremely large system and have horrendous server costs.

Something to watch out for here is implicit coupling between components. That is, coupling that only becomes clear when we scale. Of course, in part, this is unavoidable since we have a system of interdependent components! More subtlety, the point is that all coupling should be clear and can be dealt with programmatically when it comes to scaling.

Another assumption we make here is that we aren’t building a system which is synchronous, i.e., requests don’t get handled immediately. Requests are sent to the system, the system replies with an identifier. Some time later final responses with the actual response of the request are sent to the initial requestor — the identifier being the matching component between the two datasets. So these are asynchronous requests but without maintaining a connection to the server, i.e., this isn’t Ajax but more comparable to sending emails and getting replies.

The implication of asynchronicity is that I can be a little more relaxed on the system side. Of course, users want a response as fast as possible, however, in a world where servers have delays (or are even down), things need to be retried. The extra overhead the system creates shouldn’t be overly large but needn’t be zero.

My particular take on this is that we can have loose coupling at the cost of taking a little longer to handle requests. One way of getting this loose coupling is to use data stores to buffer data between services. Meaning that components aren’t depended on other components, but rather on the data stores. Another form of decoupling are data streams where data flows through the system, similar to a river. Any component can grab data from the stream, just like taking a fish from a river. However, in this river everyone can take out the exact same fish! Fish are automagically copied for everyone. This makes sharing data among components straightforward and supremely easy.

How should stateless request be represented? Especially since these requests will be annotated with other details as they pass through the system. Requests should also be human readable even if this has an overhead compared to binary representations since it makes debugging far simpler. Since I would also like to have the freedom of using whatever programming language is best for a specific component, an encoding that is widely supported and implemented should be chosen. Additionally, using a binary format might present a case of redundant (preemptive) optimisations since I don’t yet know where the bottlenecks are.

I will also make the assumption that no request left behind and each request has exactly one final response. After all, the system should not lose requests and definitely not duplicate the same request. Both of these are key success indicators. System should do its best to guarantee both. But nothing is perfect, so besides good telemetrics, we also need good forensics tools to diagnosis issues.

The doubling of requests can be caused by external servers that take a long time (this is what I meant about defining how long we wait for a response from the external server). If the server is taking too long to reply and we break off too soon, we end up retrying the same request even though the server eventually acknowledged the first. An alternative approach here to wait a user specified time, then break off and not retry the request. This allows the user to decide what to do.

Providing users with actionable result information is important. For each response sent to the user, it should be clear what needs to happened. On success, ideally nothing much needs to be done. On failure, it must be clear how to act upon it and who is the actor. Is it the user providing the request or is it the end server accepting the request. Clear error codes and actionable error messages are two ingredients for a good solution.

Pulling all the strings together, what features does the system now have? Based on the assumptions I made, for me, the following system characteristics present themselves:

Stateless and self-contained requests
Limited storage within the system since data is ephemeral
Sharding can be multidimensional and multileveled
Self-healing and self-scaling components
Loose coupling using data streams and data stores
Requests are never left behind and exactly one response per request
Actionable error codes and messages

One general aspect of development that I always watch out for is flexibility. Flexibility is the freedom to change our assumptions without having to rebuild everything. This is something that is very hard and comes back to the idea that each line of code, cements an assumption into the system.

That was my take on how a single requirement can lead to several assumptions which then determine the characteristics of the system being built. There are a number of other assumptions that have been made (e.g., I plan on using a cloud provider where scaling can save money) but I’ve tried to focus on those assumptions that mainly affect the code to be written.

In the next part, I will discuss the code that was and is being written in terms of components built.

Building Centrifuge: Part 1 — Theory

Imaging building a system capable of sending many thousands of requests to different websites every second. Sounds easy, I hear you shrug, this is 2019. But how is it really done?

Assumptions

Written by Gerrit Riessen

No responses yet