Event-Driven Architecture: Basics, advantages, and challenges
Software architectures often resemble communication structures, as we find them also outside of software in physical systems. Systems have states that can change. We can consider such state changes as events, which can be sent as messages to other parts of the system. And these other parts can react by enriching knowledge or triggering processes. Business processes in organizations can be efficiently modeled and automated with this principle. This is one reason for the popularity of event-driven architecture.
Advantageous basics

Let us imagine a customer who wants to order a pizza from their favorite pizzeria. So, the customer calls, places the order with the waiter, and after a while, receives the pizza. The initial event here is the customer's phone order. Looking closer at the process triggered in the pizzeria, we notice further events. The waiter writes an order ticket and pins it on a board in the kitchen. The chef reacts to this event, bakes the pizza, and places it packaged at the pickup counter. The delivery driver reacts to the event of the pizza at the pickup counter and delivers the orders, thus completing the business process.
In this example, we can recognize the fundamental properties of event-driven architecture. The business process is controlled by events and executed by various event processors. These handle sub-tasks and trigger new events, which further event processors may react to. The communication between event processors runs asynchronously.
Asynchronous here means that the waiter does not wait for the chef to finish baking the pizza. It could also be that the chef does not immediately react to the ticket, if there is a backlog of tickets pinned to the board and other orders are up first. The chef also does not wait for the driver to come by the pickup counter to take the pizza, but instead, moves on to the next tickets pinned to the board.
This decouples the participants in time, allowing them to avoid waiting times and instead work productively. Trusting that the next person in the workflow will follow through, they are content to keep producing events knowing the person they concern will handle them when the time comes.
For the pizzeria to operate efficiently, the workers should be fully engaged. The workload is determined by the rate and size of the orders, which can change over time—fluctuating throughout the day and week or growing long-term as the customer base expands.
It is possible that spikes in demand push the delivery driver to his limits, when many orders come in a short period and he cannot keep up with the deliveries. What would happen? The chef would not wait for the driver and would keep baking pizzas, which would accumulate at the pickup counter between the driver's trips. This would increase the time between baking and delivery, lengthening the customer's wait time. During a short spike in demand, the pizza backlog at the pickup counter would quickly clear. The driver, though overloaded, would reliably deliver as many orders as he could, while some orders would remain slightly longer than usual at the pickup counter.
This makes the system resilient to demand spikes: a benefit of event-driven architectures. If communication were synchronous instead of asynchronous, responses could take so long that communication would fail and errors would occur. This does not happen in event-driven architectures. In computer systems, a prerequisite for this property is a stable network and a sufficiently sized message broker. The broker manages the events of the distributed event processors and facilitates their communication.
Problematic delays can arise if an event processor is permanently overloaded. If, for instance, on Friday evening, the order rate is much higher compared to other weekdays, the backlog at the pickup counter could become so large that the long waiting times would be unacceptable for customers. This would be bad for customer satisfaction. What can be done?
The obvious solution: If one driver cannot handle it, two can! So, an additional driver is assigned for Friday evening, doubling the delivery rate, keeping the backlog at the pickup counter and customer wait times low. Customers do not care who delivers their pizza. The content of the delivery does not change. And for the chef, nothing changes either. They can keep working as usual. Similarly, one could increase the number of chefs if the kitchen became the bottleneck, if more and more tickets were piling up on the board.
Here, we see two more advantageous properties of event-driven architecture. The decoupled event processors can be scaled independently to adapt to varying demand. And by processing events in parallel, high performance can be achieved.
Complex error handling

Unfortunately, errors can never be completely ruled out. There are many sources of error and techniques to deal with them. Let us look at an example in the pizzeria. The waiter has illegible handwriting, so now and then, the chef bakes a pizza that does not quite match the customer's order. To fix this erroneous event, the driver should check the order and, in case of a mistake, not deliver it, but place a corrective order back in the kitchen. The chef reacts to it and creates a correct order as usual. In this way, the error is removed from the system through event-driven error handling.
Error handling in this case was easily implemented with a small extension. In larger systems with many event processors and workflows, however, it can become quite complex. More complex than in other architectural styles. Sometimes, many events are required to fix errors and undo wrongly changed states. This is a disadvantage of this architecture style. It is related to the advantage that event-driven architectures can be easily extended by adding more event processors. This leads to a tendency towards complexity, which increases the more tightly coupled the event processors become. It was mentioned earlier that the processors are decoupled in terms of timing when processing events. But there are indeed logical couplings. An event-processor is coupled to the event producers whose events it consumes.
Domains and mediators
To mitigate complexity one can attempt to implement logically separable domains in isolated workflows whose events are not used by event processors of other domains. At least then, the event processors of different domains are decoupled. This reduces complexity at the cost of extensibility. Mediators can be used to control the internally connected workflows, which coordinate the involved event processors. Unlike regular event processors, a mediator knows the workflow of its domain, can control events within it, and trigger corrections when necessary. This simplifies error handling but costs performance, as coordination adds extra time. Without a mediator, this wouldn't be the case. A mediator can also become a temporary bottleneck for the entire process during peak times until it is scaled accordingly.

In the pizzeria, the waiter is suitable for the role of the mediator. The process could be changed so that the chef places the finished pizzas in the kitchen, where the waiter fetches them and puts them to the pickup counter. This offers the following advantages. The waiter can inspect the pizzas himself and correct any errors caused by his handwriting. Moreover, he can influence when the orders are delivered by the drivers. Since he knows all the orders, he could group orders that need to be delivered to similar areas to optimize the delivery routes of the drivers. And if error handling is ever required during delivery, he could take care of that as well. With this variant, there are more possibilities to control the process. However, the waiter as the central control authority becomes a disadvantage if he needs to take a break or is overwhelmed by tasks. In this case, the delivery of already registered pizza orders would be delayed. This disadvantage did not exist in the process without a mediator.
Summary
This concludes this introduction to event-driven architecture. In the non-software example of the pizzeria, we recognized parallels to event-driven architecture and discussed the fundamental properties of this architectural style. Along with the advantages of performance, scalability, resilience to demand spikes, and expandability, we also highlighted the challenges, such as increased complexity and more difficult error handling. These pros and cons must be weighed according to the specific requirements of the application to find a suitable architecture. If the application involves many data sources that regularly produce events, it is worth considering an event-driven architecture.
If you want to learn more about handling large amounts of data or implementing event-driven architectures, feel free to visit our IoT Analytics page!
Source
[1] Mark Richards, Neal Ford (2020): Fundamentals of Software Architecture, O’Reilly Media
Tags
#Architektur, #Broker, #Ereignisgesteuert, #Ereignisse, #Eventbasiert, #EventBridge, #Events, #IoT, #Kafka, #Lambda, #MQTT, #MSK