The introduction of SOA (Service Oriented Architecture) created a lot of buzz a while back, but then got caught up with some negative connotations as is was associated with big and heavy enterprise processes. Micro services are the current hot trend. I will not delve into definitions and differences here. Both architectures espouse breaking the system into a collection of services across multiple machines that interact through well-defined APIs across a network. The APIs form a hard boundary that completely hides the internals of the service from the outside world. In this article I'll explore this boundary and will discuss various concepts as well as the nuances that might make or break your system architecture.
External Services vs. Internal Services
Modern distributed systems run behind a firewall that separates them from the outside world (users/partners/integrators) and provides a protected zone where trusted components can be executed and interact safely. But, you do want to service external users too, of course. There are dedicated public/external endpoints (often behind a load balancer) that allow external users (either human or applications) access your system. Many services will service only internal users. Some services will serve external users and some may service both. The external APIs will require much more control such as authentication, authorization and throttling. External APIs these days are often REST/HTTP to allow universal access due to the ubiquity of REST/HTTP clients and tools. More on REST/HTTP APIs below.
There are anonymous public APIs, but access to most external APIs requires authentication. There are many reasons for this such as keeping user data separate, providing a user specific experience, monitoring and tracking of user activity. There are many authentication schemes. HTTP Basic Auth provides a good starting point, but requires managing users and passwords. Social logins via OAuth/OAuth2 have become very popular. There is no one solution that fits all use cases.
The Layered API
A very good architectural approach is to layer APIs. Have an internal service with an unprotected internal API that does the heavy lifting and provides access to its state. Then, an external service with an external API whose job is to deal with all the messy details associated with external access control. When an incoming request is properly authenticated and authorized, the external service forwards the request to the internal service and returns the result.
Avoid Chatty Interfaces
Chatty interfaces are interfaces that require you to perform multiple calls to accomplish a task. They are very common in component-based in-process systems. There is very little overhead to call across components, so very fine-grained interfaces are often used. In a distributed system, where a request has to hop between multiple servers to reach the target service and back to the caller, this can have detrimental consequences to your service performance and availability and also make it harder to use. Imagine, that a caller, in order to perform some action, needs to make 10 different API calls. What happens if request #7 fails? The user now has incomplete state and has to employ some complex retry logic. It is much better to have coarse grained interfaces that are user-action oriented. This is a good example of where a layered API can be useful. Even if your internal services use a chatty interface, put an external service with a non-chatty API in front of them.
REST/HTTP vs. Native Protocols Over TCP/UDP
As discussed earlier, REST/HTTP APIs have many benefits. But, they also suffer from some downsides, such as performance. In particular, for internal services straight TCP or even UDP may give you a much better performance — both in terms of latency/throughput as well as resource utilization. If your internal services all use the same technology stack, you may even be able to communicate at a higher abstraction level and pass language-specific data structures. Note, that these data structures will be serialized across the wire, but this job will be done by a framework/library that can take advantage of the fact that both ends of the wire share the same foundation.
Blocking vs. Non-Blocking APIs
One of the most important aspects of API design is whether to use blocking or non-blocking calls. A truly blocking API that can hang forever is almost never appropriate. The more common approach is to have a blocking API with timeouts. If a response is not received within the timeout perimeter it fails. A non-blocking API will return immediately after a call and there has to be some mechanism to query later for the result and sometimes get progress information. Blocking APIs (even with timeouts) are typically appropriate for small scale systems only. If a service is unavailable and multiple clients all call it and hang until the timeout expires (and sometimes retry multiple times) it can very easily saturate other parts of the system and bring the whole system to its knees. Non-blocking APIs scale much better, but are more complicated to design and use.
Error Handling and Reporting
Error handling and reporting are critical. Every system will fail. Distributed long running system with lots of users, that are often modeled using service oriented architecture, have to handle failures with grace. When using REST/HTTP it is a best practice to use HTTP status codes to report errors. With custom interfaces, the error reporting mechanism is often tied to the framework used. Regardless of the error reporting mechanism there is a lot more that happens in the background. All errors must be handled to keep the system running. Errors should be logged for later analysis. Resource usage such as CPU, memory, disk space and network must be monitored to avoid crossing a threshold where the system is unresponsive or simply running too slowly. Often, proper error handling and recovery are more complicated at scale than the actual processing that takes place.
When you develop a distributed system based on micro-services or SOA, the design of APIs between components — and between the system and the outside world — has a significant impact on the success of the system. There are many factors to consider and the decision must be based on clear understanding of the various trade-offs and the requirements.