Designing scalable and resilient systems

One of the main features of the Microsoft Azure technology is scalability. By carefully designing our system, we can build it to manually or automatically scale (elastic scaling) to meet our business requirements as the business grows, or to cope with peaks in system load. Databases and storages can also be designed to be distributed across databases and storage partitions, allowing large volumes of data to scale while maintaining performance.

Note

Scale out means increasing computing capacity by increasing the number of compute instances in a system (for websites and cloud services, this would mean increasing the number of virtual machines). Scale up means increasing the computational resources of a compute instance (for websites and cloud services, this would mean more CPU/memory/disk allocation for a virtual machine instance).

By breaking down large systems into smaller, decoupled subsystems, which interact with each other in an asynchronous fault-tolerant manner, we can make the system, as a whole, more resilient. The following are examples of services we can use to achieve this:

  • Websites: When we build websites, if we're careful in making them stateless, we can create a website that can be scaled out across multiple shared or dedicated web servers as the demand changes. This increases the number of requests that can be processed and increases the processing capacity. If it's absolutely necessary to use a session state or caching, we can implement an Azure Cache to host the session state in a scalable, resilient manner, without depending on a single web server instance (this is known as persistence, and you may hear the term sticky session).
  • Cloud services: Worker roles can be used to perform long running processing tasks on dedicated VMs, which can be scaled up and down to meet processing requirements. Worker roles can be designed to scale by being careful about how they receive work, so there is no contention between instances, and work is not duplicated; we can use storage queues, Service Bus queues, and Service Bus topics for this.
  • Mobile services: Mobile services provide a number of great features for integrating mobile applications on all major mobile platforms. Table and custom APIs, a flexible authentication model, and the Notification Hub allow us to easily build new standalone backend services or backend services, which integrate into a larger enterprise system. Mobile services scale in the same way as websites, so the same design principles apply. The Notification Hub allows push notifications to be scaled out effectively using the Azure Service Bus, which would be otherwise difficult to achieve if we write our own push notifications services.
  • Decoupling applications: Storage queues, Service Bus queues, and Service Bus topics help us pass data between scaled-out system tiers in a robust, reliable way. Using these services, we can decouple services and allow systems that may not be online at the same time, or may not be able to keep up with each other to message each other and process messages in their own time. Message-locking mechanisms within these systems allow multiple applications to safely work in parallel without the need to design complicated custom voting and locking systems on shared data sources.
  • SQL databases: Scaling up databases to maintain performance over large volumes of data in Microsoft Azure SQL databases can be achieved by implementing a federated database, where data is split horizontally (in rows rather than columns) across multiple databases (this is also known as sharding). Implementing a federated database requires careful design to include a federation-distributed key, which allows data to be split across federation members (individual databases with the federation), with new records being distributed evenly and not just added to one member.
  • Table storage: As with Azure SQL databases, table storage requires a partition key, which must be carefully designed to allow data to be scaled and load balanced across partitions (depending on the chosen redundancy tier).
  • Azure Active Directory: Using Azure Active Directory, we can provide a consistent, scalable authentication and authorization mechanism for our systems. Large multi-tier systems that span websites, mobile services, and on-premises client applications can all be integrated into an Azure AD tenant, allowing users to use any of these systems with the same credentials and still have granular authorization via roles and groups.