Introduction To Scalability : Load Balancing and Database Replication

Discover what scalability means in computer science. Learn about vertical vs horizontal scaling, load balancers, and database replication. Perfect for beginner developers looking to understand the topic of scalability.

traffic : refers to the amount of data or requests sent to a server at a given time.

overhead : refers to any extra processing, time, memory, or resources required to manage or support the primary operations of a system.

Introduction

When you build a web application for a small user base and moderate traffic, the architecture is quite simple. It typically consists of three main components:

Client ( Presentation Layer ): This is the front-end interface where users interact with your application, usually a web browser.
Server ( Application Layer ): The server processes client requests, runs the application logic, and serves content.
Database ( Data Layer ): This is where data is stored, retrieved, and managed.

This basic architecture is suitable for a small user base but struggles with high traffic due to limitations in CPU, RAM, and bandwidth of servers. For large-scale applications, a more scalable architecture is required, as exceeding these limits can lead to performance issues and system crashes.

To overcome these limitations, there are two primary methods of scaling : vertical scaling and horizontal scaling.

Vertical Scaling :

When we talk about vertical scaling, we usually mean adding more RAM and CPU power to a single server. However, there is a limit to how much you can upgrade one server before you can’t add any more resources.

Additionally, a significant drawback of vertical scaling is that it creates a single point of failure, meaning it relies on a single server, so if that server crashes, there are no other backups to prevent the entire system from failing.

Horizontal Scaling :

Horizontal scalability, or scaling out, means increasing your system’s capacity by adding more servers to distribute workload across them. And it is often considered the most effective way to scale large systems.

Imagine this: You run a publicity campaign for your platform, and a famous artist shares it with their millions of followers. In just a few days, your site’s traffic skyrockets from a steady 5,000 requests per second to a staggering 200,000 requests per second. Your single server, which was handling those 5,000 requests with ease, suddenly finds itself overwhelmed as it can only manage 50,000 requests per second. The result? Your server crashes, and users are greeted with errors instead of accessing your platform.

To prevent this catastrophe and keep your site running smoothly, you need to scale your system. In this case, you decide to deploy five additional servers, each configured to handle a portion of the requests. By distributing the 200,000 incoming requests among these six servers, you can effectively manage the traffic and ensure that no single server gets overwhelmed.

But how do you ensure that each server gets its fair share of the traffic? This is where load balancers come into play!

Load Balancers :

A load balancer is a tool that distributes incoming network traffic across multiple servers. Its main job is to ensure that no single server becomes overwhelmed by evenly distributing requests.

Now that we have a load balancer sitting between the clients and the servers, we can handle more traffic, accommodating more users in your system.

Implementing this approach not only helps manage increased traffic but also ensures that if one server fails, the remaining 5 servers can still handle the requests. This keeps your system operational until the failed server is restored, achieving high availability and minimizing downtime.

How Load Balancers Work :

Load balancers use various algorithms to manage the distribution of incoming requests from clients across multiple servers. Some common ones are:

Round Robin: This algorithm sends requests to servers one by one in a set order. For example, it might send the first request to Server A, the second to Server B, the third to Server C, and then start over at Server A. It’s a simple way to make sure each server gets an equal number of requests.
Least Connections: This algorithm sends requests to the server that currently has the fewest active connections. It’s like picking the least busy server to keep things balanced and avoid overloading any one server.
Least Response Time: This algorithm directs requests to the server that can respond the fastest. It helps users get quicker responses by choosing servers that are already running efficiently.

Database Replication:

After scaling the application layer to handle more client requests, the database now faces a higher workload. To ensure the database can manage this increased demand without crashing, we also need to scale the data layer of our application.

A useful technique for scaling the data layer is database replication. It involves creating and maintaining copies of a database across multiple servers. This process ensures that data remains synchronized between a primary (master) database and one or more secondary (replica) databases. Replication serves several purposes, including data backup, scalability, and failover support.

Here’s how database replication works:

Write Operations: All data changes, such as updates, inserts, or deletions, are performed on the master database. This database is the source of truth for all data modifications.
Replication Process: These changes are then propagated from the master database to one or more replica databases. This process ensures that the replicas have up-to-date and consistent data.
Read Operations: Read requests, such as queries and data retrieval operations, are distributed among the replica databases. By offloading read queries to replicas, the master database is relieved from handling read operations, which helps balance the load and prevents it from becoming a performance bottleneck.

In the previous diagram, you may have noticed another load balancer placed between the application servers and the replica (slave) database servers. This load balancer distributes read queries across multiple replicas to balance the workload.

Load balancers can be strategically positioned wherever there is a need to distribute the workload across multiple servers, ensuring efficient resource use and maintaining high availability.

Benefits of Database Replication:

Data Backup: Replicas can serve as backups in case of data loss or corruption on the master database.
Read Scalability: Distributing read queries across replicas helps manage high traffic volumes and improves application performance.
Failover Support: In the event of a master database failure, one of the replicas can be promoted to become the new master, ensuring system continuity.

Two Master Setup Benefits:

If the master database fails, write operations (such as creating, deleting, or updating records) will experience downtime until the master is restored.

For example, in an app like Facebook, users might still be able to view profiles and browse the feed, but they won’t be able to post new updates or create new content.

To address this, we can use a more advanced setup where you have two master databases that both handle write operations. Here’s how it works:

Two Masters: Both master databases can accept write operations, and each updates the other and the slave databases. If one master fails, the other can continue to handle write operations, reducing downtime.
Automatic Failover: If one master database fails, the system can switch all write operations to the other master, ensuring continuous availability for both reads and writes.
Load Balancing: Both master databases can handle write queries, which balances the load.

Thanks for reading, and take care!