What exactly does a system design interview involve?
Will you be designing databases? Figuring out the server layout? Coming up with business logic? Thinking through edge cases?
All of the above, potentially. It depends on the system you're asked to design and what your interviewer wants to focus on.
So, how do you prepare?
Two ways:
- Learn a process for approaching open-ended design questions, so you know how to get started and what to cover.
- Learn some patterns and tricks that're common in system design, so you're ready to pull them out quickly if they'd help with your system.
In this article, we'll do both.
Here's the process. Just remember, "FAME":
- Functionality: Establish what our system needs to do.
- Analysis: Crunch some numbers. Come up with rough estimates of scale: how large is our user base, how much data will we store, and how many requests will we be handling.
- Minimum Viable Product (MVP): Sketch out the system at a small scale. Figure out the big pieces of the system—data models, machines, and business logic. Build something that'll work for a small number of users.
- Expansion: Scale up the MVP to handle the numbers from our earlier analysis. Grow the system from a few users to a global scale, one step at a time.
Let's apply this process to a classic system design example: design Reddit.
Step 1: Functionality
System design questions can start off feeling open-ended and ambiguous. Your interviewer wants to see you take on the abstract product and break it down into specific functionality.
So, right off the bat, focus on answering these two questions:
- What does this system need to do? Suggest a lot of options, and let the interviewer clarify what features they're most interested in.
- How will users be using this system? Will they be using a smartphone app? Logging in online? Or are you just providing an API that others can use?
As the system's features start to take form, it's okay to make simplifying assumptions. But make sure to state what assumptions you're making, and check that your interviewer is okay with them.
At the end of this step, you should be able to clearly say exactly what functionality you're building.
What does our Reddit system need to do?
Step 2: Analysis
At this point, we know what we're trying to build. The next step is to crunch some numbers and figure out how big our system needs to be.
The goal in this step is to come up with some rough estimates of what we need to scale up to. (Step 4: "Expansion")
Here are some questions you might discuss with your interviewer:
-
How many users will be using this system? Try to come up with a ballpark estimate, and check that your interviewer is satisfied.
Facebook (in the top 10 most popular sites) has about 2 billion users. Netflix (in the top 50) has about 100 million users. Github (in the top 100) has about 4 million users.
-
How fast does our site need to be? For systems that interact with users, faster is almost certainly better. But for systems that aren't interactive, we might have more time to handle a request.
When loading a website, 100ms feels instantaneous to the user. Google aims to produce a response in 500ms. 1 second is a noticeable delay, and 10 seconds loses people's attention.
It takes about 100ms to send a network packet from New York to California. And it's roughly 200ms to send a packet one-way from the U.S. to Australia or Europe.
-
How large is our data? If our data are small and fit in RAM, we'll be able to look them up faster than if we need to access persistent storage.
A tweet is 140 characters, an HTML website is roughly 100 kilobytes, and a Youtube video could be multiple gigabytes.
It takes less than one nanosecond to access data in an L1 processor cache. It takes 100 times as long (100 ns) to access data in RAM, 100,000 times as long (100 \mus) to access data on a solid state drive, and 10,000,000 times as long (10 ms) to access data on a rotating hard drive.
To put that in perspective, if it took one second to access data in a L1 cache, it would take two minutes to access data in RAM, one day to access data on an SSD, and four months to access data on a HDD.
-
Are there patterns in system loads? Is our usage even over time or bursty?
Interview Cake has a spike in traffic in the middle of the week TODO FIX LINK when we blast out a weekly practice question to over 100,000 subscribers.
What do our numbers look like for Reddit?
Step 3: Minimum Viable Product (MVP)
We've nailed down what our system has to do and what loads it needs to handle. Let's start building the actual system!
As a starting point, focus on building a minimum viable product (MVP): a basic system with the required functionality that works for a small number of users. (Put aside the large estimates for a second ... we'll come back to them soon.)
While building an MVP, you'll need to consider:
- Users: Who interacts with your system, and how?
- Machines: What hardware is used in your system? Do you need web servers? Database servers? Are your users on desktops, laptops, phones, or tablets?
- Data: What data are you storing? How is it organized? Is there a complex data model?
- Business Logic: This is the software "glue" tying everything together. How do your machines and data interact? What is involved in processing a user request?
You'll emphasize different parts of this in different questions. For some systems, the tricky part is getting the data model just right; for others, the business logic may take the most time.
Go ahead and draw out the big components that make up your system. Add in arrows to show how data and requests move from one component to another. Then, walk through your core functionality and spell out how the parts of your system work together to fulfill it.
What are the key parts of our system architecture?
Step 4: Expansion
By this point, we've got a minimum viable product. Now for the fun stuff: how can we transform those basic building blocks into a massive, scalable, and flexible system.
Look back at the numbers that came out of the analysis step. Chances are, your MVP has some bottlenecks that mean it won't be able to handle the expected load. Let's fix that.
You'll want to scale out your architecture to eliminate these bottlenecks as you find them.
As a starting point for scaling, figure out the type of bottleneck you're trying to address. It's usually one of these:
- Bottlenecks with servers.
- Bottlenecks with databases.
- Bottlenecks with business logic / code.
You probably won't be asked about all three in a single interview—it would just be too much to cover. That said, you should have some strategies for each of them at the ready, so you'll be prepared for whatever part of the system your interviewer wants to focus on.
Step 4A: Removing Server Bottlenecks
As our system grows, one server might not provide enough processing power to meet our needs, and we'll want to add in more.
The most common way to do this is to add many identical machines, dividing up the work in between them. By keeping these machines interchangeable, we can easily add more as needed, and we can quickly replace one if it breaks.
How should we split up the load between machines? For incoming requests, one common option is to use a load balancer: a piece of networking equipment that takes care of spreading incoming requests between online servers:
Have this picture in mind when you're interviewing—chances are, you'll end up drawing something similar on a whiteboard if you're asked to sketch out the system architecture.
See this in action in our TapChat messaging app design question.
How can we scale Reddit's servers?
Step 4B: Removing Database Bottlenecks
Slow databases can lead to slow systems; here are some ways to make database reads faster.
- Add an index on common lookup fields. Chances are you'll have a few fields in your database that you use in most of your queries. Consider adding an index on those fields. This is super simple to do—it's just a database command—and it can drastically speed up your accesses.
- Add in caches. In lots of systems, a small amount of content can create a large share of the traffic. (Think about that viral Youtube video that everyone's watching, or the latest tweet from a celebrity.) Consider adding a query cache, which stores the results from earlier database lookups to avoid performing the same query twice. For something more generic, take a look at Memcache, which can cache any key/value pair. (If enough of our content is served from a cache, we'll get performance that's close to the speed of storing everything in RAM.) Careful though: you'll need to augment your business logic to make sure cached data stays in sync with the underlying database.
- Add in read-only replicas. If lots of reads are overwhelming your database, make more copies. Adding multiple replicas is an easy way to reduce the load on all of the servers. Be careful though: whenever you write to your database, you'll need to update all of the replicas too, or accept that users might see stale data.
- Use faster storage. If your database is pretty small, can you fit the whole thing in memory? If not, can you at least use flash SSDs? Both of these will be faster than rotating HDDs.
Want to see an example of optimizing database reads? Head over to our Ticket Sales Site system design question!
All those tricks definitely help us. But what about if we're storing a huge amount of data? Like, too much data to fit on a single machine?
Here are a few options:
- Shard your databases. If your database is too big, then break it into smaller pieces (called "shards"). For example, instead of one humongous database for all user information, have two databases: one for users A-M and another for users N-Z. Or, use consistent hashing to divide data between machines.
- Use NoSQL databases. SQL databases are popular because they have lots of support for atomic operations, strong guarantees about consistency (everyone seeing the same database data), and rich query support. But they can be hard to scale. A NoSQL database, like MongoDB, Redis, or BigTable offers fewer features and query options but can scale to be massive.
Want to see an example? Check out our Crawler system design question!
How can we scale Reddit's databases?
Step 4C: Removing Code Bottlenecks
With ample hardware and fast databases, you can still speed up the user experience by optimizing your business logic.
- Shrink It Down. When you're sending data to users, make it as small as possible. If users are downloading content, compress it to make the download smaller. If they're loading your website, minify or compress any embedded code or scripts. Overall, reduce what you're sending to users so they get the full response faster.
- Send Content As It's Requested. When you load Facebook, only a few top stories are sent back. More content is sent back to you in chunks as you scroll down. This shrinks the amount of data Facebook has to send to you and makes that first load really fast.
- Send Responses Before You're Done. When processing a user request, immediately send back a response. Even if it's an in-progress spinning logo or a brief message saying that you're "working on it"—it'll make your service feel more responsive. (The fancy term for this is asynchronous design.)
- Do Work Ahead of Time. Computations that you do after a user has requested something delay your response. Can you generate static HTML ahead of time and serve that? Or, can you precompute answers to the most common queries and have them ready to go when a user asks for them? Think carefully about what needs to be done in real-time and what can be done ahead of time, and minimize the first one.
How can we scale Reddit's business logic?
Want another example? Head over to our Ticket Sales Site question for a deeper look.
Finishing Up: Repeat!
At this point, we've taken our MVP and scaled it up to handle our expected loads. Nice!
The rest of the interview can be pretty open ended.
Some interviewers will ask you to think about additional ways to improve your system. One common refinement: make the system more reliable by removing single points of failure. (Usually, this involves adding backup hardware.)
Other interviewers will want to see you think on the fly, so they'll introduce new constraints or ask for additional features. When this happens, repeat the four steps of the system design process—Functionality, Analysis, MVP, and Expansion—with your system as a starting point. Don't panic if you have to change things around to incorporate new features; that's a normal part of the design process.
Anything else to add to our Reddit?
Wrapping It Up
We've run through the four steps that usually guide system design interviews. Just remember "FAME":
- Functionality. (Brainstorming. What does my system need to do?)
- Analysis. (Number crunching. How big will my system be?)
- Minimum Viable Product. (Building blocks. What are the key components of my system?)
- Expansion. (Growth. How can my system scale to lots of users?)
Depending on your interviewer and the system you're building, you might work through each step methodically, or you might jump around and skip steps. Your interviewer will probably have some ideas of what they'd like to cover, so play it by ear and let them guide the conversation.
Have fun!