But it doesn't have to end here! Sign up for the 7-day coding interview crash course and you'll get a free Interview Cake problem every week.
You're in!
Find a duplicate, Space Edition™ BEAST MODE
In Find a duplicate, Space Edition™, we were given a vector of integers where:
These properties mean the vector must have at least 1 duplicate. Our challenge was to find a duplicate number without modifying the input and optimizing for space. We used a divide and conquer approach, iteratively cutting the vector in half to find a duplicate integer in time and space (sort of a modified binary search).
But we can actually do better. We can find a duplicate integer in time while keeping our space cost at .
This is a tricky one to derive (unless you have a strong background in graph theory), so we'll get you started:
Imagine each item in the vector as a node in a linked list. In any linked list, each node has a value and a "next" pointer. In this case:
Here’s a full example:
Notice we're using "positions" and not "indices." For this problem, we'll use the word "position" to mean something like "index," but different: indices start at 0, while positions start at 1. More rigorously: position = index + 1.
Using this, find a duplicate integer in time while keeping our space cost at . Just like before, don't modify the input.
Drawing pictures will help a lot with this one. Grab some paper and pencil (or a whiteboard, if you have one).
We don’t need any new data structures. Your final space cost must be .
We can do this without destroying the input.
Here are a few sample vectors. Try drawing them out as linked lists:
Look for patterns. Then think about how we might use those patterns to find a duplicate number in our vector.
When a value is repeated, how will that affect the structure of our linked list?
If two nodes have the same value, their next pointers will point to the same node!
So if we can find a node with two incoming next pointers, we know the position of that node is a duplicate integer in our vector.
For example, if there are two 2s in our vector, the node in the 2nd position will have two incoming pointers.
Alright, we’re on to something. But hold on—creating a linked list would take space, and we don’t want to change our space cost from .
No problem—turns out we can just think of the vector as a linked list, and traverse it without actually creating a new data structure.
If you're stuck on figuring out how to traverse the vector like a linked list, don't sweat it too much. Just use a real linked list for now, while we finish deriving the algorithm.
Ok, so we figured out that the position of a node with multiple incoming pointers must be a duplicate. If we can find a node with multiple incoming pointers in a constant number of walks through our vector, we can find a duplicate value in time.
How can we find a node with multiple incoming pointers?
Let's look back at those sample vectors and their corresponding linked lists, which we drew out to look for patterns:
Are there any patterns that might help us find a node with two incoming pointers?
Here’s a pattern: the last node never has any incoming pointers. This makes sense—since the vector has a length n + 1 and all the values are n or less, there can never be a pointer to the last position. If n is 5, the length of the vector is 6 but there can’t be a value 6 so no pointer will ever point to the 6th node. Since it has no incoming pointers, we should treat the last position in our vector as the "head" (starting point) of our linked list.
Here's another pattern: there’s never an end to our list. No pointer ever points to nullptr. Every node has a value in the range 1..n, so every node points to another node (or to itself). Since the list goes on forever, it must have a cycle (a loop). Here are the cycles in our example lists:
Can we use these cycles to find a duplicate value?
If we walk through our linked list, starting from the head, at some point we will enter our cycle. Try tracing that path on the example lists above. Notice anything special about the first node we hit when we enter the cycle?
The first node in the cycle always has at least two incoming pointers!
We're getting close to an algorithm for finding a duplicate value. How can we find the beginning of a cycle?
Again, drawing an example is helpful here:
If we were traversing this list and wanted to know if we were inside a cycle, that would be pretty easy—we could just remember our current position and keep stepping ahead to see if we get to that position again.
But our problem is a little trickier—we need to know the first node in the cycle.
What if we knew the length of the cycle?
If we knew the length of the cycle, we could use the “stick approach” to start at the head of the list and find the first node. We use two pointers. One pointer starts at the head of the list:
Then we lay down a “stick” with the same length as the cycle, by starting the second pointer at the end. So here, for example, the second pointer is starting 4 steps ahead because the cycle has 4 steps:
Then we move the stick along the list by advancing the two pointers at the same speed (one node at a time).
When the first pointer reaches the first node in the cycle, the second pointer will have circled around exactly once. The stick wraps around the cycle, and the two ends are in the same place: the start of the cycle.
We already know where the head of our list is (the last position in the list) so we just need the length of the cycle. How can we find the length of a cycle?
If we can get inside the cycle, we can just remember a position and count how many steps it takes to get back to that position.
How can we make sure we’ve gotten inside a cycle?
Well, there has to be a cycle in our list, and at the latest, the cycle is just the last node we hit as we traverse the list from the head:
So we can just start at the head and walk n steps. By then we'll have to be inside a cycle.
Alright, we’ve pieced together an entire strategy to find a duplicate integer! Working backward:
Can you implement this? And remember—we won't want to actually create a linked list. Here's how we can traverse our vector as if it were a linked list.
To get inside a cycle (step E above), we identify n, start at the head (the node in position n + 1), and walk n steps.
Now we’re guaranteed to be inside a cycle. To find the cycle’s length (D), we remember the current position and step ahead until we come back to that same position, counting the number of steps.
Now we have the head and the length of the cycle. We need to find the first node in the cycle (C). We set up 2 pointers: 1 at the head, and 1 ahead as many steps as there are nodes in the cycle. These two pointers form our "stick."
Alright, we just need to find to the first node in the cycle (B), and return a duplicate value (A)!
We treat the input vector as a linked list like we described at the top in the problem.
To find a duplicate integer:
We want to think of our vector as a linked list but we don't want to actually use up all that space, so we traverse our vector as if it were a linked list by converting positions to indices.
time and space.
Our space cost is because all of our additional variables are integers, which each take constant space.
For our runtime, we iterate over the array a constant number of times, and each iteration takes time in its worst case. So we traverse the linked list more than once, but it's still a constant number of times—about 3.
There another approach using randomized algorithms that is time and space. Can you come up with that one? (Hint: You'll want to focus on the median.)
This one's pretty crazy. It's hard to imagine an interviewer expecting you to get all the way through this question without help.
But just because it takes a few hints to get to the answer doesn't mean a question is "too hard." Some interviewers expect they'll have to offer a few hints.
So if you get a hint in an interview, just relax and listen. The most impressive thing you can do is drop what you're doing, fully understand the hint, and then run with it.
Reset editor
Powered by qualified.io