Dijkstra's algorithm finds the shortest path from one node to all other nodes in a weighted graph.
Say we had the following graph, which represents the travel cost between different cities in the southeast US:
Traveling from Memphis to Nashville? The cheapest route isn't to go straight from one to the other! (You'll actually want to go via New Orleans, Mobile, and Atlanta for the cheapest path.)
This sort of thing can happen in real life. Lots of airlines like to route through large hubs and might not have much service directly between two smaller cities.
Dijkstra's algorithm lets us find the cheapest route from one city to all other cities. So for Memphis:
What if we only want the cheapest path between one pair of nodes? Just pick one of them as the starting point and run Dijkstra's algorithm. Finding the cheapest path to all nodes includes finding the cheapest path to the other node in the pair.
But isn't Dijkstra's algorithm overkill if we only care about one pair of nodes? Actually no, because we'll still need to consider other nodes in the graph to make sure we've found the lowest-cost weighted path. (Take a look at going from Memphis to Nashville, where the cheapest path includes over half the nodes in the graph.)
Dijkstra's algorithm is like breadth-first search (BFS), except we use a priority queue instead of a normal first-in-first-out queue. Each item's priority is the cost of reaching it.
Let's work through an example before coding it up. We'll use our graph of cities from before, starting at Memphis.
Since we're already at Memphis, the cost to get there is 0. We don't know how to get anywhere else though, so those costs are \infty. We'll store all these costs in a table and initialize our priority queue.
First up in our queue, we've got Memphis. As with BFS, we'll start by looking at its neighbors.
Looking at its neighbors, we can get to Atlanta (at a cost of 10), Mobile (cost is 7), Nashville (cost is 15) and New Orleans (cost is 3). So we'll update those travel costs in our table and our priority queue.
Here's what our table and priority queue look like now. To help keep track of things, we'll put a check mark next to nodes we've dequeued. So far, it's just Memphis.
What next? Repeat!
Next in our priority queue we've got New Orleans, at a cost of 3. Let's remove it from the priority queue and examine its neighbors:
- We've already dequeued Memphis, so we can skip it.
- From New Orleans, we can get to Mobile at a cost of 3. Since it takes 3 to get from Memphis to New Orleans, the total cost of this path is 6. That's cheaper than the path we found earlier (where we went straight from Memphis to Mobile at a cost of 7), so we need to update its cost.
We'll also record the fact that we got to Mobile "via" New Orleans. This'll be important for figuring out which cities our cheapest route passes through at the end. (We could skip this step if we just wanted to calculate the lowest price, not the actual path.)
Now we'll repeat the process again. The next city in our priority queue is Mobile at a cost of 6.
Looking at its neighbors:
- Memphis and New Orleans have already been dequeued, so we can skip them.
- We can get to Atlanta (2) via Mobile (6), for a total cost of 8. That's cheaper than our current path to Atlanta (10, via Memphis).
- We can get to Savannah (6) via Mobile (6), for a total cost of 12. That's cheaper than our current cost (\infty, since we don't have a path yet).
Updating our priority queue and table:
Next up, Atlanta, at a cost of 8:
- We can get to Savannah (1) via Atlanta (8), for a total cost of 9. That's cheaper than our current route via Mobile, which cost 12.
- We can get to Nashville (2) via Atlanta (8), for a total cost of 10. That's cheaper than our current route via Memphis, which cost 15.
- Memphis and Mobile have already been dequeued, so we can skip them.
Once we dequeue Atlanta, we know that we won't find a cheaper path later on. So, even though there are a few different ways to get to Atlanta, we don't have to consider the cost of all those paths; we just care about the cost of the shortest one.
Here's how things look now:
Almost there. Next city is Savannah, at a cost of 9.
We've already dequeued both of Savannah's neighbors, so we don't need to do anything.
Finally, our last city is Nashville, at a cost of 10.
Again, both its neighbors have already been dequeued. We're done!
Here's what our final table looks like:
Our table above stores the cost of getting to any city from Memphis, but it doesn't store the actual path. To find it, we'd start at that city and use the Via column to backtrack through the graph until we get to Memphis. This is the same way you recover paths after doing a breadth-first search.
Here's an implementation in code. We'll split our table of data across a few different hash maps and a hash set.
Our implementation doesn't keep track of how we reached each node, and we don't recover the cheapest paths. Take a stab at adding that feature to the implementation!
Time and Space Complexity
What's the time complexity?
- Our initialization involves a constant amount of work per node, plus inserting nodes into priorityQueue will take time. (Priority queues are built on heaps, which have insertions) That's time overall for all the initialization work.
- We'll go through the loop times, and each time we call removeMin, taking time (assuming a heap-based priority queue). Over the entire algorithm, that's time.
- We'll update costInQueue once per edge, or times. Each priority queue update costs time. That's time overall.
Putting all the steps together, the time complexity for Dijkstra's algorithm is . Sometimes, this complexity is written .
What about space complexity? All our data structures hold a constant amount of data for each node in the graph. That's space in total.