Topological Sort Algorithm

The topological sort algorithm takes a directed graph and returns an array of the nodes where each node appears before all the nodes it points to.

The ordering of the nodes in the array is called a topological ordering.

Here's an example:

The messy directed graph where node 1 points to nodes 2 and 3. Node 2 points to node 4 and 5, and node 4 points to node 5.

Since node 1 points to nodes 2 and 3, node 1 appears before them in the ordering. And, since nodes 2 and 3 both point to node 4, they appear before it in the ordering.

The topologically sorted directed graph where node 1 points to nodes 2 and 3. Node 2 points to node 4 and 5, and node 4 points to node 5.

So [1, 2, 3, 4, 5] would be a topological ordering of the graph.

Can a graph have more than one valid topological ordering? Yep! In the example above, [1, 3, 2, 4, 5] works too.

Cyclic Graphs

Look at this directed graph with a cycle:

A points to B, B points to C and C points to D which points to B.

The cycle creates an impossible set of constraints—B has to be before and after D in the ordering.

As a rule, cyclic graphs don't have valid topological orderings.

The Algorithm

How can we produce a topological ordering for this directed graph?

An example of the directed graph with five nodes linked between themselves.

Well, let's focus on the first node in the topological ordering. That node can't have any incoming directed edges; it must have an indegree of zero.

Why?

Because if it had incoming directed edges, then the nodes pointing to it would have to come first.

The same example of the directed graph with bolded first node which has two possible incoming edges pointed to him.

So, we'll find a node with an indegree of zero and add it to the topological ordering.

The same example of the directed graph with bolded first node.

That covers the first node in our topological ordering. What about the next one?

Once a node is added to the topological ordering, we can take the node, and its outgoing edges, out of the graph.

An example of the directed graph with removed first node.

Then, we can repeat our earlier approach: look for any node with an indegree of zero and add it to the ordering.

This is a common algorithm design pattern:

Figure out how to get the first thing.
Remove the first thing from the problem.
Repeat.

Here's what this looks like on our graph. We'll grab a node with an indegree of 0, add it to our topological ordering and remove it from the graph:

An example of the directed graph with removed first node and bolded the second element.

and repeat

An example of the directed graph with removed two nodes.

and repeat

An example of the directed graph with removed three nodes.

until we're

An example of the directed graph with removed four nodes.

out of nodes.

Note: this isn't the only way to produce a topological ordering.

Implementation

We'll use the strategy we outlined above:

Identify a node with no incoming edges.
Add that node to the ordering.
Remove it from the graph.
Repeat.

We'll keep looping until there aren't any more nodes with indegree zero. This could happen for two reasons:

There are no nodes left. We've taken all of them out of the graph and added them to the topological ordering.
There are some nodes left, but they all have incoming edges. This means the graph has a cycle, and no topological ordering exists.

One small tweak. Instead of actually removing the nodes from the graph (and destroying our input!), we'll use a dictionary to track each node's indegree. When we add a node to the topological ordering, we'll decrement the indegree of that node's neighbors, representing that those nodes have one fewer incoming edges.

Let's code it up!

enum TopologicalSortError: String, Error, CustomStringConvertible { case cycleInGraph = "Graph has a cycle! No topological ordering exists." var description: String { return self.rawValue } } func topologicalSort(digraph: [String : [String]]) throws -> [String] { // digraph is a dictionary: // key: a node // value: an array of adjacent neighboring nodes // construct a dictionary mapping nodes to their // indegrees var indegrees: [String: Int] = [:] for (node, _) in digraph { indegrees[node] = 0 } for (_, neighbors) in digraph { for neighbor in neighbors { indegrees[neighbor]! += 1 } } // track nodes with no incoming edges var nodesWithNoIncomingEdges: [String] = [] for (node, _) in digraph { if indegrees[node] == 0 { nodesWithNoIncomingEdges.append(node) } } // initially, no nodes in our ordering var topologicalOrdering: [String] = [] // as long as there are nodes with no incoming edges // that can be added to the ordering while nodesWithNoIncomingEdges.count > 0 { // add one of those nodes to the ordering let node = nodesWithNoIncomingEdges.removeLast() topologicalOrdering.append(node) // decrement the indegree of that node's neighbors for neighbor in digraph[node]! { indegrees[neighbor]! -= 1 if indegrees[neighbor]! == 0 { nodesWithNoIncomingEdges.append(neighbor) } } } // we've run out of nodes with no incoming edges // did we add all the nodes or find a cycle? guard topologicalOrdering.count == digraph.count else { throw TopologicalSortError.cycleInGraph } return topologicalOrdering }

Time and Space Complexity

Breaking the algorithm into chunks, we:

Determine the indegree for each node. This is time (where M is the number of edges), since this involves looking at each directed edge in the graph once.
Find nodes with no incoming edges. This is a simple loop through all the nodes with some number of constant-time appends. time (where N is the number of nodes).
Add nodes until we run out of nodes with no incoming edges. This loop could run once for every node— times. In the body, we:

Do two constant-time operations on an array to add a node to the topological ordering.
Decrement the indegree for each neighbor of the node we added. Over the entire algorithm, we'll end up doing exactly one decrement for each edge, making this step time overall.

Check if we included all nodes or found a cycle. This is a fast comparison.

All together, the time complexity is .

That's the fastest time we can expect, since we'll have to look at all the nodes and edges at least once.

What about space complexity? Here are the data structures we created:

indegrees—this has one entry for each node in the graph, so it's space.
nodesWithNoIncomingEdges—in a graph with no edges, this would start out containing every node, so it's space in the worst case.
topologicalOrdering—in a graph with no cycles, this will eventually have every node. space.

All in all, we have three structures and they're all space. Overall space complexity: .

This is the best space complexity we can expect, since we must allocate a return array which costs space itself.

Uses

The most common use for topological sort is ordering steps of a process where some the steps depend on each other.

As an example, when making chocolate bundt cake,

The ingredients have to be mixed before going in the bundt pan.
The bundt pan has to be greased and floured before the batter can be poured in.
The oven has to be preheated before the cake can bake.
The cake has to be baked before it cools.
The cake has to cool before it can be iced.

While we've chosen a fun example here, the same logic applies to any set of tasks with dependencies, like building components in a large software project, performing data analysis in Map-Reduce job, or bringing up hardware components at boot time. (For example, the mother board has to initialize the hard drive before the BIOS tries to load the bootloader from disk.)

This process can be represented as a directed graph. Each step is a node, and we'll add directed edges between nodes to represent that one step has to be done before another.