Quick reference
Complexiy | |
---|---|
Time—worst case | |
Time—best case | |
Time—average case | |
Space |
Strengths:
- Linear time. Counting sort runs in time, making it asymptotically faster than comparison-based sorting algorithms like quicksort or merge sort.
Weaknesses:
- Restricted inputs. Counting sort only works when the range of potential items in the input is known ahead of time.
- Space cost. If the range of potential values is big, then counting sort requires a lot of space (perhaps more than ).
How It Works
Counting sort works by iterating through the input, counting the number of times each item occurs, and using those counts to compute an item's index in the final, sorted list.
Counting the frequency of each item
Say we have this list:
[4, 8, 4, 2, 9, 9, 6, 2, 9]
And say we know all the numbers in our list will be whole numbers between 0 and 10 (inclusive).
The idea is to count how many 0's we see, how many 1's we see, and so on. Since there are 11 possible values, we'll use a list with 11 counters, all initialized to 0.
Couldn't we use a dictionary instead? We could, but since we're working with items that can easily be mapped to list indices, using a list is a bit more lightweight. Remember: dictionaries are built on top of lists.
Counts: [ 0, 0, 0, . . . 0, 0 ]
0's 1's 2's 9's 10's
So, we'll iterate through the input once. The first item is a 4, so we'll add one to counts[4]. The next item is an 8, so we'll add one to counts[8]. When we reach the end, we'll have the total counts for each number:
Input: [4, 8, 4, 2, 9, 9, 6, 2, 9]
Counts: [ 0, 0, 2, 0, 2, 0, 1, 0, 1, 3, 0]
0's 1's 2's 3's 4's 5's 6's 7's 8's 9's 10's
Build the sorted list
Now that we know how many times each item appears, we can fill in our sorted list.
Sorted: [_, _, _, _, _, _, _, _, _]
Looking at counts, we don't have any 0's or 1's, but we've got two 2's. So, those go at the start of our sorted list.
Sorted: [2, 2, _, _, _, _, _, _, _]
No 3's, but there are two 4's that come next.
Sorted: [2, 2, 4, 4, _, _, _, _, _]
After that, we have one 6,
Sorted: [2, 2, 4, 4, 6, _, _, _, _]
one 8,
Sorted: [2, 2, 4, 4, 6, 8, _, _, _]
and three 9's
Sorted: [2, 2, 4, 4, 6, 8, 9, 9, 9]
And, with that, we're done!
Handling Non-Numeric Items
What if the items in our input aren't simple numbers that we can extract from our counts list?
As an example, what if we had a list of dessert objects, and we wanted to sort them by price?
Input: [(type: macadamia nut cookie, price: 4), (type: double chocolate cake, price: 8),
(type: chocolate chip cookie, price: 4), (type: sugar cookie, price: 2),
(type: creme brulee, price: 9), (type: chocolate souflee, price: 9),
(type: fruit tart, price: 6), (type: brownie, price: 2), (type: eclair, price: 9)]
If we went through and counted up all the prices, we'd end up with the same counts list.
Counts: [ 0, 0, 2, 0, 2, 0, 1, 0, 1, 3, 0]
0's 1's 2's 3's 4's 5's 6's 7's 8's 9's 10's
But when we go to add the 2's into our sorted list, we need to get the actual dessert objects.
There are a few ways we could do this:
- Iterate through the input to find both of the desserts that cost $2. That takes time for each cost, so time overall.
- Build a dictionary mapping prices to desserts. (Maybe this could even replace counts.) But this would basically be making an extra copy of the input, and it'd be space.
We can do this in time without copying the input.
Building a Next Index list
Using our counts list, we can pre-compute where each item from the input should go. Then we'll just iterate through the input, using the pre-computed indices to place each item in the right spot.
Take macadamia nut cookie for example: its price is $4, so it has to come after everthing with a lower price. Looking at counts, there are two cheaper desserts.
Counts: [ 0, 0, 2, 0, 2, 0, 1, 0, 1, 3, 0]
2 <===============|
How did we get that? We added up all the entries that appeared to the left of counts[4].
So, we know that (type: macadamia nut cookie, price: 4) has to come at index 2 in our sorted list:
Sorted: [_, _, (type: macadamia nut cookie, price: 4), _, _, _, _, _, _]
What about the next item: (type: double chocolate cake, price: 8)? With a price of $8, it'll come at index 5.
Counts: [ 0, 0, 2, 0, 2, 0, 1, 0, 1, 3, 0]
5 <==============================|
Sorted: [_, _, (type: macadamia nut cookie, price: 4), _, _,
(type: double chocolate cake, price: 8), _, _, _]
Let's do one more item: (type: chocolate chip cookie, price: 4).
Like the first item, its price is $4. We can't put it at index 2 though, since we've already put our first dessert there. So, instead, we'll put it in the next free index, which is 3:
Sorted: [_, _, (type: macadamia nut cookie, price: 4), (type: chocolate chip cookie, price: 4), _,
(type: double chocolate cake, price: 8), _, _, _]
Taking this idea, we'll use our counts list to build up another list, next_index, that will track where the next occurrance of a price goes in our sorted output. For instance, next_index[4] would hold the index for the next item with a price of $4.
We can initialize next_index from our counts list.
Nothing costs $0 or $1, so we'll set those to 0. The lowest price is $2, so the $2 items will start at index 0.
Counts: [0, 0, 2, 0, 2, 0, 1, 0, 1, 3, 0]
Next Index: [0, 0, 0, _, _, _, _, _, _, _, _]
Items that cost $3 go after all the items that cost $2. Two items in the list cost $2, so they'll take up indices 0 and 1. That means the first $3 item would go at index 2.
Counts: [0, 0, 2, 0, 2, 0, 1, 0, 1, 3, 0]
Next Index: [0, 0, 0, 2, _, _, _, _, _, _, _]
Notice how next_index[3] = next_index[2] + counts[2]. That makes sense, because we're taking the starting point for the $2 items and moving forward to make room for each of them. In general:
We can keep iterating through counts using this formula to fill in next_index.
Counts: [0, 0, 2, 0, 2, 0, 1, 0, 1, 3, 0]
Next Index: [0, 0, 0, 2, 2, 4, 4, 5, 5, 6, 9]
Building Our Sorted list With next_index
Our first item (type: macadamia nut cookie, price: 4) has a price of $4. Since next_index[4] is 2, we know it goes at index 2.
Sorted: [_, _, (type: macadamia nut cookie, price: 4), _, _, _, _, _, _]
And, since we've placed something at index 2, we now know that the next item that costs $4 goes after it, at index 3. So we'll increment next_index[4].
Next Index: [0, 0, 0, 2, 3, 4, 4, 5, 5, 6, 9]
Next up we've got (type: double chocolate cake, price: 8). Since next_index[8] is 5, it goes at index 5, and we increment next_index[8].
Sorted: [_, _, (type: macadamia nut cookie, price: 4), _, _,
(type: double chocolate cake, price: 8), _, _, _]
Next Index: [0, 0, 0, 2, 3, 4, 4, 5, 6, 6, 9]
We can keep iterating through the input:
Sorted: [_, _, (type: macadamia nut cookie, price: 4), (type: chocolate chip cookie, price: 4), _,
(type: double chocolate cake, price: 8), _, _, _]
Next Index: [0, 0, 0, 2, 4, 4, 4, 5, 6, 6, 9]
Sorted: [(type: sugar cookie, price: 2), _, (type: macadamia nut cookie, price: 4),
(type: chocolate chip cookie, price: 4), _,
(type: double chocolate cake, price: 8), _, _, _]
Next Index: [0, 0, 1, 2, 4, 4, 4, 5, 6, 6, 9]
Sorted: [(type: sugar cookie, price: 2), _, (type: macadamia nut cookie, price: 4),
(type: chocolate chip cookie, price: 4), _,
(type: double chocolate cake, price: 8), (type: creme brulee, price: 9), _, _]
Next Index: [0, 0, 1, 2, 4, 4, 4, 5, 6, 7, 9]
And so on, until we're done!
Implementation
Here's how we'd code it up:
What if the values could be negative? Or the smallest value was 50?
Our implementation assumes that all of the items are between 0 and some maximum. But it's pretty simple to extend the algorithm to handle any sort of range of integers. Give it a try. :)
Complexity
Counting sort takes time and space, where n is the number of items we're sorting and k is the number of possible values.
We iterate through the input items twice—once to populate counts and once to fill in the output list. Both iterations are time. Additionally, we iterate through counts once to fill in next_index, which is time.
The algorithm allocates three additional lists: one for counts, one for next_index, and one for the output. The first two are space and the final one is space.
You can actually combine counts and next_index into one list. No asymptotic changes, but it does save space.
In many cases cases, k is (i.e.: the number of items to be sorted is not asympototically different than the number of values those items can take on. Because of this, counting sort is often said to be time and space.
Interview coming up?
Get the free 7-day email crash course. You'll learn how to think algorithmically, so you can break down tricky coding interview questions.
No prior computer science training necessary—we'll get you up to speed quickly, skipping all the overly academic stuff.
No spam. One-click unsubscribe whenever.
You're in! Head over to your email inbox right now to read day one!