When should you not use a list comprehension?
Two common cases where you shouldn't use a list comprehension are:
- You don't actually want a list
- The logic is too long
Case 1: You don't actually want a list
List comprehensions build lists, but that's not the only reason we use for loops. Sometimes we have functions or methods whose main purpose is their side effect, and they don't return anything meaningful.
Not only does this look more natural as a conventional for loop, it doesn't waste space creating a list that it just ignores.
Case 2: The logic is too long
One of the main benefits of list comprehensions is that they make your code shorter and clearer. Once you start packing too much into a single statement, it becomes harder to follow than a regular for loop.
While descriptive variable names went a long way into making this piece of code somewhat readable, it's still hard to understand.
This is clearer because we can:
- Include comments to explain the code.
- Use control flow keywords like continue.
- Debug this section of code more easily using logging statements or asserts.
- Easily see the complexity of the code by scanning down the lines, picking out for loops.
Slicing lists from the end
We want to run some analytics on our investments. To start, we're given a list containing the balance at the end of the day for some number of days. The last item in the list represents yesterday's closing balance and each previous item refers to the day before.
The first step in the process is to grab adjacent items in the list. To get started, we want a function that takes in our list of daily_balances and prints pairs of adjacent balances for the last 3 days:
We just hired a new intern, Dan, to help us with this but something doesn't seem to be working quite right. Here's his function:
What's his code is printing, and how can we fix it?
If we run his code:
this is what we see:
Everything worked fine on the first slice, but the second one is empty. What's going on?
The issue is that list slicing with negative indices can get tricky if we aren't careful. The first time through the loop, we take the slice daily_balances[-3:-1] and everything works as expected. However, the second time through the loop, we take the slice daily_balances[-2:0].
Our stop index, 0, isn't the end of our list, it's the first item! So, we just asked for a slice from the next-to-last item to the very first item, which is definitely not what we meant to do.
So, why didn't Python return daily_balances in reverse order, from the next-to-last item up through the first item? This isn't what we originally wanted, but wouldn't it make more sense than the empty list we got?
Python is happy to slice lists in reverse order, but wants you to be explicit so it knows unambiguously you want reverse slices. If we wanted a reverse slice, we need to use daily_balances[-2:0:-1] where the third parameter, -1, is the step argument, telling Python to reverse the order of the items it slices.
Essentially, when we asked for the slice daily_balances[-2:0] we asked for all the elements starting 2 from the end whose index was less than 0. That's an empty set of numbers, but not an error, so Python returns an empty slice.
So how do we fix this code?
Since daily_balances is just a regular list, the fix is simple—use positive indices instead:
Count Capital Letters
Write a one-liner that will count the number of capital letters in a file. Your code should work even if the file is too big to fit in memory.
Assume you have an open file handle object, such as:
Rest assured—there is a clean, readable answer!
Trying to pull a one-liner out of thin air can be daunting and error-prone. Instead of trying to tackle that head-on, let's work on understanding the framework of our answer, and only afterwards try to convert it into a one-liner. Our first thought might be to keep a running count as we look through the file:
This is a great start—the code isn't very long, but it is clear. That makes it easier to iterate over as we build up our solution. There are two main issues with what we have so far if we want to turn it into a one-liner:
- Variable initialization: In a one-liner, we won't be able to use something like count = 0
- Memory: The question says this needs to work even on files that won't fit into memory, so we can't just read the whole file into a string.
Let's try to deal with the memory issue first: we can't use the read method since that reads the whole file at once. There are some other common file methods, readlines() and readline(), that might help, so let's look at them.
- readlines is a method that reads a file into a list—each line is a different item in the list. That doesn't really help us—it's still reading the entire file at once, so we still won't have room.
readline only reads a single line at a time—it seems more promising. This is great from a memory perspective (let's assume each line fits in memory, at least for now).
But, we can't just replace read with readline because that only gives us the first line. We need to call readline over and over until we read the entire file.
The idea of repeatedly calling a function, such as readline, until we hit some value (the end of the file) is so common, there's a standard library function for it: iter. We'll need the two-argument form of iter, where the first argument is our function to call repeatedly, and the second argument is the value that tells us when to stop (also called the sentinel).
What value do we need as our sentinel? Looking at the documentation for readline, it includes the newline character so even blank lines will have at least one character. It returns an empty string only when it hits the end of the file, so our sentinel is ''.count = 0 for line in iter(fh.readline, ''): for character in line: if character.isupper(): count += 1And this works! But...it's not as clear as it could be. Understanding this code requires knowing about the two-argument iter and that readline returns '' at the end of the file. Trying to condense all this into a one-liner seems like it might be confusing to follow.
Is there a simpler way to iterate over the lines in the file? If you're using Python 3, there aren't any methods for that on your file handle. If you're using Python 2.7, there is something that sounds interesting—xreadlines. It iterates over the lines in a file, yielding each one to let us process it before reading the next line. In our code, it might be used like:
It's exactly like our code with readline and iter but even clearer! It's a shame we can't use this in Python3.x though, it seems like it would be great. Let's look at the documentation for this function to see if we can learn what alternatives Python3.x might have:
Huh? "returns self"—how does that even do anything?
What's happening here is that iterating over the lines of a file is so common that they built it right in to the object itself. If we use our file object in an iterator, it starts yielding us lines, just like xreadlines! So we can clean up our code, and make it Python3.x compatible, by just removing xreadlines.
Alright, we've finally solved the issue of efficiently reading the file and iterating over it, but we haven't made any progress on making it a one-liner. As we said in the beginning, we can't initialize variables, so what we need is a function that will just return the count of all capitalized letters. There isn't a count() function in Python (at least, not one that would help us here), but we can rephrase the question just enough to find a function that gets the job done.
Instead of thinking about a "count of capitalized letters", let's think about mapping every letter (every character, even) to a number, since our answer is a number. All we care about are capital letters, and each one adds exactly 1 to our final count. Every other character should be ignored, or add 0 to our final count. We can get this mapping into a single line using Python's inline if-else:
What did this mapping get us? Well, Python didn't have a function to count capital letters, but it does have a function to add up a bunch of 1s and 0s: sum.
sum takes any iterable, such as a generator expression, and our latest solution—nested for loops and a single if-else—can easily be rewritten as a generator expression:
and now we've got a one-liner! It's not quite as clear as it could be—seems unnecessary to explicitly sum 0 whenever we have a character that isn't a capital letter. We can filter those out:
or we can even take advantage of the fact that Python will coerce True to 1 (and False to 0):
Best in Subclass
When I was younger, my parents always said I could have pets if I promised to take care of them. Now that I'm an adult, I decided the best way to keep track of them is with some Python classes!
Since these pets won't sit still long enough to be put into a list, I need to keep track with the class attribute num_pets.
That should be enough to get me started. Let's create a few pets:
and see what they have to say:
Hmm... I'm not getting the output I expect. What did these two lines print, and how do we fix it?
When we run these lines we get:
Something isn't right—it's not counting the number of pets properly. Don't worry Rover, I didn't replace you with Spot! What's happening here?
Turns out, there's the difference between class and instance- attributes. When we created rover and added to num_pets, we accidentally shadowed Pet.num_pets with rover.num_pets—and they're two completely different variables now! We can see this even more clearly if we ask our Pet class how many pets it knows about:
Our Pet class still thinks there are 0 pets, because each new pet adds 1 and shadows the class attribute num_pets with its own instance attribute.
So how can we fix this? We just need to make sure we refer to, and increment, the class attribute:
and now, if we run our updated code:
we get what we wanted:
When is a number not itself?
Given some simple variables:
What's the output we get from running the following?
The Python keyword is tests whether two variables refer to the exact same object, not just if they are equal. For example:
Now let's look at our original questions:
This should make sense—we created two different objects that each hold a number, so while they happen to hold the same value, they aren't referring to the same object in memory.
And for small numbers?
What's happening here? Python creates singletons for the most commonly used integers. One good reason to do this is that small numbers get used so frequently that if Python had to create a brand new object every time it needed a number, and then free the object when it goes out of scope, it would start to actually take a noticeable amount of time.
The idea behind singletons is that there can only ever be one instance of a particular object, and whenever someone tries to use or create a new one, they get the original.
So, which numbers count as "small numbers"? We can write a quick bit of code to test this out for us:
Python makes singletons for the numbers 0 through 256. But are there other numbers made into singletons? It would make sense that some negative numbers might be worth making only once—it's pretty common to look at, say, the last few characters in a string, or the last few elements in a list. We can run the same code but check negative numbers instead:
This shows that numbers from -5 up to and including 256 have singleton instances, so they could be tested against each other with is. But we shouldn't do that, since the next version of Python might change the range of singleton numbers. And any code that took advantage of these singletons won't be able to check for equality for numbers outside this range. A simple equality check with numbers is always safest.
Here's a hint
Here at Interview Cake, we've decided to keep all our interview questions inside Python dictionaries. We have a default template to get us started:
and a function to help us populate new questions:
Then we added a few questions (abbreviated for simplicity):
What did we do wrong? How can we fix it?
Things start going wrong after question 1. Let's look at question 2 and see what we got:
Question 2 wasn't supposed to have any hints! And question 3 is even more jumbled:
It has hints from question 1 and its own hints! What's going on here?
The problem all stems from how we used our question_template—there's nothing wrong with the template itself, but when we call question_template.copy(), we're making a shallow copy of our dictionary. A shallow copy basically does this:
All of the items, keys and values, refer to the exact same objects after making a shallow copy. The issue arises with our hints because it's a list, which is mutable. Our copy of question_template points to the same exact object as the hints in our template! We can see this if we just print out our template:
One way around this problem would be to overwrite the list of hints every time:
This works for our simple dictionary here, since we know the only mutable element is the hints variable. However, this can get more confusing if we had a deeper structure to copy over, such as a list of dicts of lists (of lists of...). That's actually pretty common when working on REST APIs that return giant nested JSON dictionaries.
A more general solution is to ensure that any mutable objects have true copies created, rather than just passing along a "reference" to the original object. And if that mutable object is a container, then any of its mutable elements need to make true copies, and so on, recursively. Rolling this code by hand would be error-prone and tedious—luckily, Python has a standard library function that can help: deepcopy.
We can just drop that in where our shallow copy was:
Now, the list of hints in each new question will be a brand new list, so changes to it won't affect other questions or the template question.
Generating the Matrix
We want to build a matrix of values, like a multiplication table. The output we want in this case should be a list of lists, like:
Trying to keep our code clean and concise, we've come up with a matrix generator:
But our output isn't what we expected. Can you figure out what we got instead, and how to fix it?
Instead of 3 lists with 3 elements, if we run the code above we get:
The reason we didn't get what we expected is because our iterator is a generator. Generators in Python have an interesting property—they create values lazily, which allows them to save space. For this same reason though, they only create each value once. After an element has been yielded by a generator, there's no way to go back and get that value again.
It's easiest to see what happens when we walk through this code step by step. Inside our list comprehension, we have nested for loops:
- Start with the outer for loop, which is for x in iterator. This is the first time we've called the iterator, so we get x = 1
- Go inside the inner list comprehension to reach our inner for loop, for y in iterator. Since we already called our iterator once before, we get y = 2
- Do our computation, x * y = 2, which is the first value in our first list
- Keep working through the inner for loop, yielding the next value so that y = 3
- Do our computation, where our x is still 1 so x * y = 3, which is the second value in our first list
- Try to keep working through the inner for loop, and that loop ends, since there are no more values in our iterator
- Pop back up to the outer for loop, but the iterator is still empty, so we finish our list comprehension
and that's how we end up with only 2 values in our matrix.
The simplest way to fix our code in this case is to not use a generator. We just have to make our iterator into a list: