Word Cloud Data (Practice Interview Question)

You only have free questions left (including this one).

But it doesn't have to end here! Sign up for the 7-day coding interview crash course and you'll get a free Interview Cake problem every week.

You're in!

You want to build a word cloud, an infographic where the size of a word corresponds to how often it appears in the body of text.

To do this, you'll need data. Write code that takes a long string and builds its word cloud data in a dictionary, where the keys are words and the values are the number of times the words occurred.

Think about capitalized words. For example, look at these sentences:

'After beating the eggs, Dana read the next step:' 'Add milk and eggs, then add flour and sugar.'

What do we want to do with "After", "Dana", and "add"? In this example, your final dictionary should include one "Add" or "add" with a value of 2. Make reasonable (not necessarily perfect) decisions about cases like "After" and "Dana".

Assume the input will only contain words and standard punctuation.

You could make a reasonable argument to use regex in your solution. We won't, mainly because performance is difficult to measure and can get pretty bad.

Are you sure your code handles hyphenated words and standard punctuation?

Are you sure your code reasonably handles the same word with different capitalization?

Try these sentences:

"We came, we saw, we conquered...then we ate Bill's (Mille-Feuille) cake." 'The bill came to five dollars.'

We can do this in runtime and space.

The final dictionary we return should be the only data structure whose length is tied to n.

We should only iterate through our input string once.

Start your free trial!

Where do I enter my password?

Actually, we don't support password-based login. Never have. Just the OAuth methods above. Why?

It's easy and quick. No "reset password" flow. No password to forget.
It lets us avoid storing passwords that hackers could access and use to try to log into our users' email or bank accounts.
It makes it harder for one person to share a paid Interview Cake account with multiple people.

Start your free trial!

Where do I enter my password?

Actually, we don't support password-based login. Never have. Just the OAuth methods above. Why?

It's easy and quick. No "reset password" flow. No password to forget.
It lets us avoid storing passwords that hackers could access and use to try to log into our users' email or bank accounts.
It makes it harder for one person to share a paid Interview Cake account with multiple people.

Runtime and memory cost are both . This is the best we can do because we have to look at every character in the input string and we have to return a dictionary of every unique word. We optimized to only make one pass over our input and have only one data structure.

We haven't explicitly talked about how to handle more complicated character sets. How would you make your solution work with more unicode characters? What changes need to be made to handle silly sentences like these:

I'm singing ♬ on a ☔ day.

☹ + ☕ = ☺.
We limited our input to letters, hyphenated words and punctuation. How would you expand your functionality to include numbers, email addresses, twitter handles, etc.?
How would you add functionality to identify phrases or words that belong together but aren't hyphenated? ("Fire truck" or "Interview Cake")
How could you improve your capitalization algorithm?
How would you avoid having duplicate words that are just plural or singular possessives?

Start your free trial!

Where do I enter my password?

Actually, we don't support password-based login. Never have. Just the OAuth methods above. Why?

It's easy and quick. No "reset password" flow. No password to forget.
It lets us avoid storing passwords that hackers could access and use to try to log into our users' email or bank accounts.
It makes it harder for one person to share a paid Interview Cake account with multiple people.

Ready for more?

Check out our full course

You only have free questions left (including this one).

Start your free trial!

Start your free trial!

Start your free trial!

Ready for more?

Programming interview questions by company:

Programming interview questions by topic: