Just [[currentUser.getNumFreeQuestionsLeft()]] No more free questions left!

Upgrade Now

You want to build a word cloud, an infographic where the size of a word corresponds to how often it appears in the body of text.

To do this, you'll need data. Write code that takes a long string and builds its word cloud data in a dictionary , where the keys are words and the values are the number of times the words occurred.

Think about capitalized words. For example, look at these sentences:

'After beating the eggs, Dana read the next step:' 'Add milk and eggs, then add flour and sugar.'

What do we want to do with "After", "Dana", and "add"? In this example, your final dictionary should include one "Add" or "add" with a value of 2. Make reasonable (not necessarily perfect) decisions about cases like "After" and "Dana".

Assume the input will only contain words and standard punctuation.

You could make a reasonable argument to use regex in your solution. We won't, mainly because performance is difficult to measure and can get pretty bad.

Are you sure your code handles hyphenated words and standard punctuation?

Are you sure your code reasonably handles the same word with different capitalization?

Try these sentences:

'We came, we saw, we conquered...then we ate Bill's (Mille-Feuille) cake.' 'The bill came to five dollars.'

We can do this in runtime and space.

The final dictionary we return should be the only data structure whose length is tied to n.

We should only iterate through our input string once.

You must log in with one click to view the rest.

Once you're logged in, you'll get free full access to this and 4 other questions.

You must log in with one click to view the rest.

Once you're logged in, you'll get free full access to this and 4 other questions.

Runtime and memory cost are both . This is the best we can do because we have to look at every character in the input string and we have to return a dictionary of every unique word. We optimized to only make one pass over our input and have only one data structure.

  1. We haven't explicitly talked about how to handle more complicated character sets. How would you make your solution work with more unicode characters? What changes need to be made to handle silly sentences like these:

    I'm singing ♬ on a ☔ day.

    ☹ + ☕ = ☺.

  2. We limited our input to letters, hyphenated words and punctuation. How would you expand your functionality to include numbers, email addresses, twitter handles, etc.?
  3. How would you add functionality to identify phrases or words that belong together but aren't hyphenated? ("Fire truck" or "Interview Cake")
  4. How could you improve your capitalization algorithm?
  5. How would you avoid having duplicate words that are just plural or singular possessives?

You must log in with one click to view the rest.

Once you're logged in, you'll get free full access to this and 4 other questions.

What's next?

RUN
Code execution powered by Qualified.io

. . .