Google Books American English Corpus

I want to share a fascinating resource I found: the Google Books corpus of American English, 155 billion words in size. Obviously too big to download, so for analysis purposes you’re limited to what you can do via the website at Brigham Young University. The easy thing to do is type in a word or phrase and see its frequency by decade, going back to the 1810s. Want to know when the phrase “Think Outside the Box” first appeared in a book? You can get a pretty good idea through this resource. Would you have guessed that the word “originary” goes back to the 1820s, if not farther? I wouldn’t have. The interface allows you to look for collocates (words that go with other words), view charts showing relative word frequency in the corpus by decade, handles parts of speech, and gives you various limits and display options. Other kinds of analysis that might be done with text corpora can’t be done through the interface. Their Corpus of Contemporary American English allows more sophisticated querying. BYU has several other language corpora available online in this format. Fun with words!!