Architecture, interface, and searches

The six main corpora now have exactly the same architecture and interface.

Using our unique corpus architecture, users can:

  • Search by word, phrase, substring, part of speech (e.g. nouns or verbs), lemma (e.g. all forms of go: goes, went, etc), synonyms, customized wordlists, or any combination of these

  • See the individual frequency of all matching forms (as well as in each section of the corpus), or the overall frequency in each genre and time period

  • Find the collocates (nearby words) of a given word or phrase, which provides insight into the meaning of the word

  • Compare the collocates of two words, to see differences in meaning or usage (e.g. collocates of rob vs. steal, or warm vs hot, or men vs. women, or Democrats vs. Republicans)

  • Compare the collocates across time periods (provides insight into changes in meaning, such as new uses with green)

  • Compare the collocates across genres to show differences in 'word sense', e.g. chair = 'committee leader' (academic) vs. 'piece of furniture' (fiction)

  • Order results by Mutual Information score (shows 'relevance', in addition to raw frequency)

  • With integrated thesauruses, find the frequency and distribution of synonyms of a given word (to see which synonyms are most frequent, in which genres they are used most, which are increasing or decreasing in use, etc)

  • Create personalized lists of words and phrases (e.g. for a particular semantic field) and then re-use them as part of subsequent queries

  • Complete, context-sensitive help files and "guided tours" for each corpus

  • Save and re-use queries, as well as annotate and share your queries with others