CORPUS.BYU.EDU

seven online corpora | 45 - 425 million words each

corpora queries history researchers publications profile | register FAQ | questions contact us offline data

 
These corpora were created by Mark Davies, Professor of Linguistics at Brigham Young University. They have many different uses, including: finding out how native speakers actually speak and write; looking at language variation and change; finding the frequency of words, phrases, and collocates; and designing authentic language teaching materials and resources.

The corpora are used by more than 100,000 people each month (more than 200,000 visits), which makes them perhaps the most widely-used corpora currently available. They also serve as the basis for an increasing number of publications by researchers from throughout the world.

English

# words

language / dialect

time period

 compare to:

Corpus of Contemporary American English (COCA)

425 million

American English

1990-2011

Google, BNC, ANC, BoE


     Note that there are many new resources that are based on COCA, in addition to the regular COCA interface.
 

Corpus of Historical American English (COHA)

400 million

American English

1810-2009

Google Books, small corpora

TIME Magazine Corpus of American English

100 million

American English

1923-2006

 

BYU-BNC: British National Corpus*

100 million

British English

1980s-1993

COCA

N-grams

       

Google Book (American English) Corpus

155 billion

American English

1810-2009

Google Books (Standard)

Other languages

       

Corpus del Español

100 million

Spanish

1200s-1900s

CORDE and CREA

Corpus do Português

45 million

Portuguese

1300s-1900s

 

* Our architecture and interface to the BNC from OUP