THE SALAMANCA CORPUS:

DIGITAL ARCHIVE OF ENGLISH DIALECT TEXTS.

Copyright © 2022-DING, The Salamanca Corpus, Universidad de Salamanca

The Project










Last Updated


3rd September 2022


568 Texts & Counting


[Title List]


Word Count



14,237,559 Words


The linguistic history of English dialects still suffers from a considerable lack of diachronic data representative of the period that extends from the early modern period up to modern times (c.1500-c.1950). Whilst the increasing availability of textual corpora has enabled successful diachronic research into the history of standard English, variation in regional varieties of English remains virtually unexplored. No diachronic compilations have hitherto been available to fill the lacunae still present in the field. For this reason, a group of researchers from the University of Salamanca (initially led by Dr. Gudelia Rodríguez Sánchez ) has been working over the past few years on a long-term project whose primary aim is to remedy the scarcity of data so that linguists may be able to sketch the regional setting from a diachronic perspective. Consisting of documents representative of literary dialects and dialect literature, the Salamanca Corpus has been conceived as an  electronic repository of diachronic dialect material which might bridge some of the gaps still existing in the field. It aims to cover a time span of no fewer than four centuries (c.1500-c.1950), thereby presenting documents in which dialect traits from pre-1974 English counties are documented. Some of the texts supplement the monumental primary sources of the English Dialect Dictionary (1898-1905), thus adding to our understanding of old regional speech. The compilation follows the pluralistic stance that has recently been adopted by diachronic linguistics, seeking to provide a democratic account of non canonical literatures too.  

The Salamanca Corpus has been possible thanks to the generous financial support of the Spanish Ministry of Education and Science. Two research grants have so far funded our investigation:

1. “Variación lingüística en el Inglés Moderno Temprano: Dialectos y sociolectos marginados en el proceso de estandardización” (PB98-0258).

Period: 30/12/1999-30/12/2002

Main researcher: Dr. Gudelia Rodríguez Sánchez.

2. “Idiolectos y sociolectos ingleses marginados en el proceso de estandardización desde fines del siglo XVI hasta mediados del siglo XX” (BFF 2003-09376).

Period: 10/12/2003-09/12/2006

Main Researcher: Dr. María F. García-Bermejo Giner.


We are also grateful to the University of Salamanca for permanently hosting this electronic Corpus at the University Digital Archive: GREDOS. See link on the right.