David M. Rojas

Indiana University


Considering the Geographical Delineation of Cajun English

String edit distance refers to a metric based on the number of insertions, deletions, and substitutions required to convert one string into another. The idea of using string edit distance to determine the degree of similarity of two or more linguistic varieties dates from at least as early as the 1970s (Séguy, 1971). Taking into account either phone sequences or the phonological features of phone sequences, the techniques have been applied to lexical data from a range of varieties, including Irish dialects (Kessler, 1995) and Dutch dialects (Nerbonne et al., 1996), and the methodology of dialectometry—or quantifying the similarity among dialects—has since been extensively refined (e.g. Nerbonne et al., 1999; Nerbonne and Heeringa, 2001). Recently, the procedure has also been carried out on lexical data from the Linguistic Atlas of Middle and South Atlantic States (LAMSAS) (Kleiweg and Nerbonne, 2001). The approach consists of calculating the string edit distances between pairs of phonetically transcribed lexical items as typically found in linguistic atlases, then populating a square matrix with the distances derived from all pair-wise dialect item comparisons for all of localities being considered. The distance matrix is subsequently subjected to evaluations via clustering algorithms and visualization tools as well as to comparisons with traditional accounts by dialectologists and sociolinguists.

In any recent description of language varieties in the U.S. South, Cajun English (CE) is likely to be mentioned. The variety of English known as CE has often been associated with Cajun French (CF) in that features representative of the English variety have been seen as reflexes of interference from the French. Nevertheless, the phenomenon cannot be considered a simple case of second language interference, since most present speakers of CE do not speak CF at all. Responding to the call for further publicly accessible research on CE using previously collected materials (Eble, 2003), the focus of the current paper is not to directly trace the possible origins of CE, or even examine the features that characterize it, but rather to test hypotheses regarding the extent to which the area where CE is spoken coincides with the borders of cultural, political, or linguistic Acadiana.

Using data from the Dictionary of American Regional English (DARE) and from the Linguistic Atlas of the Gulf States (LAGS), localities will be clustered as described above. Though running sequence comparisons on lexical item transcriptions may alone be insufficient to fully distinguish a CE area from a non-CE area, the readily available and as yet untapped material provides a unique resource awaiting combination with an approach that has been successfully employed in categorizing other dialects. The analyses are expected to yield a distinct and coherent region that differs significantly from its neighboring varieties. The interesting research question, however, is to what extent this region corresponds to the historically French speaking region. If it is smaller, the reason may be related to leveling pressures that have whittled the peripheries of the area. If, on the other hand, the distinctive region is larger than Acadiana proper, then an argument could be made supporting the socio-economic importance of the linguistic reinforcement of Cajun cultural identity in the ongoing Cajun Renaissance of southern Louisiana.

DARE. Dictionary of American Regional English. (1985– ). Vol. 1 (A-C), Cassidy, Frederic G. (ed.). Vols. 2 (D-H) and 3 (I-O), ed.
    Cassidy, Frederic G., and Joan Houston Hall (eds.). 3 vols. to date. Cambridge: Belknap Press of Harvard UP.
Eble, Connie. (2003). The Englishes of southern Louisiana. In Nagle, Stephen J., and Sara L. Sanders (eds.), English in the
    Southern United States. Cambridge: Cambridge UP.
An Index by region, usage, and etymology to the Dictionary of American Regional English, Volumes I and II. (1993). Publication of
    the American Dialect Society. No. 77. Tuscaloosa: U of Alabama P.
LAGS. Linguistic Atlas of the Gulf States. (1986–92). Pederson, Lee (ed.). 7 vols. Athens: U of Georgia P.
Kessler, Brett. (1995). Computational dialectology in Irish Gaelic. In Proceedings of the European Association for Computational
    Linguistics. 60–67.
Kleiweg, Peter, and John Nerbonne. (2001). Analysis and visualisation of LAMSAS dialects. Manuscript, November 2000–August
    2001. http://odur.let.rug.nl/~kleiweg/indexr.html
Nerbonne, John, and Wilbert Heeringa. (2001). Computational comparison and classification of dialects. Dialectologia et
    Geolinguistica, 9:69–83.
Nerbonne, John, Wilbert Heeringa, Eric van den Hout, Peter van de Kooi, Simone Otten, and Willem van de Vis. (1996). Phonetic
    Distance between Dutch Dialects. In Proceedings of the Sixth Computational Linguistics in the Netherlands (CLIN) Meeting.
Nerbonne, John, Wilbert Heeringa, and Peter Kleiweg. (1999). Edit distance and dialect proximity. In Sankoff, David, and Joseph
    Kruskal (eds.), Time Warps, String Edits and Macromolecules: The Theory and Practice of Sequence Comparison. Stanford:
    CSLI. v–xv.
Séguy, Jean. (1971). La relation entre la distance spatiale et la distance lexicale. Revue de Linguistique Romane 35:335–357.