Erik Thomas                                    Jeffrey Reaser

North Carolina State University                    North Carolina State University

 

 

Audio

An Experiment on Cues Used for Identification of Voices as African American or European

Over the past 52 years, at least thirty studies have investigated experimentally the identification of voices as African American or European American. A general finding of these studies is that, in most cases, listeners can identify the ethnicity of speakers, though accuracy rates vary from near-chance to over 90%, depending on the type of stimuli used and what sorts of listeners serve as subjects. It can thus be said that most listeners can identify the ethnicity of most speakers most of the time. The next step is to determine how listeners make their identifications. Many of the past investigations on ethnic identification have attempted to address this issue. Doing so involves more experimental difficulties than simply determining whether ethnic identification is possible, however. For the most part, past research on what cues listeners utilize to make identifications has focused on single cues, such as the quality of certain vowels, the fundamental frequency (F0), or intonational patterns, and has been limited to determining whether listeners could access the particular cue instead of comparing the relative importance of various cues.

We designed an experiment to compare the relative usefulness of different cues. The stimuli that we used were samples of running speech, including both read and spontaneous utterances, spoken by African American and European American college students. The read utterances were designed to highlight either particular vowels or intonation patterns. In this way, the results can be compared with the findings of speech production studies. We avoided identifying lexical, morphosyntactic, and, as far as possible, consonantal variants in the stimuli; middle-class African Americans often avoid these variants, yet such speakers are still usually identifiable. In production, African Americans show less fronting, on average, of /o/ (as in coat), /u/ (as in who), and the nucleus of /au/ (as in how), than European Americans, as well as higher /æ/ (as in hat) and /e/ (as in set), and also produce more intonational pitch accents than European Americans. We wanted to determine whether these trends in production were reflected in perception. In addition, there may be some differences in voice quality that listeners can access. African American males are reported to show lower F0 values, on average, than European American males—though it is unclear whether the same relationship holds for females—and there may be differences in spectral tilt as well. We presented subjects with the same utterances, treated three different ways, to subjects: unmodified (after the initial digitization), monotonized, and lowpass filtered at 500 Hz. Monotonization makes F0 constant, eliminating F0-dependent voice quality variation and reducing the amount of intonational information available to listeners. Lowpass filtering at 500 Hz eliminates F2 and higher formants, as well as a good deal of F1 information, making differences in vowel quality virtually unrecognizable, and also makes variations in spectral tilt essentially indistinguishable. By comparing responses to different utterances and different treatments, we were able to compare the degree to which listeners relied on different cues.