Global Talker Talking Translator
Here you can find all about Global Talker Talking Translator like manual and other informations. For example: review.
Global Talker Talking Translator manual (user guide) is ready to download for free.
On the bottom of page users can write a review. If you own a Global Talker Talking Translator please write about it to help other people. [ Report abuse or wrong photo | Share your Global Talker Talking Translator photo ]
Global Talker Talking Translator - Quick Start Guide, size: 1.6 MB
Global Talker Talking Translator
Lingo XPLORER (Model TTX 14) 14 Language Talking Translator with New ...
User reviews and opinions
|sjmercure||10:09pm on Thursday, May 6th, 2010|
|Read the instructions AND practice (save phrases) BEFORE you need it ... Ok, so I have to admit.|
Comments posted on www.ps2netdrivers.net are solely the views and opinions of the people posting them and do not necessarily reflect the views or opinions of us.
speech intelligibility. Taken together, there is a growing body of research showing that the linguistic content of an utterance and the indexical, paralinguistic information, such as talker- and instancespecific characteristics, are not only simultaneously conveyed by the acoustic signal, but also are not dissociated, or normalized away, during speech perception (Ladefoged and Broadbent, 1957; Laver and Trudgill, 1979). Similarly, studies of within-talker variability in speech production have shown that talkers systematically alter their speech patterns in response to particular communicative requirements in ways that have substantial effects on the overall intelligibility of an utterance. For example, in a series of studies on speech directed towards the hard of hearing, Picheny et al. (1985, 1986, 1989) and Uchanski et al. (1996) found systematic, acoustic-phonetic differences bespeech within tween clear and conversational individual talkers. Clear speech had consistently higher intelligibility scores, and was found to be slower and to exhibit fewer phonological reduction
A.R. Bradlow et al./Speech
phenomena than conversational speech. Lindblom (19901 and Moon and Lindblom (1994) showed that talkers adapt their speech patterns to both production-oriented and listener-oriented factors as demanded by the specific communicative situation. For example, formant frequencies of vowels embedded in words spoken in clear speech exhibited less contextually conditioned undershoot than those embedded in words spoken in citation form. Recently, Bond and Moore (1994) investigated whether the acoustic-phonetic characteristics that apparently distinguish clear versus conversational speaking styles within a talker also distinguish the speech across talkers who differ in overall intelligibility. Indeed, in a comparison of the acousticphonetic characteristics of the speech of a relatively high intelligibility talker and two talkers with relatively low intelligibility, Bond and Moore found that inadvertently clear speech shared many of the acoustic-phonetic characteristics of intentionally clear speech. Finally, Keating et al. (1994) and Byrd (1994) investigated inter-talker variability in pronunciation of American English from tokens in the TIMIT database of American English dialects (Lame1 et al., 1986; Pallett, 1990; Zue et al., 1990). Both of these studies revealed the broad range of pronunciation characteristics in American English, and pointed out how paralinguistic factors, such as the talkers gender, dialect and age, in addition to linguistic factors, such as phonetic context, contribute to the observed pronunciation variability. However, since the TIMIT database does not include perceptual data, neither of these studies could make any inferences regarding the effects of these inter-talker differences on overall speech intelligibility. The goal of the present study was to extend our understanding of the talker-specific characteristics that lead to variability in speech intelligibility by investigating the acoustic correlates of different talkers productions in a large database that includes both sentence productions from multiple talkers and intelligibility data from multiple listeners per talker (Karl and Pisoni, 1994). The basic question we asked was: What acoustic characteristics make some talkers more intelligible than others? By directly assessing talker-specific correlates of speech intelligibility at the acoustic-phonetic level this investigation aimed to extend our understanding of the relation-
ship between the indexical and linguistic aspects of speech communication: we hoped to identify some of the aspects of talker variability that might, on the one hand, be expected to help identify a particular talker, and on the other hand, have a direct effect on overall speech intelligibility. We acknowledge that it is misleading to ascribe all of the variability in sentence intelligibility to acoustic-phonetic characteristics of the talker. Such an approach incorrectly disregards any listenertalker-sentence interactions that affect the resultant intelligibility scores. Nevertheless, while keeping in mind the contribution of listener- and sentence-related factors to overall intelligibility, we were interested in investigating what talker-related characteristics, independently of the listener- and sentence-related characteristics, might correlate with overall intelligibility, and therefore might account for some portion of the observed variability in overall intelligibility. We hoped that the results of this investigation combining both acoustic-phonetic measurements with perceptual data might lead to a better understanding of the salient acoustic-phonetic characteristics that listeners respond to during speech perception, and would therefore help to differentiate highly intelligible speech from less intelligible speech. We adopted an approach that focused on two aspects of talker-specific characteristics. First, we talker characteristics, such as focused on global gender, fundamental frequency and rate of speech. These characteristics are global because they extend over the entire set of utterances from a given talker, rather than being confined to local aspects of the speech signal that are related to the articulation of individual segments. Second, we focused on specific pronunciation characteristics, such as vowel category realization and segmental timing relations that are fine-grained, acoustic-phonetic indicators of instance-specific variability. Whereas the global characteristics provide information about some of the invariant speech attributes of the individual talkers, the fine-grained acoustic-phonetic details at the local, segmental level, provide information about the instance-specific pronunciation characteristics of particular utterances. We expected that a wide range of these talker-related characteristics would contribute to variability in overall intelligibility, and we hoped that this approach would provide a better understand-
ing of some of the talker- and instance-specific factors that are associated with highly intelligible normal speech.
2. The Indiana multi-talker
The materials for this study came from the Indiana Multi-Talker Sentence Database (Karl and Pisoni, 1994). This database consists of 100 Harvard sentences (IEEE, 1969) produced by 20 talkers (10 males and 10 females) of General American English 3. The sentences are all mono-clausal and contain 5 keywords plus any number of additional function words. None of the talkers had any known speech or hearing impairments at the time of recording, and all recordings were live-monitored for gross misarticulations, hesitations, and other disfluencies. (See Table 2 for examples of the sentences.) The sentences were presented to the subjects on a CRT monitor in a sound-attenuated booth (IAC 401A). The stimuli were transduced with a Shure (SM98) microphone, and digitized on-line (16-bit analog-todigital converter (DSC Model 240) at a 20 kHz sampling rate). The average root mean square amplitude of each of the digital speech files was then equated with a signal processing software package (Lute and Carrell, 19811, and the files were converted to 12-bit resolution for later presentation to listeners in a transcription task using a PDP-11/34 computer. Along with the audio recordings, this database also includes speech intelligibility data in the form of sentence transcriptions by 10 listeners per talker, for a total of 200 listeners. In collecting these transcriptions, each group of 10 listeners heard the full set of 100 sentences produced by a single talker. The sentence stimuli were low-pass filtered at 10 kHz, and presented binaurally over matched and calibrated TDH-39 headphones using a 12-bit digital-to-analog converter. The listeners heard each sentence in the clear (no noise was added) at a comfortable listening
3Copies of the Indiana Multi-Talker Sentence Database can be obtained in CD-ROM form for a nominal cost for media and postage. Please write the authors at Speech Research Laboratory, Department of Psychology, Indiana University, Bloomington, IN 47405, USA, or e-mail, firstname.lastname@example.org.
level (75 dB SPL), and then typed what they heard at a computer keyboard. A PDP-11/34 computer was used to control the entire experimental procedure in real-time. The listeners were all native speakers of American English, who were students at Indiana University. They had no speech or hearing impairments at the time of testing. The sentence transcriptions were scored by a keyword criterion that counted a sentence as correctly transcribed if, and only if, all 5 keywords were correctly transcribed. Any error on a keyword resulted in the sentence being counted as mistranscribed. With this strict scoring method, each sentence for each talker received an intelligibility score out of a possible 10. Each talkers overall intelligibility score was then calculated as the average score across all 100 sentences. As shown in Table 1, the overall sentence intelligibility scores ranged from 81.1% to 93.4% correct transcription, with a mean and standard deviation of 87.8% and 3.1%, respectively. Thus, the materials in this large multi-talker sentence database showed considerable variation and covered a range of talker intelligibility that could be used as the basis for an investigation of the effects of global and fine-grained acoustic-phonetic talker characteristics on overall speech intelligibility. It is important to note here that intelligibility scores must be interpreted in a relative sense. For example, Hirsh et al. (1954) observed that authors on this subject almost always caution readers. to regard such scores as specific to a given crew of talkers and a given crew of listeners. In the present study, we were specifically interested in exploring the individual characteristics of our crew of talkers, however, our database was constructed in such a way that it did not provide the means of systematically investigating the contribution of the crew of talkers independently of the crew of listeners. This is because for each talker, a different group of 10 listeners, drawn from the same population, transcribed the recordings of the full set of 100 sentences. Therefore, the intelligibility scores for the 20 talkers shown in Table 1, as well as the talker-related correlates of intelligibility that we discuss below, should, strictly speaking, be regarded as reflecting characteristics of the particular talker-listener situation, rather than of the talker independently of
The final global talker characteristic that we investigated was overall speaking rate. Although speaking rate is not a source-related, voice-quality characteristic, it is one of the most salient global talker-specific characteristics, and one that is known to distinguish clear versus conversational speech within individuals (Picheny et al., 1989; Krause and Braida, 1995; Uchanski et al., 1996). Additionally, many phonological reduction phenomena are directly related to changes in speaking rate. In Byrds analyses of the TIMIT database, which included sentences from 630 talkers, she found that across all dialects, the males had significantly faster speaking rates than the females on the two calibration sentences that were read by all talkers. However, Byrds study also found an interaction of gender and dialect region such that the slowest speaking region for the male speakers (the South Midland)
was only the fourth slowest for the female speakers. Bond and Moore (1994) found no word duration differences in their analyses of two talkers that differed in overall intelligibility when the words were embedded in sentences, although for isolated words the less intelligible talker had shorter durations than the more intelligible talker. Furthermore, in a recent study of the effects of speaking rate on the intelligibility of clear and conversational speaking modes, Krause and Braida (1995) reported that trained talkers were able to achieve an intelligibility advantage for the clear speech mode even at faster speaking rates. In other words, it is possible to produce fast clear speech. Thus, although there is some evidence that overall speaking rate varies with paralinguistic (indexical) characteristics such as speakers gender and dialect, and that speaking rate can be associated conversational speaking style, a with a reduced, direct link between speaking rate and intelligibility remains unclear. In our database, we measured overall speaking rate for each of the 20 talkers, as the mean sentence duration across all 100 sentences. All duration measurements were made using the Entropics WAVES + software on a SUN workstation. The questions we asked here were: (1) Does overall speaking rate correlate with overall speech intelligibility across all 20 talkers? and (2) Can the gender-based intelligibility difference be traced to a gender-based difference in overall speaking rate? As shown in Table 1, we observed considerable variability across all 20 talkers in mean sentence duration (mean sentence duration = 2.115 seconds, with a standard deviation of 0.276 seconds). However, we failed to find a clear relationship between mean sentence duration and overall speech intelligibility scores: there was no correlation between speaking rate and speech intelligibility across all 20 talkers, and there was no significant difference in the means between the male and female speaking rates. Thus, in our multi-talker sentence database, overall speaking rate as measured by mean sentence duration did not appear to be a talker-related correlate of variability in speech intelligibility. This result is consistent with the recent finding of Krause and Braida (1995) that fast speech can also be clear speech, and with Bond and Moore (1994) who found no difference in duration for words in sentences
spoken by a high and a low intelligibility talker. Furthermore, even though the present data do not show a difference between male and female speaking rates as reported by Byrd (1994), it is likely that these data reflect the interaction between speaker gender and dialect region that she found in the TIMIT database: most of our speakers and listeners were from the South Midland region, which Byrd found to have the slowest speaking rate for males but an average speaking rate for females. Nevertheless, it remains possible that a measure of speaking rate that took into account the number and duration of any pauses that the talker may have inserted into the sentence, rather than simply averaging the pauses into the overall sentence durations, would correlate better with overall intelligibility. This possibility is supported by the finding of Picheny et al. (1986, 1989) and Uchanski et al. (1996) who reported that, within individual talkers, clear speech contains more numerous and longer pauses than conversational speech.
Table 2 Subset of 18 sentences containing the words with the target vowels from which the vowel space measurements were taken, with the IPA phonemic transcription for the target word. All 5 keywords are italicized, with the word with the target vowel in boldface, Asterisks mark the three sentences whose productions by Talker F7 and M2 can be heard from the Elsevier web site (http://www.elsevier.nl/locate/specom) /i/: 1. Its easyto tell the depth of a well. 2. The fruit peel was cut in thick slices. 3. Adding fast leads to wrong sums. 4. This is a grandseason for hikes on the road. 5. The walled town was seized without a fight. 6. The meal was cooked before the bell rang. /a/: 7 * A pot of tea helps to pass the evening. 8. A rod is used to catch pink salmon. 9. The wide road shimmered in the hot sun. 10. The show was a frop from the very star?. 11. The hogs were fed chopped corn and garbage. 12. A large size in stockings is hard to sell.
/lidz/ /s&n/ /sizd/ /mil/
/stdu)z/ /wok/ /Jot/ Dot/ /lad/ /rW /no/
4. Fine-grained teristics
13 * The horn of the car woke the sleeping cop. 14. Bail the boat to stop it from sinking. 15. Mend the coat before you go out. 16. Hoist the load to your left shoulder. 17. The dune rose from the edge of the water.
18. The young girl gaue no clear response.
4.1. Vowel space characteristics We began our investigation of fine-grained acoustic-phonetic talker characteristics with an examination of vowel spaces. Vowel centralization is a typical feature of casual, or reduced speech (Picheny et al., 1986; Lindblom, 1990; Moon and Lindblom, 1994; Byrd, 1994). Additionally, vowel space expansion has been shown to correlate with speech intelligibility. For example, Bond and Moore (1994) found more peripheral vowel category locations in an Fl by F2 space for a higher-intelligibility talker relative to a lower-intelligibility talker. In a study of vowel production by deaf adolescents, Monsen (1976) found a significant positive correlation between range in F2 and intelligibility. Both of these studies lead us to hypothesize that in our multi-talker sentence database we would find a positive correlation between overall intelligibility and measures of vowel space expansion. Specifically, we predicted that relatively expanded vowel spaces would be associated with enhanced speech intelligibility scores. In order to measure each talkers vowel space, we selected six occurrences of the three peripheral vowels, /i,a,o/, from the sentence materials in the database. (The point vowel /u/ was avoided due to excessive allophonic variation for this vowel in General American English.) All of the words containing the target vowels were content words, and none was the final keyword in the sentence. Table 2 lists the subset of 18 sentences containing the words with the target vowels from which the vowel space measurements were taken. The first and second formants were measured from each of the 18 target vowels as produced by each of the 20 talkers. All formant measurements were made using the Entropics WAVES + software package on a SUN workstation. Both LPC spectra (calculated from a 25 ms Hanning window) and spectrograms were used to determine the location of the first two formant frequencies at the vowel steady-state. These Fl and F2 measurements were then converted to the perceptually motivated me1
Table 3 Intelligibility scores across the 18 sentences used for the vowel space measurements, vowel space area, vowel space dispersion, Fl range, F2 range, within category clustering, vowel space dispersion/within-category clustering, F2-Fl distance for /i/ and /a/ for each individual talker. Asterisks mark the two talkers (F7 and M2) whose vowel space measurements are shown in Fig. 2, and whose speech samples can be heard from the Elsevier web site (http:// www.elsevier.nl/locate/specom) Talker Intelligibility (18 sentences) Fl F2 F3 F4 F5 F6 F7 * F8 F9 FlO Ml M2 M3 M4 M5 M6 Ml M8 M9 Ml0 93.3 92.8 91.1 85.0 91.1 91.7 92.8 92.8 88.3 90.6 91.1 78.3 82.2 81.5 85.0 86.7 88.3 90.6 86.7 85.0 Vowel space area (mel?) 82747.76 40844.95 81688.80 49686.160.25 69203.79 98726.79 55770.36 28993.41 61950.01 61092.87 41005.79 114352.73 73394.01 13531.00 72398.30 35205.43 49982.49 63413.41 79670.34 Vowel space dispersion (mels) 349.23 301.91 327.33 268.21 311.06 321.54 3184.108.40.206 252.13 285.57 272.45 360.09 278.47 250.13 280.60 273.67 262.61 263.44 309.08 Fl range (mels) F2 range (mels) Category clustering (mels) clustering 66.899 75.077 90.939 110.827 107.85 67.482 62.253 77.757 73.608 66.179 61.688 65.893 94.095 65.757 55.574 47.559 45.863 99.5 66.161 72.69 Vowel space dispersion/ category F2-Fl /i/ (mels) F2-Fl /a/ (mels)
649.72 569.19 518.35 545.66 604.542.88 607.76 572.61 564.30 472.95 470.40 435.03 498.13 456.37 476.60 475.77 408.63 430.11 453.04 575.71
802.95 746.18 1168.76 932.08 1172.23 852.91 904.66 757.56 651.23 672.00 844.43 737.49 1053.08 745.809 636.17 663.73 811.51 751.19 756.13 878.97
5.220 4.021 3.599 2.420 2.884 4.765 5.572 3.747 3.414 3.810 4.629 4.135 3.827 4.235 4.501 5.900 5.967 2.639 3.982 4.252
1426.44 1396.64 1314.26 1219.59 1357.83 1370.43 1394.58 1351.63 1243.95 1253.27 1230.42 1282.32 1238.78 1211.41 1268.43 1250.89 1204.81 1080.55 1177.70 1343.15
311.19 339.11 412.39 422.63 496.94 333.86 307.04 405.06 339.65 450.47 355.413.46 393.10 529.38 375.96 443.88 482.01 382.81 387.12 399.74
scale (Fant, 1973). (The exact equation for converting frequencies from Hertz to mels is M = (lOOO/log 2) log((F/ 1000) + I), where M and F are the frequencies in mels and Hertz, respectively.) Each talkers vowel space was then represented by the locations of the 18 individual vowel tokens in an Fl by F2 space. In all of the following analyses of the relations between vowel space characteristics and speech intelligibility, we used each talkers average intelligibility score across the 18 sentences, given in Table 2, that formed the subset of sentences with the words that contained the target vowels (see Table 3). Across all 20 talkers, the overall intelligibility scores for the total set of 100 sentences and for the subset of 18 sentence were significantly correlated (Spearman p = + 0.629, p = 0.006), thus this subset of 18 sentences was a good indicator of the talkers overall intelligibility scores. The first measure that we used to assess the
relationship between vowel space and overall speech intelligibility was the Euclidian area covered by the triangle defined by the mean of each vowel category. Here we hypothesized that the greater the triangular area, the higher the overall intelligibility. Fig. 2(a) shows the vowel triangles for the highest intelligibility talker (Talker F7) and the lowest intelligibility talker (Talker M2). Sample sentences that provide an indication of these two talkers vowel spaces can be heard in Signals 5 A-F (three sentences for each talker). It is clear from Fig. 2(a) that the vowel triangle for Talker F7 covers a greater area within this space than the vowel triangle for Talker M2. However, across all 20 talkers we failed to find a positive correlation between triangular vowel space
5 The texts corresponding pendix A.
A.R. Bradlow et al/Speech
Communication 20 (1996) 255-272
area and speech intelligibility scores (see Table 3 for each individual talkers vowel space area). One problem with triangular vowel space area as a measure of
vowel category differentiation is that the points used to calculate this measure are the category averages, and these may not be representative of the individual
5w 700 WI
Bw 300 2MM 5w 703 900
800 P 700 9w
3&l 500 7w 900
. fWJi 300
503 Fl heIs)
Fig. 2. Vowel space characteristics for a high-intelligibility area, (b) vowel space dispersion, (c) range in Fl and F2.
talker (Talker F7) and a low-intelligibility
talker (Talker M2): (a)
vowel tokens actually produced by the talker. For this reason, we devised a different measure of vowel space expansion that took into account the specific location of each individual vowel token, and then reanalyzed the data. Fig. 2(b) shows each vowel tokens distance from a central point in the talkers vowel space for the highest intelligibility talker (Talker F7) and the lowest intelligibility talker (Talker M2). A measure of each talkers vowel space dispersion was calculated as the mean of these distances for each talker. This measure thus provided an indication of the overall expansion, or compactness, of the set of individual vowel tokens from each talker (see Table 3 for each individual talkers vowel space dispersion measure). The measures of vowel space area and vowel space dispersion were highly correlated (Spearman p = f0.782, p < O.OOl>, however, the correlation was not perfect indicating that each measure captures a slightly different aspect of the talkers vowel production characteristics. With respect to the correlation between vowel space dispersion and intelligibility, we found a moderate, positive rank order correlation (Spearman p = + 0.431, p = 0.060) across all 20 talkers, and this correlation increased when only the 10 highest intelligibility talkers were included in the analysis (Speannan p = f0.698, p = 0.036). Thus, using a measure of vowel space dispersion, the data showed that higher overall speech intelligibility is associated with a more expanded vowel space, particularly for the talkers in the top half of the distribution of intelligibility scores. Based on the finding that overall vowel space dispersion and speech intelligibility were correlated, we then investigated which of the two dimensions, Fl or F2, in the vowel space representations was more responsible for this correlation. In his study of the vowel productions of deaf adolescents, Monsen (1976) found a stronger positive correlation between range in F2 and intelligibility (r = +0.74) than he did for range in Fl and intelligibility (r = +0.45). As Monsen notes, these correlations do not suggest that range in F2 is more important for normal speech intelligibility than range in Fl, rather these correlations arise from the fact that the vowels of these deaf subjects occupy a more normal range in Fl than in F2. For the purposes of our investigation of variability in normal speech, Monsens finding simply indi-
cated the usefulness of investigating range in Fl and F2 as separate dimensions that might correlate with overall intelligibility. Accordingly, we measured each talkers range in Fl and F2 as the difference between the maximum and minimum values on each of these dimensions. Fig. 2(c) shows the Fl and F2 range measurements for the highest intelligibility talker (Talker F7) and the lowest intelligibility talker (Talker M2). (See Table 3 for each individual talkers range in Fl and F2.1 Across all 20 talkers, we found a significant positive rank order correlation between range in Fl and intelligibility (Spearman p = + 0.531, p = 0.020), but we failed to find a significant rank order correlation between range in F2 and intelligibility (Spearman p = +0.239, p = 0.300). This correlation of Fl range and intelligibility was strengthened when only the top 10 talkers were included in the analysis (Spearman p = +0.817, p = 0.014). Thus, it appears that the area covered in Fl was a better correlate of overall intelligibility than the area covered in F2. This finding is not surprising in view of the fact that the English vowel system has several vowel height distinctions (of which Fl frequency is an important acoustic correlate), whereas there are many fewer distinctions along the front-back dimension (of which F2 frequency is the primary acoustic correlate). It may be that in order for the numerous English vowels to be well distinguished, a wide range in Fl (vowel height) is advantageous, whereas less precision can be more easily tolerated in the F2 (front-back) dimension. The vowel space measures that we have reported so far have established relations between relative vowel space expansion and overall speech intelligibility, particularly for talkers in the top half of the intelligibility score distribution. An additional measure of vowel articulation that might be expected to correlate with intelligibility is the relative compactness of individual vowel categories. We might expect that the more tightly clustered categories enhance intelligibility since they are less likely to lead to inter-category confusion. As a measure of tightness of within-category clustering, we first calculated the mean of the distances of each individual token from the category mean, as we did for our measure of overall vowel space dispersion. Then a single measure for each talker was calculated as the mean
A.R. Bradlow et al. / Speech Communication 20 (1996) 255-272
within-category dispersion across all three vowel categories (see Table 3 for these values for each talker). However, analysis of the results showed that across all 20 talkers, as well as for only the 10 highest intelligibility talkers, there was no correlation between within-category dispersion and intelligibility. Thus, tightness of within-category clustering per se was not a good correlate of overall intelligibility. We then explored the possibility that a combined measure of within- and between-category dispersion might correlate with intelligibility better than each measure independently. We hypothesized that talkers can compensate for a less dispersed overall vowel space by having more tightly clustered individual vowel categories. In order to test this hypothesis we calculated a dispersion index from each talkers overall vowel space dispersion divided by the mean within-category clustering (see Table 3). We expected that a greater dispersion index would indicate better differentiated vowel categories relative to the overall vowel space area, and would therefore correlate positively with overall intelligibility. Across all 20 talkers, the dispersion index did not correlate with intelligibility; but, for the 10 highest intelligibility talkers there was a significant positive correlation (Spearman p = +0.654, p = 0.049). However, this correlation is comparable to the correlation between overall vowel space dispersion and intelligibility independently of within-category clustering (Spearman p = +0.698, p = 0.036), suggesting that overall vowel space expansion on its own, rather than relative to within-category compactness, is associated with increased speech intelligibility. The final measure of vowel space that we examined as a possible correlate of speech intelligibility was the acoustic-phonetic implementation of the point vowels /i/ and /a/. Each of these two vowel categories defines an extreme point in the American English general vowel space. In the acoustic domain, they each display extreme F2-Fl distances: /i/ is characterized by a wide separation between the first two formant frequencies, whereas /a/ is characterized by very close Fl and F2 frequencies. Thus, the F2-Fl distance for these point vowels provided an indication of the extreme locations in the Fl by F2 space for these vowels (Gerstman, 1968). Accordingly, we hypothesized that the F2-Fl distance for /i/ would be positively correlated with overall intel-
ligibility, and that the F2-Fl distance for /a/ would be negatively correlated with overall intelligibility. Indeed, across all 20 talkers, we found a positive rank order correlation between F2-Fl distance for /i/ and overall intelligibility (Spearman p = + 0.601, p = 0.009>, and a negative rank order correlation between F2-Fl distance for /a/ and overall intelligibility (Spearman p = - 0.509, p = 0.027). (See Table 3 for these values for each talker.) When only the 10 highest intelligibility talkers were included in the analysis, these correlations were strengthened further (Spearman p = +0.866, p = 0.009 and Spearman p = -0.673, p = 0.043, for /i/ and /a/, respectively). Thus, relatively high overall speech intelligibility is associated with more extreme vowels as measured by the precision of individual vowel category realization, as well as by overall vowel space expansion for a given talker. In summary, the general pattern that emerged from these measures of the acoustic-phonetic vowel characteristics as correlates of overall intelligibility was that talkers with more reduced vowel spaces tended to have lower overall speech intelligibility scores. The measures of vowel space reduction that were shown to correlate with overall speech intelligibility were overall vowel space dispersion, particularly range covered in the Fl dimension, and the extreme locations in the Fl by F2 space of the point vowels /i/ and /a/ as measured by F2-Fl distance. The analyses also showed that the correlations between vowel space reduction and overall intelligibility were stronger for talkers in the top half of the distribution of intelligibility scores, suggesting a greater degree of variability for talkers with lower intelligibility scores that is not accounted for by these measures of a talkers vowel space. 4.2. Acoustic-phonetic tener errors correlates of consistent lis-
Another strategy we used for investigating the correlation between fine-grained acoustic-phonetic characteristics of a talkers speech and overall intelligibility involved analyses of the specific portions of sentences that showed consistent listener transcription errors. With this approach we hoped to identify specific pronunciation patterns that resulted in the observed listener errors. These analyses differed from
Communication Talker Ml
Fig. 3. Waveforms of the sentence portion, walled town, as produced by Talker Ml, who had a relatively long duration of voicing during the stop closure, and Talker M9, who had a very short duration of voicing during the stop closure.
the methods used in the analysis of vowel spaces because here we focused on specific cases where there were known listener errors, rather than on more general statistical indicators of overall phonetic reduction. In particular, in our database we found two specific cases of consistent listener error that revealed the importance of highly precise inter-segmental timing for speech intelligibility (see also (Neel, 1995)). 4.2.1. Segment deletion The first case of consistent listener error occurred in the sentence The walled town was seized without a fight. The overall intelligibility of this sentence across all 20 talkers was 60% correct, with 94% of the listener transcription errors occurring for the phrase walled town. Of the listener errors on this portion of the sentence, 82% involved omitting the word final /d/ in walled. (None of the remaining 18% of the errors involved omitting the word initial /t/ in town.) In order to determine what specific talker-related acoustic characteristics might lead to this common listener error, we measured the durations of various portions of the acoustic waveform from this phrase, and then correlated these measurements with the rate of /d/ detection for each talker. We began by measuring the total vowel-to-vowel
duration, that is, the portion of the waveform that corresponds to the talkers /dt/ articulation between the /al/ of wall and the /au/ of town. This portion of the acoustic signal was measured from the point at which there was a marked decrease in amplitude and change in waveform shape as the preceding vowel-sonorant sequence (the /al/ from wall) ended, until the onset of periodicity for the following vowel (the /au/ from town). In almost all cases, this portion consisted of a single closure portion and a single release portion: most talkers (18/20) did not release the /d/ and then form a second closure for the /t/. Fig. 3 shows waveforms of this portion of the sentence for two talkers, with vertical cursors demarcating the salient acoustic boundaries. These sentences can be heard in Signal G (Talker Ml) and Signal H (Talker M9). Across the group of 20 talkers, we found a significant positive rank order correlation between the vowel-to-vowel duration and rate of /d/ detection (Spearman p = +0.713, p = 0.002) 6. Based on this
Note that the correlations reported here differ slightly from those reported in (Bradlow et al., 1995). This minor difference is due to the addition of one more listeners data into the present analysis: the earlier report was based on only 199 (instead of 200) listeners data.
finding, we then looked at the rate of /d/ detection in relation to the separate durations of the closure portion and of the release portion, which together comprised the vowel-to-vowel portion. Here we found a significant positive correlation with the closure duration (Spearman p = +0.641, p = 0.005) but no correlation with the release duration. The closure portion generally consisted of a period with very low amplitude, low frequency vibration, followed by a silent portion. Accordingly, we then examined the correlation between rate of /d/ detection and the separate durations of each of these portions of the total closure duration. A highly significant positive correlation was found between rate of /d/ detection and the duration of voicing during the closure (Spearman p = +0.755, p < O.OOl), whereas no correlation was found between the duration of the silent portion of the closure and rate of /d/ detection. This correlation suggests that the duration of voicing during closure, in an absolute sense, is a reliable acoustic cue to the presence of a voiced consonant in this phonetic environment. However, an extremely strong (and highly significant) rank order correlation was found between the rate of /d/ detection and the duration of the voicing during the closure relative to the duration of the preceding vowel-sonorant sequence, /wal/ (Spearman p = + 0,810, p = 0.0004). In other words, listeners appeared to rely heavily on relative timing between the duration of voicing during the closure and the overall rate of speech, as determined by the duration of the preceding syllable portion, in detecting the presence or absence of a segment. This finding is consistent with studies on rate-dependent processing in phonetic perception that have shown that listeners adjust to overall rate of speech in the identification of phonetic segments (e.g. (Miller, 1981)) and that relative timing between segments can play a crucial role in segment identification (Port, 1981; Port and Dalby, 1982; Parker et al., 1986; Kluender et al., 1988). Fig. 3 contrasts two talkers with varying amounts of this low frequency voicing during the closure relative to the preceding /wal/ portion of the waveform. Talker Ml had a considerably longer relative duration of voicing during the closure than talker M9, and consequently all of the listeners for Talker Ml detected the presence of the /d/, whereas only
ing syllables, the more likely it was to be correctly syllabified by the listener as onset of the following syllable, rather than as both coda of the preceding word and the onset of the following word. In Fig. 4, this may be seen by the shorter relative durations of the /s/ for Talker F6, whose /s/ was correctly syllabified by all 10 listeners, as opposed to the relatively longer /s/ for Talker Fl, whose /s/ was correctly syllabified by only 3 of the 10 listeners. Thus, in this case, as in the case of segment deletion discussed above, the listeners drew on global information about the speaking rate of the talker in perceiving the placement of the word boundary. The talkers precision in inter-segmental timing had a direct effect on the listeners interpretation of the speech signal. Furthermore, in this case, there was a gender-related factor in the timing relationship between the medial /s/ and the surrounding syllables. In general, the duration of the /s/ relative to the preceding and following syllables was shorter for the female talkers than for the male talkers. Consequently, the female talkers renditions of this phrase were more often correctly transcribed: 7 of the 10 female talkers had no errors of this type, whereas 6 of the 10 males had this error for at least 30% of the listeners. Thus, in this case, the female talkers as a group were
apparently more precise with respect to controlling this timing relationship than the group of male talkers. Although this case is not a matter of phonological reduction (in fact, the correct form is shorter in duration), this example does demonstrate that the gender-based difference in overall speech intelligibility that we observed in our database may be due to the use of more precise articulations by our female talkers. Moreover, both this case of syllable affiliation and the previous case of segment deletion indicate why global talker-related characteristics, such as overall speech rate, may not be good candidates for the primary determiners of talker intelligibility: apparently, finer acoustic-phonetic details of speech timing and the precision of specific articulatory events propagate up to higher levels of processing during speech perception to modulate and control overall speech intelligibility in sentences.
5. General summary
The overall goal of this investigation was to identify some of the talker-related acoustic-phonetic correlates of speech intelligibility. Specifically, we asked What makes one talker more intelligible than another? The results of this study showed that
each sentence. Thus, detailed acoustic-phonetic measures of the speech signal can be related directly to listeners perceptual responses.
Acknowledgements We are grateful to Luis Hemandez for technical support, to John Karl for compiling the Indiana Multi-talker Sentence Database, and to Christian Benoit for many useful comments. This research was supported by NIDCD Training Grant DC-00012 and by NIDCD Research Grant DC-to Indiana University.
A. Signal captions
The following signals can be heard at the web site http//www.elsevier.nl/locate/specom. Signal la. Audiofile of Talker F7s production of Its easy to tell the depth of a well. Signal lb. Audiofile of Talker F7s production of A pot of tea helps to pass the evening. Signal lc. Audiofile of Talker F7s production of The horn of the car woke the sleeping cop. Signal 2a. Audiofile of Talker M2s production of Its easy to tell the depth of a well. Signal 2b. Audiofile of Talker M2s production of A pot of tea helps to pass the evening. Signal 2c. Audiofile of Talker M2s production of The horn of the car woke the sleeping cop. Signal 3a. Audiofile of Talker Mls production of The walled town was seized without a fight. Signal 3b. Audiofile of Talker M9s production of The walled town was seized without a fight. Signal 4a. Audiofile of Talker F6s production of The play seems dull and quite stupid. Signal 4b. Audiofile of Talker Fls production of The play seems dull and quite stupid.
intelligibility tests, J. J.W. Black (19571, Multiple-choice Speech and Hearing Disorders, Vol. 22, pp. 213-235. Z.S. Bond and T.J. Moore (19941, A note on the acousticphonetic characteristics of inadvertently clear speech, Speech Communication, Vol. 14, No. 4, pp. 325-337.
A.R. Bradlow, L.C. Nygaard and D.B. Pisoni (1995), On the contribution of instance-specific characteristics to speech perception, in: C. Sorin, J. Mariani, H. Meloni and J. Schoentgen, Eds., Levels in Speech Communication: Relations and Interactions (Elsevier, Amsterdam), pp. 13-25. D. Byrd (19941, Relations of sex and dialect to reduction*, Speech Communication, Vol. 15, Nos. 1-2, pp. 39-54. Cl. Fant (1973). Speech Sounds and Features (MIT Press, Cambridge, MA). L.J. Gerstman (19681, Classification of self-normalized vowels, IEEE Trans. Audio Electroacoust., Vol. AU-16, pp. 78-80. H.M. Hanson (19951, Glottal characteristics of female speakers - Acoustic, physiological. and perceptual correlates, J. Acoust. Sot. Amer., Vol. 97, No. 2, pp. 3422. I.J. Hirsh, E.G. Reynolds and M. Joseph (19541, Intelligibility of different speech materials, J. Acoust. Sot. Amer., Vol. 26, pp. 530-538. J.D. Hood and J.P. Poole (1980), Influence of the speaker and other factors affecting speech intelligibility, Audiology, Vol. 19, pp. 434-455. IEEE (19691, IEEE recommended practice for speech quality measurements, IEEE Report No. 297. J. Karl and D. Pisoni (19941, The role of talker-specific information in memory for spoken sentences, J. Acoust. Sot. Amer., Vol. 95, p. 2873. P.A. Keating, D. Byrd, E. Flemming and Y. Todaka (1994). Phonetic analyses of word and segment variation using the TIMIT corpus of American English, Speech Communication, Vol. 14, No. 2. pp. 131-142. D. Klatt and L. Klatt (19901, Analysis, synthesis, and perception of voice quality variations among female and male talkers, J. Acoust. Sot. Amer., Vol. 87, pp. 820-857. K.R. Kluender, R.L. Diehl and B.A. Wright (1988), Vowellength difference before voiced and voiceless consonants: An auditory explanation, J. Phonetics, Vol. 16, pp. 153-169. J.C. Krause and L.D. Braida (19951, The effects of speaking rate on the intelligibility of speech for various speaking modes, J. Acoust. Sot. Amer. Vol. 98, No. 2, pp. 2982. P. Ladefoged and D.E. Broadbent (1957), Information conveyed by vowels,.I. Acoust. Sot. Amer., Vol. 29, pp. 98-104. L. Lamel, R. Kassel and S. Seneff (19861, Speech database development: Design and analysis of the acoustic-phonetic corpus, Proc. DARPA Speech Recognition Workshop, February 1986, pp. 100-109. J. Laver and P. Trudgill (19791, Phonetic and linguistic markers in speech, in: K.R. Scherer and H. Giles, Eds., Social Markers in Speech (Cambridge University Press, Cambridge), pp. l-32. B. Lindblom (1990). Explaining phonetic variation: A sketch of the H & H theory, in: W.J. Hardcastle and A. Marchal, Eds., Speech Production and Speech Modeling (Kluwer Academic Publishers, Dordrecht), pp. 403-439. P.A. Lute and T.D. Carrel1 (19811, Creating and editing waveforms using WAVES, Research in Speech Perception Progress Report No. 7 (Indiana University Speech Research Laboratory, Bloomington), J.L. Miller (1981), Effects of speaking rate on segmental distinc-
the Study of Speech
in: P.D. Eimas and J.L. Miller, Eds., Perspectiues on (Lawrence Erlbaum, Hillsdale, NJ), pp.
39-14. R.B. Monsen (1976), Normal and reduced phonological space: the productions of English vowels by deaf adolescents, J. Phonetics, Vol. 4, pp. 189-198. S-J. Moon and B. Lindblom (1994). Interaction between duration, context and speaking style in English stressed vowels,.I. Acoust. Sot. Amer., Vol. 96, pp. 40-55. J.W. Mullennix, D.B. Pisoni and C.S. Martin (1989). Some effects of talker variability on spoken word recognition, J. Acoust. Sot. Amer., Vol. 85, pp. 365-378. A.T. Neel (19951, Intelligibility of normal speakers: Error analysis, J. Acoust. Sot. Amer., Vol. 98, p. 2982. L.C. Nygaard, MS. Sommers and D.B. Pisoni (1994), Speech perception as a talker-contingent process, Psychological Sci., Vol. 5, pp. 42-46. L.C. Nygaard, M.S. Sommers and D.B. Pisoni (19951, Effects of stimulus variability on perception and representation of spoken words in memory, Perception and Psychophysics, Vol. 51, pp. 989-1001. D. Pallett (1990), Speech corpora and performance assessment in the DARPA SLS program, Proc. Internat. Conf: on Spoken Language Processing 1990, pp. 24.3.1-24.3.4. T.J. Palmeri, S.D. Goldinger and D.B. Pisoni (19931, Episodic encoding of voice attributes and recognition memory for spoken words, J. Experimental Psychology: Learning, Memov and Cognition, Vol. 19, pp. l-20. E.M. Parker, R.L. Diehl and K.R. Kluender (19861, Trading relations in speech and nonspeech, Perception and Psychophysics, Vol. 34, pp. 314-322. M.A. Picheny, N.I. Durlach and L.D. Braida (1985), Speaking clearly for the hard of hearing I: Intelligibility differences between clear and conversational speech, J. Speech and Hearing Research, Vol. 28, pp. 96-103. M.A. Picheny, N.I. Durlach and L.D. Braida (19861, Speaking clearly for the hard of hearing II: Acoustic characteristics of
and conversational speech, J. Speech and Hearing Vol. 29, pp. 434-446. M.A. Picheny, N.I. Durlach and L.D. Braida (19891, Speaking clearly for the hard of hearing III: An attempt to determine the contribution of speaking rate to difference in intelligibility between clear and conversational speech, J. Speech and Hearing Research, Vol. 32, pp. 600-603. D.B. Pisoni (19931, Long-term memory in speech perception: Some new findings on talker variability, speaking rate and perceptual learning, Speech Communication, Vol. 13, Nos. 1-2, pp. 1099125. R.F. Port (19811, Linguistic timing factors in combination, J. Acoust. Sot. Amer., Vol. 69, pp. 262-274. R.F. Port and J. Dalby (1982), Consonant/vowel ratio as a cue for voicing in English, Perception and Psychophysics, Vol. 32, pp. 141-152. R.P. Runyon and A. Haber (19911, Fundamentals of Behavioral Statistics (McGraw-Hill, New York), pp. 201-205. MS. Sommers, L.C. Nygaard and D.B. Pisoni (19941, Stimulus variability and spoken word recognition: I. Effects of variability in speaking rate and overall amplitude, J. Acoust. Sac. Amer., Vol. 96, pp. 1314-1324. M.T.J. Tielen (19921, Male and Female Speech: An experimental study of sex-related voice and pronunciation characteristics, Doctoral dissertation, University of Amsterdam. R.M. Uchanski, S. Choi, L.D. Braida, C.M. Reed and N.I. Durlach (1996), Speaking clearly for the hard of hearing IV: Further studies of the role of speaking rate, J. Speech and Hearing Research, Vol. 39, pp. 494-509. G. Weismer and R.E. Martin (19921, Acoustic and perceptual approaches to the study of intelligibility, in: R.D. Kent, Ed., clear
TX32LX85P NN-K155wbgpg VX912 EOB64101X EU PSR-90 BC-700 SF505 Ique 3600 LE46A956 HT-C553 Dimage XT Canon T70 Fidelio ETX-90EC Naturallyspeaking 6 PSC 1500 TH-42PX6U CW28D85V CGA5722-00 DMC-FS3 Rosieres 1401 Samsung P801 TX-32LXD700 DVP425E Temporis 05 HK 6850 HK6800 G2739N Review UX-EP25 PX-G5100 20U51 Dslr-A550L TH7II-raid SE233 ICD-MS515 KL-820 INA-N033r-space-software Chauffe-EAU RT-28FZ85RX Porsche 924 54 MZ-4 RSA1ztvg 3DE-7985E CDX-L250 PFE-900E Thunder 9 BAR310HG CTK-710 Ga24-12 KV-28DX650 ISA430mkii 41049 FST301 SRS-nwgu50 KDC-5024 ZD8612 IS Electronique IQ24 PM4000 Edition SUP 016 HP520 15HP-2005 SC-NC6 VGN-FW11S CS660 IDL 60 ATC3K Galaxy 3CP3453 CD491 GC-480W SD-T5000 CT-W300 LDV-S802 MDR-XB700 42PG20R-MA 2-P350-p550 EB-GD55 713BM Hearplus 318W LAV71330 XL-UR27H MD 4689 SD4504 L1952T-BF Cappuccino Headset Benq-siemens EF71 PDP-50MXE1 Acerpower SD DT-42PY10X Studioworks 700E TX-NR5007 IP4000 DAV-SB300 WFT 2830 DEH-2020MP DRA-297 WA10B3 SP-60
manuel d'instructions, Guide de l'utilisateur | Manual de instrucciones, Instrucciones de uso | Bedienungsanleitung, Bedienungsanleitung | Manual de Instruções, guia do usuário | инструкция | návod na použitie, Užívateľská príručka, návod k použití | bruksanvisningen | instrukcja, podręcznik użytkownika | kullanım kılavuzu, Kullanım | kézikönyv, használati útmutató | manuale di istruzioni, istruzioni d'uso | handleiding, gebruikershandleiding
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101