Reviews & Opinions
Independent and trusted. Read before buy Sony Dtcza5ES!

Sony Dtcza5ES

Manual

Download (English)
Instruction: After click Download and complete offer, you will get access to list of direct links to websites where you can download this manual.

About

Sony Dtcza5ESAbout Sony Dtcza5ES
Here you can find all about Sony Dtcza5ES like manual and other informations. For example: review.

Sony Dtcza5ES manual (user guide) is ready to download for free.

On the bottom of page users can write a review. If you own a Sony Dtcza5ES please write about it to help other people.

 

[ Report abuse or wrong photo | Share your Sony Dtcza5ES photo ]

Video review

Sony DTC ZA5ES while recording a minidisc played by a MDS JA555ES

 

User reviews and opinions

<== Click here to post a new opinion, comment, review, etc.

No opinions have been provided. Be the first and add a new opinion/review.

 

Documents

doc0

Eurospeech 2001 - Scandinavia
EVALUATION OF CROSS-LANGUAGE VOICE CONVERSION BASED ON GMM AND STRAIGHT
Mikiko Mashimoy , Tomoki Today , Kiyohiro Shikanoy , and Nick Campbellz
Graduate School of Information Science, Nara Institute of Science and Technology 8916-5 Takayama, Ikoma, Nara, 630-0101 Japan
[mikiko-m, tomoki-t, shikano]@is.aist-nara.ac.jp
ATR Information Sciences Division, Kyoto, 619-0288, Japan

nick@slt.atr.co.jp

Abstract
Voice conversion is a technique for producing utterances using any target speakers voice from a single source speakers utterance. In this paper, we apply cross-language voice conversion between Japanese and English to a system based on a Gaussian Mixture Model (GMM) method and STRAIGHT, a high quality vocoder. To investigate the effects of this conversion system across different languages, we recorded two sets of bilingual utterances and performed voice conversion experiments using a mapping function which converts parameters of acoustic features for a source speaker to those of a target speaker. The mapping functions were trained using bilingual databases of both Japanese and English speech. In an objective evaluation using Mel cepstrum distortion (Mel CD), it was conrmed that the system can perform cross-language voice conversion with the same performance as that within a single-language.

1. INTRODUCTION

Since voice conversion allows mapping of any target speakers voice after training using a small number of source speaker utterances (roughly 50-60 sentences), it has a potential for various applications. Our goal is to capture not only the target speakers voice and speaking style, but also to convert across language pairs which the original speaker may not be capable of. This method would have potential applications in e.g., computer aided language learning system and interpretation systems, although cross-language voice conversion has not yet been well researched. For cross-language conversion, the quality of converted speech should sound as if the target speaker had spoken the other language, and the speaker individuality should also be preserved across different languages. An attempt was made by Abe et al [1] in the late 1980s between Japanese and English using a codebook mapping
method [2], which became the typical voice conversion algorithm. Recently, an algorithm based on the Gaussian Mixture Model (GMM) has been proposed by Stylianou et al [3],[4]. The advantage of this method is that the acoustic space of a speaker is modeled by the GMM, so that acoustic features are converted from a source speaker to a target speaker continuously. The codebook mapping method uses a discrete representation through vector quantization. Voice conversion algorithms based on the GMM method were applied to a high quality vocoder STRAIGHT (Speech Transformation and Representation using Adaptive Interpolation of weiGHTed spectrum) (Kawahara et al [5],[6]), by Toda et al [7],[8]. From their objective and subjective evaluation report, this system succeeded to produce high quality of converted voice within a single-language. The purpose of the present paper is to apply crosslanguage voice conversion using the GMM and STRAIGHT-based system and to evaluate the effect of it as a rst step towards producing high quality voice conversion between different languages. In the present study, we are working on the assumption that this voice conversion system will be applied to a practical pronunciation learning system. In most language learning systems, there is only a single language dataset for the learner (i.e., that of target speaker). However, we collected a bilingual female speakers databases of both source and target texts for investigation into whether the differences between the two languages has an effect on the converted voice.

2. VOICE CONVERSION ALGORITHM
In our method, p-dimensional time-aligned acoustic features of a source speaker and a target speaker determined by Dynamic Time Warping (DTW) are assumed as below, where T denotes transposition. source speaker : xf[x0 ; x1 ; : : : ; xp01 ]T g, target speaker : y f[y0 ; y1 ; : : : ; yp01 ]T g,
2.1. GMM-based voice conversion algorithms In the GMM algorithm, the training data size and the number of trainable parameters are variable [3],[4]. The probability distribution of acoustic features x can be described as m m X X p(x) = i N (x ; i , i ); i = 1; i 0; (1) i=1 i=1
Table 1: Recording conditions. Recording place Microphone Recording equipment Sampling frequency Number of sentences Sound treated room SONY C355 DAT SONY DTC-ZA5ES 48000 Hz 60
where N (x; ; ) denotes the normal distribution with the mean vector and the covariance matrix. i denotes a weight of class i, and m denotes the total number of the Gaussian mixtures.
2.2. Conversion of acoustic features Conversion of the acoustic features of the source speaker to those of the target speaker is performed by a Mapping Function, dened as follows,
high quality vocoder developed to meet the necessity of a exible and high quality analysis-synthesis [5],[6]. It consists of pitch adaptive spectrogram smoothing and fundamental frequency extraction (TEMPO), and allows manipulation of speech parameters such as vocal tract length, pitch, and speaking rate.

F (x) = E [yjx]

m X i=1
4. IMPLEMENTATION OF THE CONVERSION ALGORITHMS
The GMM-based voice conversion algorithm has been implemented in STRAIGHT by Toda et al [7], [8]. In their system, acoustic features are described by the cepstrum of the smoothed spectrum analyzed by STRAIGHT. In our work, however, we used Mel cepstrum because of its closeness to human auditory perception. The prosodic characteristics have not been considered yet but the fundamental frequency (F0 ) of the source speaker is adjusted to match the target speakers F0 in average of log-scale for the source information. The adjusting function is described as follows, 0 f0 = y 2 f0 (4) x where f0 and f0 denote log scale F0 of source speaker and converted speech of source speaker, and x and y denote mean log scale F0 of source speaker and target speaker.

; y denote mean vectors of class i for the where x and i i source and target speakers. 6xx is covariance matrix of i class i for the source speaker. 6yx is the cross-covariance i matrix of class i for the source and target speakers. These matrices are diagonal.

j =1 j

; hi (x) = Pm i N (x;(i 6;i 6) ) N x; x xx

x xx j j

hi (x)[y + 6yx (6xx )01 (x 0 x )] ; i i i i
2.3. Training of The Mapping Function
yx In order to estimate parameters such as i ; x ; y ; xx ; i , i i i T, the probability distribution of the joint vectors z = [x yT ]T for the source and target speakers is represented by the GMM whose parameters are trained by joint density distribution [9]. Covariance matrix z and mean vector i z of class i for joint vectors can be written as i

xy xx = 6iyx 6iyy 6i 6i

x = iy i

5. EXPERIMENT

(3) 5.1. Speech Databases Bilingual (Japanese and English) speech utterances of two Japanese female speakers were recorded, sampled at 48000 Hz. Each speaker has long experience living in abroad or having learned English from a native speaker since before the age of seven. The speakers read 60 bilingual sentences selected from the ATR phonetically balanced sentences [10]. After down sampling to 16000 Hz, the 50 sentences were used for training data sets, and the remaining 10 were used for evaluation sentences for the converted utterances. Table 1 shows other recording conditions.
Expectation maximization (EM) is used for estimating these parameters.
3. ANALYSIS-SYNTHESIS METHOD
In a voice conversion system, not only the voice conversion algorithm but the quality of the analysis-synthesis method determines the quality of the synthesized voice. Therefore, choosing a reliable analysis-synthesis method is of importance. In our work, STRAIGHT was employed as the analysis-synthesis method. STRAIGHT is a very

Sorce speaker utterances

English Japanese

(Evaluation sentences)

3. Converted voice + conversion rules of a source speaker 2. Voice conversion System based on GMM & STRAIGHT

Japanese

(Training sentences)
Target speaker utterances

1.Training the acoustic features from a source to a target
Evaluation (Evaluation sentences)
Japanese (Evaluation sentences)

Evaluation

Figure 1: Diagram of cross-language conversion procedure, English converted voice trained by Japanese.
Figure 2: Diagram of single-language conversion procedure, Japanese converted voice trained by Japanese.
5.2. Voice Conversion In order to investigate the differences between voice conversion across different languages, both Japanese and English trained mapping functions were used for learning the source and target speakers parameters for conversion of acoustic features. Therefore, the system was tested on 4 types of female to female converted voice, (1) English (Eng) converted voice trained by Japanese (Jpn), (2) Jpn converted voice trained by Jpn, (3) Eng converted voice trained by Jpn, and (4) Jpn converted voice trained by Jpn.The procedure for producing utterances of (1) Eng converted voice trained by Jpn and (2) Jpn converted voice trained by Jpn are depicted in Figure 1 and Figure 2. The procedures of voice conversion trained by English is the same. According to the previous work of Toda et al [7],[8], the relation between the number of GMM classes and cepstrum distortion (CD) saturates at a certain number of classes, approximately 64, so we used 64 GMM classes. Other analysis parameters were shown in Table 2. Note that the mean F0 of speakers is different.
Table 2: Analysis parameters. Analysis Window sampling frequency Shift length Number of FFT points Number of the GMM class Training sentences Evaluation sentences mean F0 (source speaker) mean F0 (target speaker) Gaussian 16000 Hz 5 ms Jpn: 270.0 Hz Eng: 248.6 Hz Jpn: 227.6 Hz Eng: 233.9 Hz

6. EVALUATION

To evaluate speaker individuality objectively, a Mel cepstrum distortion (Mel CD) function was calculated between the converted speech and the target speech. Mel (conv) (tar) and mci CD is calculated as below, where mci denote Mel CD coefcients of converted voice and target voice, respectively.
MelCD = 10=ln(mc(conv) 0 mc(tar) )2 : i=1 i i

If the value of Mel CD is smaller, speaker individuality of converted voice is closer to that of target speaker. Figure 3 shows the results of Mel CD whose conversion rules were trained by 50 English and 50 Japanese sentences. Table 6 shows the values numerically. We can see from the results in the table that characteristics of converted speech were also improved within the cross-language voice conversion. However, since there are normally large spectral differences between Japanese and English speech sounds, the results of the converted speech trained by the same language show closer values (i.e. English converted voice trained by English and Japanese converted voice trained by Japanese is still preferred). In addition, for producing a higher quality converted voice, we must consider the voice quality differences between Japanese and English of the same speaker.
Original source voice - target voice Conv voice trained by Jpn - target voice Conv voice trained by Eng - target voice

4.0 3.5

9. References
[1] M. Abe, K. Shikano and H. Kuwabara, Statistical analysis of bilingual speakers speech for crosslanguage voice conversion, J. Acoust. Soc. Am. 90(1), pp. 7682, July 1991 [2] M. Abe, S. Nakamura, K. Shikano, and H. Kuwabara, Voice conversion through vector quantization, J. Acoust. Soc. Jpn. (E), vol. 11, no. 2, pp. 7176, 1990.

Mel CD [dB]

3.0 2.5 2.0 1.5 1.0

English

[3] Y. Stylianou, O. Cappe, E. Moulines, Statistical methods for voice quality transformation, Proc. EUROSPEECH, Madrid, Spain, pp. 447450, Sept. 1995. [4] Y. Stylianou, O. Cappe, A system voice conversion based on probabilistic classication and a harmonic plus noise model, Proc. ICASSP, Seattle, U.S.A., pp. 281284, May 1998.
[5] H. Kawahara, Speech representation and transformation using adaptive interpolation of weighted spectrum: vocoder revisited, Proc. ICASSP, Munich, Germany, pp. 13031306, Apr. 1997. [6] H. Kawahara, I. Masuda-Katsuse, A. de Cheveign , e Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds, Speech Communication, vol. 27, no. 34, pp. 187207, 1999. [7] T. Toda, J. Lu, H. Saruwatari, K. Shikano, STRAIGHT-based voice conversion algorithm based on gaussian mixture model, Proc. ICSLP, PAe(09-10)-K-05, pp. 279-282, Beijing, China, Oct. 2000. [8] T. Toda, H. Saruwatari, K. Shikano, Voice conversion algorithm based on Gaussian mixture model with dynamic frequency warping of STRAIGHT spectrum, Proc. ICASSP, Salt Lake City, U.S.A., May 2001. [9] A. Kain, and M.W. Macon, Spectral voice conversion for text-to-speech synthesis, Proc. ICASSP, Seattle, U.S.A., May 1998. [10] M. Abe, Y. Sagisaka, T. Umeda and H. Kuwabara, Speech Database Usrs Manual, ATR Technical Report (in Japanese)

Figure 3: Result of the objective evaluation experiment of speaker individuality.
Table 3: Values of Mel cepstrum distortion. Jpn (Hz) Original Source - Target converted voice - Target (trained by Jpn) converted voice - Target (trained by Eng) 3.33 2.32 2.59 Eng (Hz) 3.53 2.75 2.52

7. CONCLUSION

In this paper, we evaluated the effect of applying crosslanguage voice conversion to a system based on GMM (Gaussian Mixture Model) and STRAIGHT. From the results of the objective evaluation using Mel cepstrum distortion, it was found that the system performs cross-language voice conversion nearly equivalent to that of singlelangu- age conversion. This indicates that it has a possibility to be employed to a language learning system. For the next step, the problem of mean fundamental frequency (F0 ) differences between Japanese and English utterances of the same speaker and variation of the quality of voice must be considered, as this will cause inconsistency in perception of converted voice sounds. Future work will include developing a method of perceptual evaluation which takes such differences into account.

8. ACKNOWLEDGMENT

This work was partly supported by JST/CREST (Core Research for Evolutional Science and Technology) in Japan.

doc1

Acoust. Sci. & Tech. 26, 4 (2005)
Auditory feedback control during a sentence-reading task: Eect of others voice
Akira Toyomura1; and Takashi Omori2; y
Graduate School of Engineering, Hokkaido University, Kita 13 Nishi 8, Sapporo, 0608628 Japan 2 Graduate School of Information Science and Technology, Hokkaido University, Kita 14 Nishi 9, Sapporo, 0600814 Japan ( Received 21 December 2004, Accepted for publication 7 February 2005 ) Keywords: Delayed auditory feedback, Transformed auditory feedback, Speech motor control, Stuttering PACS number: 43.70.h [DOI: 10.1250/ast.26.358] 1. Introduction Speech perception plays an important role in speech production. When we speak, we always control our voice through speech perception in order to adapt our speaking to unpredictable environmental changes. For example, according to the Lombard eect [1], pitch or loudness increases under a noisy condition. An articial change such as a delayed auditory feedback (DAF) or a transformed auditory feedback (TAF) experiments also gives rise to various behaviors of subjects. It is known that in the DAF experiment, where speech is delayed for a short time (50300 ms) before it is replayed to the speaker, normal subjects fall into a stutteringlike condition [2]. However, in the case of stutterers, stuttering is suppressed with DAF. In the TAF experiment, where the pitch of the feedback voice is shifted, the pitch frequency is feedback-controlled with fast (150 ms) and slow (!300 ms) responses [35]. Stutterers also demonstrate dierent performance from normal subjects [6]. These ndings indicate that complex feedback controls, including multiple processing paths, play a part in speech motor control through speech perception. A phenomenon called the chorus eect, which is related to stuttering, is known [7]. It is a phenomenon in which stuttering is suppressed when the stutterer speaks the same sentences in synchrony with others. The mechanism of this eect is not known, however, there are some possible explanations for this eect. One is that the brain of stutterers forms a specic structure in the feedback control of articulation that does not exist in normal subjects, and it is eective only under the condition described above. A second simpler explanation which is not limited to stutterers is that, in general, the feedback control of articulation is activated only on ones own voice; it does not function when others voice is input to the auditory system. As a result, stutterers who have some problem in their articulation feedback control system do not fall into stuttering when others voice is given. In the latter case, we can predict that when a parameter of the feedback voice is articially shifted suciently to change ones perception of the voice, the feedback control of articulation will be suppressed. One of the parameters is pitch. It is known that a frequency altered feedback (FAF) medical treatment,
where the pitch of the feedback voice is shifted for stutterers, has an inhibitory eect on stuttering [8]. Up to now, control models of pitch [4,5] or brain processing models of speech [9] have been proposed from the viewpoint of speech motor control. However, brain processing that explains the chorus eect, such as the two possibilities above, has not been suciently discussed or examined. The assumption that feedback control of articulation acts only on ones own voice suggests the existence of another process of speech motor control that has not been considered until now. Consequently, we report the results of an experiment in which DAF and TAF were combined to examine the nature of the interaction between articulation control and pitch-shift perception. Following that, we discuss the mechanism of feedback control on articulation. Experiment It has been reported from conventional DAF experiments that DAF is most eective at delays under 200 ms. However, if the feedback control of articulation is suppressed in proportion to the pitch shift, the DAF eect will be reduced in proportion to the pitch shift, independently of the delay time. 2.1. Experimental method Eight healthy, male native Japanese speakers (age range 2226 years, mean SD 23:5 1:9) participated as subjects. They had no speech or voice disorders, and were not trained as singers. Each subjects voice was recorded with a microphone (SHURE SM58). The delay and pitch were converted through an eector (ZOOM RFX-2000) and fed back to the subjects through headphones (AKG K271S). The parameters of pitch and delay were manipulated using a PC (SONY VAIO PCVRX72K) through MIDI. The voices were recorded on a DAT (SONY DTC-ZA5ES). Pink noise at 75 dB SPL was generated by an analyzer (PHONIC PAA-2) and mixed with the feedback voice by a mixer (PHONIC PM602FX). Each subject was instructed to read a Japanese junior high school language textbook. Two conditions were prepared. Under the normal condition, subjects read the textbook aloud while listening to an unaltered feedback voice through headphones. Under the altered condition, they read the same text aloud while listening to an altered voice. Under the altered condition, the sentences were divided into blocks, and each block was allocated dierent delay-time and pitch-shift 2.

e-mail: toyo@complex.eng.hokudai.ac.jp e-mail: omori@complex.eng.hokudai.ac.jp
A. TOYOMURA and T. OMORI: AUDITORY FEEDBACK CONTROL DURING A SENTENCE-READING TASK Table 1 Mean SD of mora ratio for 15 s. Pitch (halftone) 1:00 0:00 1:18 0:14 1:41 0:12 1:47 0:20 1:31 0:17 1:44 0:28 1:44 0:1:28 0:20 1:20 0:07 1:34 0:13 1:37 0:18 1:27 0:23 1:30 0:11 1:26 0:1:09 0:10 1:22 0:19 1:27 0:13 1:28 0:11 1:24 0:14 1:15 0:11 1:32 0:1:16 0:12 1:16 0:09 1:27 0:19 1:19 0:17 1:14 0:09 1:13 0:13 1:16 0:1:10 0:09 1:15 0:12 1:20 0:20 1:15 0:09 1:25 0:14 1:10 0:09 1:19 0:14

Delay (ms)

parameters. There were 35 combinations of altered-condition parameter settings: delay times of 0, 50, 100, 200, 300, 400 and 700 ms; pitch shifts of 0, 2, 4, 6 and 12 halftones. Although the delay time was randomly chosen, the pitch parameter was increasingly shifted from 0 halftones in order to avoid a change in the subjects threshold for pitch shift perception. The subjects were instructed to read the text at their preferred pace, clearly and with as little reading error as possible. During the experiment, they were instructed to take a break and drink water when necessary. The experiment was carried out over two days, the normal-condition experiment being conducted on the rst day and the altered-condition experiment on the second day. During those two days, the subjects read about 12,000 morae. To analyze the subjects performance, we extracted the recorded voices for the same sentences under both conditions from the DAT. In order to quantify the eect of the parameter settings, we aligned the beginnings of the same sentences under the two conditions, and counted the morae number of the sentences that were read in 15 s. As a performance index of the parameter settings, we calculated the mora ratio, in which the number of morae under the normal condition is divided by the number of morae under the altered condition. When a subject repeated the same mora, it was counted as 1 mora. As a result, when the subject read the word repeatedly or slowly due to the delay and the pitch-shift parameter setting, the mora ratio become larger than 1. 2.2. Results Table 1 shows the measured mora ratio for each parameter setting. In the 0 halftone line, which is the conventional DAF experimental setting, the 200 ms delay shows the largest value. The mora ratios for 400 ms and 700 ms also show large values. At these settings, however, subjects tended to intentionally stop reading the text in order to avoid overlap of their immediate voice and the feedback voice. This tendency led to larger mora ratio values. Therefore, we cannot discuss the relationship between DAF and pitch-shift setting at these parameter settings. The mora ratio at each delay time approached 1 in proportion to the amount of pitch shift. At the 0 halftone line, the value clearly increased from that at 0 ms delay and reached a peak at 200 ms, though at or greater than the 6 halftone line, the value remained nearly constant. In order to evaluate the eect of each delay time level in each pitch shift line, two-way ANOVA and one-way ANOVA were applied (Table 2). In the two-way ANOVA, there was a signicant dierence in the interaction between delay and pitch. The main eects of pitch (p < 0:005) and delay (p < 0:005) were

Table 2 ANOVA results for various delay times or pitch shifts. F Delay Pitch 0 halftone line 2 halftone line 4 halftone line 6 halftone line 12 halftone line 200 ms delay line
P 0.004 <0.001 0.569 0.029 0.506 0.316 0.003

Judge ** ** *

2.02 6.93 0.81 2.60 0.89 1.21 4.88

p < 0.005,

p < 0:05.

1.7 1.6 1.5

Mora ratio
1.4 1.3 1.2 1.+0 +2 +4 +6 +12

Pitch shift (halftone)

Fig. 1 95% condence intervals along 200-ms-delay line.
also signicant. The one-way ANOVA was applied on the delay time range from 0 to 700 ms at each halftone line. According to the result, at the 0 halftone line, there was a signicant dierence in delay time. However, at the 2, 6, and 12 halftone lines, there was no signicant dierence in delay time. The mora ratio also approached 1. These results suggest that the DAF eect decreases when the pitch shift increases in the normal subject case. Figure 1 shows the eect of the pitch shift on the 200-msdelay line. The DAF eect is not immediately suppressed when the pitch shift increased; it is, however, gradually suppressed in proportion to the amount of pitch shift. This result suggests that the feedback controller of articulation is suppressed in proportion to the pitch shift. Also, the interval at 12 halftones is narrower than that at 0 or 2 halftones, indicating that normal DAF with a 200 ms delay does not have a uniform eect on all subjects. However, DAF with a
Acoust. Sci. & Tech. 26, 4 (2005) sucient pitch shift has a minor eect on most of the subjects. On the 200-ms-delay line, ANOVA also revealed signicant dierences (Table 2). Although it was not a primary observation in this experiment, most of the subjects displayed the same phenomenon as the Lombard eect [1]. Interestingly, on the 200-msdelay line, the voice pitch under 2 halftone shift was higher than that under 12 halftones in many cases. This outcome suggests that subjects did not interpret the feedback voice to be their own voice or that of others under the 2 halftone condition, although they did make an eort to monitor and control their own voice. More detailed investigation is necessary regarding this phenomenon. Discussion The experimental results showed that the DAF eect is suppressed in proportion to the pitch shift in normal subjects, suggesting a process where in feedback control of articulation is suppressed when the pitch of the feedback voice is changed. This study focused on normal subjects. However, there is a possibility that the same mechanism is also valid for the chorus eect or FAF for stutterers. That is, stuttering is suppressed when a voice that is dierent from ones own voice is fed back to the auditory system. This phenomenon can be called the others voice eect. The eect implies a critical concept that stuttering is related to the feedback control system of speech. In this study, we only measured the eect of pitch shift, though another method such as recording the eect of a decrease in pitch, or a technique such as a formant change, might be adopted for the same purpose. Further study is necessary regarding this point. We can assume some possibilities for a concrete neural circuit suppressing the feedback control of articulation. The rst is a ltering-like mechanism that can distinguish ones own voice from others voices within the articulation control loop, and then suppresses the feedback controller when it judges the voice to be that of another. The second is that the feedback loop of pitch exists independently of the feedback loop of articulation, and that there is an inhibition circuit from the former loop to the latter loop. When the pitch shift increases, the feedback loop of articulation is suppressed by the pitch loop, resulting in uent speech even under the DAF condition. Howell and Sackin suggested that subjects performing the DAF task recognize the feedback voice as noise, because speakers increase their vocal level in a way that resembles the Lombard eect when DAF sounds are amplied [10]. However, the feedback voice during the DAF task has a specic vocal spectral structure, and the spectral structure diers greatly from that of real noise. In addition, the voice has sucient unique characteristics for subjects to distinguish their own voice from those of others. Here, we introduce voice levels to characterize the process of articulation feedback control between ones own voice and noise. Actually, according to our result, there is a possibility that the subject changes the interpretation of the two voices on the basis of the amount of pitch shift. First, under the parameter setting of 12 halftones, it is highly plausible that subjects identies the feedback voice to be others voice or noise, 3. because the mora ratio is close to 1 (Fig. 1). In this setting, others voice and noise seem to have the same eect. However, under normal DAF conditions, the feedback voice always interferes with speech production. We can interpret that subjects recognized their own voices more strongly as compared to the case of 12 halftone shift. Under the settings between 0 and 12 halftone shifts, the recognition of the voice changed from their own voice to that of other in proportion to the pitch shift. Unfortunately, we cannot yet identify the neural circuit to explain the chorus eect or FAF on the basis of only our results. Future work will involve the identication of the mechanism through various means. One is to conduct DAF and TAF tasks in combination with a noninvasive brainmeasuring techniques such as fMRI, EEG or MEG. If the region for each function in the brain is identied, we may get closer to resolving the above issues. 4. Conclusion In this study, we rst highlighted the possibility of brain neural circuits that distinguish ones own voice from those of others. Second, we reported experimental results based on a combination of DAF and TAF settings for normal subjects. The results showed that the DAF eect decreased in proportion to the pitch shift. Finally, we discussed some possibilities that can explain this result and the chorus eect. Acknowledgement We thank Dr. Koichi Mori at the Research Institute of National Rehabilitation Center for the Disabled, Japan, for many helpful suggestions. This work was supported by a Sasakawa Scientic Research Grant from The Japan Science Society. References

[1] H. L. Lane and B. Tranel, The Lombard sign and the role of hearing in speech, J. Speech. Lang. Hear. Res., 14, 677709 (1971). [2] B. S. Lee, Eects of delayed speech feedback, J. Acoust. Soc. Am., 22, 824826 (1950). [3] H. Kawahara, Interaction between speech production and perturbations on fundamental frequencies, J. Acoust. Soc. Jpn. (E), 15, 201202 (1994). [4] H. Kawahara, H. Kato and J. C. Williams, Eects of auditory feedback on F0 trajectory generation, Proc. 4th Int. Conf. Spoken Language Processing, pp. 287290 (1996). [5] T. C. Hain, T. A. Burnett, S. Kiran, C. R. Larson, S. Singh and M. K. Kenney, Instructing subjects to make a voluntary response reveals the presence of two components to the audiovocal reex, Exp. Brain Res., 130, 133141 (2000). [6] Y. Sato, K. Mori and Y. Fukushima, Temporal characteristics of fundamental frequency control by auditory feedback and its application to stuttering evaluation, Tech. Rep. Psychol. Acoust. Acoust. Soc. Jpn., SP2001, 148, pp. 2530 (2002). [7] W. Johnson and L. Rosen, Eects of certain changes in speech patterns upon the frequency of stuttering, J. Speech Disord., 2, 101104 (1937). [8] R. J. Ingham, R. J. Moglia, P. Frank, J. C. Ingham and A. K. Cordes, Experimental investigation of the eects of frequency-altered auditory feedback on the speech of adults who stutter, J. Speech Lang. Hear. Res., 40, 361372 (1997).
A. TOYOMURA and T. OMORI: AUDITORY FEEDBACK CONTROL DURING A SENTENCE-READING TASK [9] F. H. Guenther, A neural network model of speech acquisition and motor equivalent speech production, Biol. Cybern., 72, 4353 (1994). [10] P. Howell and S. Sackin, Timing interference to speech in altered listening conditions, J. Acoust. Soc. Am., 111, 2842 2852, (2002).

 

Tags

Media 200 Calybox 120 TX-W28d2DP Vario KH 2302 DCR-SX40E WF8750LSW XSA L105C Finepix A200 Money TS2GPF810W SNC-RZ50P Htfb 85 Dimage XG Amplifier HT-THX25R KV-XR29m50 LE32M86BC DV3250 PRO HD DCP-7045N SE-70 GA-P55-us3L Review FC9222 DB346MP NV-HV3G F5D8011 Kapten Only ZWF365 VGN-AR51E DZ5080 SC-AK770 KDL-46WE5 Delta2 GPS 100 Axis 207 Lighting OT-606 Motorola V190 DMC-TZ3 AVX-7600 Htr5000 Locator HT486 M7VKQ DLS PS6 KX-TS600EXW EWF1230 DMC-LC33 Yamaha US-1 DUO-V33 Samson Smix Generation Motorazr2 V8 Scenic 860 MP2800 TLU-42643B S1043 BU-200 NWZ-S544BLK Dmctz7 CS60 2 GC700 XR-P310 RH2T80 Dmpbdt100 Micro 331PH Touch Hd1 DTB-P850Z 41819 3crwdr101B-75 Manual PSP-SCT3 K7VT4-4X CD-67 GA-G31m-es2L 2300DTN N92-1 Gpsmap 545 Grandam 1995 DX-7500 F88080IM Devireg 850 MY301X DVX392 LV2785 WBP54G EWF14470W FE-200 CQ-VD7005U QSG751 PT-LC80U FR-7XB MR-14EX 3 2 V40-80 HDR-TG5V HT-C306hla0

 

manuel d'instructions, Guide de l'utilisateur | Manual de instrucciones, Instrucciones de uso | Bedienungsanleitung, Bedienungsanleitung | Manual de Instruções, guia do usuário | инструкция | návod na použitie, Užívateľská príručka, návod k použití | bruksanvisningen | instrukcja, podręcznik użytkownika | kullanım kılavuzu, Kullanım | kézikönyv, használati útmutató | manuale di istruzioni, istruzioni d'uso | handleiding, gebruikershandleiding

 

Sitemap

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101