Synthesis of the laryngeal source of throat singing using a 2×2-mass model
Ken-Ichi Sakakibara, Hiroshi Imagawa, Seiji Niimi, Naotoshi Osaka
Singing voices have various timbres. Throat singing and some other Asian traditional singing voices have a pressed timbre that is significantly different from the European classic singing voice. In our previous study on throat singing, the vibration of the false vocal folds as well as that of the vocal folds were observed and was found to be essentially due to the pressed timbre. This paper describes a 2×2-mass model as a physical model, defines an adduction parameterization of its parameters, and presents a simulation of vocal fold and false vocal fold vibrations in the larynx. Furthermore, a visual simulator of the laryngeal movements is demonstrated. By using this model, the vibration patterns of the two different laryngeal voices in throat singing (the squeezed and karygraa voices) and the normal pressed voice have been simulated.
The results show the possibility of synthesis of various timbres for singing.
The singing voice has numerous variations of timbre. There are considerable differences, for instance, between European classical singing voice, such as bel canto and German lied, and the Asian traditional pressed singing voices, such as throat singing, Japanese Youkyoku, and Korean Pansori.
The laryngeal source is an essential factor in determining the timbre of the singing voice, especially for pressed quality. In general, the pressed quality is obtained by excessive adduction of the supraglottal structure. The laryngeal adjustments in Asian traditional pressed singing are much different from that in European classic singing [5, 6, 9].
Synthesizing such varying timbres in singing voices requires a flexible laryngeal source model. A glottal waveform model allows us to control its parameters to approximate the perception of voice [8, 10]. On the other hand, a physical model allows us to control its parameters according to the physical and physiological mechanism of laryngeal adjustment. Based on the physiological observations, we have constructed a 2×2-mass model as a physical model which is devised by attaching a two-mass for the false vocal fold to ordinary two-mass model for the vocal folds [3, 10].
In this paper, after summarizing the physiological observations in throat singing, we describe the mechanism of a 2×2-mass model and its adduction parameterization. We also present a visual simulation tool for the model. Finally, using the model, we simulate the laryngeal sources of throat singing and the normal pressed voice.
2 Laryngeal Source in throat singing
2.1 Throat singing
singing is a traditional singing style of people who live around the
The production of the highly pitched overtone is mainly due to the pipe resonance of the cavity from the larynx to the point of articulation in the vocal tract . On the other hand, the laryngeal voice of throat singing has special pressed timbre and supports the generation of the overtone.
The laryngeal voices of throat singing can be classified as squeezed and kargyraa based on the listener’s impression, acoustical characteristics, and the singer’s personal observation on voice production. The squeezed voice is the basic laryngeal voice in throat singing and used as drone. The kargyraa voice is a very low pitched voice that ranges out of the modal register.
2.2 False vocal folds The false vocal folds (ventricular folds) are a pair of soft and flaccid folds which attach to anterolateral surface of the arytenoid cartilages (Fig. 1). While the vocal folds (VFs) have a mechanism that change the stiffness, thickness, and longitude by the muscles (mainly by the action of thyroarytenoid muscle), the false vocal folds (FVFs) are incapable of becoming tense, since they contain very few muscle fibres. The FVFs are capable of moving with the arytenoid cartilages. They are also abducted and adducted by the action of certain laryngeal muscles. In normal phonation, they do not vibrate .
2.3 Physiological observation of laryngeal movements
Here, we summarize the results of the physiological observation of laryngeal movements using simultaneous recording of high-speed digital images, EGG, and sound waveforms in [9, 10].
The common features of the squeezed and kargyraa voices are an overall constriction of the suprastructures of the glottis and vibration of the FVFs. The differences lie in the narrowness of the constriction and the manner of FVF vibration. In the squeezed voice, the FVF vibrates at the same frequency as the VF and both vibrate in the opposite phase. In the kargyraa voice, the FVFs can be assumed to close once for every two periods of closure of the VFs, and contribute to the generation of the subharmonic tone of kargyraa [2, 6, 7, 9, 10].
3 Physical model
3.1 Two-mass model
The VF vibrations are modelled via the two-mass models , which make it possible to simulate the movements of the upper and lower portions of the VFs in different phase. The model parameters are defined as follows. m1, m2: paired masses of the upper and lower portions of the VF; d1, d2: thickness; k1, k2: stiffness; r1, r2: viscous resistances; ζ1, ζ2: damping ratios which satisfy
stiffness of the linear coupling spring for the upper and lower portions, lg: the length of the glottis; Ag1,Ag2: the cross-sectional areas between masses; Ag01,Ag02: the cross-sectional areas between masses in rest.
A tension parameter Q which controls pitch of a synthesized sound is to parameterize several model parameters which are related to physical properties of the VF as follows:
3.2 2×2-mass model
For a physical simulation of the VF and FVF vibrations, we have proposed a 2 × 2-mass model as a self oscillating model of VF and FVF vibrations . The model (Fig. 2) was devised by attaching a two-mass model for FVFs to the ordinary two-mass model for VFs with a laryngeal ventricle space between the models.
The laryngeal ventricle is assumed to be a cylinder and not to be deformed. The mechanical transmission of vibrations between the VFs and FVFs were not considered. The shape of area of vocal tract which have acoustic interaction with the VF vibration is time variable by the FVF vibrations.
Control parameters for
We adopt a two-mass model instead of a one-mass model for the FVF because the FVFs are as thick as the VF and a two-mass model reveals the same movement as a one-mass model does, if kc is set sufficiently large.
3.3 Adduction parameter for the false vocal folds
As stated above, the FVFs contain few muscle fibres and, unlike the VFs, their physical properties essentially do not change. Therefore, it is meaningless to define a tension parameter for FVFs. Hence some other parameterization is necessary.
It is a physiological fact that the FVFs are adducted by the action of certain laryngeal muscles, but it is unclear whether their physiological properties, such mass and stiffness, are changed or not by the adduction. We take into account the changing shapes of the FVF and, as one possible parameterization of the model parameters by introducing an adduction parameter Q’ for
the validity of this parameterization, we must wait for the detailed measurements of physical properties of the FVFs by using fresh excited human larynx.
4 Visual Environment
A visual simulation tool called VibLaVie (vibrated larynx viewer) is implemented on a Windows PC. Fig. 3 shows its main panel. visualization
The default initial values are given, but users can set arbitrary initial parameters using the initial parameter setting panel, after setting the initial parameters, users can also set segmentally linear envelopes that describe time-variable information for parameters. Fig. 4 shows the displacements of the masses, a laryngeal airflow, and a synthesized mouth-output sound obtained by convoluting the laryngeal airflow and vocal tract resonator whose formant-parameters can be also set by users. Fig. 5 shows the VF and FVF vibration visualization panel. Users can see the vibrations in larynx. This visual environment is very useful in simulating the model, which has many complicated parameters and acts as a chaotic complex system.
5.1 Basic parameter setting
We set the initial values of VF parameters as follows:
These constants are the same as the ones in , which was deduced from physiological measurements. We also set the initial values of the parameters for the FVFs and laryngeal ventricles as follows:
These constants are not precisely based on the physiological measurements. However, the longitude and width of the false glottis and thickness of the FVFs were estimated from images and are not far from the real values. It was verified by using MRI that the laryngeal ventricle space exists in throat singing phonation. The vocal tract is assumed to be a uniform pipe, 16 cm long, 5 cm squared in cross-section.
5.2 Results and discussions
We chose several values from 0 to 1.0 cm as an adduction parameters Q’. The results are shown in Fig. 6; for each Q’, horizontal displacements of m1,m2,m’1,m’2 is shown at the top and a laryngeal airflow (volume velocity) Ug is shown at the bottom. In the bottom, the solid line, dashed line, dotted line, and dashed-dotted line show the displacement of m1,m2,m’1, and m’2 respectively.
The normal pressed voice without vibration of the FVFs that is observed when Q’ = 0.05, 0.1. In general, this type of phonation is observed in normal phonation and some Japanese traditional singing voices. The false glottis is somewhat wider than that in throat singing . The simulation based on the 2×2-mass model is in good agreement with the observations. A period-triple kargyraa, in which the FVFs vibrate once every three periods of VF vibration, is observed when Q’ = 0.35. In this pattern, the pitch of the subharmonic tones should be perceived an octave and a perfect fifth lower than that of the basic phonation. Some throat singers are known to be able to sing the period-triple kargyraa. The normal kargyraa vibration occurs when Q’= 0.5. For this vibration, the shape of laryngeal airflow also agreed with the shape of the laryngeal airflow estimated by using inverse filtering
. When Q’= 0.6, vibration is not periodical or might have very long period (> 1 s). When Q’= 0.7, the period-triple kargyraa is observed again. The squeezed voice, in the realm of throat singing, is observed when Q’= 0.85, 1.0.
The difference of the phase between vibrations of the VF and FVF at Q’ = 1.0 is different from that at Q’ = 0.85 . The shape of the simulated laryngeal airflow was in agreement with the estimated laryngeal airflow by inverse-filtering .
From the physiological observation [9, 10], the vibration patterns depend on how close the FVFs are approximated. The squeezed voice vibration was observed in the close approximation, and the kargyraa voice vibration in the middle approximation. The results of the simulation also agree with these physiological observations.
We simulated laryngeal movements for throat singing using a 2×2-mass model. The results were in good agreement with physiological observations. By using the model, it is possible to synthesize various laryngeal voices. As future work, we should measure the realistic physical properties of the FVF and improve the model, and investigate details of this model from the viewpoint of physics and chaos.
We would like to thank Seiji Adachi, Takafumi Hikichi, Kiyoshi Honda, Emi Z. Murano, Johan Sundberg, Sayoko Takano, Niro Tayama, and Masahiko Todoriki for their helpful discussions. We also would like to thank the reviewers for their useful comments.
 S. Adachi and M. Yamada. An acoustical study of sound production in biphonic singing x¨o¨omij. J. Acoust. Soc. Am., Vol.105, No. 5, pp. 2920–2932, 1999.
 L. Fuks, B. Hammarberg, and J. Sundberg. A self-sustained vocal-ventricular phonation mode: acoustical, aerodynamic and glottographic evidences. KTH TMH-QPSR, Vol. 3/1998, pp. 49–59, 1998.
 H. Imagawa, K.-I. Sakakibara, T. Konishi, E. Z. Murano, and S. Niimi. Throat singing synthesis by a laryngeal voice model based on vocal fold and false vocal fold vibrations. Proc. Of Study Group on Musical Info. of IPSJ., Vol. 01-MUS-39, pp. 71–78, 2001. in Japanese.
K. Ishizaka and J. L. Flanagan. Synthesis of voiced sounds from a two-mass
model of the vocal cords.
N. Kobayashi, Y. Tohkura, S. Tenpaku, and S. Niimi. Acoustic and physiological
characteristics of traditional singing in
T. C. Levin and M. E. Edgerton. The throat singers of tuva. Scientific
 P.-A. Lindestad, M. Sodersten, B. Merker, and S. Granqvist. Voice source characteristics in mongolian ”throat singing” studied with high-speed imaging technique, acoustic spectra, and inverse filtering. J. Voice, Vol. 15, No. 1, pp. 78–85, 2001.
 H.-L. Liu and J. O. Smith III. Glottal source modelling for singing voice synthesis. In Proc. ICMC 2000, pp. 90–97. ICMA, 2000.
M. Kumada, M. Todoriki, H. Imagawa, and S. Niimi. Analysis of vocal fold vibrations in throat singing. Tech. Rep. Musical Acoust. of Acoust. Soc. Jpn., Vol. 19, No. 4, pp. 41–48, 2000. in Japanese.
 K.-I. Sakakibara, T. Konishi, K. Kondo, E. Z. Murano, M. Kumada, H. Imagawa, and S. Niimi. Vocal fold and false vocal fold vibrations and synthesis of kh¨o¨omei. In Proc. ICMC 2001, pp. 135–138. ICMA, 2001.
 W. R. Zemlin. Speech and hearing science — anatomy and physiology. Allyn and Bacon, 4th edition, 1998.