**The Laryngeal Flow model for Pressed-Type Singing Voices**

**Ken-Ichi Sakakibara, Hiroshi Imagawa, Seiji Niimi, Naotoshi Osaka 2006**

**Abstract**

Asian traditional
pressed-type singing voices are different from the European traditional singing
voice in their timbre and voice production mechanism. In throat singing, the
ventricular folds and true vocal folds vibrate, resulting in the generation of
the special laryngeal voice. On the other hand, in some other pressed-type
singing voices, such as Japanese Min-yoh, the ventricular folds only
approximate but do not vibrate.

We propose a new
laryngeal flow model incorporating the effect of the ventricular fold vibration
and laryngeal ventricle resonance. The model is a combination of the known
glottal airflow model (R-model), the laryngeal ventricle resonance (Helmholtz
resonator), and the modulation of ventricular fold vibration. We will also
demonstrate the relation between model parameters and voice quality. The
results show that the proposed model is effective for synthesizing the
pressed-type singing voices.

**1. Introduction**

Non-interactive parametric
glottal models assume that there are no interactions between the glottal source
and vocal tract [2]. The glottal source is described by using mathematical
equations. Such models are very effective in speech synthesis and coding and
therefore have been used in many studies. The R-model [8] and LF-model [3] have
become reference models for this type of model.

All of these models
assume that the laryngeal voice source is determined by vocal fold vibratory
patterns and intend to control voice quality by changing vocal fold vibratory
parameters, such as the open quotient (OQ), speed quotient (SQ), closing
quotient (CQ), and amplitude quotient (AQ) [1]. However, in throat singing, the
vibration of the ventricular folds (VTFs) (also referred to as the false vocal
fold) and strong constriction of the supraglottic structure are observed [11],
and in some Asian traditional pressed-type singing, such as Japanese Min-yoh,
the constriction of the supraglottic structure is also observed, though the
VTFs do not vibrate [5]. Therefore, for synthesis of various styles of singing
voices, besides the vocal fold vibration, the effects of the VTF vibration and
resonance of the laryngeal ventricle must be considered.

In this paper, we
propose a new laryngeal model based on glottal flow, laryngeal ventricle
resonance, and the modulation of the VTF vibration. As the
laryngeal source for the source-filter synthesis, the proposed model is able to
control various timbres of singing voices.

**2. The Laryngeal
Flowmodel With Ventricular-Fold Vibratory Modulation**

__2.1. VTF-modulation
model__

VTF vibration is observed
in various types of phonation. In throat singing, both drone and kargyraa voice
phonations are always accompanied by VTF vibrations, as well as vocal fold (VF)
vibrations. In the drone voice, the ventricular folds vibrate in the same
period as the VFs, and in the kargyraa voice, the VTF vibrate in an integer
multiple (usually double or triple) period of the VFs [4, 6, 11]. The results
of a simulation using a 2x2-mass model suggest the possible vibratory patterns
of the VFs and VTFs [9, 11].

Here, we use
“laryngeal flow (source)” to mean the airflow through the VTF slit, and
“glottal airflow (source)” to mean the airflow through the slit of the VFs. The
laryngeal flows of drone and kargyraa for different two singers are shown in
Fig. 1. These flows were obtained from recorded sounds using an inverse-filter
analysis. We marked five poles on spectrum in the range from 0 to 5 kHz,
constructed the inverse-filter, and manually adjusted it to make the result
smooth. By combining the results of high-speed images, EGG waveforms, and these
inverse-filtered laryngeal sources, we concluded that, in throat singing, the
VTF vibration is indispensable for the generation of the laryngeal flow.
Therefore, modelling the laryngeal flow in throat singing requires a new
laryngeal model that includes the effect of VTF vibration.

The VTF-modulation
model ũ (t) is simply defined as follows:

The block-diagram of the
model is depicted as shown in Fig. 2. In this paper, we choose a simple R-model
[8] for the glottal flow. The R-model is described as follows:

where α is amplitude,
Tp opening time, Tn closing time, and To period. All of these variables are in
R>0. The open quotient (OQ) is written as (Tp + Tn) / T0

The vibratory patterns
of the VTFs were observed using the

high-speed images and seem
to be not exactly sine-shape [7, 10,

11]. However, here we
define the VTF-modulation function M (t) by multiplication by constant M of the
false glottal area function A’g. A’g. We also define as a sine function:

where α’ represents
the amplitude of the VTF vibration, Ag’0 the area between the VTFs at rest,
ώ the frequency of the VTF vibration, and θ’ the phase difference of
the VTF vibration from VF vibration. All of these are in R>0. Physiological observations and the simulation
using 2x2-mass model suggest that the periods of the VF and VTF vibration
satisfy 2π/ώ = nT0.where n ε Z>0.

**2.2. VTF-modulation and
LVT-resonance model**

The laryngeal
ventricle is the space between the VFs and VTFs. When the VTFs are strongly constricted,
it seems the effect of this small space on the laryngeal voice can not be
ignored. The physical model simulation suggests that some acoustic effects
occur around 2000 Hz [9, 11]. The inverse-filtered laryngeal voices of throat
singing have some ripples (Fig. 1), which almost agree with the physical model
simulation results, [7], hence, some appropriate model with laryngeal ventricle
resonance is required. Fig. 3 shows spectra of the drone voices of two
different singers.

A block diagram of our
proposed model (VTF-modulation and LVT-resonance model) is shown in Fig. 4.

The model was obtained
as follows: The glottal airflow is convoluted with the time-variant laryngeal
ventricle resonator depending on the VTF vibration, and modulated by the vibration
of the VTFs.

We denote the resonator by
the laryngeal ventricle by h [t] (z). Then, the laryngeal voice with the
laryngeal ventricle and VTF modulation is described as:

We realize h [t] (z) as a time-varying
one-pole filter. We calculate the resonance frequency of the laryngeal
ventricle, i.e. the frequency of the pole of h [t], by means of a Helmholtz
resonator. Let Fv (t) be the resonance frequency, d’ be thickness of the VTF,
and Vv be volume of the laryngeal ventricle. Then,

Where c is the sound
velocity, 3.53 x 10 cubed cm/s. To permit control flexibility, we define the
bandwidth of the resonance by the multiple of variable K, which changes
depending on phonation types, and the bandwidth as a Helmholtz resonator.

The resistance Rv (t),
inductance Lv (t), conductance G, and capacitance C satisfy the following
equations.

Where ω := 2π/T0
is the frequency of the VF vibration, and dv the thickness of the laryngeal
ventricle. The constants are set as follows: the density of air p = 1.14 x 10³
g/cm³ the viscosity μ = 1.86 x 10-4 dyn. s/cm² ;the adiabatic gas constant
ŋ = 1.4; and the specific heat ξ = 0.24 cal/gm . degree.

**3. Acoustical
Characteristics**

__3.1. VTF-modulation
model__

We study the effect of the
phase difference between the VF and VTF vibrations. In the equation

we fix A’g0 =
0.5 max u (t), α’ = 0.35 max u (t). For u (t), we also set T0 = 8 ms,
Tp/T0 = 0.42, Tn/T0 = 0.18, and hence OQ = 0.6.
As these settings, the spectral tilt of u (t) is close to 12 dB. We also
set ω’ = ω, i.e. study the laryngeal flow, such as the drone voice of
throat singing. The laryngeal flows of various θ’ are as shown in Fig. 5.

The EGG waveforms and
high-speed images of the same subjects in Fig. 1 suggested that the phase delay
of the VTF vibration to the VFs should be around π/4, i.e. θ’ = -
π/4 [11]. This value is also supported by its frequent appearance in
physical model simulation [9]. In Fig. 5, the laryngeal flow for θ’ s shows
the similar characteristics of the drone in Fig. 1 and the opening duration is
relatively less than the closing duration.

Fig. 6 show the spectral
envelops of the laryngeal flows for different θ’. Among θ’ s in Fig.
6, the spectral tilt is the largest when θ’ = - π/4 (-16 dB/octave), smallest when θ’ = -
π/2 (- 12dB/octave).

__3.2. VTF-modulation
and LVT-resonance model__

We set A’g0 =
0.10 cm and α’ = 0.05 cm. We normalize u(t) by multiplying some real
positive value and assume u(t) as the glottal area function. We set the maximal
glottal area max u(t) to 0.2 cm². We set the thickness of the VTF d’ to 1.0 cm,
the cross sectional area of the laryngeal ventricle to 1.5 cm², and the depth
of the laryngeal ventricle to 0.5 cm, K = 20 in Eq. (7). We used these values
for calculation of the H[t](z). The other values are the same as above. The
laryngeal flows with LVT resonance of various θ’ s are as shown in Fig. 7.

In all cases, ripples
are observed after the closure of the glottis. Fig. 8 shows spectra of two
flows. The effect of the VTF resonance is observed around 2000 Hz. This feature
is observed in all the synthesized sources.

__3.3. False glottal
area at rest__

When A’g0 is decreased, Eq.
(6) implies that the resonance frequency is pushed higher.

The spectra in Fig. 9 shows
the spectra for different A’g0. Other conditions are the same as above.

__3.4. Modulation
amplitude__

We synthesized
laryngeal flows by changing the amplitude of VTF vibrations α’. No significant
trends are observed in the behaviours of the synthesized flows.

__3.5. Laryngeal source
for kargyraa__

If ώ =
2ω, then u’(t) has a double-period of u(t) and shows behaviour similar to
kargyraa phonation. From the characteristics of u(t), in the middle of each
period, the laryngeal flow reaches to 0. However, the inverse-filtered karygraa
voice maintains flow in each period. Uncompleted closure of the VFs is also
observed in the physical model simulation [9]. In order to obtain the similar
laryngeal flow shape, the second u(t) flow must start before Tn + Tp or u(t)
needs sufficiently large OQ.

**4. Conclusions**

A new laryngeal flow model
was proposed. We studied the acoustic characteristics of the model by changing
its parameters. To obtain the laryngeal voice shape of the drone voice, VTF
modulation is indispensable. In addition, to obtain ripples after the closure
of the vocal folds, laryngeal ventricle resonance is effective. These results
show the proposed model is effective for synthesizing pressed-type singing
voices, such as throat singing. Parameter fitting in terms of
analysis-by-synthesis and perceptual evaluation will be addressed as future
works. In addition, an effective inverse filtering method in cases that the
source has poles and the filter has zeros must be studies.

**Acknowledgments**

We thank Seiji Adachi,
Parham Mokhtari, Yoshinao Shiraki, Niro Tayama, and Masahiko Todoriki for their
helpful discussions.

**5. References**

[1] P. Alku, T.
B¨ackstr¨om, and E.Vilkman. Normalized amplitude quatient for parametrization
of the glottal flow. *J.* *Acoust. Soc. Am.*, 112(2):701–710, 2002.

[2] K. E. Cummings and M.
A. Clements. Glottal models for digital speech processing: A historical survey
and new results. *Digital Signa Processing*, 5:21–42, 1995.

[3] G. Fant, J.
Liljencrants, and Q.-A. Lin. A four-parameter model of glottal flow. *KTH STL
QPSR*, pages 1–14, 1985.

[4] L. Fuks, B. Hammarberg,
and J. Sundberg. A self-sustained vocal-ventricular phonation mode: acoustical,
aerodynamic and glottographic evidences. *KTH TMH-QPSR*, 3/1998:49–59,
1998.

[5] N. Kobayashi, Y.
Tohkura, S. Tenpaku, and S. Niimi. Acoustic and physiological characteristics
of traditional singing in *Tech. Rep. IEICE*, SP89-147:39–45, 1990.

[6] T. C. Levin and M. E.
Edgerton. The throat singers of tuva. *Scientific *, Sep-1999:80–87, 1999.

[7] P.- A° . Lindestad, M. Sodersten,
B. Merker, and S. Granqvist. Voice source
characteristics in mongolian ”throat singing”studied with high-speed imaging
technique, acoustic spectra, and inverse filtering. *J. Voice*,
15(1):78–85, 2001.

[8] A. *J. Acoust. Soc. Am.*, 49(2):583–590, 1970.

[9] K.-I. Sakakibara, H.
Imagawa, S. Niimi, and *Proc.
ICMC 2002*, pages 5–8, 2002.

[10] K.-I. Sakakibara, T.
Konishi, H. Imagawa, E. Z. Murano, K. Kondo, M. Kumada, and S. Niimi.
Observation of the laryngeal movements for throat singing — vibration of two
pairs of the folds in human larynx. *Acoust. Soc. Am. World Wide Press Room*,
144th meeting of the ASA, 2002. http://www.acoustics.org/press/.

[11] K.-I. Sakakibara, T.
Konishi, K. Kondo, E. Z. Murano, M. Kumada, H. Imagawa, and S. Niimi. Vocal
fold and false vocal fold vibrations and synthesis of kh¨o¨omei. In *Proc.*
*ICMC 2001*, pages 135–138. ICMA, 2001.

Return
to Mongolian Khoomii Main Page