Yamaha Pocket Miku User Manual page 4

The singing keyboard
Table of Contents

Advertisement

with the same fundamental idea. This is the idea
that by analyzing the structure and tone quality of
the human voice, we can then attempt to simulate
it. As a representative example, let's look at
"formant synthesis. " Formants are the spectral
peaks of the sound spectrum (the distribution of
the volume of each frequency band) of the voice.
The idea is that you can simulate human
pronunciation (the vocal cords and the movement
of the mouth) by supplying these peak
movements to a basic sound source.
"Concatenative synthesis" is another method
that spread quickly due to the shrinking costs of
digital technology. This method involves linking
fragments of recorded (sampled) voices to
synthesize vocals. Vocaloid's system is basically a
type of concatenative synthesis which produces
more music like results. This system achieves this
effect by similarly connecting vocal fragments, and
at the same time making adjustments to each
frequency zone.
Formant synthesis and the robot
voice
As for an example of a device that is closer to the
concept of "formant synthesis, " the "Vocoder" is a
device that is familiar to many in the music world.
The idea for this device was originally formed in the
late 1920s at Bell Labs. At the time, it was used as a
voice compression technology for sending a clear
voice transmission through a telegraph cable's small
bandwidth. The technology was used mainly for
purposes of military communication, due to the
limitation of cost reduction with the technology at
the time, as well as the fact that this was the period
encompassing World War II. Production costs were
reduced as semiconductor technology advanced in
the late 1960s, and instruments and effect
processors that gave the human singing voice a
robot-like effect grew popular. Vocoder technology
as a means of voice compression was later used to
improve voice clarity in cellphones. This technology
is still being developed today.
Similarly, a type of effects processor called a talk
box, which uses the structure of the human mouth
itself as a physical filter, has become very popular
in musical genres such as rock and funk. These
devices, however, are simply effects processors
that process sound by using the movements of the
human mouth. They don't quite belong in the
same category of "vocal synthesizers" as Vocaloid
does, because they do not generate singing voices
on their own.
4
WAHHA GO GO
"WAHHA GO GO, " a machine that laughs like a human, was developed in 2009 by Maywa
Denki. Powered by a flywheel and bellows, the device imitates the movements of human
vocal cords and the opening and closing of the mouth, resulting in changes in the
formant (voice quality) and the amount of air (voice volume).
© Yoshimoto Kogyo co.,ltd. / Maywa Denki
 The "A" formant Multiple peaks can be con rmed.
500
1000
Formants and the Vocoder
The peak movements of a formant have a significant relationship with the vocal chords
and movements of the mouth when a person uses their voice. When similar sounds are
produced, their formants peak near the same frequencies. The Vocoder is a development
of audio compression technology that reproduces formants by generating them from the
receiving end. It uses multiple bandpass filters to detect the extent of the peak of each
frequency zone.
First Formant
Second Formant
Vocal cords
1500

Advertisement

Table of Contents
loading

Table of Contents