RESOURCES > DATABASE > SPIRE VCV

SPIRE VCV CORPUS

About

  • SPIRE VCV is a database of speech production, which includes simultaneous acoustic and electromagnetic articulography data collected from speakers of non-native/Indian English.
  • Stimuli comprises of non-sense symmetrical VCV (Vowel-Consonant-Vowel) utterances as part of the sentence "speak VCV today" in three different speaking rates: slow, normal, and fast with 3 repetitions each.
  • The VCV utterances consist of the combination of 17 consonant sounds namely:
    C = { /b/, /ch/, /d/, /f/, /g/, /jh/, /k/, /l/, /m/, /n/, /ng/, /p/, /r/, /s/, /t/, /v/, /z/ }
  • And 5 vowel sounds:
    V = { /a/, /e/, /i/, /o/, /u/ }
  • Ten non-native English speakers: 5 female and 5 male in the age range of 18 to 27 years with no speech related disorders.
  • Recordings were made in the sound damped studio at the SPIRE Labs speech recording facility. Acoustic and articulatory data were recorded directly to the computer and carefully synchronized.
  • VCV boundaries were manually annotated. Read more
>>top<<

Recording and Setup

  • Articulatory movements were recorded using a 3D Electromagnetic Articulograph. (EMA) AG501.
  • A t.bone EM9600 shotgun unidirectional electret condenser microphone was placed near the subject to record the audio data synchronously with the articulatory data.
  • Audio:
    • originally recorded at 48 kHz then downsampled to 16 kHz.
  • Articulatory data:
    • Sampled at 250 Hz.
    • A 10th-order lowpass Chebyshev Type II filter with 40Hz cut-off frequency and 40 dB of stopband attenuation was used to low-pass filter the articulatory movement recording to remove the high-frequency noise resulting from EMA measurement error.
    • Sensor placement
      • 6 sensors were placed on the different speech articulators namely:
        • Upper Lip
        • Lower Lip
        • Jaw
        • Tongue Tip
        • Tongue Body
        • Tongue Dorsum
    • Sensors were also placed behind the left and right ear for the purpose of head movement correction.
    • Each of these 6 sensors captures the movements of the articulators in 3D space, resulting in eighteen articulatory features
  • Instructions to speaker:
    • All speakers were college going students fluent with reading, writing and speaking English coming from different regions of India with different native language backgrounds.
    • Speakers were given prior training to increase speaking rate gradually during the main recording.
    • A GUI produces the stimuli to be uttered on screen and the user pronounces it for each of the three different speaking rates, namely slow, normal/habitual, and fast and three repetitions each.
  • V-C-V Boundary annotation:
    • The VCV boundaries were manually annotated by a team of four members.
    • These boundaries were marked using an in-house built MATLAB annotation tool by observing the wideband spectrogram, the raw waveform and glottal pulses obtained using praat.
      • For unvoiced consonants: the last glottal pulse in the V1 region was considered for marking the onset of the C region, and the first glottal pulse at the start of V2 region was for considered marking the end of C-region, in tandem with the spectrogram.
      • For voiced consonants: the spectrogram with the formants and time domain waveform were considered for marking the consonant start and end boundaries.
    • For ambiguous cases, a unanimous call was then taken for the boundary marking after an internal discussion among the annotators.
>>top<<

Speaker Information

    #SubjectAgeGenderNative Language
    1F122FemaleBengali
    2M121MaleTulu
    3F227FemaleBengali
    4M220MaleBengali
    5F323FemaleTamil
    6M321MaleTamil
    7F420FemaleKannada
    8M423MaleTamil
    9F521FemaleMalayalam
    10M520MaleHindi
>>top<<

Data

Subject:
Vowel:
Consonant:
Speed:

>>top<<

People

Publications

Conferences (Accepted and/or Published):

  1. Tilak Purohit, Achuth Rao M V, P. K. Ghosh,, "Impact of speaking rate on the source filter Interaction in speech: a study" ICASSP 2021 [PDF] [Poster]
  2. Tilak Purohit , P. K. Ghosh. , "An investigation of the virtual lip trajectories during the production of bilabial stops and nasal at different speaking rates" Interspeech 2020, Shanghai, China [PDF] [Slides] [Presentation]
  3. Anusuya P, Aravind Illa, P. K. Ghosh,, "A Data Driven Phoneme-Specific Analysis of Articulatory Importance" International Seminar On Speech Production 2020
>>top<<
>>top<<