Wspire Corpus



About

The Wspire corpus is an extensive multi-device speech database specifically designed to capture recordings in both neutral and whispered modes. It encompasses recordings from 88 speakers, utilizing 5 different devices to ensure diverse audio capture scenarios. The corpus features parallel recordings in both neutral and whispered speech, allowing for detailed comparative analysis. All recordings were meticulously conducted in a soundproof recording room at the Electrical Engineering Department of the Indian Institute of Science (IISc), Bangalore. The participants, who contributed to this dataset, are primarily graduate students, interns, or employees affiliated with IISc. This dataset is invaluable for research in speech processing and acoustic analysis, providing robust data across varying speech modes and recording conditions.

>>top<<

Recording and Setup

Audio: originally recorded at 44.1 kHz then downsampled to 16 kHz.

5 devices were used:

Recording Procedure:

  • At first, every speaker was asked to speak all the sentences in the assigned set in neutral mode and then in the whispered mode.
  • Three different tones, each having a duration of 1 second, were used while recording.
  • Further, to indicate the beginning of each sentence, a tone of constant frequency of 1kHz was played.
  • An up-tone of 1-second duration, with frequency increasing from 1kHz to 2kHz, was played when the speaker pronounced the sentence correctly. Similarly, a down-tone of 1-second duration with frequency decreasing from 2kHz to 1kHz was played if the sentence was mispronounced.
  • In case of wrong pronunciation, speakers were asked to repeat the sentence
Recording Setup
>>top<<

Speaker Information

IDAGEGENDERSET
424Male4
524Male5
621Female6
724Male7
822Male8
1230Female2
1324Male3
1426Male4
1623Male6
1724Female7
1828Female8
1923Female9
2120Female3
2225Female4
2327Male5
2424Male6
2522Male7
2623Male8
2722Male9
2831Male1
2922Female2
3022Female3
3122Female4
3222Female5
3323Male6
3428Female7
3523Male8
3626Female9
3721Male1
3823Male2
3924Male3
4025Female4
4128Male5
4221Female6
4330Male7
4423Female9
4524Male1
4637Female2
4723Female3
4830Male4
4923Male5
5031Female6
5122Female7
5224Male8
5321Male9
5425Male1
5524Male2
5628Female3
5725Female4
5832Male5
5925Male6
6023Male7
6222Male9
6322Male1
6424Male2
6520Male3
6623Male4
6724Female5
6823Female6
6921Female7
7022Male8
7127Female9
7224Male1
7324Male2
7422Female3
7529Male4
7626Female5
7729Male6
7923Male8
8027Female9
8323Female3
8430Female4
8523Male5
8626Female6
8723Male7
8825Male8
8923Male9
9023Female1
9227Male2
9328Male3
9524Male5
9623Male6
9722Male7
9822Male8
9923Male9
10022Female1
10125Male2
10224Male3

Data

We have chosen the first 450 out of the 460 sentences which were used in the MOCHA-TIMIT corpus [link], as it contains phonetically balanced utterances for the speech recording.

Further, the 450 sentences were divided into nine sets of each having 50 sentences.

>>top<<
SETLINK
1link
2link
3link
4link
5link
6link
7link
8link
9link

Each audio file and its corresponding timestamp file has a unique name:

  • ID - Each speaker was given a unique id.
  • SET - We have nine different types of sets, numbered from 1 to 9.
  • GENDER - Male and Female ("M" and "F").
  • MODE - Whisper, Neutral.
  • DEVICE - Headset, Zoom, iPhone, Moto, Nokia.
  • SENTNUM - It denotes the sentence number in the respective set.
>>top<<

Timestamp file name: "ID0[ID]_00[SET]_[GENDER]_[MODE]_[DEVICE]_[SENTNUM].lab"

Example: “ID06_006_F_Neutral_Headset_11.lab”

Audio file name: “ID0[ID]_00[SET]_[GENDER]_[MODE]_[DEVICE]_[SENTNUM].wav”

Example: “ID06_006_F_Neutral_Headset_11.wav”

People

Profile Image
Bhavuk Singhal
Profile Image
Abinay Reddy