SPIRE: Signal Processing Interpretation and Representation Lab

RESOURCES > DATABASE > WSPIRE

Wspire Corpus

About

The Wspire corpus is an extensive multi-device speech database specifically designed to capture recordings in both neutral and whispered modes. It encompasses recordings from 88 speakers, utilizing 5 different devices to ensure diverse audio capture scenarios. The corpus features parallel recordings in both neutral and whispered speech, allowing for detailed comparative analysis. All recordings were meticulously conducted in a soundproof recording room at the Electrical Engineering Department of the Indian Institute of Science (IISc), Bangalore. The participants, who contributed to this dataset, are primarily graduate students, interns, or employees affiliated with IISc. This dataset is invaluable for research in speech processing and acoustic analysis, providing robust data across varying speech modes and recording conditions.

>>top<<

Recording and Setup

Audio: originally recorded at 44.1 kHz then downsampled to 16 kHz.

5 devices were used:

Recording Procedure:

At first, every speaker was asked to speak all the sentences in the assigned set in neutral mode and then in the whispered mode.
Three different tones, each having a duration of 1 second, were used while recording.
Further, to indicate the beginning of each sentence, a tone of constant frequency of 1kHz was played.
An up-tone of 1-second duration, with frequency increasing from 1kHz to 2kHz, was played when the speaker pronounced the sentence correctly. Similarly, a down-tone of 1-second duration with frequency decreasing from 2kHz to 1kHz was played if the sentence was mispronounced.
In case of wrong pronunciation, speakers were asked to repeat the sentence

>>top<<

Speaker Information

ID	AGE	GENDER	SET
4	24	Male	4
5	24	Male	5
6	21	Female	6
7	24	Male	7
8	22	Male	8
12	30	Female	2
13	24	Male	3
14	26	Male	4
16	23	Male	6
17	24	Female	7
18	28	Female	8
19	23	Female	9
21	20	Female	3
22	25	Female	4
23	27	Male	5
24	24	Male	6
25	22	Male	7
26	23	Male	8
27	22	Male	9
28	31	Male	1
29	22	Female	2
30	22	Female	3
31	22	Female	4
32	22	Female	5
33	23	Male	6
34	28	Female	7
35	23	Male	8
36	26	Female	9
37	21	Male	1
38	23	Male	2
39	24	Male	3
40	25	Female	4
41	28	Male	5
42	21	Female	6
43	30	Male	7
44	23	Female	9
45	24	Male	1
46	37	Female	2
47	23	Female	3
48	30	Male	4
49	23	Male	5
50	31	Female	6
51	22	Female	7
52	24	Male	8
53	21	Male	9
54	25	Male	1
55	24	Male	2
56	28	Female	3
57	25	Female	4
58	32	Male	5
59	25	Male	6
60	23	Male	7
62	22	Male	9
63	22	Male	1
64	24	Male	2
65	20	Male	3
66	23	Male	4
67	24	Female	5
68	23	Female	6
69	21	Female	7
70	22	Male	8
71	27	Female	9
72	24	Male	1
73	24	Male	2
74	22	Female	3
75	29	Male	4
76	26	Female	5
77	29	Male	6
79	23	Male	8
80	27	Female	9
83	23	Female	3
84	30	Female	4
85	23	Male	5
86	26	Female	6
87	23	Male	7
88	25	Male	8
89	23	Male	9
90	23	Female	1
92	27	Male	2
93	28	Male	3
95	24	Male	5
96	23	Male	6
97	22	Male	7
98	22	Male	8
99	23	Male	9
100	22	Female	1
101	25	Male	2
102	24	Male	3

Data

We have chosen the first 450 out of the 460 sentences which were used in the MOCHA-TIMIT corpus [link], as it contains phonetically balanced utterances for the speech recording.

Further, the 450 sentences were divided into nine sets of each having 50 sentences.

>>top<<