RESOURCES > SOFTWARE

Mid-sagittal video from Electromagnetic Articulography (EMA)

(Five illustrative videos from male and female subjects of MOCHA-TIMIT corpus)

Maximum a-posteriori estimation of missing samples with continuity constraint in electromagnetic articulography data (ICASSP 2014)

Electromagnetic Articulography (EMA) technique is used to record the kinematics of different articulators while one speaks. EMA data often contains missing segments due to sensor failure. In this work, we propose a maximum a-posteriori (MAP) estimation with continuity constraint to recover the missing samples in the articulatory trajectories recorded using EMA. In this approach, we combine the benefits of statistical MAP estimation as well as the temporal continuity of the articulatory trajectories.

Click to download Matlab code

Multi-pitch tracking using Gaussian mixture model with time varying parameters and grating compression transform (ICASSP 2014)

Multi-pitch tracks of speech mixtures are evaluated using time-varying means of a Gaussian mixture model (GMM), referred to as TVGMM. The TVGMM parameters are estimated using multiple pitch values at each frame in a given utterance obtained from different patches of the spectrogram using Grating Compression Transform.

Click to download Matlab code

Improved subject-independent acoustic-to-articulatory inversion (Speech Communication, Elsevier)

In subject-independent acoustic-to-articulatory inversion (SII), articulatory movements are estimated from the speech acoustics of a test speaker even though the test speaker may not be used for training. Several schemes for SII have been proposed. Below are the links for downloading the MATLAB codes for four best performing inversion schemes (IS), namely,

Missing samples estimation in electromagnetic articulography data using equality constrained Kalman smoother (Interspeech 2014)

In this work, we propose an equality constrained Kalman smoother (ECKS) to estimate the missing samples in the EMA data. We incorporate the dynamics of the articulatory movement for missing samples estimation by considering the EMA data vector as the observations from a linear dynamical system. The proposed approach gives 41% reduction on the root mean square error of the estimates compared to the minimum mean square error estimator which does not utilize the dynamics of the articulatory movement.

Click to download Matlab code

Robust whisper activity detection using long-term log energy variation of sub-band signal (SPL 2015)

The goal in the whisper activity detection (WAD) is to find the whispered speech segments in a given noisy recording of whispered speech. Since whispering lacks the periodic glottal excitation, it resembles an unvoiced speech. This noise-like nature of the whispered speech makes WAD a more challenging task compared to a typical voice activity detection (VAD) problem. In this paper, we propose a feature based on the long term variation of the logarithm of the short-time sub-band signal energy for WAD. We also propose an automatic sub-band selection algorithm to maximally discriminate noisy whisper from noise. Experiments with eight noise types in four different signal-tonoise ratio (SNR) conditions show that, for most of the noises, the performance of the proposed WAD scheme is significantly better than that of the existing VAD schemes and whisper detection schemes when used for WAD.

Click to download Matlab code

Detailed WAD Performance


Detailed WAD Performance


An Error Correction Scheme for GCI Detection Algorithms using Pitch Smoothness Criterion (Interspeech 2015)

In this work, we propose a postprocessing scheme for correcting errors of any GCI detection algorithm. The error correction is formulated as an optimization problem such that the pitch contour from the corrected GCIs has the least high frequency components. The proposed error correction scheme is experimentally evaluated on speech corpus with simultaneous EGG recordings using three state-of-the-art GCI detection algorithms viz., DPI, ZFR, and SEDREAMS. It is found that the proposed error correction scheme improves the performance of the GCI detection in clean speech as well as noisy conditions at different SNRs.

Click to download Matlab code

Cumulative Impulse Strength for Epoch Extraction (SPL 2016)

Algorithms for extracting epochs or glottal closure instants (GCIs) from voiced speech typically fall into two categories: (i) ones which operate on linear prediction residual (LPR) and (ii) those which operate directly on the speech signal. While the former class of algorithms (such as YAGA and DPI) tend to be more accurate, the latter ones (such as ZFR and SEDREAMS) tend to be more noise-robust. In this paper, a temporal measure termed the cumulative impulse strength is proposed for locating the impulses in a quasi-periodic impulse-sequence embedded in noise. Subsequently, it is applied for detecting the GCIs from the inverted integrated LPR using a recursive algorithm. Experiments on two large corpora of speech with simultaneous electroglottographic recordings demonstrate that the proposed method is more robust to additive noise than the state-of-the-art algorithms, despite operating on the LPR.

Click to download Matlab code

Spectrogram enhancement using multiple window Savitzky Golay (MWSG) filter for robust bird sound detection (IEEE Trans. ASLP 2017)

Bird sound detection from real field recordings is essential for identifying bird species in bioacoustic monitoring. Variations in the recording devices, environmental conditions and the presence of vocalizations from other animals make the bird sound detection very challenging. In order to overcome these challenges, we propose an unsupervised algorithm comprising two main stages. In the first stage, a spectrogram enhancement technique is proposed using a multiple window Savitzky-Golay (MWSG) filter. We show that the spectrogram estimate using MWSG filter is unbiased and has lower variance compared to its single window counterpart. It is known that bird sounds are highly structured in the time-frequency (T-F) plane. We exploit these cues of prominence of T-F activity in specific directions from the enhanced spectrogram, in the second stage of the proposed method, for bird sound detection. In this regard, we use a set of four moving average filters that when applied to the enhanced spectrogram, yield directional spectrograms that capture the direction specific information. We propose a thresholding scheme on the time varying energy profile computed from each of these directional spectrograms to obtain frame-level binary decisions of bird sound activity. These individual decisions are then combined to obtain the final decision.

Click to download Matlab code

Click to download MLSP annotations used in this work