SPIRE: Signal Processing Interpretation and Representation Lab

LAB EVENTS > TALKS

3rd June 2022 : Selection of acoustically similar sentences based on phone error rate in the context of ASR by Saurabh Kumar
3rd June 2022 : Selection of acoustically similar sentences based on phone error rate in the context of ASR by Saurabh Kumar

Talk summary:

For many languages, state-of-the-art ASR systems are reported to perform poorly due to the lack of acoustically and phonetically rich speech data available for system building. Even for resource-rich languages such as English, little efforts have been made to finding an efficient method to select training data similar to the testing conditions. Instead, state-of-the-art ASR systems are data hungry and require lots of speech data for training. Therefore, data selection plays a crucial role in the development of robust and computationally efficient ASR systems. In the last few years, several methods have been reported that ensure both acoustic and phonetic richness of the speech data. In this study, several recently reported data selection methods have been explored and efforts have been made to improve them.

27th May 2022 : WER–BERT: Automatic WER Estimation with BERT in a Balanced Ordinal Classification Paradigm by Abhishek Kumar
27th May 2022 : WER–BERT: Automatic WER Estimation with BERT in a Balanced Ordinal Classification Paradigm by Abhishek Kumar

Talk summary:

Automatic Speech Recognition (ASR) systems are evaluated using Word Error Rate (WER), which is calculated by comparing the number of errors between the ground truth and the transcription of the ASR system. This calculation, however, requires manual transcription of the speech signal to obtain the ground truth. Since transcribing audio signals is a costly process, Automatic WER Evaluation (e-WER) methods have been developed to automatically predict the WER of a speech system by only relying on the transcription and the speech signal features. While WER is a continuous variable, previous works have shown that positing e-WER as a classification problem is more effective than regression. However, while converting to a classification setting, these approaches suffer from heavy class imbalance. In this paper, we propose a new balanced paradigm for e-WER in a classification setting. Within this paradigm, we also propose WER-BERT, a BERT based architecture with speech features for e-WER. Furthermore, we introduce a distance loss function to tackle the ordinal nature of e-WER classification. The proposed approach and paradigm are evaluated on the Librispeech dataset and a commercial (black box) ASR system, Google Cloud’s Speech-to-Text API. The results and experiments demonstrate that WER-BERT establishes a new state-of-the-art in automatic WER estimation.

20th May 2022 : Unsupervised representation learning for speaker verification by Prajesh Rana
20th May 2022 : Unsupervised representation learning for speaker verification by Prajesh Rana

Talk summary:

The objective of speaker verification is authentication of a claimed identity from measurements on the voice signal. For speaker verification I am exploring contrastive loss based self supervised learning(SSL). My work on speaker verification consist of two parts. In the first part I trained the Self supervised model and in second part I am using pretrained model as a feature extractor and trained the PLDA as a backend model. I will compare my results with the TDNN-PLDA and TDNN-ECAPA algorithms.

13th May 2022 : Extracting features using Self Supervised learning using ASR by Abhishek
13th May 2022 : Extracting features using Self Supervised learning using ASR by Abhishek

Talk summary:

Automatic Speech Recognition, or ASR for short, is a technique of providing transcription to a speech or in simple terms ASR is also termed as Speech-to-Text conversion. So, here we aim to learn a feature that not only utilizes data but are also robust to noise. To support this argument for our target feature I have evaluated the performance of MFCC and F-bank features with the Features learnt using wav2vec] which is learnt using a self- supervised representation learning, for ASR. The metric used for comparison is PER and CER.

6th May 2022 : A Stage Match For Query-By-Example Spoken Term Detection Based On Structure Information Of Query by Deekshitha G
6th May 2022 : A Stage Match For Query-By-Example Spoken Term Detection Based On Structure Information Of Query by Deekshitha G

Talk summary:

The state-of-the-art of query-by-example spoken term detection (QbE-STD) strategies are usually based on segmental dynamic time warping (S-DTW). However, the sliding window in S-DTW may separate signal of a word into different segments and produce many illegal candidates required to be compared with the query, which significantly reduce the accuracy and efficiency of detection. This paper propose a stage match strategy based on the structure information of the query, represented with the unvoiced-voiced attribute of the portions in itself. The strategy first locates potential candidates with similar structure against the query in utterances,and further matches the query with Type-Location DTW (TLDTW), which is a modified DTW with the constraints of pronunciation types and relative positions of paired frames in the voiced sub-segments. Experiments on AISHELL-1 Corpus showed that the proposed approach achieved a relative improvement S-DTW and speeded up the retrieval.

29th April 2022 : Paper review by Sathvik Udupa
29th April 2022 : Paper review by Sathvik Udupa

Talk summary:

1. Understanding the Role of Self Attention for Efficient Speech Recognition Transformer neural networks are increasingly used in automatic speech recognition (ASR). This work investigates the inner working of such networks in ASR and introduces techniques to reduce recognition latency. 2. Chunked Autoregressive GAN for Conditional Waveform Synthesis Generative adversarial networks (GAN) based neural vocoders have been performing well in speech synthesis in recent years. The authors show that these networks are unable to generate accurate pitch and periodicity, and introduce an autoregressive GAN based vocoder to tackle the issues.

15th April 2022 : Broadcasted Residual Learning for Efficient Keyword Spotting by Siddarth
15th April 2022 : Broadcasted Residual Learning for Efficient Keyword Spotting by Siddarth

Talk summary:

We present a broadcasted residual learning method to achieve high accuracy with small model size and computational load. Our method configures most of the residual functions as 1D temporal convolution while still allows 2D convolution together using a broadcasted-residual connection that expands temporal output to frequency-temporal dimension.

25th March 2022 : Wave Equation and Fundamentals by Veerababu Dharanalakota
25th March 2022 : Wave Equation and Fundamentals by Veerababu Dharanalakota

Talk summary:

It is necessary to know the mathematical description of the speech signals (sound waves) not just for the reason the study exists but for it has a potential to mimic the reality under certain conditions. In order to do so, it is necessary to know the derivation of wave equation and the underlying assumptions. This talk covers the derivation of wave equation from the fundamental of fluid dynamics equations: continuity, momentum and energy equations, which in turn are derived from natural laws. Further, the talk covers the general terminology used in the study of sound.

11th March 2022 : Pnoi: Development and Challenges by Syed Fahad
11th March 2022 : Pnoi: Development and Challenges by Syed Fahad

Talk summary:

Discussion of the developments in regards to creation of a specialized digital stethoscope called Pnoi for capturing lung and breathing sounds. These sounds can be used for medical diagnosis at an substantially cheaper cost than the current standards.

4th March 2022 : Large Text Corpus Creation using Web Scraping for Language Modelling by Hemantha Krishna Bharadwaj
4th March 2022 : Large Text Corpus Creation using Web Scraping for Language Modelling by Hemantha Krishna Bharadwaj

Talk summary:

The collection of large datasets for training language models requires the use of techniques that extract data from the world wide web in a systematic manner. Collectively known as web scraping, these techniques have been well established by previous research, but there is little research on their use for the collection of data other than that in the English language. This talk will detail improved methods of extracting domain-specific non-English language data from the internet using a combination of HTML parsing libraries and frameworks in Python. The proposed methodology can be utilized to provide large non-English language text datasets in an automated fashion.

18th February 2022 : Attention and Transformers by Abhayjeet Singh
18th February 2022 : Attention and Transformers by Abhayjeet Singh

Talk summary:

Intuitive and mathematical understanding of Attention and Transformer Networks

11th February 2022 : wav2vec 2.0 by Siddarth C
11th February 2022 : wav2vec 2.0 by Siddarth C

Talk summary:

We show for the first time that learning powerful representations from speech audio alone followed by fine-tuning on transcribed speech can outperform the best semi-supervised methods while being conceptually simpler. wav2vec 2.0 masks the speech input in the latent space and solves a contrastive task defined over a quantization of the latent representations which are jointly learned.

28th January 2022 : An error correction scheme for improved air-tissue boundary in real-time MRI video for speech production by Anwesha
28th January 2022 : An error correction scheme for improved air-tissue boundary in real-time MRI video for speech production by Anwesha

Talk summary:

The best performance in Air-tissue boundary (ATB) segmentation of real-time Magnetic Resonance Imaging (rtMRI) videos in speech production is known to be achieved by a 3-dimensional convolutional neural network (3D-CNN) model. However, the evaluation for this model, as well as other ATB segmentation techniques reported in the literature, is done using Dynamic Time Warping (DTW) distance between the entire original and predicted contours. Such an evaluation measure may not capture local errors in the entire contour. Careful analysis of predicted contours reveals errors in regions like the velum part of contour1 and tongue base section of contour2, which are not captured in a global evaluation metric like DTW distance. In this work, we automatically detect such errors and propose a correction scheme for the same. We also propose two new evaluation metrics for ATB segmentation separately in contour1 and contour2 to explicitly capture two types of errors in these contours.

14th January 2022 : Data Visualization by Jeevan

7th January 2022 : Sociolinguistics: Language Variation and Dialect by Sharmistha

31st December 2021 : A brief introduction to muscle synergies in speech by Navaneetha

24th December 2021 : Vocal and Non-vocal segmentation based on the analysis of formant structure\u200b by Pranaswi
24th December 2021 : Vocal and Non-vocal segmentation based on the analysis of formant structure\u200b by Pranaswi

Talk summary:

A pulmonary system is a network of organs and tissues that help us to breath. A typical pulmonary system in humans consists of lungs, larynx, trachea, bronchi, bronchioles, alveoli and thoracic diaphragm. Inspiratory sounds measured simultaneously over the extrathoracic trachea and at the chest surface contain highly unique regional information. The characteristic patterns in the recorded data are associated with the conditions affecting airway patency such as asthma and obstructive sleep apnea. There is a potential for the recorded sounds to be used in clinical practices for the diagnosis and monitoring of various respiratory conditions. In the proposed research work, an acoustic model of the pulmonary system will be developed by treating tracheobronchial tree and lungs as flexible branched duct system and plenums, respectively.

17th December 2021 : Overlapped Speech Detection using CNN Architectures by Pooja
17th December 2021 : Overlapped Speech Detection using CNN Architectures by Pooja

Talk summary:

The ability to estimate the overlapped sentences spoken by an individual over a certain period of time is valuable in language acquisition, healthcare, and assessing language development. However, establishing a robust automatic framework to achieve high accuracy is non-trivial in realistic/naturalistic scenarios due to various factors such as different styles of conversation or types of noise that appear in audio recordings, especially in multi-party conversations. Therefore, overlapping speech detection has become an important front-end triage step for speech technology applications. This is crucial for large-scale datasets where manual labeling in not possible. A block-based CNN architecture is proposed to address modeling overlapping speech in audio streams with frames as short as 25 ms. The architecture is robust to both: (i) shifts in distribution of network activations due to the change in network parameters during training, (ii) local variations from the input features caused by feature extraction, environmental noise, or room interference.

10th December 2021 : Introduction to G2P Systems by Priyanshi
10th December 2021 : Introduction to G2P Systems by Priyanshi

Talk summary:

Orthography of a language does not always have a predictable relationship with it’s pronunciation. Certain languages have predictable and consistent relationships, however, for languages like English which have multiple inconsistencies and loan words from other languages, mapping this relationship becomes challenging. Ability to map this relationship can help in producing better performing ASR and TTS systems. Grapheme to phoneme conversion systems are used to find pronunciation of a word given it’s written form. We look at where it plays a role in the aforementioned systems, what are the challenges involved in it and also look at one approach to do it.

3rd December 2021 : Acoustic Modeling and Analysis of Pulmonary System by Veerababu
3rd December 2021 : Acoustic Modeling and Analysis of Pulmonary System by Veerababu

Talk summary:

A pulmonary system is a network of organs and tissues that help us to breath. A typical pulmonary system in humans consists of lungs, larynx, trachea, bronchi, bronchioles, alveoli and thoracic diaphragm. Inspiratory sounds measured simultaneously over the extrathoracic trachea and at the chest surface contain highly unique regional information. The characteristic patterns in the recorded data are associated with the conditions affecting airway patency such as asthma and obstructive sleep apnea. There is a potential for the recorded sounds to be used in clinical practices for the diagnosis and monitoring of various respiratory conditions. In the proposed research work, an acoustic model of the pulmonary system will be developed by treating tracheobronchial tree and lungs as flexible branched duct system and plenums, respectively.

26th November 2021 : Hindi Language Modelling using text data from domains of agriculture, finance, healthcare and general by Sneha
26th November 2021 : Hindi Language Modelling using text data from domains of agriculture, finance, healthcare and general by Sneha

Talk summary:

Data corresponding to Hindi Text is collected from four different domains: general, agriculture, healthcare, and finance. Different statistics like word frequency, total number of sentences, words are determined from the combined cleaned data of all the domains. To validate the data, a Decision Tree Algorithm is used for text classification where it can classify an unknown text into pre-determined groups.

19th November 2021 : Bengali text data classification of different domains by Sanchari
19th November 2021 : Bengali text data classification of different domains by Sanchari

Talk summary:

Bengali texts from 4 different domains (general, agriculture, healthcare, and finance) a decision tree algorithm is used in separating the classes and predicting the domain of any unknown Bengali text. It is an approach by which any unknown text can be easily classified as to which domain it belongs to.

5th November 2021 : Selection of acoustically and phonetically rich sentences in the context of ASR by Saurabh Kumar

28th October 2021 : Neural speech synthesis models by Navneet Kaur
28th October 2021 : Neural speech synthesis models by Navneet Kaur

Talk summary:

The most recent advancements in the field of speech sythesis have been brought by deep learning. In current state-of-the-art models, the task of text to speech conversion is accomplished through two steps: i) Conversion of text to lower resolution intermediate representation generally mel-spectrogram using seq2seq model (Frontend), ii) Generation of speech waveform from mel-spectrogram using generative models(Backend). In this talk, I will be discussing and comparing different techniques and models for both front-end and backend. Specifically, for seq-to-seq model I will cover Tacotron-2, Fastspeech-2, Transformer-TTS, and GlowTTS. Among generative models, I will discuss WaveNet, WaveGlow, and MelGAN.

22nd October 2021 : Analysis of vocal sounds in asthmatic patients by Shivani

15th October 2021 : Diffusion probabilistic models in speech synthesis by Sathvik
15th October 2021 : Diffusion probabilistic models in speech synthesis by Sathvik

Talk summary:

In recent years, there has been progress in a type of generative modelling known as diffusion probabilistic models. The latent features are learnt through a 'diffusion' process, which iteratively adds noise to the data to transform it into a noise distribution. During inference, this process can be reversed to generate data samples from noise. This learning technique has been applied in problem statements in speech synthesis by modifying it to a conditional generative process.

13th October 2021 : Analysis of vocal sounds in asthmatic patients by Shivani

24th September 2021 : Study of ALS/PD classification using slurred speech by Aayushman
24th September 2021 : Study of ALS/PD classification using slurred speech by Aayushman

Talk summary:

Monitoring disease progression in patients with Amyotrophic Lateral Sclerosis (ALS), and Parkinson’s disease (PD) can be done by analyzing their speech waveforms. Many works in the past have used different acoustic features for the classification of patients with ALS and PD with healthy controls (HC). In this project, I studied a data-driven approach to learn representations from raw speech waveform. The model comprises of 1-D Convolutional Neural Network (CNN) layer to extract representations from raw speech followed by a Bi-directional Long Short Term Memory (BLSTM) layers for the classification tasks. Three different classification tasks (ALS vs HC), (PD vs HC), and (ALS vs PD) were considered. The model performs classification task using four different speech stimuli, namely, image description (IMAG), spontaneous speech (SPON), diadochokinetic rate (DIDK), and sustained phoneme production (PHON). Experiments were performed with 90 ALS, 90 PD, and 90 HC patient.

17th September 2021 : Speech Synergies by Chirag Vasist

3rd September 2021 : TaLNet: Voice Reconstruction from Tongue and Lip Articulation with Transfer Learning from Text-to-Speech Synthesis by Siddarth
3rd September 2021 : TaLNet: Voice Reconstruction from Tongue and Lip Articulation with Transfer Learning from Text-to-Speech Synthesis by Siddarth

Talk summary:

This paper presents TaLNet, a model for voice reconstruction with ultrasound tongue and optical lip videos as inputs. TaLNet is based on an encoder-decoder architecture. Separate encoders are dedicated to processing the tongue and lip data streams respectively. The decoder predicts acoustic features conditioned on encoder outputs and speaker codes. To mitigate for having only relatively small amounts of dual articulatory-acoustic data available for training, and since our task here shares with text-to-speech (TTS) the common goal of speech generation, we propose a novel transfer learning strategy to exploit the much larger amounts of acoustic-only data available to train TTS models. For this, a Tacotron 2 TTS model is first trained, and then the parameters of its decoder are transferred to the TaLNet decoder.

20th August 2021 : Kannada language modeling using text data from the domains of agriculture, finance, healthcare and general by Karthik S Vasisht
20th August 2021 : Kannada language modeling using text data from the domains of agriculture, finance, healthcare and general by Karthik S Vasisht

Talk summary:

Language modelling involves decomposing texts into smaller sections: sentences and words, and statistically analyzing them to make accurate predictions of phrases and sentences. The N-GRAM model is a statistical analysis tool that predicts the likelihood of certain words combining to form a meaningful sentence based on the conditional probabilities of each of the words in the sentence given the occurrence of the others. This presentation would discuss the work done during the period of 2 months towards link collection, web-scraping, text cleaning and validation of collected data for building a Kannada language model.

6th August 2021 : AAI and Palate contour estimation by Anish
6th August 2021 : AAI and Palate contour estimation by Anish

Talk summary:

AAI is a model which maps MFCC to EMA points. EMA data corresponds to movement of articulators in the mouth. Predicting EMA points helps us visualize the movement of our mouth while we speak, but predicting the palate contour along with the EMA points would help us better understand the movement of our mouth. The presentation would focus on different models which were trained along with different preprocessing techniques which were employed to predict the palate contours.

23rd July 2021 : Language Identification of ALS patients using X-Vector model by Yasaswini
23rd July 2021 : Language Identification of ALS patients using X-Vector model by Yasaswini

Talk summary:

Amyotrophic lateral sclerosis (ALS) is a rare neurological disease that primarily affects the nerve cells (neurons) responsible for controlling voluntary muscle movements like chewing, walking, and talking. As ALS hampers speech by a great deal, speech recognition techniques become predominant. So in order to build a model to identify their speech, language identification is the foremost crucial part.

16th July 2021 : Accent conversion using Cotatron by Chinmay
16th July 2021 : Accent conversion using Cotatron by Chinmay

Talk summary:

Accent conversion (AC) aims to make non-native speech sound as if the speaker has a certain native accent. Typical AC methods attempt to convert only the native speaker voice to that of a non-native native speaker, leaving the basic content and pronunciation unchanged. This hinders their practical use in real-world applications, because native-accented utterances are required at conversion stage. Students who get a second language after “critical age” often speak a language other than their mother tongue.This can lead to low understanding and speakers may face discriminatory situations. Therefore, students who communicate with native speakers have much to gain by improving their pronunciation. The presentation would discuss the work done during the period of 2 months towards testing, comparing, and improving the existing methods for the accent conversion

9th July 2021 : Age Estimation for ALS Patients Speech Utterance Based on LSTM by Lavanya
9th July 2021 : Age Estimation for ALS Patients Speech Utterance Based on LSTM by Lavanya

Talk summary:

Speaker age is part of the non-verbal information contained in speech. Age estimation consists of automatically determining the age of a speaker in a given segment of the speech utterance. Age estimation from speech has recently received increased interest as it is useful for many applications such as user-profiling, targeted marketing, or personalized call-routing. This kind of applications need to quickly estimate the age of the speaker and might greatly benefit from real-time capabilities. Long short-term memory (LSTM) have shown to outperform state-of the-art approaches in related speech-based tasks, such as language identification or voice activity detection, especially when an accurate real-time response is required.

2nd July 2021 : Speech-based classification of ALS patients and Healthy subjects by Sonakshi
2nd July 2021 : Speech-based classification of ALS patients and Healthy subjects by Sonakshi

Talk summary:

Amyotrophic Lateral Sclerosis (ALS) is a rare neurological disease that affects the motor neurons, hence causing loss of ability to speak, eat, move and breath There is no cure for ALS yet. Early detection is crucial for so that the therapeutic measures can be started at an early stage which would help in prolonging the life expectancy and quality of living for the patients. But unfortunately, the diagnosis of the disease is difficult and time consuming. Hence, there is need to develop an automatic device/app that can detect the disease, which would be beneficial for the early beginning of therapy leading to greater life expectancy. The presentation would discuss the work done during the period of 6 months towards testing,comparing and improving the existing methods for the classification purpose.

18th June 2021 : Segnet based ATB segmentation in rtMRI videos by Jelwin

11th June 2021 : Brain Stroke Segmentation using Deep Learning by Nikhil
11th June 2021 : Brain Stroke Segmentation using Deep Learning by Nikhil

Talk summary:

Stroke is one of the main reasons for adult deaths around the globe, impacting 6.2 million people per annum. Over the past 20 years, there has been a 26 percent increase in stroke deaths, worldwide. Across the world, stroke is the second leading cause of death. In recent years, machine and deep learning algorithms have created a huge impact on addressing research challenges in several domains includes health care, natural language processing, speech processing, and more. The medical field also greatly benefits from the utilization of improving deep learning models which save time and produce accurate results. Typically, the manual segmentation of strokes is done by expert radiologists or doctors who excelled in this field. It is said that the manual segmentation is time-consuming (takes nearly three to four hours to diagnose the problem) and also introduces inter and intra rater variability among the radiologists. It impacts brain stroke-affected patients if careful clinical decision-making is not made in less amount of time. To augment a radiologist's or doctor's effort, deep learning algorithms can be used effectively for segmenting clinical brain images and can be a valuable tool for this work.

21st May 2021 : A Scalable Deep Learning Model for Arbitrary Transmitter Configurations in Inverse Scattering by Karthik

7th May 2021 : Non-Attentive Tacotron: Robust and Controllable Neural TTS Synthesis Including Unsupervised Duration Modeling by Abhayjeet

30th April 2021 : Learning Rate Warmups and the Variance of Adaptive Learning Rates by Bheshaj

23rd April 2021 : ISTA-Net: Interpretable Optimization-Inspired Deep Network for Image Compressive Sensing by Bhargava

9th April 2021 : Speech to EMG mapping by Navaneetha

19th March 2021 : A literature survey on audio recording device identification by Bhavuk

12th March 2021 : A brief tutorial on Android app development and design patterns by Shankar

5th March 2021 : Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains by

26th February 2021 : Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains by

19th February 2021 : Principal Component Analysis (PCA), Kernel PCA and Independent Component Analysis (ICA) by Anwesha

13th February 2021 : Graph Convolutional Networks by Manthan

5th February 2021 : Tutorial on Equivariant Networks by

29th January 2021 : A Brief Introduction to Density Based Spatial Clustering for Applications with Noise by Priyanshi

22nd January 2021 : Graph Neural Networks for solving PDEs by Karthik

15th January 2021 : Capsule Networks by Siddharth

8th January 2021 : Whisper to neutral speech conversion by Subhadeep, Pritam and Debojyoti

1st January 2021 : A closer look at loss functions by Achuth

24th December 2020 : Approximate inference methods by Sathvik

14th December 2020 : Acoustic-Articulatory Mapping: Analysis and Improvements with Neural Network Learning Paradigms by Aravind

4th December 2020 : Fader Network by Tanuka

27th November 2020 : An unsupervised segmentation of vocal breath sounds by Shivani

19th November 2020 : Pulmonary function test graph digitizer by Sandhya

13th November 2020 : Overview of Microphones by Jeevan and Shaique

6th November 2020 : Feasibility of Learning(Continued) by Karthik

30th October 2020 : Feasibility of Learning by Karthik

17th October 2020 : Inverse scattering using two stage networks by Karthik

9th October 2020 : Introduction to Phonetics by Sharmistha

11th September 2020 : Introduction to Git and Docker by Sanjeev

27th August 2020 : On the power of curriculum learning in training deep networks by Siddharth

21st August 2020 : Generative models based on normalizing flows by Achuth

14th August 2020 : Generative models based on normalizing flows by Achuth

7th August 2020 : Electrical Impedance Tomography: A portable low cost setup for biomedical imaging and other applications by Karthik

31st July 2020 : Neural Networks and Differential Equations by Avni

24th July 2020 : Regularization for deep learning by Aravind

17th July 2020 : Thesis defence by Renuka

9th July 2020 : Speech task-specific representation learning using acoustic-articulatory data by Renuka

3rd July 2020 : Neural Turing Machines by Chiranjeevi

26th June 2020 : An overview of gradient descent optimization algorithms by Abinay

19th June 2020 : Speech rate estimation using representations learned from speech with convolutional neural network by Renuka

12th June 2020 : Acoustic-to-articulatory inversion of dysarthric speech by utilizing cross-corpus acoustic-articulatory data

5th June 2020 : Temporal decomposition of Speech by Tilak

29th May 2020 : Quantum Computing for breaking encryption by Pavan Kumar J

13th March 2020 : Linguistics: An Introduction by Sharmistha

6th March 2020 : AUTOMATIC CLASSIFICATION OF VOLUMES OF WATER USING SWALLOW SOUNDS FROM CERVICAL AUSCULTATION by Siddharth

28th February 2020 : Web Interface for acoustic feature analysis by Heena and Vaibhav

21st February 2020 : Deep Canonical Correlation Analysis by Sanjeev
21st February 2020 : Deep Canonical Correlation Analysis by Sanjeev

Talk summary:

Introduction to correlation and also linear algebra concepts like Eigen Decomposition, SVD, and PCA. This would be followed by how Canonical Correlation Analysis (CCA) uses all these to find ideal transformations.

14th February 2020 : Voice-based classification of patients with ALS, Parkinson's disease and healthy controls with CNN-LSTM using transfer learning by Jhansi

7th February 2020 : Hypothesis Testing by Shivani

31st January 2020 : Improving fundamental frequency generation in EMG-to-Speech Conversion using a Quantization Approach by Tejas

17th January 2020 : Comparison of interpolation schemes for the perception of speech in the presence of missing samples by Amit

3rd January 2020 : Inverse Scattering by Mahima

1st January 2020 : Out-of-Pronunciation Distribution Detection: An Unsupervised Approach by Parth

27th December 2019 : Data driven analysis of critical articulators in speech production by Anusuya

13th December 2019 : Multichannel Acoustic Source Localization by Tarun

6th December 2019 : Variational Methods by Achuth

29th November 2019 : Computational wave scattering by Karthik

22nd November 2019 : Basics of Graph Signal Processing by Aravind

11th October 2019 : Medical image segmentation on GPUs – A comprehensive review by Divya

4th October 2019 : Dynamic Programming: An Overview and Some Optimization Techniques by Shankar

27th September 2019 : Origins of Fourier Series by Pavan Kumar
27th September 2019 : Origins of Fourier Series by Pavan Kumar

Talk summary:

Introduction The Heat Equation Solution of PDE Fourier Series

2nd September 2019 : Comparison of automatic syllable stress detection quality with time-aligned boundaries and context dependencies by Manoj

2nd September 2019 : A comparative study of noise robustness of goodness of pronunciation (GoP) measures and its modifications based on teacher's utterance by Manoj

2nd September 2019 : Whisper to neutral mapping using cosine similarity maximization in i-vector space for speaker verification by Abinay

2nd September 2019 : An investigation on speaker specific articulatory synthesis with speaker independent articulatory inversion by Aravind

31st August 2019 : Low resource automatic intonation classification using gated recurrent unit (GRU) networks pre-trained with synthesized pitch patterns by Atreyee

23rd August 2019 : Achievements and the goals of the lab by Prasanta Kumar Ghosh

23rd August 2019 : ASR inspired syllable stress detection for pronunciation evaluation without using a supervised classifier and syllable level features by Manoj

23rd August 2019 : An improved goodness of pronunciation (GoP) measure for pronunciation evaluation with DNN-HMM system considering HMM transition probabilities. by Manoj

16th August 2019 : Acoustic and articulatory feature based speech rate estimation using a convolutional dense neural network. by Renuka
16th August 2019 : Acoustic and articulatory feature based speech rate estimation using a convolutional dense neural network. by Renuka

Talk summary:

In this paper, we propose a speech rate estimation approach using a convolutional dense neural network (CDNN). The CDNN based approach uses the acoustic and articulatory features for speech rate estimation. The Mel Frequency Cepstral Coefficients (MFCCs) are used as acoustic features and the articulograms representing time-varying vocal tract profile are used as articulatory features. The articulogram is computed from a real-time magnetic resonance imaging (rtMRI) video in the midsagittal plane of a subject while speaking. However, in practice, the articulogram features are not directly available, unlike acoustic features from speech recording. Thus, we use an Acoustic-to-Articulatory Inversion method using a bidirectional long-short-term memory network which estimates the articulogram features from the acoustics. The proposed CDNN based approach using estimated articulatory features requires both acoustic and articulatory features during training but it requires only acoustic data during testing. Experiments are conducted using rtMRI videos from four subjects each speaking 460 sentences. The Pearson correlation coefficient is used to evaluate the speech rate estimation. It is found that the CDNN based approach gives a better correlation coefficient than the temporal and selected sub-band correlation (TCSSBC) based baseline scheme by 81.58% and 73.68% (relative) in seen and unseen subject conditions respectively.

2nd August 2019 : Rethinking Model Scaling for Convolutional Neural Networks by Aparna

23rd July 2019 : Breath cycle segmentation by Shruthi
23rd July 2019 : Breath cycle segmentation by Shruthi

Talk summary:

Segmentation of individual inhale and exhale from an audio recording of continuous breaths using two approaches - Spectral entropy approach and Parzen window-based approach.

19th July 2019 : Performance characterization of microphones by Suhas
19th July 2019 : Performance characterization of microphones by Suhas

Talk summary:

we look at the parameters that affect the recording from a microphone, what to look for in a specifications sheet and also look to assess different microphones' performance qualitatively

16th July 2019 : Construction of an anthropomorphic thorax phantom using CT scan segmentation and 3D printing by Srishti
16th July 2019 : Construction of an anthropomorphic thorax phantom using CT scan segmentation and 3D printing by Srishti

Talk summary:

Phantoms that mimic human physiology have long been used for designing and testing various diagnostic medical/imaging techniques. The most important advantage of using phantoms is easy access to ground truth information which in most cases cannot be obtained from a human subject. The objective of this project is to construct an anthropomorphic thorax phantom that can be used to develop a system for multi-channel active/passive acoustic characterisation of lungs. However, to construct anthropomorphic phantoms we need a suitable way to capture anthropomorphic parameters and replicate them in the form of a phantom. To this end, we first obtain the anthropomorphic parameters of a human thorax using a CT scan. The CT scan is then segmented into various regions which would be finally printed using a 3D printer.

12th July 2019 : The task of Sound Event Detection by Shoureen
12th July 2019 : The task of Sound Event Detection by Shoureen

Talk summary:

The task of Sound Event Detection can be broadly classified into two categories, namely- classification and localization, the former catering to simple audio tagging while the latter requiring the additional task of specifying the onset and offset times of each event which is taking place in the given audio stream. The main challenge involved in audio tagging is the lack of availability of frame wise ground truths which essentially turns this into a Multiple Instance Learning problem. In my work, I have tested multiple pooling functions by incorporating them at various stages in order to maximize the F-score of the Audio Tagging System

12th July 2019 : Call recording app with some additional features. by Utkarsh

12th July 2019 : Trend Statistics Network and Channel invariant EEG Network for sleep arousal study by Achuth
12th July 2019 : Trend Statistics Network and Channel invariant EEG Network for sleep arousal study by Achuth

Talk summary:

Sleep is a very important part of life and lack of sleep or sleep disorder can cause a negative impact on day to day life and can have long term serious consequences. In this work, we propose an end-to-end trainable neural network for automated arousal scoring. The network consists of two main parts. Firstly, a trend statistics network that computes the moving average of the filtered signals at different scales. Secondly, we propose a channel invariant EEG network to detect the EEG arousals in any channel. Finally, we combine the features from various channels through a convolution network and bi-directional long short-term memory to predict the probability of the arousal. Further, we propose an objective function that uses only respiratory effort related arousal (RERA) and non-arousal regions to optimize the network. We also propose method to estimate the respiratory disturbance index (RDI) from the probability predicted by the network. Evaluation on Physionet Challenge 2018 database shows that the proposed method detects the RERA with area under the precision-recall curve (AUPRC) of 0.50 in a 10-fold cross validation setup. The mean absolute error of RDI prediction is 6.11, while a two-class RDI severity prediction yields a specificity of 75% and sensitivity of 83%

5th July 2019 : Effect of consonant context in TIMIT vcv sequences on pitch trend by Vaibhav
5th July 2019 : Effect of consonant context in TIMIT vcv sequences on pitch trend by Vaibhav

Talk summary:

This analysis describe that how the pitch trend in vcv sequences depend on voicing characteristics of consonant in the vowel region followed by consonant.

5th July 2019 : An exhaustive study on involvement articulators in the production of plosives. by Minulakshmi
5th July 2019 : An exhaustive study on involvement articulators in the production of plosives. by Minulakshmi

Talk summary:

The study focuses on the occurrence and duration of constriction for bilabial and laminal-alveolar plosives across the vowels /a,e,i,o,u/ in a symmetric VCV sequence.

28th June 2019 : Quantitative Trading by Sanjeev
28th June 2019 : Quantitative Trading by Sanjeev

Talk summary:

Exploring methods used by Quants in trading - Risk Model, Alpha Model, and strategies.

28th June 2019 : A Comparison of Different Methods for Audio Declipping by Sandhiya
28th June 2019 : A Comparison of Different Methods for Audio Declipping by Sandhiya

Talk summary:

A deep dive into state-of-the-art algorithms for audio declipping - Constrained Blind Amplitude Reconstruction, Constrained Orthogonal Matching Pursuit Reconstruction, and two variants Sparse Audio declippers.

20th June 2019 : An acoustic investigation on the effect of consonant context and speaking rate on vowel space and coarticulation in Toda VCV sequences. by Nayan
20th June 2019 : An acoustic investigation on the effect of consonant context and speaking rate on vowel space and coarticulation in Toda VCV sequences. by Nayan

Talk summary:

This study analyzes the effect of consonant context and speaking rate on vowel space and coarticulation in Toda vowel-consonant-vowel (VCV) sequences. The vowels /a/,/e/, /i/, /o/, /u/, and two intervocalic consonants, /p/ (labial) and /t/ (alveolar), are considered to form asymmetrical VCV sequences in slow and very fast speaking rates. Results from these acoustic analyses indicate that there are differences in the nature in which rate and consonant context affect the coarticulatory organization.

20th June 2019 : Acoustic analysis of swallow sounds in individuals with head and neck cancer by Divya
20th June 2019 : Acoustic analysis of swallow sounds in individuals with head and neck cancer by Divya

Talk summary:

This paper describes the effect of volume of water swallowed by healthy controls on the acoustic sound signals captured by means of cervical auscultation This study indicates that peak intensity of the second swallow segment is found to be the best parameter to differentiate different volumes since it changes significantly across three volumes of water considered in this study.

14th June 2019 : A study on the problem of heart-rate estimation from facial videos by Vishay

7th June 2019 : Unsupervised syllable stress detection by Manoj
7th June 2019 : Unsupervised syllable stress detection by Manoj

Talk summary:

Estimate stress markings in automatic speech recognition (ASR) framework involving finite-state-transducer (FST) without using annotated stress markings and segmental information.

6th May 2019 : AIR-TISSUE BOUNDARY SEGMENTATION IN REAL TIME MAGNETIC RESONANCE IMAGING VIDEO USING A CONVOLUTIONAL ENCODER-DECODER NETWORK by Renuka

6th May 2019 : AN IMPROVED AIR TISSUE BOUNDARY SEGMENTATION TECHNIQUE FOR REAL TIME MAGNETIC RESONANCE IMAGING VIDEO USING SEGNET by Renuka

26th April 2019 : Representation learning using convolution neural network for acoustic-to-articulatory inversion by Aravind

26th April 2019 : A Study on Robustness of Articulatory Features for Automatic Speech Recognition of Neutral and Whispered Speech by Gokul

19th April 2019 : FORMANT-GAPS FEATURES FOR SPEAKER VERIFICATION USING WHISPERED SPEECH by Abhinay

12th April 2019 : K-SVD by Karthik

15th March 2019 : Methods to work with class imbalanced datasets by Shivani Yadav
15th March 2019 : Methods to work with class imbalanced datasets by Shivani Yadav

Talk summary:

What is the class imbalance How they affect classifiers performance Methods to handle class imbalance

8th March 2019 : Initial value problem by Prasanta Kumar Ghosh
8th March 2019 : Initial value problem by Prasanta Kumar Ghosh

Talk summary:

Solving initial value problem in the context of ordinary differential equations (ODEs) using numerical methods, which are often required when ODEs are not analytically solvable. In this regard, both the theory and Matlab coding of Runga-Kutta family of methods will be discussed.

22nd February 2019 : Font and Background Color Independent Text Binarization by Vishay
22nd February 2019 : Font and Background Color Independent Text Binarization by Vishay

Talk summary:

Starting with the motivation for binarization of text images, then a discussion on global and adaptive thresholding techniques, and end with a discussion on a novel approach for Text Binarization

22nd February 2019 : Prediction of articulatory motion at different rates by Abhay
22nd February 2019 : Prediction of articulatory motion at different rates by Abhay

Talk summary:

Predicting the articulatory trajectories in speech production from Neutral to Fast or Slow rates using Encoder-Decoder based model with some alterations

15th February 2019 : A SegNet Based Image Enhancement Technique for Air-Tissue Boundary Segmentation in Real-Time Magnetic Resonance Imaging Video by Renuka

8th February 2019 : Weighted Finite-State Transducers in Speech Recognition by Manoj
8th February 2019 : Weighted Finite-State Transducers in Speech Recognition by Manoj

Talk summary:

Different operations on WFSTs How WFSTs are used in decoding an utterance

1st February 2019 : Articulatory Phonology by Renuka
1st February 2019 : Articulatory Phonology by Renuka

Talk summary:

Description of articulatory phonology Gestural computational model Gestural analysis

25th January 2019 : Learning deep features for one-class classification by Shankar
25th January 2019 : Learning deep features for one-class classification by Shankar

Talk summary:

We take a look at the problem of one-class classification and a deep learning-based solution for feature learning for one-class classification

11th January 2019 : Solving a system of linear equations by Karthik

28th December 2018 : On Utility of Multi-taper Modified Group Delay by Narendra
28th December 2018 : On Utility of Multi-taper Modified Group Delay by Narendra

Talk summary:

Function Representations for Speaker and Language Recognition The abstract can be found in the attached document

14th December 2018 : Visualizing High Dimensional Data using t-SNE by Aravind
14th December 2018 : Visualizing High Dimensional Data using t-SNE by Aravind

Talk summary:

Basic concepts of Information theory Introduction to visualization t-SNE (t-distributed stochastic neighbor embedding)

7th December 2018 : Overview of ASR by Avni
7th December 2018 : Overview of ASR by Avni

Talk summary:

Mathematical equation of ASR Overview of HMM-GMM ASR Viterbi decoding Advantages of WFST over tree based structures

30th November 2018 : How does Netflix & Amazon Prime recommend movies by Sweekar
30th November 2018 : How does Netflix & Amazon Prime recommend movies by Sweekar

Talk summary:

The talk is about how Matrix Factorization & Gradient Descent collectively work towards suggesting the best content possible for the viewer

9th November 2018 : Introduction to MIR, Audio licensing and blockchain technology by Suhas
9th November 2018 : Introduction to MIR, Audio licensing and blockchain technology by Suhas

Talk summary:

In this talk, we look at what music information retrieval is, why audio licensing is required and how audio watermarking and blockchains make data secure, accurate and reliable

2st November 2018 : Attention in the neural network by Achuth
2st November 2018 : Attention in the neural network by Achuth

Talk summary:

We will see how basic attention works in neural networks and understand how attention is used general seq2seq mapping problem including ASR, TTS, machine translation and image captions

12th October 2018 : Necessity for cloud computing by Valliappan

12th October 2018 : We’re creating a dystopia of misinformation and emotional manipulation by Aparna

5th October 2018 : Learning better models to sparsify yellow marks in manuscripts by Nisha
5th October 2018 : Learning better models to sparsify yellow marks in manuscripts by Nisha

Talk summary:

Faulty writing practices leading to "yellowing" of submitted drafts Factors in writing that are inversely proportional to dimension of "yellow marks" subspace "Check-List" algorithm to improve writing

28th September 2018 : Fisher Linear Discriminant by Chiranjeevi

24th August 2018 : Subband Weighting for Binaural Speech Source Localization by Karthik

24th August 2018 : Speech Enhancement Using Deep Mixture of Experts Based on Hard Expectation Maximization by Pavan

24th August 2018 : Relating Articulatory Motions in Different Speaking Rates by Astha

24th August 2018 : Whispered Speech to Neutral Speech Conversion Using Bidirectional LSTMs by Nisha

17th August 2018 : Air-Tissue Boundary Segmentation in Real-Time Magnetic Resonance Imaging Video Using Semantic Segmentation with Fully Convolutional Networks by Valliappan

17th August 2018 : Automatic visual augmentation for concatenation based synthesized articulatory videos from real-time MRI data for spoken language training by Chiranjeevi

17th August 2018 : Inferring speaker identity from articulatory motion during speech by Aravind

17th August 2018 : Low resource acoustic-to-articulatory inversion using bi-directional long short-term memory by Aravind

3rd August 2018 : Responsive Website Tool to Rate the Pronunciation Quality by Abhishek Gaonkar

3rd August 2018 : Richer convolutional features for edge detection by Renuka

27th July 2018 : Gottal Segmentation GUI in python by Varun

27th July 2018 : STUDY OF USE OF ARTICULATORY INFORMATION FOR ASR OF NEUTRAL AND WHISPERED SPEECH by Gokul

24th July 2018 : Interpretibility in Machine learning (ML) by Deep
24th July 2018 : Interpretibility in Machine learning (ML) by Deep

Talk summary:

We have been deploying the ML algorithms ("black box models") in various problems (e.g. classification tasks), and as a result it has become imperative that we develop tools for interpretibility of these "black boxes" so as to enable their deployment in real life applications. My aim is to give a brief overview of this science of interpretibility

20th July 2018 : A study on acoustic-to-articulatory inversion for understanding inter-speaker dependency by Siddant

20th July 2018 : Intonation classification using temporal structures in pitch contour by Atreyee

13th July 2018 : Rendering head gestures based on MO-CAP (OptiTrack) data by Varshini

13th July 2018 : Prediction of the air-tissue boundary in the upper airway of the vocal tract by Avinash

13th July 2018 : Implementation of frame selective dynamic programming based pitch estimation by Aswin

13th July 2018 : A Maximum Likelihood Formulation to Exploit Heart Rate Variability for Robust Heart Rate Estimation from Facial Video by Raseena

6th July 2018 : Detection and Delineation of P and T waves in an ECG signal by Prakhar

6th July 2018 : Broad Phoneme Class Specific Deep Neural Network Based Speech Enhancement by Pavan

6th July 2018 : Classification between story-telling and poem recitation using head gesture of the talker by Anurag

6th July 2018 : Comparison of Cough, Wheeze and Sustained Phonations for Automatic Classification between Healthy subjects and Asthmatic patients by Shivani

29th June 2018 : A Maximum Likelihood Formulation to Exploit Heart Rate Variability for Robust Heart Rate Estimation from Facial Video by Raseena
29th June 2018 : A Maximum Likelihood Formulation to Exploit Heart Rate Variability for Robust Heart Rate Estimation from Facial Video by Raseena

Talk summary:

Motivation for Non contact heart rate measurement Challenges in Non contact heart rate measurement The proposed maximum likelihood approach Experiments and results

29th June 2018 : Comparison of Cough, Wheeze and Sustained Phonations for Automatic Classification between Healthy subjects and Asthmatic patients by Shivani
29th June 2018 : Comparison of Cough, Wheeze and Sustained Phonations for Automatic Classification between Healthy subjects and Asthmatic patients by Shivani

Talk summary:

Introduction, motivation, proposed method, dataset, experimental setup, results, conclusion and future work.

22nd June 2018 : Automatic visual augmentation for articulatory videos from real-time MRI data by Chandana

15th June 2018 : A Brief introduction to Lungs anatomy, physiology, pathology and pulmonary function tests by Shivani
15th June 2018 : A Brief introduction to Lungs anatomy, physiology, pathology and pulmonary function tests by Shivani

Talk summary:

Lungs anatomy and physiology: Explanation and overview of lung function tests. Pulmonary diseases: Overview of two papers related to sound-based analysis or detection of pulmonary diseases.

4th May 2018 : Git and GitHub by Nitin
4th May 2018 : Git and GitHub by Nitin

Talk summary:

What is Git and GitHub Demonstration of tool

23rd Feb 2018 : Introduction to Bootstrap by Kausthubha
23rd Feb 2018 : Introduction to Bootstrap by Kausthubha

Talk summary:

Introduction to Bootstrap File Structure Typography Different classes of buttons used in Bootstrap

10th Feb 2018 : Joint Learning of Phonetic Units and Word Pronunciations for ASR by Avni
10th Feb 2018 : Joint Learning of Phonetic Units and Word Pronunciations for ASR by Avni

Talk summary:

Problem statement.
• Background.
• Model formulation for the underlying problem.
• Discussion.

2nd Feb 2018 : Why Deep learning? by Valliappan
2nd Feb 2018 : Why Deep learning? by Valliappan

Talk summary:

DNN vs Linear Classifier Back Propagation Understanding Back-Propagation for Batch Normalisation Layer Introduction to CNN GPU's Speech and CNN

5th Dec 2018 : Enhanced voice user interface employing spatial filtration of signals from acoustic vector sensor by Abinay Reddy
5th Dec 2018 : Enhanced voice user interface employing spatial filtration of signals from acoustic vector sensor by Abinay Reddy

Talk summary:

One of the current challenges in automatic speech recognition (ASR) is robust recognition in noisy conditions. We will discuss the idea of using acoustic vector sensor to improve ASR in noisy conditions.

29th Dec 2017 : Connectionist temporal classification (CTC) by Achuth Rao
29th Dec 2017 : Connectionist temporal classification (CTC) by Achuth Rao

Talk summary:

CTC is one of a key component in the recent state of the automatic speech recognition by Google and deep speech2. We will discuss the key ideas and motivation involved in developing CTC.

22nd Dec 2017 : Time Scaling of Articulatory Motion in Speech Production by Astha Singh
22nd Dec 2017 : Time Scaling of Articulatory Motion in Speech Production by Astha Singh

Talk summary:

Introduction Problem Statement Approaches : Interpolation, Affine Invariant DTW and Interpolation Some Results

15th Dec 2017 : Arc-cosine kernels and neural networks by Pavan Karjol
15th Dec 2017 : Arc-cosine kernels and neural networks by Pavan Karjol

Talk summary:

Kernel functions Arc-cosine kernels Neural networks Conclusions

8th Dec 2017 : On the importance/unimportance of phase in speech signal processing by Prasanta
8th Dec 2017 : On the importance/unimportance of phase in speech signal processing by Prasanta

Talk summary:

Definition of Phase Key results (perception) Role of phase in speech enhacenment, watermarking, synthesis, recognition

1st Dec 2017 : The Quantum Bit (Qubit) by Karthik
1st Dec 2017 : The Quantum Bit (Qubit) by Karthik

Talk summary:

States of a qubit Information and Measurement of a qubit state Single qubit gates Multi qubit gates Bell States/ EPR pairs Quantum Entanglement

10th Nov 2017 : Wagner-Fisher string-to-string correction algorithm and its optimality by Chiranjeevi Yarra
10th Nov 2017 : Wagner-Fisher string-to-string correction algorithm and its optimality by Chiranjeevi Yarra

Talk summary:

Problem definition Wagner-Fisher algorithm Objective function Modifications to the objective function Optimality

3rd Nov 2017 : The impact of speaking rate on acoustic-to-articulatory inversion by Aravind Illa
3rd Nov 2017 : The impact of speaking rate on acoustic-to-articulatory inversion by Aravind Illa

Talk summary:

Speech production Acoustic to articulatory inversion Effect of rate on inversion

13th Oct 2017 : simple introduction to Blind Source Separation by Karthik
13th Oct 2017 : simple introduction to Blind Source Separation by Karthik

Talk summary:

Introduction to blind source separation Ambiguities due to permutation, scaling and Gaussianity Principle of Independent Component Analysis (ICA) Maximum Likelihood based algorithm for ICA

6th Oct 2017 : Partial Least Squares Regression (contd.) by Nisha Meenakshi
6th Oct 2017 : Partial Least Squares Regression (contd.) by Nisha Meenakshi

Talk summary:

Issues in Multiple Linear Regression Nonlinear Iterative Partial Least Squares (NIPALS) Algorithm Discussion

9th June 2017 : Audio-Visual Keyword Spotting by Astha Singh
9th June 2017 : Audio-Visual Keyword Spotting by Astha Singh

Talk summary:

Introduction Idea for implementing AV-KWS Feature extraction Audio, Visual HMM - Overview Fusion Strategy for audio and visual modalities output Expected Results

2nd June 2017 : Audio-Visual Speech Enhancement by Ajay Mahender Singh
2nd June 2017 : Audio-Visual Speech Enhancement by Ajay Mahender Singh

Talk summary:

Introduction to the problem statement Motivation Initial work - just speech Feature extraction The Menpo Project Visual features Enhancement techniques Conclusion and future work

26th May 2017 : Illumination Variation-Resistant Video-Based Heart Rate Measurement Using Joint Blind Source Separation and Ensemble Empirical Mode Decomposition by Raseena KT
26th May 2017 : Illumination Variation-Resistant Video-Based Heart Rate Measurement Using Joint Blind Source Separation and Ensemble Empirical Mode Decomposition by Raseena KT

Talk summary:

Photoplethysmography Joint Blind Source Seperation Estimating Heart Rate from face Video

19th May 2017 : non-ASR based keyword spotting by Samik Sadhu
19th May 2017 : non-ASR based keyword spotting by Samik Sadhu

Talk summary:

Motivation Recap of Poisson Process Models in Keyword Spotting Discriminative Training of Poisson Process Models in Keyword Spotting Unsupervised Online Learning of Poisson Process Models Posteriorgram Filtering based Keyword Spotting Future Scope of Work

5th May 2017 : Variational RNN by Pavan Karjol
5th May 2017 : Variational RNN by Pavan Karjol

Talk summary:

Dynamic Bayesian Networks Recurrent Neural Networks Variational Recurrent Neural Networks Experiments

28th April 2017 : Finite State Transducers and its Application in KALDI by Avni Rajpal
28th April 2017 : Finite State Transducers and its Application in KALDI by Avni Rajpal

Talk summary:

Motivation Basic terms and definitions Operations: particularly composition and determinization Speech Recognition using FST

21st April 2017 : WaveNet: A Generative Model for Raw Audio by Achuth Rao
21st April 2017 : WaveNet: A Generative Model for Raw Audio by Achuth Rao

Talk summary:

Recall the generative and discriminative models Generative models used in speech Why modeling direct audio is difficult How Wavenet overcome these difficulties How Wavenet combine both feature of generative and discriminative model features How single model can be used to solve 4-different problem in speech - (a) TTS (b) Multi speaker speech generation (c) Music generation (d) speech recognition.

14th April 2017 : Video Editing with Blender by Gaurav Fotedar
14th April 2017 : Video Editing with Blender by Gaurav Fotedar

Talk summary:

Introduction to the Blender VSE Extracting audio from video Cutting/cropping videos Replacing audio in a video with audio from another source changing frame rates Adding Subtitles Video Overlay Making Compilation Videos

31st March 2017 : Hypothesis Testing by Prasanta Ghosh
31st March 2017 : Hypothesis Testing by Prasanta Ghosh

Talk summary:

Definition Null and alternative hypothesis Test procedure Error in hypothesis testing Significance level Tests about a population mean Tests concerning a population proportion P-value

24th March 2017 : Hypothesis Testing by Prasanta Ghosh
24th March 2017 : Hypothesis Testing by Prasanta Ghosh

Talk summary:

Definition Null and alternative hypothesis Test procedure Error in hypothesis testing Significance level Tests about a population mean Tests concerning a population proportion P-value

10th March 2017 : Variational Auto encoder by Pavan Karjol
10th March 2017 : Variational Auto encoder by Pavan Karjol

Talk summary:

Introduction Stochastic gradient variational Bayes (SGVB) estimator Experiments and conclusion

24th February 2017 : Mock Presentations by
24th February 2017 : Mock Presentations by

Talk summary:

Pitch Prediction from Mel-frequency Cepstral Coefficients Using Sparse Spectrum Recovery by Achuth Rao A Comparative Study on the Effect of Different Codecs on Speech Recognition Accuracy Using Various Acoustic Modeling Techniques by Nisha Meenakshi Classification of Healthy Subjects and Patients with Essential Vocal Tremor using Empirical Mode Decomposition of High-Resolution Pitch Contour by Mekhala H S

17th February 2017 : automatic detection of syllable stress using sonority based\tprominence features by Chiranjeevi Yarra
17th February 2017 : automatic detection of syllable stress using sonority based\tprominence features by Chiranjeevi Yarra

Talk summary:

How sonority is useful? Existing works on measuring sonority Proposed approach Results Conclusion

10th February 2017 : A COMPARATIVE STUDY OF ACOUSTIC-TO-ARTICULATORY INVERSION FOR NEUTRAL AND WHISPERED SPEECH by Aravind Illa
10th February 2017 : A COMPARATIVE STUDY OF ACOUSTIC-TO-ARTICULATORY INVERSION FOR NEUTRAL AND WHISPERED SPEECH by Aravind Illa

Talk summary:

Introduction.
• Data collection.
• Experimental set-up.
• Results.
• Conclusion.

3rd February 2017 : Automatic detection and diagnosis of phoneme pronunciation quality: a review by Chiranjeevi Yarra
3rd February 2017 : Automatic detection and diagnosis of phoneme pronunciation quality: a review by Chiranjeevi Yarra

Talk summary:

Introduction.
• Mispronunciation detection.
• Error diagnosis.
• Conclusion.

20th January 2017 : Classification of Voluntary Cough Airflow Patterns for Prediction of Abnormal Spirometry by Shivani Yadav
20th January 2017 : Classification of Voluntary Cough Airflow Patterns for Prediction of Abnormal Spirometry by Shivani Yadav

Talk summary:

Introduction: What is spirometry and its variables, what is the need for automatic classification using cough flow pattern.
• Study design.
• Method used.
• Result.

13th January 2017 : Poisson Process Based Keyword Spotting and its Variants by Samik Sadhu
13th January 2017 : Poisson Process Based Keyword Spotting and its Variants by Samik Sadhu

Talk summary:

Introduction Generating Events Model Phonetic Events Keyword Searching with Poisson Process Models Bayesian Approach to training A Better phonetic event selection technique One of our works in PPM - Discriminative Training of PPM Receiver Operating Curves (ROC) and Figure of Merit (FOM) Conclusion

6th January 2017 : The Task Dynamic Model of Speech Production by Nisha Meenakshi
6th January 2017 : The Task Dynamic Model of Speech Production by Nisha Meenakshi

Talk summary:

What is articulatory phonology? How do you model the movement of articulators? Applications of the task dynamic model.

30th December 2016 : Degenerate Unmixing Estimation Technique (DUET) by Girija Ramesan Karthik
30th December 2016 : Degenerate Unmixing Estimation Technique (DUET) by Girija Ramesan Karthik

Talk summary:

What does that even mean? W-Disjoint Orthogonality Approximate W-Disjoint Orthogonality of Speech ML Parameter Estimation for 2-mixture Speech Separation The phase wrapping problem Weighted histogram based Estimators for 2-mixture Speech Separation

16th December 2016 : Why does deep and cheap learning work so well? -- Part II by Achuth Rao
16th December 2016 : Why does deep and cheap learning work so well? -- Part II by Achuth Rao

Talk summary:

We focus out attention approximation of radial function[fun(|x|)]. We construct simple radial function and show how that can approximated by three layer network with complexity poly(d), but the 2-layer network requires exp(d) units.(d is the dimension of input) We show that the simple function can generalize to any radial function.

2nd December 2016 : Associative Networks by Karthik Ramesan
2nd December 2016 : Associative Networks by Karthik Ramesan

Talk summary:

What is association? Type of Associative Networks Linear & Non linear Associators Linear & Non linear Associators Energy function Conclusion

25th November 2016 : Mail server by Kausthubha
25th November 2016 : Mail server by Kausthubha

Talk summary:

Introduction How mail server works SMTP/POP3 Summary

18th November 2016 : Keyword Spotting in Continuous Speech; An Overview of Different Approaches to Keyword Spotting by Samik Sadhu
18th November 2016 : Keyword Spotting in Continuous Speech; An Overview of Different Approaches to Keyword Spotting by Samik Sadhu

Talk summary:

Keyword Spotting What is So Special in That? Going Deep! - DNNs, CNNs Going Sparse! - Dictionary Learning Go Semi(pseudo!) Unsupervised!- Query by Example Go Completely Unsupervised!

11th November 2016 : Automatic Prosodic Event Detection by Vijayakrishna
11th November 2016 : Automatic Prosodic Event Detection by Vijayakrishna

Talk summary:

Introduction ToBI convention explaination Existing methods to tackle the prosodic event detection problem Conclusion and future work

28th October 2016 : Why does deep and cheap learning work so well? by Achuth Rao
28th October 2016 : Why does deep and cheap learning work so well? by Achuth Rao

Talk summary:

Intro about neural network Proof overview of neural network universal approximation. Visual proof How the depth helps.

21st October 2016 : Return of the Savitzky-Golay (SG) filters by Nisha Meenakshi
21st October 2016 : Return of the Savitzky-Golay (SG) filters by Nisha Meenakshi

Talk summary:

Differentiation filter Moment preservation property of SG filters. When is an SG filter an optimal filter?

14th October 2016 : Savitzky-Golay (SG) filters by Nisha Meenakshi
14th October 2016 : Savitzky-Golay (SG) filters by Nisha Meenakshi

Talk summary:

Filter formulation Properties of the SG filter with illustrative examples Exemplary application: ECG denoising. Conclusions

7th October 2016 : Generalized Triangular Decomposition in Transform coding by Aravind Illa
7th October 2016 : Generalized Triangular Decomposition in Transform coding by Aravind Illa

Talk summary:

Theorem definition Karhunen-Loeve Transform (KLT) Prediction-based lower triangular transform (PLT) GTD Transform Coder conclusions

30th September 2016 : The Perron-Frobenius theorem and It's applications -- Part 2 by Pavan Karjol
30th September 2016 : The Perron-Frobenius theorem and It's applications -- Part 2 by Pavan Karjol

Talk summary:

Theorem definition Proof of the theorem. Applications involving Markov chains and page rank algorithm conclusions

23rd September 2016 : The Perron-Frobenius theorem and It's applications by Pavan Karjol
23rd September 2016 : The Perron-Frobenius theorem and It's applications by Pavan Karjol

Talk summary:

Theorem definition Proof of the theorem. Applications involving Markov chains and page rank algorithm conclusions

16th September 2016 : Allpass modeling of phase spectrum of speech signal by Prasanta Ghosh
16th September 2016 : Allpass modeling of phase spectrum of speech signal by Prasanta Ghosh

Talk summary:

significance of phase spectrum in speech signal processing Allpass modeling of phase spectrum of speech Applications of allpass modeling including formant tracking and GCI identification conclusions

2nd September 2016 : A General Regression Neural Network by Aravind Illa
2nd September 2016 : A General Regression Neural Network by Aravind Illa

Talk summary:

Introduction to GRNN Advantages Limitations Conclusion

19th August 2016 : Linear regression fit under Laplacian noise by Chiranjeevi Yarra
19th August 2016 : Linear regression fit under Laplacian noise by Chiranjeevi Yarra

Talk summary:

Problem definition Problem formulation Solution under special cases -- When C=0, When m=0 Generic solution -- Alternating minimization, Samples based solution Experimental results

12th August 2016 : Automatic recognition of social roles using long term role transitions in small group interactions by Gaurav Fotedar
12th August 2016 : Automatic recognition of social roles using long term role transitions in small group interactions by Gaurav Fotedar

Talk summary:

Introduction to Roles Data Proposed Method Experiments & Results Conclusion & Future Work

5th August 2016 : Distributed Maximum Likelihood estimation of GMM parameters by Varsha Satish
5th August 2016 : Distributed Maximum Likelihood estimation of GMM parameters by Varsha Satish

Talk summary:

Introduction Basic EM Distributed EM Application of Distributed EM Two distributed EM algorithms implementation of EM Algorithm in C problems faced while implementing it on C

29th July 2016 : Classification of Healthy Subjects and Patients with Essential vocal tremor using Empirical Mode Decomposition of High Resolution Pitch Contour by Mekhala
29th July 2016 : Classification of Healthy Subjects and Patients with Essential vocal tremor using Empirical Mode Decomposition of High Resolution Pitch Contour by Mekhala

Talk summary:

Introduction Obtaining High Resolution Pitch using Glottal Closure Instants(GCIs) Pitch Oscillation Characteristics (POC) extraction using Empirical Mode Decomposition Experimentation - Baseline and Evaluation metric Results and discussion

29th July 2016 : Audio Visual Synthesis by Valliappan
29th July 2016 : Audio Visual Synthesis by Valliappan

Talk summary:

Introduction to the Problem Dataset (PRAV Corpus) Approaches (Dynamic programming and LSTM - RNN) Results Conclusion and Further Work

22nd July 2016 : Music Reconstruction, Separation and synthesis by Anurendra
22nd July 2016 : Music Reconstruction, Separation and synthesis by Anurendra

Talk summary:

Problem formulation Background Our new model Theoretical derivations and solutions Implementation issues

15th July 2016 : Finding a relation between acoustic features of speech and head motion of the speaker by Pranav
15th July 2016 : Finding a relation between acoustic features of speech and head motion of the speaker by Pranav

Talk summary:

Introduction Brief idea about HMM Data Acquisition and Preparation Different methods that have been adopted previously to cluster head motion The different tests we performed to check for a relation between speech and head motion Scope for further research

11th July 2016 : Implementing an intonation practice environment for voisTUTOR by Anand
11th July 2016 : Implementing an intonation practice environment for voisTUTOR by Anand

Talk summary:

Why intonation is important in speech The methodology undertaken in creating the stylisation (Intonation practice) Results Further work

8th July 2016 : Sparse modelling of Residual for Vocal tract estimation at high pitch by Chaithya
8th July 2016 : Sparse modelling of Residual for Vocal tract estimation at high pitch by Chaithya

Talk summary:

Introduction Problem Statement Earlier Methods Sparse Modelling of residual Properties GCI Corrective Algorithm

8th July 2016 : Detection and delineation of SLEEP APNEA and HYPOPNEA from EDR-ECG Derived Respiratory Signal by Salma B
8th July 2016 : Detection and delineation of SLEEP APNEA and HYPOPNEA from EDR-ECG Derived Respiratory Signal by Salma B

Talk summary:

Initial work understanding of the ECG link of respiration and ECG mathematical implementation.

4th July 2016 : Identification and labelling of prosodic groups in utterances using ToBI by Vaidhya
4th July 2016 : Identification and labelling of prosodic groups in utterances using ToBI by Vaidhya

Talk summary:

What is ToBI What are its applications Explanation of the 4 tiers in brief Explaining the tone tier in detail

4th July 2016 : Carnatic Music app in Android by Priyadarshini S
4th July 2016 : Carnatic Music app in Android by Priyadarshini S

Talk summary:

Different components of carnatic music Implementation of various aspects of carnatic music training. Online feedback for practice sessions

30th June 2016 : Comparative study of the pulse rate estimation from facial video under different video compression schemes by Paridhi Maheshwari
30th June 2016 : Comparative study of the pulse rate estimation from facial video under different video compression schemes by Paridhi Maheshwari

Talk summary:

Prior Methods Sparse Spectral Peak Tracking Algorithm Motivation Database & Recording Setup I & II Results Conclusion

30th June 2016 : Glottal source modelling for improving text-to-speech (TTS) systems by Tom Francis
30th June 2016 : Glottal source modelling for improving text-to-speech (TTS) systems by Tom Francis

Talk summary:

Introduction A biologically inspired glottal model for TTS A novel parameterization for the glottal waveform using the beta distribution Conclusion

24th June 2016 : Rank sparsity incoherence for matrix decomposition by Pavan Karjol
24th June 2016 : Rank sparsity incoherence for matrix decomposition by Pavan Karjol

Talk summary:

Introduction.
• Problem formulation.
• Conditions for unique decomposition.
• Results and conclusions.

17th June 2016 : Speaker Verification by Achuth Rao
17th June 2016 : Speaker Verification by Achuth Rao

Talk summary:

introduction to speaker verification Models for handling inter speaker variability Models for handling inter session variability Models that can handle both- i-vector

3rd June 2016 : Face and Body Gesture Recognition and Analysis by Dr. Tanaya Guha
3rd June 2016 : Face and Body Gesture Recognition and Analysis by Dr. Tanaya Guha

Talk summary:

We will cover two aspects of gesture understanding - recognition and analysis. In the first part, we will discuss sparse representation-based classification algorithms for recognizing face and body gestures in videos. In the second part, we'll concentrate on analyzing facial gestures of children with autism using motion capture (mocap) data.

27th May 2016 : SPIRE-ABC: An online tool for acoustic-unit boundary correction (ABC) via crowdsourcing by Kausthubha N K
27th May 2016 : SPIRE-ABC: An online tool for acoustic-unit boundary correction (ABC) via crowdsourcing by Kausthubha N K

Talk summary:

Introduction to Annotation Motivation for annotation Online tool for the annotation (wavesurfer.js) and it's limitations Modifications made to achieve the proposed system Hands-on session to use the online tool

20th May 2016 : Highs in my Life and my Selling Mishaps by Sanjeev Mittal
20th May 2016 : Highs in my Life and my Selling Mishaps by Sanjeev Mittal

Talk summary:

SPIRE recipe for super speaking skills to sell your ideas from the stage.
• Showcase videos: Two great public talks based on the similar recipe.
• Open discussion on the recipe.
• Open platform: Opportunity for anyone wishing to practice their ideas to utilize the occasion to deliver a flash talk based on the introduced framework.

29th April 2016 : Comparison of acoustic to articulatory inversion of ALS patients and healthy controls by Neha Koundal
29th April 2016 : Comparison of acoustic to articulatory inversion of ALS patients and healthy controls by Neha Koundal

Talk summary:

Comparison of acoustic to articulatory inversion of ALS patients and healthy controls

16th April 2016 : Acoustic based speech rate estimation using data driven approaches by Chiranjeevi Yarra
16th April 2016 : Acoustic based speech rate estimation using data driven approaches by Chiranjeevi Yarra

Talk summary:

Introduction NMF based speech rate estimation Mode-shape based peak detection strategy for speech rate estimation Experimental results Conclusions

17th March 2016 : ICASSP poster talks by Chiranjeevi, Navaneet, Prasanta
17th March 2016 : ICASSP poster talks by Chiranjeevi, Navaneet, Prasanta

Talk summary:

A robust speech rate estimation based on the activation profile from the selected acoustic unit dictionary, multiple spectral peak tracking for heart rate monitoring from photoplethysmography signal during intensive physical exercise, better acoustic normalization in subject-independent acoustic-to-articulatory inversion: benefit to recognition.

4th March 2016 : Spatial Hearing by Karthink Ramesan
4th March 2016 : Spatial Hearing by Karthink Ramesan

Talk summary:

Introduction to spatial hearing cues that help in spatial hearing, structural approximations for binaural hearing, conclusions.

19th February 2016 : A model selection approach to audio segmentation via the Bayesian Information Criterion (BIC) by Nisha Meenakshi
19th February 2016 : A model selection approach to audio segmentation via the Bayesian Information Criterion (BIC) by Nisha Meenakshi

Talk summary:

Problem1: Audio Segmentation Problem2: Model Selection BIC for model selection How can audio segmentation be viewed as a model selection problem? Literature: BIC in audio segmentation.

12th February 2016 : Phase Processing for Single-Channel Speech Enhancement by Pavan Karjol
12th February 2016 : Phase Processing for Single-Channel Speech Enhancement by Pavan Karjol

Talk summary:

INTRODUCTION ITERATIVE ALGORITHMS FOR PHASE ESTIMATION SINUSOIDAL MODEL-BASED PHASE ESTIMATION GROUP DELAY AND TRANSIENT PROCESSING RELATION BETWEEN PHASE AND MAGNITUDE ESTIMATION RESULTS AND CONCLUSION

5th February 2016 : Speaker Verification methods by Achuth Rao
5th February 2016 : Speaker Verification methods by Achuth Rao

Talk summary:

Introduction to speaker verification GMM based methods GMM UBM based methods(MAP adaptation) EMAP adaptation Eigen voices

29th January 2016 : HTML & CSS Building a Static Website by Gaurav Fotedar
29th January 2016 : HTML & CSS Building a Static Website by Gaurav Fotedar

Talk summary:

HTML Structure HTML Basic Elements HTML Forms Basic CSS Syntax (Inline and File) HTML 5 Elements CSS3 elements

15th January 2016 : SIGNAL SUBSPACE APPROACH FOR SPEECH ENHANCEMENT by Pavan Karjol
15th January 2016 : SIGNAL SUBSPACE APPROACH FOR SPEECH ENHANCEMENT by Pavan Karjol

Talk summary:

Speech enhancement overview Signal and noise models Signal and noise subspaces Linear estimators (TDC and LDC) Results and conclusions

1st January 2016 : Speech Analysis/Synthesis Based on a Sinusoidal Representation by Aravind Illa
1st January 2016 : Speech Analysis/Synthesis Based on a Sinusoidal Representation by Aravind Illa

Talk summary:

Sinusoidal Speech Model Estimation of Speech Parameters Frame-To-Frame Peak Matching Synthesis System Extension to Harmonic Models

11th December 2015 : Voice Conversion by Achuth Rao
11th December 2015 : Voice Conversion by Achuth Rao

Talk summary:

Overview GMM based voice conversion modifications to GMM for voice conversion Frequency warping and amplitude scaling for voice conversion.

11th December 2015 : On generative and discriminative models by Prasanta Kumar Ghosh
11th December 2015 : On generative and discriminative models by Prasanta Kumar Ghosh

Talk summary:

What is generative model (including examples)? What is discriminative model (including examples)? Asymptotic performance of generative and discriminative models Discriminative training of generative model - blending generative and discriminative models

20th November 2015 : Blind Source Separation Using Wigner-Ville Distribution (WVD) by Chiranjeevi Yarra
20th November 2015 : Blind Source Separation Using Wigner-Ville Distribution (WVD) by Chiranjeevi Yarra

Talk summary:

Problem statement Problem formulation with WVD Joint diagonalization Simulation results.

5th November 2015 : Language models and an introduction to the IRSTLM toolkit by Nisha Meenakshi
5th November 2015 : Language models and an introduction to the IRSTLM toolkit by Nisha Meenakshi

Talk summary:

What are language models?
• Where are they used?
• How does the IRSTLM toolkit perform language modeling?
• A few examples of IRSTLM implementation.

30th October 2015 : vi basics & survival skills by Sanjeev Mittal
30th October 2015 : vi basics & survival skills by Sanjeev Mittal

Talk summary:

Various states and transitions in vi.
• Quick basic survival commands.
• Day-to-day commands.
• Advanced commands.
• Pitfalls and troubleshooting.

22nd October 2015 : Understanding OOPS Concepts using Java by Gaurav Fotedar
22nd October 2015 : Understanding OOPS Concepts using Java by Gaurav Fotedar

Talk summary:

Classes and objects, polymorphism, inheritance, method and operator override, interfaces and abstract classes, arrays and string class, generics.

16th October 2015 : Robust real-time pulse rate estimation from facial video using sparse spectral peak tracking by Aditya Gaonkar
16th October 2015 : Robust real-time pulse rate estimation from facial video using sparse spectral peak tracking by Aditya Gaonkar

Talk summary:

Estimating pulse rate of subjects from facial videos. Usage of "Independent Component Analysis" in Biomedical Signal Processing. An overview of the proposed method. Discussion on the obtained results.

9th October 2015 : Video Lecture by
9th October 2015 : Video Lecture by

Talk summary:

Finite state transducer Speech recognition using finite state transducer

1st October 2015 : Low Rank and Sparse Matrix Decomposition by Jitendra Kumar Dhiman
1st October 2015 : Low Rank and Sparse Matrix Decomposition by Jitendra Kumar Dhiman

Talk summary:

Type of the problem: Inverse problem. Problem formulation for low-rank and sparse matrix decomposition. Problem solution using the "Augmented Lagrangian Method of Multipliers." Application to musical noise removal for speech signal. Abstract: Inverse problems arise in many applications of science and engineering. There have been several approaches to solving such problems. We will discuss one particular type of inverse problem: "Low-rank and sparse matrix decomposition." In this problem, we are given data in the form of a matrix, which exhibits the property of being the sum of two unknown matrices—one of which is low-rank, and the other is sparse. The goal is to achieve such decomposition of the given matrix. This problem can be solved in an optimization framework. Although many algorithms are available to solve this problem, we will focus on one particular optimization algorithm (Augmented Lagrangian Method). Finally, we will apply the algorithm for musical noise separation from speech signals. In order to separate musical noise from the denoised speech signal, the algorithm exploits the structure of musical noise, which is sparse in the time-frequency domain, and speech, which is low-rank.

25th September 2015 : Estimation of the air-tissue boundaries (ATBs) of the vocal tract in the mid-sagittal plane from electromagnetic articulograph (EMA) data by Pattem Ashok Kumar
25th September 2015 : Estimation of the air-tissue boundaries (ATBs) of the vocal tract in the mid-sagittal plane from electromagnetic articulograph (EMA) data by Pattem Ashok Kumar

Talk summary:

Introduction to EMA and real time magnetic resonance imagining (rtMRI) Co-registration of the EMA data and ATBs in the rtMRI Estimation of the ATBs from registered EMA Results Discussion on the quality of estimated ATBs

18th September 2015 : Speech Beyond Speech - IS2015? by Prasanta Ghosh
18th September 2015 : Speech Beyond Speech - IS2015? by Prasanta Ghosh

Talk summary:

SPIRE lab's paper presentations Some good/significant works latest trends Challenges New Tools/Datasets Dresden

11th September 2015 : Fundamentals of HMM-based speech synthesis. by Achuth Rao
11th September 2015 : Fundamentals of HMM-based speech synthesis. by Achuth Rao

Talk summary:

Vocoding techniques: Speech parameter modeling and generation algorithm, spectrum parameter, F0 parameter, context-clustering, advantages, and disadvantages.

4th September 2015 : A discriminative analysis within and across voiced and unvoiced consonants in neutral and whispered speech in multiple Indian languages. by Nisha Meenakshi
4th September 2015 : A discriminative analysis within and across voiced and unvoiced consonants in neutral and whispered speech in multiple Indian languages. by Nisha Meenakshi

Talk summary:

Typically, voiced consonants are voiceless when whispered, as whispered speech lacks the vocal chord vibrations. Therefore, we ask the following questions. Is the discrimination between the voiced and unvoiced (V-UV) consonants still preserved in whispered speech? Is the variation of the acoustics from neutral to whispered speech, consonant specific? Does language affect V-UV consonant discrimination?

29th August 2015 : Speech recognition using HTK by Amber Afshan
29th August 2015 : Speech recognition using HTK by Amber Afshan

Talk summary:

Brief introduction about HMM and Viterbi recognition Tutorial on building a basic recognition system using HTK -- Preparing data, Training, Recognition. HLDA transform. Speaker adaptation. GMM using HTK.

28th August 2015 : Video Lecture by
28th August 2015 : Video Lecture by

Talk summary:

A brief history of Speech recognition The Probabilistic Approach Feature Extraction, Acoustic Modelling, Language Modelling, Search. Where we stand.

21st August 2015 : Android App Development by Ataur Rehman
21st August 2015 : Android App Development by Ataur Rehman

Talk summary:

What is Android? Android internals. Briefing about applicaion Building Blocks. Steps to develop Hello Android App in android studio. Demo of some crazy android Apps.

31st July 2015 : Implementation of Automatic Gender Classification using Normal and Whispered speech in Android by Shashidhar Prabhu
31st July 2015 : Implementation of Automatic Gender Classification using Normal and Whispered speech in Android by Shashidhar Prabhu

Talk summary:

Finding the pitch in neutral speech
• Recording whisper and finding the MFCC features of the recorded speech
• SVM modeling
• Results

31st July 2015 : Classification of Voiced and Unvoiced frames in speech using Periodicity Transforms by Shashidhar Prabhu
31st July 2015 : Classification of Voiced and Unvoiced frames in speech using Periodicity Transforms by Shashidhar Prabhu

Talk summary:

Introduction to periodicity transforms
• Choosing the best algorithm in periodicity transforms for the classification
• Feature extraction and SVM modeling
• Results

31st July 2015 : Part-1: Insight into real time magnetic resonance imagining (rtMRI) and its applications towards the understanding of human speech.
Part-2: Automatic classification of eating conditions from speech using acoustic feature selection and a set of hierarchical support vector machine classifiers by Abhay Prasad
31st July 2015 : Part-1: Insight into real time magnetic resonance imagining (rtMRI) and its applications towards the understanding of human speech.
Part-2: Automatic classification of eating conditions from speech using acoustic feature selection and a set of hierarchical support vector machine classifiers by Abhay Prasad

Talk summary:

Introduction to rtMRI voice activity detection using rtMRI.
• Broad-class phonetic recognition with rtMRI.
• Extraction of variant and invariant components of speech and its application towards speaker identification using rtMRI.

3rd July 2015 : An introduction to the nature of whispered speech and an overview of LTLEV as a feature for whisper activity detection (WAD) by Nisha Meenakshi
3rd July 2015 : An introduction to the nature of whispered speech and an overview of LTLEV as a feature for whisper activity detection (WAD) by Nisha Meenakshi

Talk summary:

The differences between the nature of whispered speech and neutral speech will be discussed, with a few illustrations to aid understanding.
• The use of the newly developed signal characteristic—Long Term Logarithmic Energy Variation (LTLEV) will be explained.
• The performance assessment of this feature and four other baseline schemes, for WAD in the presence of eight different noises.

26th June 2015 : Light on Photoplethysmography signal which requires processing by Vijitha Periyasamy
26th June 2015 : Light on Photoplethysmography signal which requires processing by Vijitha Periyasamy

Talk summary:

Insight into Photoplethysmography (PPG)
• Challenges in processing PPG
• Paths to analyze PPG
• Opportunities for contribution in PPG

22nd June 2015 : Exploring the use of discriminative dictionary learning for enhancement of audio in additive magnetic resonance imaging noise by Ataur Rehman
22nd June 2015 : Exploring the use of discriminative dictionary learning for enhancement of audio in additive magnetic resonance imaging noise by Ataur Rehman

Talk summary:

Introduction to MRI recording: advantages and problems (noisy speech recording) with MRI recording.
• Techniques to enhance the noisy speech recording, i.e., NMF, PLCA, and DDL algorithms-based audio enhancement.
• Discuss the consequences of the aforementioned algorithms.
• Conclusions and future work.

S P I R E L A B