LAB EVENTS > TALKS

WELCOME TO THE TALKS PAGE


3rd June 2022 : Selection of acoustically similar sentences based on phone error rate in the context of ASR by Saurabh Kumar
3rd June 2022 : Selection of acoustically similar sentences based on phone error rate in the context of ASR by Saurabh Kumar

Talk summary:

  • For many languages, state-of-the-art ASR systems are reported to perform poorly due to the lack of acoustically and phonetically rich speech data available for system building. Even for resource-rich languages such as English, little efforts have been made to finding an efficient method to select training data similar to the testing conditions. Instead, state-of-the-art ASR systems are data hungry and require lots of speech data for training. Therefore, data selection plays a crucial role in the development of robust and computationally efficient ASR systems. In the last few years, several methods have been reported that ensure both acoustic and phonetic richness of the speech data. In this study, several recently reported data selection methods have been explored and efforts have been made to improve them.

Provide your feedback

27th May 2022 : WER–BERT: Automatic WER Estimation with BERT in a Balanced Ordinal Classification Paradigm by Abhishek Kumar
27th May 2022 : WER–BERT: Automatic WER Estimation with BERT in a Balanced Ordinal Classification Paradigm by Abhishek Kumar

Talk summary:

  • Automatic Speech Recognition (ASR) systems are evaluated using Word Error Rate (WER), which is calculated by comparing the number of errors between the ground truth and the transcription of the ASR system. This calculation, however, requires manual transcription of the speech signal to obtain the ground truth. Since transcribing audio signals is a costly process, Automatic WER Evaluation (e-WER) methods have been developed to automatically predict the WER of a speech system by only relying on the transcription and the speech signal features. While WER is a continuous variable, previous works have shown that positing e-WER as a classification problem is more effective than regression. However, while converting to a classification setting, these approaches suffer from heavy class imbalance. In this paper, we propose a new balanced paradigm for e-WER in a classification setting. Within this paradigm, we also propose WER-BERT, a BERT based architecture with speech features for e-WER. Furthermore, we introduce a distance loss function to tackle the ordinal nature of e-WER classification. The proposed approach and paradigm are evaluated on the Librispeech dataset and a commercial (black box) ASR system, Google Cloud’s Speech-to-Text API. The results and experiments demonstrate that WER-BERT establishes a new state-of-the-art in automatic WER estimation.

Provide your feedback

20th May 2022 : Unsupervised representation learning for speaker verification by Prajesh Rana
20th May 2022 : Unsupervised representation learning for speaker verification by Prajesh Rana

Talk summary:

  • The objective of speaker verification is authentication of a claimed identity from measurements on the voice signal. For speaker verification I am exploring contrastive loss based self supervised learning(SSL). My work on speaker verification consist of two parts. In the first part I trained the Self supervised model and in second part I am using pretrained model as a feature extractor and trained the PLDA as a backend model. I will compare my results with the TDNN-PLDA and TDNN-ECAPA algorithms.

Provide your feedback

13th May 2022 : Extracting features using Self Supervised learning using ASR by Abhishek
13th May 2022 : Extracting features using Self Supervised learning using ASR by Abhishek

Talk summary:

  • Automatic Speech Recognition, or ASR for short, is a technique of providing transcription to a speech or in simple terms ASR is also termed as Speech-to-Text conversion. So, here we aim to learn a feature that not only utilizes data but are also robust to noise. To support this argument for our target feature I have evaluated the performance of MFCC and F-bank features with the Features learnt using wav2vec] which is learnt using a self- supervised representation learning, for ASR. The metric used for comparison is PER and CER.

Provide your feedback

6th May 2022 : A Stage Match For Query-By-Example Spoken Term Detection Based On Structure Information Of Query by Deekshitha G
6th May 2022 : A Stage Match For Query-By-Example Spoken Term Detection Based On Structure Information Of Query by Deekshitha G

Talk summary:

  • The state-of-the-art of query-by-example spoken term detection (QbE-STD) strategies are usually based on segmental dynamic time warping (S-DTW). However, the sliding window in S-DTW may separate signal of a word into different segments and produce many illegal candidates required to be compared with the query, which significantly reduce the accuracy and efficiency of detection. This paper propose a stage match strategy based on the structure information of the query, represented with the unvoiced-voiced attribute of the portions in itself. The strategy first locates potential candidates with similar structure against the query in utterances,and further matches the query with Type-Location DTW (TLDTW), which is a modified DTW with the constraints of pronunciation types and relative positions of paired frames in the voiced sub-segments. Experiments on AISHELL-1 Corpus showed that the proposed approach achieved a relative improvement S-DTW and speeded up the retrieval.

Provide your feedback

29th April 2022 : Paper review by Sathvik Udupa
29th April 2022 : Paper review by Sathvik Udupa

Talk summary:

  • 1. Understanding the Role of Self Attention for Efficient Speech Recognition Transformer neural networks are increasingly used in automatic speech recognition (ASR). This work investigates the inner working of such networks in ASR and introduces techniques to reduce recognition latency. 2. Chunked Autoregressive GAN for Conditional Waveform Synthesis Generative adversarial networks (GAN) based neural vocoders have been performing well in speech synthesis in recent years. The authors show that these networks are unable to generate accurate pitch and periodicity, and introduce an autoregressive GAN based vocoder to tackle the issues.

Provide your feedback

15th April 2022 : Broadcasted Residual Learning for Efficient Keyword Spotting by Siddarth
15th April 2022 : Broadcasted Residual Learning for Efficient Keyword Spotting by Siddarth

Talk summary:

  • We present a broadcasted residual learning method to achieve high accuracy with small model size and computational load. Our method configures most of the residual functions as 1D temporal convolution while still allows 2D convolution together using a broadcasted-residual connection that expands temporal output to frequency-temporal dimension.

Provide your feedback

25th March 2022 : Wave Equation and Fundamentals by Veerababu Dharanalakota
25th March 2022 : Wave Equation and Fundamentals by Veerababu Dharanalakota

Talk summary:

  • It is necessary to know the mathematical description of the speech signals (sound waves) not just for the reason the study exists but for it has a potential to mimic the reality under certain conditions. In order to do so, it is necessary to know the derivation of wave equation and the underlying assumptions. This talk covers the derivation of wave equation from the fundamental of fluid dynamics equations: continuity, momentum and energy equations, which in turn are derived from natural laws. Further, the talk covers the general terminology used in the study of sound.

Provide your feedback

11th March 2022 : Pnoi: Development and Challenges by Syed Fahad
11th March 2022 : Pnoi: Development and Challenges by Syed Fahad

Talk summary:

  • Discussion of the developments in regards to creation of a specialized digital stethoscope called Pnoi for capturing lung and breathing sounds. These sounds can be used for medical diagnosis at an substantially cheaper cost than the current standards.

Provide your feedback

4th March 2022 : Large Text Corpus Creation using Web Scraping for Language Modelling by Hemantha Krishna Bharadwaj
4th March 2022 : Large Text Corpus Creation using Web Scraping for Language Modelling by Hemantha Krishna Bharadwaj

Talk summary:

  • The collection of large datasets for training language models requires the use of techniques that extract data from the world wide web in a systematic manner. Collectively known as web scraping, these techniques have been well established by previous research, but there is little research on their use for the collection of data other than that in the English language. This talk will detail improved methods of extracting domain-specific non-English language data from the internet using a combination of HTML parsing libraries and frameworks in Python. The proposed methodology can be utilized to provide large non-English language text datasets in an automated fashion.

Provide your feedback

18th February 2022 : Attention and Transformers by Abhayjeet Singh
18th February 2022 : Attention and Transformers by Abhayjeet Singh

Talk summary:

  • Intuitive and mathematical understanding of Attention and Transformer Networks

Provide your feedback

11th February 2022 : wav2vec 2.0 by Siddarth C
11th February 2022 : wav2vec 2.0 by Siddarth C

Talk summary:

  • We show for the first time that learning powerful representations from speech audio alone followed by fine-tuning on transcribed speech can outperform the best semi-supervised methods while being conceptually simpler. wav2vec 2.0 masks the speech input in the latent space and solves a contrastive task defined over a quantization of the latent representations which are jointly learned.

Provide your feedback

28th January 2022 : An error correction scheme for improved air-tissue boundary in real-time MRI video for speech production by Anwesha
28th January 2022 : An error correction scheme for improved air-tissue boundary in real-time MRI video for speech production by Anwesha

Talk summary:

  • The best performance in Air-tissue boundary (ATB) segmentation of real-time Magnetic Resonance Imaging (rtMRI) videos in speech production is known to be achieved by a 3-dimensional convolutional neural network (3D-CNN) model. However, the evaluation for this model, as well as other ATB segmentation techniques reported in the literature, is done using Dynamic Time Warping (DTW) distance between the entire original and predicted contours. Such an evaluation measure may not capture local errors in the entire contour. Careful analysis of predicted contours reveals errors in regions like the velum part of contour1 and tongue base section of contour2, which are not captured in a global evaluation metric like DTW distance. In this work, we automatically detect such errors and propose a correction scheme for the same. We also propose two new evaluation metrics for ATB segmentation separately in contour1 and contour2 to explicitly capture two types of errors in these contours.

Provide your feedback

14th January 2022 : Data Visualization by Jeevan
14th January 2022 : Data Visualization by Jeevan

Talk summary:


Provide your feedback

7th January 2022 : Sociolinguistics: Language Variation and Dialect by Sharmistha
7th January 2022 : Sociolinguistics: Language Variation and Dialect by Sharmistha

Talk summary:


Provide your feedback

31st December 2021 : A brief introduction to muscle synergies in speech by Navaneetha
31st December 2021 : A brief introduction to muscle synergies in speech by Navaneetha

Talk summary:


Provide your feedback

24th December 2021 : Vocal and Non-vocal segmentation based on the analysis of formant structure\u200b by Pranaswi
24th December 2021 : Vocal and Non-vocal segmentation based on the analysis of formant structure\u200b by Pranaswi

Talk summary:

  • A pulmonary system is a network of organs and tissues that help us to breath. A typical pulmonary system in humans consists of lungs, larynx, trachea, bronchi, bronchioles, alveoli and thoracic diaphragm. Inspiratory sounds measured simultaneously over the extrathoracic trachea and at the chest surface contain highly unique regional information. The characteristic patterns in the recorded data are associated with the conditions affecting airway patency such as asthma and obstructive sleep apnea. There is a potential for the recorded sounds to be used in clinical practices for the diagnosis and monitoring of various respiratory conditions. In the proposed research work, an acoustic model of the pulmonary system will be developed by treating tracheobronchial tree and lungs as flexible branched duct system and plenums, respectively.

Provide your feedback

17th December 2021 : Overlapped Speech Detection using CNN Architectures by Pooja
17th December 2021 : Overlapped Speech Detection using CNN Architectures by Pooja

Talk summary:

  • The ability to estimate the overlapped sentences spoken by an individual over a certain period of time is valuable in language acquisition, healthcare, and assessing language development. However, establishing a robust automatic framework to achieve high accuracy is non-trivial in realistic/naturalistic scenarios due to various factors such as different styles of conversation or types of noise that appear in audio recordings, especially in multi-party conversations. Therefore, overlapping speech detection has become an important front-end triage step for speech technology applications. This is crucial for large-scale datasets where manual labeling in not possible. A block-based CNN architecture is proposed to address modeling overlapping speech in audio streams with frames as short as 25 ms. The architecture is robust to both: (i) shifts in distribution of network activations due to the change in network parameters during training, (ii) local variations from the input features caused by feature extraction, environmental noise, or room interference.

Provide your feedback

10th December 2021 : Introduction to G2P Systems by Priyanshi
10th December 2021 : Introduction to G2P Systems by Priyanshi

Talk summary:

  • Orthography of a language does not always have a predictable relationship with it’s pronunciation. Certain languages have predictable and consistent relationships, however, for languages like English which have multiple inconsistencies and loan words from other languages, mapping this relationship becomes challenging. Ability to map this relationship can help in producing better performing ASR and TTS systems. Grapheme to phoneme conversion systems are used to find pronunciation of a word given it’s written form. We look at where it plays a role in the aforementioned systems, what are the challenges involved in it and also look at one approach to do it.

Provide your feedback

3rd December 2021 : Acoustic Modeling and Analysis of Pulmonary System by Veerababu
3rd December 2021 : Acoustic Modeling and Analysis of Pulmonary System by Veerababu

Talk summary:

  • A pulmonary system is a network of organs and tissues that help us to breath. A typical pulmonary system in humans consists of lungs, larynx, trachea, bronchi, bronchioles, alveoli and thoracic diaphragm. Inspiratory sounds measured simultaneously over the extrathoracic trachea and at the chest surface contain highly unique regional information. The characteristic patterns in the recorded data are associated with the conditions affecting airway patency such as asthma and obstructive sleep apnea. There is a potential for the recorded sounds to be used in clinical practices for the diagnosis and monitoring of various respiratory conditions. In the proposed research work, an acoustic model of the pulmonary system will be developed by treating tracheobronchial tree and lungs as flexible branched duct system and plenums, respectively.

Provide your feedback

26th November 2021 : Hindi Language Modelling using text data from domains of agriculture, finance, healthcare and general by Sneha
26th November 2021 : Hindi Language Modelling using text data from domains of agriculture, finance, healthcare and general by Sneha

Talk summary:

  • Data corresponding to Hindi Text is collected from four different domains: general, agriculture, healthcare, and finance. Different statistics like word frequency, total number of sentences, words are determined from the combined cleaned data of all the domains. To validate the data, a Decision Tree Algorithm is used for text classification where it can classify an unknown text into pre-determined groups.

Provide your feedback

19th November 2021 : Bengali text data classification of different domains by Sanchari
19th November 2021 : Bengali text data classification of different domains by Sanchari

Talk summary:

  • Bengali texts from 4 different domains (general, agriculture, healthcare, and finance) a decision tree algorithm is used in separating the classes and predicting the domain of any unknown Bengali text. It is an approach by which any unknown text can be easily classified as to which domain it belongs to.

Provide your feedback

5th November 2021 : Selection of acoustically and phonetically rich sentences in the context of ASR by Saurabh Kumar
5th November 2021 : Selection of acoustically and phonetically rich sentences in the context of ASR by Saurabh Kumar

Talk summary:


Provide your feedback

28th October 2021 : Neural speech synthesis models by Navneet Kaur
28th October 2021 : Neural speech synthesis models by Navneet Kaur

Talk summary:

  • The most recent advancements in the field of speech sythesis have been brought by deep learning. In current state-of-the-art models, the task of text to speech conversion is accomplished through two steps: i) Conversion of text to lower resolution intermediate representation generally mel-spectrogram using seq2seq model (Frontend), ii) Generation of speech waveform from mel-spectrogram using generative models(Backend). In this talk, I will be discussing and comparing different techniques and models for both front-end and backend. Specifically, for seq-to-seq model I will cover Tacotron-2, Fastspeech-2, Transformer-TTS, and GlowTTS. Among generative models, I will discuss WaveNet, WaveGlow, and MelGAN.

Provide your feedback

22nd October 2021 : Analysis of vocal sounds in asthmatic patients by Shivani
22nd October 2021 : Analysis of vocal sounds in asthmatic patients by Shivani

Talk summary:


Provide your feedback

15th October 2021 : Diffusion probabilistic models in speech synthesis by Sathvik
15th October 2021 : Diffusion probabilistic models in speech synthesis by Sathvik

Talk summary:

  • In recent years, there has been progress in a type of generative modelling known as diffusion probabilistic models. The latent features are learnt through a 'diffusion' process, which iteratively adds noise to the data to transform it into a noise distribution. During inference, this process can be reversed to generate data samples from noise. This learning technique has been applied in problem statements in speech synthesis by modifying it to a conditional generative process.

Provide your feedback

13th October 2021 : Analysis of vocal sounds in asthmatic patients by Shivani
13th October 2021 : Analysis of vocal sounds in asthmatic patients by Shivani

Talk summary:


Provide your feedback

24th September 2021 : Study of ALS/PD classification using slurred speech by Aayushman
24th September 2021 : Study of ALS/PD classification using slurred speech by Aayushman

Talk summary:

  • Monitoring disease progression in patients with Amyotrophic Lateral Sclerosis (ALS), and Parkinson’s disease (PD) can be done by analyzing their speech waveforms. Many works in the past have used different acoustic features for the classification of patients with ALS and PD with healthy controls (HC). In this project, I studied a data-driven approach to learn representations from raw speech waveform. The model comprises of 1-D Convolutional Neural Network (CNN) layer to extract representations from raw speech followed by a Bi-directional Long Short Term Memory (BLSTM) layers for the classification tasks. Three different classification tasks (ALS vs HC), (PD vs HC), and (ALS vs PD) were considered. The model performs classification task using four different speech stimuli, namely, image description (IMAG), spontaneous speech (SPON), diadochokinetic rate (DIDK), and sustained phoneme production (PHON). Experiments were performed with 90 ALS, 90 PD, and 90 HC patient.

Provide your feedback

17th September 2021 : Speech Synergies by Chirag Vasist
17th September 2021 : Speech Synergies by Chirag Vasist

Talk summary:


Provide your feedback

3rd September 2021 : TaLNet: Voice Reconstruction from Tongue and Lip Articulation with Transfer Learning from Text-to-Speech Synthesis by Siddarth
3rd September 2021 : TaLNet: Voice Reconstruction from Tongue and Lip Articulation with Transfer Learning from Text-to-Speech Synthesis by Siddarth

Talk summary:

  • This paper presents TaLNet, a model for voice reconstruction with ultrasound tongue and optical lip videos as inputs. TaLNet is based on an encoder-decoder architecture. Separate encoders are dedicated to processing the tongue and lip data streams respectively. The decoder predicts acoustic features conditioned on encoder outputs and speaker codes. To mitigate for having only relatively small amounts of dual articulatory-acoustic data available for training, and since our task here shares with text-to-speech (TTS) the common goal of speech generation, we propose a novel transfer learning strategy to exploit the much larger amounts of acoustic-only data available to train TTS models. For this, a Tacotron 2 TTS model is first trained, and then the parameters of its decoder are transferred to the TaLNet decoder.

Provide your feedback

20th August 2021 : Kannada language modeling using text data from the domains of agriculture, finance, healthcare and general by Karthik S Vasisht
20th August 2021 : Kannada language modeling using text data from the domains of agriculture, finance, healthcare and general by Karthik S Vasisht

Talk summary:

  • Language modelling involves decomposing texts into smaller sections: sentences and words, and statistically analyzing them to make accurate predictions of phrases and sentences. The N-GRAM model is a statistical analysis tool that predicts the likelihood of certain words combining to form a meaningful sentence based on the conditional probabilities of each of the words in the sentence given the occurrence of the others. This presentation would discuss the work done during the period of 2 months towards link collection, web-scraping, text cleaning and validation of collected data for building a Kannada language model.

Provide your feedback

6th August 2021 : AAI and Palate contour estimation by Anish
6th August 2021 : AAI and Palate contour estimation by Anish

Talk summary:

  • AAI is a model which maps MFCC to EMA points. EMA data corresponds to movement of articulators in the mouth. Predicting EMA points helps us visualize the movement of our mouth while we speak, but predicting the palate contour along with the EMA points would help us better understand the movement of our mouth. The presentation would focus on different models which were trained along with different preprocessing techniques which were employed to predict the palate contours.

Provide your feedback

23rd July 2021 : Language Identification of ALS patients using X-Vector model by Yasaswini
23rd July 2021 : Language Identification of ALS patients using X-Vector model by Yasaswini

Talk summary:

  • Amyotrophic lateral sclerosis (ALS) is a rare neurological disease that primarily affects the nerve cells (neurons) responsible for controlling voluntary muscle movements like chewing, walking, and talking. As ALS hampers speech by a great deal, speech recognition techniques become predominant. So in order to build a model to identify their speech, language identification is the foremost crucial part.

Provide your feedback

16th July 2021 : Accent conversion using Cotatron by Chinmay
16th July 2021 : Accent conversion using Cotatron by Chinmay

Talk summary:

  • Accent conversion (AC) aims to make non-native speech sound as if the speaker has a certain native accent. Typical AC methods attempt to convert only the native speaker voice to that of a non-native native speaker, leaving the basic content and pronunciation unchanged. This hinders their practical use in real-world applications, because native-accented utterances are required at conversion stage. Students who get a second language after “critical age” often speak a language other than their mother tongue.This can lead to low understanding and speakers may face discriminatory situations. Therefore, students who communicate with native speakers have much to gain by improving their pronunciation. The presentation would discuss the work done during the period of 2 months towards testing, comparing, and improving the existing methods for the accent conversion

Provide your feedback

9th July 2021 : Age Estimation for ALS Patients Speech Utterance Based on LSTM by Lavanya
9th July 2021 : Age Estimation for ALS Patients Speech Utterance Based on LSTM by Lavanya

Talk summary:

  • Speaker age is part of the non-verbal information contained in speech. Age estimation consists of automatically determining the age of a speaker in a given segment of the speech utterance. Age estimation from speech has recently received increased interest as it is useful for many applications such as user-profiling, targeted marketing, or personalized call-routing. This kind of applications need to quickly estimate the age of the speaker and might greatly benefit from real-time capabilities. Long short-term memory (LSTM) have shown to outperform state-of the-art approaches in related speech-based tasks, such as language identification or voice activity detection, especially when an accurate real-time response is required.

Provide your feedback

2nd July 2021 : Speech-based classification of ALS patients and Healthy subjects by Sonakshi
2nd July 2021 : Speech-based classification of ALS patients and Healthy subjects by Sonakshi

Talk summary:

  • Amyotrophic Lateral Sclerosis (ALS) is a rare neurological disease that affects the motor neurons, hence causing loss of ability to speak, eat, move and breath There is no cure for ALS yet. Early detection is crucial for so that the therapeutic measures can be started at an early stage which would help in prolonging the life expectancy and quality of living for the patients. But unfortunately, the diagnosis of the disease is difficult and time consuming. Hence, there is need to develop an automatic device/app that can detect the disease, which would be beneficial for the early beginning of therapy leading to greater life expectancy. The presentation would discuss the work done during the period of 6 months towards testing,comparing and improving the existing methods for the classification purpose.

Provide your feedback

18th June 2021 : Segnet based ATB segmentation in rtMRI videos by Jelwin
18th June 2021 : Segnet based ATB segmentation in rtMRI videos by Jelwin

Talk summary:


Provide your feedback

11th June 2021 : Brain Stroke Segmentation using Deep Learning by Nikhil
11th June 2021 : Brain Stroke Segmentation using Deep Learning by Nikhil

Talk summary:

  • Stroke is one of the main reasons for adult deaths around the globe, impacting 6.2 million people per annum. Over the past 20 years, there has been a 26 percent increase in stroke deaths, worldwide. Across the world, stroke is the second leading cause of death. In recent years, machine and deep learning algorithms have created a huge impact on addressing research challenges in several domains includes health care, natural language processing, speech processing, and more. The medical field also greatly benefits from the utilization of improving deep learning models which save time and produce accurate results. Typically, the manual segmentation of strokes is done by expert radiologists or doctors who excelled in this field. It is said that the manual segmentation is time-consuming (takes nearly three to four hours to diagnose the problem) and also introduces inter and intra rater variability among the radiologists. It impacts brain stroke-affected patients if careful clinical decision-making is not made in less amount of time. To augment a radiologist's or doctor's effort, deep learning algorithms can be used effectively for segmenting clinical brain images and can be a valuable tool for this work.

Provide your feedback

21st May 2021 : A Scalable Deep Learning Model for Arbitrary Transmitter Configurations in Inverse Scattering by Karthik
21st May 2021 : A Scalable Deep Learning Model for Arbitrary Transmitter Configurations in Inverse Scattering by Karthik

Talk summary:


Provide your feedback

7th May 2021 : Non-Attentive Tacotron: Robust and Controllable Neural TTS Synthesis Including Unsupervised Duration Modeling by Abhayjeet
7th May 2021 : Non-Attentive Tacotron: Robust and Controllable Neural TTS Synthesis Including Unsupervised Duration Modeling by Abhayjeet

Talk summary:


Provide your feedback

30th April 2021 : Learning Rate Warmups and the Variance of Adaptive Learning Rates by Bheshaj
30th April 2021 : Learning Rate Warmups and the Variance of Adaptive Learning Rates by Bheshaj

Talk summary:


Provide your feedback

23rd April 2021 : ISTA-Net: Interpretable Optimization-Inspired Deep Network for Image Compressive Sensing by Bhargava
23rd April 2021 : ISTA-Net: Interpretable Optimization-Inspired Deep Network for Image Compressive Sensing by Bhargava

Talk summary:


Provide your feedback

9th April 2021 : Speech to EMG mapping by Navaneetha
9th April 2021 : Speech to EMG mapping by Navaneetha

Talk summary:


Provide your feedback

19th March 2021 : A literature survey on audio recording device identification by Bhavuk
19th March 2021 : A literature survey on audio recording device identification by Bhavuk

Talk summary:


Provide your feedback

12th March 2021 : A brief tutorial on Android app development and design patterns by Shankar
12th March 2021 : A brief tutorial on Android app development and design patterns by Shankar

Talk summary:


Provide your feedback

5th March 2021 : Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains by
5th March 2021 : Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains by

Talk summary:


Provide your feedback

26th February 2021 : Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains by
26th February 2021 : Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains by

Talk summary:


Provide your feedback

19th February 2021 : Principal Component Analysis (PCA), Kernel PCA and Independent Component Analysis (ICA) by Anwesha
19th February 2021 : Principal Component Analysis (PCA), Kernel PCA and Independent Component Analysis (ICA) by Anwesha

Talk summary:


Provide your feedback

13th February 2021 : Graph Convolutional Networks by Manthan
13th February 2021 : Graph Convolutional Networks by Manthan

Talk summary:


Provide your feedback

5th February 2021 : Tutorial on Equivariant Networks by
5th February 2021 : Tutorial on Equivariant Networks by

Talk summary:


Provide your feedback

29th January 2021 : A Brief Introduction to Density Based Spatial Clustering for Applications with Noise by Priyanshi
29th January 2021 : A Brief Introduction to Density Based Spatial Clustering for Applications with Noise by Priyanshi

Talk summary:


Provide your feedback

22nd January 2021 : Graph Neural Networks for solving PDEs by Karthik
22nd January 2021 : Graph Neural Networks for solving PDEs by Karthik

Talk summary:


Provide your feedback

15th January 2021 : Capsule Networks by Siddharth
15th January 2021 : Capsule Networks by Siddharth

Talk summary:


Provide your feedback

8th January 2021 : Whisper to neutral speech conversion by Subhadeep, Pritam and Debojyoti
8th January 2021 : Whisper to neutral speech conversion by Subhadeep, Pritam and Debojyoti

Talk summary:


Provide your feedback

1st January 2021 : A closer look at loss functions by Achuth
1st January 2021 : A closer look at loss functions by Achuth

Talk summary:


Provide your feedback

24th December 2020 : Approximate inference methods by Sathvik
24th December 2020 : Approximate inference methods by Sathvik

Talk summary:


Provide your feedback

14th December 2020 : Acoustic-Articulatory Mapping: Analysis and Improvements with Neural Network Learning Paradigms by Aravind
14th December 2020 : Acoustic-Articulatory Mapping: Analysis and Improvements with Neural Network Learning Paradigms by Aravind

Talk summary:


Provide your feedback

4th December 2020 : Fader Network by Tanuka
4th December 2020 : Fader Network by Tanuka

Talk summary:


Provide your feedback

27th November 2020 : An unsupervised segmentation of vocal breath sounds by Shivani
27th November 2020 : An unsupervised segmentation of vocal breath sounds by Shivani

Talk summary:


Provide your feedback

19th November 2020 : Pulmonary function test graph digitizer by Sandhya
19th November 2020 : Pulmonary function test graph digitizer by Sandhya

Talk summary:


Provide your feedback

13th November 2020 : Overview of Microphones by Jeevan and Shaique
13th November 2020 : Overview of Microphones by Jeevan and Shaique

Talk summary:


Provide your feedback

6th November 2020 : Feasibility of Learning(Continued) by Karthik
6th November 2020 : Feasibility of Learning(Continued) by Karthik

Talk summary:


Provide your feedback

30th October 2020 : Feasibility of Learning by Karthik
30th October 2020 : Feasibility of Learning by Karthik

Talk summary:


Provide your feedback

17th October 2020 : Inverse scattering using two stage networks by Karthik
17th October 2020 : Inverse scattering using two stage networks by Karthik

Talk summary:


Provide your feedback

9th October 2020 : Introduction to Phonetics by Sharmistha
9th October 2020 : Introduction to Phonetics by Sharmistha

Talk summary:


Provide your feedback

11th September 2020 : Introduction to Git and Docker by Sanjeev
11th September 2020 : Introduction to Git and Docker by Sanjeev

Talk summary:


Provide your feedback

27th August 2020 : On the power of curriculum learning in training deep networks by Siddharth
27th August 2020 : On the power of curriculum learning in training deep networks by Siddharth

Talk summary:


Provide your feedback

21st August 2020 : Generative models based on normalizing flows by Achuth
21st August 2020 : Generative models based on normalizing flows by Achuth

Talk summary:


Provide your feedback

14th August 2020 : Generative models based on normalizing flows by Achuth
14th August 2020 : Generative models based on normalizing flows by Achuth

Talk summary:


Provide your feedback

7th August 2020 : Electrical Impedance Tomography: A portable low cost setup for biomedical imaging and other applications by Karthik
7th August 2020 : Electrical Impedance Tomography: A portable low cost setup for biomedical imaging and other applications by Karthik

Talk summary:


Provide your feedback

31st July 2020 : Neural Networks and Differential Equations by Avni
31st July 2020 : Neural Networks and Differential Equations by Avni

Talk summary:


Provide your feedback

24th July 2020 : Regularization for deep learning by Aravind
24th July 2020 : Regularization for deep learning by Aravind

Talk summary:


Provide your feedback

17th July 2020 : Thesis defence by Renuka
17th July 2020 : Thesis defence by Renuka

Talk summary:


Provide your feedback

9th July 2020 : Speech task-specific representation learning using acoustic-articulatory data by Renuka
9th July 2020 : Speech task-specific representation learning using acoustic-articulatory data by Renuka

Talk summary:


Provide your feedback

3rd July 2020 : Neural Turing Machines by Chiranjeevi
3rd July 2020 : Neural Turing Machines by Chiranjeevi

Talk summary:


Provide your feedback

26th June 2020 : An overview of gradient descent optimization algorithms by Abinay
26th June 2020 : An overview of gradient descent optimization algorithms by Abinay

Talk summary:


Provide your feedback

19th June 2020 : Speech rate estimation using representations learned from speech with convolutional neural network by Renuka
19th June 2020 : Speech rate estimation using representations learned from speech with convolutional neural network by Renuka

Talk summary:


Provide your feedback

12th June 2020 : Acoustic-to-articulatory inversion of dysarthric speech by utilizing cross-corpus acoustic-articulatory data
12th June 2020 : Acoustic-to-articulatory inversion of dysarthric speech by utilizing cross-corpus acoustic-articulatory data

Talk summary:


Provide your feedback

5th June 2020 : Temporal decomposition of Speech by Tilak
5th June 2020 : Temporal decomposition of Speech by Tilak

Talk summary:


Provide your feedback

29th May 2020 : Quantum Computing for breaking encryption by Pavan Kumar J
29th May 2020 : Quantum Computing for breaking encryption by Pavan Kumar J

Talk summary:


Provide your feedback

13th March 2020 : Linguistics: An Introduction by Sharmistha
13th March 2020 : Linguistics: An Introduction by Sharmistha

Talk summary:


Provide your feedback

6th March 2020 : AUTOMATIC CLASSIFICATION OF VOLUMES OF WATER USING SWALLOW SOUNDS FROM CERVICAL AUSCULTATION by Siddharth
6th March 2020 : AUTOMATIC CLASSIFICATION OF VOLUMES OF WATER USING SWALLOW SOUNDS FROM CERVICAL AUSCULTATION by Siddharth

Talk summary:


Provide your feedback

28th February 2020 : Web Interface for acoustic feature analysis by Heena and Vaibhav
28th February 2020 : Web Interface for acoustic feature analysis by Heena and Vaibhav

Talk summary:


Provide your feedback

21st February 2020 : Deep Canonical Correlation Analysis by Sanjeev
21st February 2020 : Deep Canonical Correlation Analysis by Sanjeev

Talk summary:

  • Introduction to correlation and also linear algebra concepts like Eigen Decomposition, SVD, and PCA. This would be followed by how Canonical Correlation Analysis (CCA) uses all these to find ideal transformations.

Provide your feedback

14th February 2020 : Voice-based classification of patients with ALS, Parkinson's disease and healthy controls with CNN-LSTM using transfer learning by Jhansi
14th February 2020 : Voice-based classification of patients with ALS, Parkinson's disease and healthy controls with CNN-LSTM using transfer learning by Jhansi

Talk summary:


Provide your feedback

7th February 2020 : Hypothesis Testing by Shivani
7th February 2020 : Hypothesis Testing by Shivani

Talk summary:


Provide your feedback

31st January 2020 : Improving fundamental frequency generation in EMG-to-Speech Conversion using a Quantization Approach by Tejas
31st January 2020 : Improving fundamental frequency generation in EMG-to-Speech Conversion using a Quantization Approach by Tejas

Talk summary:


Provide your feedback

17th January 2020 : Comparison of interpolation schemes for the perception of speech in the presence of missing samples by Amit
17th January 2020 : Comparison of interpolation schemes for the perception of speech in the presence of missing samples by Amit

Talk summary:


Provide your feedback

3rd January 2020 : Inverse Scattering by Mahima
3rd January 2020 : Inverse Scattering by Mahima

Talk summary:


Provide your feedback

1st January 2020 : Out-of-Pronunciation Distribution Detection: An Unsupervised Approach by Parth
1st January 2020 : Out-of-Pronunciation Distribution Detection: An Unsupervised Approach by Parth

Talk summary:


Provide your feedback

27th December 2019 : Data driven analysis of critical articulators in speech production by Anusuya
27th December 2019 : Data driven analysis of critical articulators in speech production by Anusuya

Talk summary:


Provide your feedback

13th December 2019 : Multichannel Acoustic Source Localization by Tarun
13th December 2019 : Multichannel Acoustic Source Localization by Tarun

Talk summary:


Provide your feedback

6th December 2019 : Variational Methods by Achuth
6th December 2019 : Variational Methods by Achuth

Talk summary:


Provide your feedback

29th November 2019 : Computational wave scattering by Karthik
29th November 2019 : Computational wave scattering by Karthik

Talk summary:


Provide your feedback

22nd November 2019 : Basics of Graph Signal Processing by Aravind
22nd November 2019 : Basics of Graph Signal Processing by Aravind

Talk summary:


Provide your feedback

11th October 2019 : Medical image segmentation on GPUs – A comprehensive review by Divya
11th October 2019 : Medical image segmentation on GPUs – A comprehensive review by Divya

Talk summary:


Provide your feedback

4th October 2019 : Dynamic Programming: An Overview and Some Optimization Techniques by Shankar
4th October 2019 : Dynamic Programming: An Overview and Some Optimization Techniques by Shankar

Talk summary:


Provide your feedback

27th September 2019 : Origins of Fourier Series by Pavan Kumar
27th September 2019 : Origins of Fourier Series by Pavan Kumar

Talk summary:

  • Introduction The Heat Equation Solution of PDE Fourier Series

Provide your feedback

2nd September 2019 : Comparison of automatic syllable stress detection quality with time-aligned boundaries and context dependencies by Manoj
2nd September 2019 : Comparison of automatic syllable stress detection quality with time-aligned boundaries and context dependencies by Manoj

Talk summary:


Provide your feedback

2nd September 2019 : A comparative study of noise robustness of goodness of pronunciation (GoP) measures and its modifications based on teacher's utterance by Manoj
2nd September 2019 : A comparative study of noise robustness of goodness of pronunciation (GoP) measures and its modifications based on teacher's utterance by Manoj

Talk summary:


Provide your feedback

2nd September 2019 : Whisper to neutral mapping using cosine similarity maximization in i-vector space for speaker verification by Abinay
2nd September 2019 : Whisper to neutral mapping using cosine similarity maximization in i-vector space for speaker verification by Abinay

Talk summary:


Provide your feedback

2nd September 2019 : An investigation on speaker specific articulatory synthesis with speaker independent articulatory inversion by Aravind
2nd September 2019 : An investigation on speaker specific articulatory synthesis with speaker independent articulatory inversion by Aravind

Talk summary:


Provide your feedback

31st August 2019 : Low resource automatic intonation classification using gated recurrent unit (GRU) networks pre-trained with synthesized pitch patterns by Atreyee
31st August 2019 : Low resource automatic intonation classification using gated recurrent unit (GRU) networks pre-trained with synthesized pitch patterns by Atreyee

Talk summary:


Provide your feedback

23rd August 2019 : Achievements and the goals of the lab by Prasanta Kumar Ghosh
23rd August 2019 : Achievements and the goals of the lab by Prasanta Kumar Ghosh

Talk summary:


Provide your feedback

23rd August 2019 : ASR inspired syllable stress detection for pronunciation evaluation without using a supervised classifier and syllable level features by Manoj
23rd August 2019 : ASR inspired syllable stress detection for pronunciation evaluation without using a supervised classifier and syllable level features by Manoj

Talk summary:


Provide your feedback

23rd August 2019 : An improved goodness of pronunciation (GoP) measure for pronunciation evaluation with DNN-HMM system considering HMM transition probabilities. by Manoj
23rd August 2019 : An improved goodness of pronunciation (GoP) measure for pronunciation evaluation with DNN-HMM system considering HMM transition probabilities. by Manoj

Talk summary:


Provide your feedback

16th August 2019 : Acoustic and articulatory feature based speech rate estimation using a convolutional dense neural network. by Renuka
16th August 2019 : Acoustic and articulatory feature based speech rate estimation using a convolutional dense neural network. by Renuka

Talk summary:

  • In this paper, we propose a speech rate estimation approach using a convolutional dense neural network (CDNN). The CDNN based approach uses the acoustic and articulatory features for speech rate estimation. The Mel Frequency Cepstral Coefficients (MFCCs) are used as acoustic features and the articulograms representing time-varying vocal tract profile are used as articulatory features. The articulogram is computed from a real-time magnetic resonance imaging (rtMRI) video in the midsagittal plane of a subject while speaking. However, in practice, the articulogram features are not directly available, unlike acoustic features from speech recording. Thus, we use an Acoustic-to-Articulatory Inversion method using a bidirectional long-short-term memory network which estimates the articulogram features from the acoustics. The proposed CDNN based approach using estimated articulatory features requires both acoustic and articulatory features during training but it requires only acoustic data during testing. Experiments are conducted using rtMRI videos from four subjects each speaking 460 sentences. The Pearson correlation coefficient is used to evaluate the speech rate estimation. It is found that the CDNN based approach gives a better correlation coefficient than the temporal and selected sub-band correlation (TCSSBC) based baseline scheme by 81.58% and 73.68% (relative) in seen and unseen subject conditions respectively.

Provide your feedback

2nd August 2019 : Rethinking Model Scaling for Convolutional Neural Networks by Aparna
2nd August 2019 : Rethinking Model Scaling for Convolutional Neural Networks by Aparna

Talk summary:


Provide your feedback

23rd July 2019 : Breath cycle segmentation by Shruthi
23rd July 2019 : Breath cycle segmentation by Shruthi

Talk summary:

  • Segmentation of individual inhale and exhale from an audio recording of continuous breaths using two approaches - Spectral entropy approach and Parzen window-based approach.

Provide your feedback

19th July 2019 : Performance characterization of microphones by Suhas
19th July 2019 : Performance characterization of microphones by Suhas

Talk summary:

  • we look at the parameters that affect the recording from a microphone, what to look for in a specifications sheet and also look to assess different microphones' performance qualitatively

Provide your feedback

16th July 2019 : Construction of an anthropomorphic thorax phantom using CT scan segmentation and 3D printing by Srishti
16th July 2019 : Construction of an anthropomorphic thorax phantom using CT scan segmentation and 3D printing by Srishti

Talk summary:

  • Phantoms that mimic human physiology have long been used for designing and testing various diagnostic medical/imaging techniques. The most important advantage of using phantoms is easy access to ground truth information which in most cases cannot be obtained from a human subject. The objective of this project is to construct an anthropomorphic thorax phantom that can be used to develop a system for multi-channel active/passive acoustic characterisation of lungs. However, to construct anthropomorphic phantoms we need a suitable way to capture anthropomorphic parameters and replicate them in the form of a phantom. To this end, we first obtain the anthropomorphic parameters of a human thorax using a CT scan. The CT scan is then segmented into various regions which would be finally printed using a 3D printer.

Provide your feedback

12th July 2019 : The task of Sound Event Detection by Shoureen
12th July 2019 : The task of Sound Event Detection by Shoureen

Talk summary:

  • The task of Sound Event Detection can be broadly classified into two categories, namely- classification and localization, the former catering to simple audio tagging while the latter requiring the additional task of specifying the onset and offset times of each event which is taking place in the given audio stream. The main challenge involved in audio tagging is the lack of availability of frame wise ground truths which essentially turns this into a Multiple Instance Learning problem. In my work, I have tested multiple pooling functions by incorporating them at various stages in order to maximize the F-score of the Audio Tagging System

Provide your feedback

12th July 2019 : Call recording app with some additional features. by Utkarsh
12th July 2019 : Call recording app with some additional features. by Utkarsh

Talk summary:


Provide your feedback

12th July 2019 : Trend Statistics Network and Channel invariant EEG Network for sleep arousal study by Achuth
12th July 2019 : Trend Statistics Network and Channel invariant EEG Network for sleep arousal study by Achuth

Talk summary:

  • Sleep is a very important part of life and lack of sleep or sleep disorder can cause a negative impact on day to day life and can have long term serious consequences. In this work, we propose an end-to-end trainable neural network for automated arousal scoring. The network consists of two main parts. Firstly, a trend statistics network that computes the moving average of the filtered signals at different scales. Secondly, we propose a channel invariant EEG network to detect the EEG arousals in any channel. Finally, we combine the features from various channels through a convolution network and bi-directional long short-term memory to predict the probability of the arousal. Further, we propose an objective function that uses only respiratory effort related arousal (RERA) and non-arousal regions to optimize the network. We also propose method to estimate the respiratory disturbance index (RDI) from the probability predicted by the network. Evaluation on Physionet Challenge 2018 database shows that the proposed method detects the RERA with area under the precision-recall curve (AUPRC) of 0.50 in a 10-fold cross validation setup. The mean absolute error of RDI prediction is 6.11, while a two-class RDI severity prediction yields a specificity of 75% and sensitivity of 83%

Provide your feedback

5th July 2019 : Effect of consonant context in TIMIT vcv sequences on pitch trend by Vaibhav
5th July 2019 : Effect of consonant context in TIMIT vcv sequences on pitch trend by Vaibhav

Talk summary:

  • This analysis describe that how the pitch trend in vcv sequences depend on voicing characteristics of consonant in the vowel region followed by consonant.

Provide your feedback

5th July 2019 : An exhaustive study on involvement articulators in the production of plosives. by Minulakshmi
5th July 2019 : An exhaustive study on involvement articulators in the production of plosives. by Minulakshmi

Talk summary:

  • The study focuses on the occurrence and duration of constriction for bilabial and laminal-alveolar plosives across the vowels /a,e,i,o,u/ in a symmetric VCV sequence.

Provide your feedback

28th June 2019 : Quantitative Trading by Sanjeev
28th June 2019 : Quantitative Trading by Sanjeev

Talk summary:

  • Exploring methods used by Quants in trading - Risk Model, Alpha Model, and strategies.

Provide your feedback

28th June 2019 : A Comparison of Different Methods for Audio Declipping by Sandhiya
28th June 2019 : A Comparison of Different Methods for Audio Declipping by Sandhiya

Talk summary:

  • A deep dive into state-of-the-art algorithms for audio declipping - Constrained Blind Amplitude Reconstruction, Constrained Orthogonal Matching Pursuit Reconstruction, and two variants Sparse Audio declippers.

Provide your feedback

20th June 2019 : An acoustic investigation on the effect of consonant context and speaking rate on vowel space and coarticulation in Toda VCV sequences. by Nayan
20th June 2019 : An acoustic investigation on the effect of consonant context and speaking rate on vowel space and coarticulation in Toda VCV sequences. by Nayan

Talk summary:

  • This study analyzes the effect of consonant context and speaking rate on vowel space and coarticulation in Toda vowel-consonant-vowel (VCV) sequences. The vowels /a/,/e/, /i/, /o/, /u/, and two intervocalic consonants, /p/ (labial) and /t/ (alveolar), are considered to form asymmetrical VCV sequences in slow and very fast speaking rates. Results from these acoustic analyses indicate that there are differences in the nature in which rate and consonant context affect the coarticulatory organization.

Provide your feedback

20th June 2019 : Acoustic analysis of swallow sounds in individuals with head and neck cancer by Divya
20th June 2019 : Acoustic analysis of swallow sounds in individuals with head and neck cancer by Divya

Talk summary:

  • This paper describes the effect of volume of water swallowed by healthy controls on the acoustic sound signals captured by means of cervical auscultation This study indicates that peak intensity of the second swallow segment is found to be the best parameter to differentiate different volumes since it changes significantly across three volumes of water considered in this study.

Provide your feedback

14th June 2019 : A study on the problem of heart-rate estimation from facial videos by Vishay
14th June 2019 : A study on the problem of heart-rate estimation from facial videos by Vishay

Talk summary:


Provide your feedback

7th June 2019 : Unsupervised syllable stress detection by Manoj
7th June 2019 : Unsupervised syllable stress detection by Manoj

Talk summary:

  • Estimate stress markings in automatic speech recognition (ASR) framework involving finite-state-transducer (FST) without using annotated stress markings and segmental information.

Provide your feedback

6th May 2019 : AIR-TISSUE BOUNDARY SEGMENTATION IN REAL TIME MAGNETIC RESONANCE IMAGING VIDEO USING A CONVOLUTIONAL ENCODER-DECODER NETWORK by Renuka
6th May 2019 : AIR-TISSUE BOUNDARY SEGMENTATION IN REAL TIME MAGNETIC RESONANCE IMAGING VIDEO USING A CONVOLUTIONAL ENCODER-DECODER NETWORK by Renuka

Talk summary:


Provide your feedback

6th May 2019 : AIR-TISSUE BOUNDARY SEGMENTATION IN REAL TIME MAGNETIC RESONANCE IMAGING VIDEO USING A CONVOLUTIONAL ENCODER-DECODER NETWORK by Renuka
6th May 2019 : AIR-TISSUE BOUNDARY SEGMENTATION IN REAL TIME MAGNETIC RESONANCE IMAGING VIDEO USING A CONVOLUTIONAL ENCODER-DECODER NETWORK by Renuka

Talk summary:


Provide your feedback

6th May 2019 : AN IMPROVED AIR TISSUE BOUNDARY SEGMENTATION TECHNIQUE FOR REAL TIME MAGNETIC RESONANCE IMAGING VIDEO USING SEGNET by Renuka
6th May 2019 : AN IMPROVED AIR TISSUE BOUNDARY SEGMENTATION TECHNIQUE FOR REAL TIME MAGNETIC RESONANCE IMAGING VIDEO USING SEGNET by Renuka

Talk summary:


Provide your feedback

26th April 2019 : Representation learning using convolution neural network for acoustic-to-articulatory inversion by Aravind
26th April 2019 : Representation learning using convolution neural network for acoustic-to-articulatory inversion by Aravind

Talk summary:


Provide your feedback

26th April 2019 : A Study on Robustness of Articulatory Features for Automatic Speech Recognition of Neutral and Whispered Speech by Gokul
26th April 2019 : A Study on Robustness of Articulatory Features for Automatic Speech Recognition of Neutral and Whispered Speech by Gokul

Talk summary:


Provide your feedback

19th April 2019 : FORMANT-GAPS FEATURES FOR SPEAKER VERIFICATION USING WHISPERED SPEECH by Abhinay
19th April 2019 : FORMANT-GAPS FEATURES FOR SPEAKER VERIFICATION USING WHISPERED SPEECH by Abhinay

Talk summary:


Provide your feedback

12th April 2019 : K-SVD by Karthik
12th April 2019 : K-SVD by Karthik

Talk summary:


Provide your feedback

15th March 2019 : Methods to work with class imbalanced datasets by Shivani Yadav
15th March 2019 : Methods to work with class imbalanced datasets by Shivani Yadav

Talk summary:

  • What is the class imbalance How they affect classifiers performance Methods to handle class imbalance

Provide your feedback

8th March 2019 : Initial value problem by Prasanta Kumar Ghosh
8th March 2019 : Initial value problem by Prasanta Kumar Ghosh

Talk summary:

  • Solving initial value problem in the context of ordinary differential equations (ODEs) using numerical methods, which are often required when ODEs are not analytically solvable. In this regard, both the theory and Matlab coding of Runga-Kutta family of methods will be discussed.

Provide your feedback

22nd February 2019 : Font and Background Color Independent Text Binarization by Vishay
22nd February 2019 : Font and Background Color Independent Text Binarization by Vishay

Talk summary:

  • Starting with the motivation for binarization of text images, then a discussion on global and adaptive thresholding techniques, and end with a discussion on a novel approach for Text Binarization

Provide your feedback

22nd February 2019 : Prediction of articulatory motion at different rates by Abhay
22nd February 2019 : Prediction of articulatory motion at different rates by Abhay

Talk summary:

  • Predicting the articulatory trajectories in speech production from Neutral to Fast or Slow rates using Encoder-Decoder based model with some alterations

Provide your feedback

15th February 2019 : A SegNet Based Image Enhancement Technique for Air-Tissue Boundary Segmentation in Real-Time Magnetic Resonance Imaging Video by Renuka
15th February 2019 : A SegNet Based Image Enhancement Technique for Air-Tissue Boundary Segmentation in Real-Time Magnetic Resonance Imaging Video by Renuka

Talk summary:


Provide your feedback

8th February 2019 : Weighted Finite-State Transducers in Speech Recognition by Manoj
8th February 2019 : Weighted Finite-State Transducers in Speech Recognition by Manoj

Talk summary:

  • Different operations on WFSTs How WFSTs are used in decoding an utterance

Provide your feedback

1st February 2019 : Articulatory Phonology by Renuka
1st February 2019 : Articulatory Phonology by Renuka

Talk summary:

  • Description of articulatory phonology Gestural computational model Gestural analysis

Provide your feedback

25th January 2019 : Learning deep features for one-class classification by Shankar
25th January 2019 : Learning deep features for one-class classification by Shankar

Talk summary:

  • We take a look at the problem of one-class classification and a deep learning-based solution for feature learning for one-class classification

Provide your feedback

11th January 2019 : Solving a system of linear equations by Karthik
11th January 2019 : Solving a system of linear equations by Karthik

Talk summary:


Provide your feedback

28th December 2018 : On Utility of Multi-taper Modified Group Delay by Narendra
28th December 2018 : On Utility of Multi-taper Modified Group Delay by Narendra

Talk summary:

  • Function Representations for Speaker and Language Recognition The abstract can be found in the attached document

Provide your feedback

14th December 2018 : Visualizing High Dimensional Data using t-SNE by Aravind
14th December 2018 : Visualizing High Dimensional Data using t-SNE by Aravind

Talk summary:

  • Basic concepts of Information theory Introduction to visualization t-SNE (t-distributed stochastic neighbor embedding)

Provide your feedback

7th December 2018 : Overview of ASR by Avni
7th December 2018 : Overview of ASR by Avni

Talk summary:

  • Mathematical equation of ASR Overview of HMM-GMM ASR Viterbi decoding Advantages of WFST over tree based structures

Provide your feedback

30th November 2018 : How does Netflix & Amazon Prime recommend movies by Sweekar
30th November 2018 : How does Netflix & Amazon Prime recommend movies by Sweekar

Talk summary:

  • The talk is about how Matrix Factorization & Gradient Descent collectively work towards suggesting the best content possible for the viewer

Provide your feedback

9th November 2018 : Introduction to MIR, Audio licensing and blockchain technology by Suhas
9th November 2018 : Introduction to MIR, Audio licensing and blockchain technology by Suhas

Talk summary:

  • In this talk, we look at what music information retrieval is, why audio licensing is required and how audio watermarking and blockchains make data secure, accurate and reliable

Provide your feedback

2st November 2018 : Attention in the neural network by Achuth
2st November 2018 : Attention in the neural network by Achuth

Talk summary:

  • We will see how basic attention works in neural networks and understand how attention is used general seq2seq mapping problem including ASR, TTS, machine translation and image captions

Provide your feedback

12th October 2018 : Necessity for cloud computing by Valliappan
12th October 2018 : Necessity for cloud computing by Valliappan

Talk summary:


Provide your feedback

12th October 2018 : We’re creating a dystopia of misinformation and emotional manipulation by Aparna
12th October 2018 : We’re creating a dystopia of misinformation and emotional manipulation by Aparna

Talk summary:


Provide your feedback

5th October 2018 : Learning better models to sparsify yellow marks in manuscripts by Nisha
5th October 2018 : Learning better models to sparsify yellow marks in manuscripts by Nisha

Talk summary:

  • Faulty writing practices leading to "yellowing" of submitted drafts Factors in writing that are inversely proportional to dimension of "yellow marks" subspace "Check-List" algorithm to improve writing

Provide your feedback

28th September 2018 : Fisher Linear Discriminant by Chiranjeevi
28th September 2018 : Fisher Linear Discriminant by Chiranjeevi

Talk summary:


Provide your feedback

24th August 2018 : Subband Weighting for Binaural Speech Source Localization by Karthik
24th August 2018 : Subband Weighting for Binaural Speech Source Localization by Karthik

Talk summary:


Provide your feedback

24th August 2018 : Speech Enhancement Using Deep Mixture of Experts Based on Hard Expectation Maximization by Pavan
24th August 2018 : Speech Enhancement Using Deep Mixture of Experts Based on Hard Expectation Maximization by Pavan

Talk summary:


Provide your feedback

24th August 2018 : Relating Articulatory Motions in Different Speaking Rates by Astha
24th August 2018 : Relating Articulatory Motions in Different Speaking Rates by Astha

Talk summary:


Provide your feedback

24th August 2018 : Whispered Speech to Neutral Speech Conversion Using Bidirectional LSTMs by Nisha
24th August 2018 : Whispered Speech to Neutral Speech Conversion Using Bidirectional LSTMs by Nisha

Talk summary:


Provide your feedback

17th August 2018 : Air-Tissue Boundary Segmentation in Real-Time Magnetic Resonance Imaging Video Using Semantic Segmentation with Fully Convolutional Networks by Valliappan
17th August 2018 : Air-Tissue Boundary Segmentation in Real-Time Magnetic Resonance Imaging Video Using Semantic Segmentation with Fully Convolutional Networks by Valliappan

Talk summary:


Provide your feedback

17th August 2018 : Automatic visual augmentation for concatenation based synthesized articulatory videos from real-time MRI data for spoken language training by Chiranjeevi
17th August 2018 : Automatic visual augmentation for concatenation based synthesized articulatory videos from real-time MRI data for spoken language training by Chiranjeevi

Talk summary:


Provide your feedback

17th August 2018 : Inferring speaker identity from articulatory motion during speech by Aravind
17th August 2018 : Inferring speaker identity from articulatory motion during speech by Aravind

Talk summary:


Provide your feedback

17th August 2018 : Low resource acoustic-to-articulatory inversion using bi-directional long short-term memory by Aravind
17th August 2018 : Low resource acoustic-to-articulatory inversion using bi-directional long short-term memory by Aravind

Talk summary:


Provide your feedback

3rd August 2018 : Responsive Website Tool to Rate the Pronunciation Quality by Abhishek Gaonkar
3rd August 2018 : Responsive Website Tool to Rate the Pronunciation Quality by Abhishek Gaonkar

Talk summary:


Provide your feedback

3rd August 2018 : Richer convolutional features for edge detection by Renuka
3rd August 2018 : Richer convolutional features for edge detection by Renuka

Talk summary:


Provide your feedback

27th July 2018 : Gottal Segmentation GUI in python by Varun
27th July 2018 : Gottal Segmentation GUI in python by Varun

Talk summary:


Provide your feedback

27th July 2018 : STUDY OF USE OF ARTICULATORY INFORMATION FOR ASR OF NEUTRAL AND WHISPERED SPEECH by Gokul
27th July 2018 : STUDY OF USE OF ARTICULATORY INFORMATION FOR ASR OF NEUTRAL AND WHISPERED SPEECH by Gokul

Talk summary:


Provide your feedback

24th July 2018 : Interpretibility in Machine learning (ML) by Deep
24th July 2018 : Interpretibility in Machine learning (ML) by Deep

Talk summary:

  • We have been deploying the ML algorithms ("black box models") in various problems (e.g. classification tasks), and as a result it has become imperative that we develop tools for interpretibility of these "black boxes" so as to enable their deployment in real life applications. My aim is to give a brief overview of this science of interpretibility

Provide your feedback

20th July 2018 : A study on acoustic-to-articulatory inversion for understanding inter-speaker dependency by Siddant
20th July 2018 : A study on acoustic-to-articulatory inversion for understanding inter-speaker dependency by Siddant

Talk summary:


Provide your feedback

20th July 2018 : Intonation classification using temporal structures in pitch contour by Atreyee
20th July 2018 : Intonation classification using temporal structures in pitch contour by Atreyee

Talk summary:


Provide your feedback

13th July 2018 : Rendering head gestures based on MO-CAP (OptiTrack) data by Varshini
13th July 2018 : Rendering head gestures based on MO-CAP (OptiTrack) data by Varshini

Talk summary:


Provide your feedback

13th July 2018 : Prediction of the air-tissue boundary in the upper airway of the vocal tract by Avinash
13th July 2018 : Prediction of the air-tissue boundary in the upper airway of the vocal tract by Avinash

Talk summary:


Provide your feedback

13th July 2018 : Implementation of frame selective dynamic programming based pitch estimation by Aswin
13th July 2018 : Implementation of frame selective dynamic programming based pitch estimation by Aswin

Talk summary:


Provide your feedback

13th July 2018 : A Maximum Likelihood Formulation to Exploit Heart Rate Variability for Robust Heart Rate Estimation from Facial Video by Raseena
13th July 2018 : A Maximum Likelihood Formulation to Exploit Heart Rate Variability for Robust Heart Rate Estimation from Facial Video by Raseena

Talk summary:


Provide your feedback

6th July 2018 : Detection and Delineation of P and T waves in an ECG signal by Prakhar
6th July 2018 : Detection and Delineation of P and T waves in an ECG signal by Prakhar

Talk summary:


Provide your feedback

6th July 2018 : Broad Phoneme Class Specific Deep Neural Network Based Speech Enhancement by Pavan
6th July 2018 : Broad Phoneme Class Specific Deep Neural Network Based Speech Enhancement by Pavan

Talk summary:


Provide your feedback

6th July 2018 : Classification between story-telling and poem recitation using head gesture of the talker by Anurag
6th July 2018 : Classification between story-telling and poem recitation using head gesture of the talker by Anurag

Talk summary:


Provide your feedback

6th July 2018 : Comparison of Cough, Wheeze and Sustained Phonations for Automatic Classification between Healthy subjects and Asthmatic patients by Shivani
6th July 2018 : Comparison of Cough, Wheeze and Sustained Phonations for Automatic Classification between Healthy subjects and Asthmatic patients by Shivani

Talk summary:


Provide your feedback

29th June 2018 : A Maximum Likelihood Formulation to Exploit Heart Rate Variability for Robust Heart Rate Estimation from Facial Video by Raseena
29th June 2018 : A Maximum Likelihood Formulation to Exploit Heart Rate Variability for Robust Heart Rate Estimation from Facial Video by Raseena

Talk summary:

  • Motivation for Non contact heart rate measurement Challenges in Non contact heart rate measurement The proposed maximum likelihood approach Experiments and results

Provide your feedback

29th June 2018 : Comparison of Cough, Wheeze and Sustained Phonations for Automatic Classification between Healthy subjects and Asthmatic patients by Shivani
29th June 2018 : Comparison of Cough, Wheeze and Sustained Phonations for Automatic Classification between Healthy subjects and Asthmatic patients by Shivani

Talk summary:

  • INTRODUCTION MOTIVATION PROPOSED METHOD DATASET EXPERIMENTAL SETUP RESULTS CONCLUSION AND FUTURE WORK

Provide your feedback

22nd June 2018 : Automatic visual augmentation for articulatory videos from real-time MRI data by Chandana
22nd June 2018 : Automatic visual augmentation for articulatory videos from real-time MRI data by Chandana

Talk summary:


Provide your feedback

15th June 2018 : A Brief introduction to Lungs anatomy, physiology, pathology and pulmonary function tests by Shivani
15th June 2018 : A Brief introduction to Lungs anatomy, physiology, pathology and pulmonary function tests by Shivani

Talk summary:

  • lungs anatomy and physiology explanation overview of lung function test pulmonary diseases Overview of 2 papers related to sound-based analysis or detection of pulmonary diseases

Provide your feedback

4th May 2018 : Git and GitHub by Nitin
4th May 2018 : Git and GitHub by Nitin

Talk summary:

  • What is Git and GitHub Demonstration of tool

Provide your feedback

23rd Feb 2018 : Introduction to Bootstrap by Kausthubha
23rd Feb 2018 : Introduction to Bootstrap by Kausthubha

Talk summary:

  • Introduction to Bootstrap File Structure Typography Different classes of buttons used in Bootstrap

Provide your feedback

10th Feb 2018 : Joint Learning of Phonetic Units and Word Pronunciations for ASR by Avni
10th Feb 2018 : Joint Learning of Phonetic Units and Word Pronunciations for ASR by Avni

Talk summary:

  • Problem Statement Background Model Formulation for the underlying problem Discussion

Provide your feedback

2nd Feb 2018 : Why Deep learning? by Valliappan
2nd Feb 2018 : Why Deep learning? by Valliappan

Talk summary:

  • DNN vs Linear Classifier Back Propagation Understanding Back-Propagation for Batch Normalisation Layer Introduction to CNN GPU's Speech and CNN

Provide your feedback

5th Dec 2018 : Enhanced voice user interface employing spatial filtration of signals from acoustic vector sensor by Abinay Reddy
5th Dec 2018 : Enhanced voice user interface employing spatial filtration of signals from acoustic vector sensor by Abinay Reddy

Talk summary:

  • One of the current challenges in automatic speech recognition (ASR) is robust recognition in noisy conditions. We will discuss the idea of using acoustic vector sensor to improve ASR in noisy conditions.

Provide your feedback

29th Dec 2017 : Connectionist temporal classification (CTC) by Achuth Rao
29th Dec 2017 : Connectionist temporal classification (CTC) by Achuth Rao

Talk summary:

  • CTC is one of a key component in the recent state of the automatic speech recognition by Google and deep speech2. We will discuss the key ideas and motivation involved in developing CTC.

Provide your feedback

22nd Dec 2017 : Time Scaling of Articulatory Motion in Speech Production by Astha Singh
22nd Dec 2017 : Time Scaling of Articulatory Motion in Speech Production by Astha Singh

Talk summary:

  • Introduction Problem Statement Approaches : Interpolation, Affine Invariant DTW and Interpolation Some Results

Provide your feedback

15th Dec 2017 : Arc-cosine kernels and neural networks by Pavan Karjol
15th Dec 2017 : Arc-cosine kernels and neural networks by Pavan Karjol

Talk summary:

  • Kernel functions Arc-cosine kernels Neural networks Conclusions

Provide your feedback

8th Dec 2017 : On the importance/unimportance of phase in speech signal processing by Prasanta
8th Dec 2017 : On the importance/unimportance of phase in speech signal processing by Prasanta

Talk summary:

  • Definition of Phase Key results (perception) Role of phase in speech enhacenment, watermarking, synthesis, recognition

Provide your feedback

1st Dec 2017 : The Quantum Bit (Qubit) by Karthik
1st Dec 2017 : The Quantum Bit (Qubit) by Karthik

Talk summary:

  • States of a qubit Information and Measurement of a qubit state Single qubit gates Multi qubit gates Bell States/ EPR pairs Quantum Entanglement

Provide your feedback

10th Nov 2017 : Wagner-Fisher string-to-string correction algorithm and its optimality by Chiranjeevi Yarra
10th Nov 2017 : Wagner-Fisher string-to-string correction algorithm and its optimality by Chiranjeevi Yarra

Talk summary:

  • Problem definition Wagner-Fisher algorithm Objective function Modifications to the objective function Optimality

Provide your feedback

3rd Nov 2017 : The impact of speaking rate on acoustic-to-articulatory inversion by Aravind Illa
3rd Nov 2017 : The impact of speaking rate on acoustic-to-articulatory inversion by Aravind Illa

Talk summary:

  • Speech production Acoustic to articulatory inversion Effect of rate on inversion

Provide your feedback

13th Oct 2017 : simple introduction to Blind Source Separation by Karthik
13th Oct 2017 : simple introduction to Blind Source Separation by Karthik

Talk summary:

  • Introduction to blind source separation Ambiguities due to permutation, scaling and Gaussianity Principle of Independent Component Analysis (ICA) Maximum Likelihood based algorithm for ICA

Provide your feedback

6th Oct 2017 : Partial Least Squares Regression (contd.) by Nisha Meenakshi
6th Oct 2017 : Partial Least Squares Regression (contd.) by Nisha Meenakshi

Talk summary:

  • Issues in Multiple Linear Regression Nonlinear Iterative Partial Least Squares (NIPALS) Algorithm Discussion

Provide your feedback

9th June 2017 : Audio-Visual Keyword Spotting by Astha Singh
9th June 2017 : Audio-Visual Keyword Spotting by Astha Singh

Talk summary:

  • Introduction Idea for implementing AV-KWS Feature extraction Audio, Visual HMM - Overview Fusion Strategy for audio and visual modalities output Expected Results

Provide your feedback

2nd June 2017 : Audio-Visual Speech Enhancement by Ajay Mahender Singh
2nd June 2017 : Audio-Visual Speech Enhancement by Ajay Mahender Singh

Talk summary:

  • Introduction to the problem statement Motivation Initial work - just speech Feature extraction The Menpo Project Visual features Enhancement techniques Conclusion and future work

Provide your feedback

26th May 2017 : Illumination Variation-Resistant Video-Based Heart Rate Measurement Using Joint Blind Source Separation and Ensemble Empirical Mode Decomposition by Raseena KT
26th May 2017 : Illumination Variation-Resistant Video-Based Heart Rate Measurement Using Joint Blind Source Separation and Ensemble Empirical Mode Decomposition by Raseena KT

Talk summary:

  • Photoplethysmography Joint Blind Source Seperation Estimating Heart Rate from face Video

Provide your feedback

19th May 2017 : non-ASR based keyword spotting by Samik Sadhu
19th May 2017 : non-ASR based keyword spotting by Samik Sadhu

Talk summary:

  • Motivation Recap of Poisson Process Models in Keyword Spotting Discriminative Training of Poisson Process Models in Keyword Spotting Unsupervised Online Learning of Poisson Process Models Posteriorgram Filtering based Keyword Spotting Future Scope of Work

Provide your feedback

5th May 2017 : Variational RNN by Pavan Karjol
5th May 2017 : Variational RNN by Pavan Karjol

Talk summary:

  • Dynamic Bayesian Networks Recurrent Neural Networks Variational Recurrent Neural Networks Experiments

Provide your feedback

28th April 2017 : Finite State Transducers and its Application in KALDI by Avni Rajpal
28th April 2017 : Finite State Transducers and its Application in KALDI by Avni Rajpal

Talk summary:

  • Motivation Basic terms and definitions Operations: particularly composition and determinization Speech Recognition using FST

Provide your feedback

21st April 2017 : WaveNet: A Generative Model for Raw Audio by Achuth Rao
21st April 2017 : WaveNet: A Generative Model for Raw Audio by Achuth Rao

Talk summary:

  • Recall the generative and discriminative models Generative models used in speech Why modeling direct audio is difficult How Wavenet overcome these difficulties How Wavenet combine both feature of generative and discriminative model features How single model can be used to solve 4-different problem in speech - (a) TTS (b) Multi speaker speech generation (c) Music generation (d) speech recognition.

Provide your feedback

14th April 2017 : Video Editing with Blender by Gaurav Fotedar
14th April 2017 : Video Editing with Blender by Gaurav Fotedar

Talk summary:

  • Introduction to the Blender VSE Extracting audio from video Cutting/cropping videos Replacing audio in a video with audio from another source changing frame rates Adding Subtitles Video Overlay Making Compilation Videos

Provide your feedback

31st March 2017 : Hypothesis Testing by Prasanta Ghosh
31st March 2017 : Hypothesis Testing by Prasanta Ghosh

Talk summary:

  • Definition Null and alternative hypothesis Test procedure Error in hypothesis testing Significance level Tests about a population mean Tests concerning a population proportion P-value

Provide your feedback

24th March 2017 : Hypothesis Testing by Prasanta Ghosh
24th March 2017 : Hypothesis Testing by Prasanta Ghosh

Talk summary:

  • Definition Null and alternative hypothesis Test procedure Error in hypothesis testing Significance level Tests about a population mean Tests concerning a population proportion P-value

Provide your feedback

10th March 2017 : Variational Auto encoder by Pavan Karjol
10th March 2017 : Variational Auto encoder by Pavan Karjol

Talk summary:

  • Introduction Stochastic gradient variational Bayes (SGVB) estimator Experiments and conclusion

Provide your feedback

24th February 2017 : Mock Presentations by
24th February 2017 : Mock Presentations by

Talk summary:

  • Pitch Prediction from Mel-frequency Cepstral Coefficients Using Sparse Spectrum Recovery by Achuth Rao A Comparative Study on the Effect of Different Codecs on Speech Recognition Accuracy Using Various Acoustic Modeling Techniques by Nisha Meenakshi Classification of Healthy Subjects and Patients with Essential Vocal Tremor using Empirical Mode Decomposition of High-Resolution Pitch Contour by Mekhala H S

Provide your feedback

17th February 2017 : automatic detection of syllable stress using sonority based\tprominence features by Chiranjeevi Yarra
17th February 2017 : automatic detection of syllable stress using sonority based\tprominence features by Chiranjeevi Yarra

Talk summary:

  • How sonority is useful? Existing works on measuring sonority Proposed approach Results Conclusion

Provide your feedback

10th February 2017 : A COMPARATIVE STUDY OF ACOUSTIC-TO-ARTICULATORY INVERSION FOR NEUTRAL AND WHISPERED SPEECH by Aravind Illa
10th February 2017 : A COMPARATIVE STUDY OF ACOUSTIC-TO-ARTICULATORY INVERSION FOR NEUTRAL AND WHISPERED SPEECH by Aravind Illa

Talk summary:

  • Introduction Data Collection Experimental Set-up Results Conclusion

Provide your feedback

3rd February 2017 : Automatic detection and diagnosis of phoneme pronunciation quality: a review by Chiranjeevi Yarra
3rd February 2017 : Automatic detection and diagnosis of phoneme pronunciation quality: a review by Chiranjeevi Yarra

Talk summary:

  • Introduction mispronunciation detection Error diagnosis Conclusion

Provide your feedback

20th January 2017 : Classification of Voluntary Cough Airflow Patterns for Prediction of Abnormal Spirometry by Shivani Yadav
20th January 2017 : Classification of Voluntary Cough Airflow Patterns for Prediction of Abnormal Spirometry by Shivani Yadav

Talk summary:

  • INTRODUCTION -- 1) What is Spirometry and its variables, 2) What is the need of automatic classification using cough flow pattern STUDY DESIGN METHOD USED RESULT

Provide your feedback

13th January 2017 : Poisson Process Based Keyword Spotting and its Variants by Samik Sadhu
13th January 2017 : Poisson Process Based Keyword Spotting and its Variants by Samik Sadhu

Talk summary:

  • Introduction Generating Events Model Phonetic Events Keyword Searching with Poisson Process Models Bayesian Approach to training A Better phonetic event selection technique One of our works in PPM - Discriminative Training of PPM Receiver Operating Curves (ROC) and Figure of Merit (FOM) Conclusion

Provide your feedback

6th January 2017 : The Task Dynamic Model of Speech Production by Nisha Meenakshi
6th January 2017 : The Task Dynamic Model of Speech Production by Nisha Meenakshi

Talk summary:

  • What is articulatory phonology? How do you model the movement of articulators? Applications of the task dynamic model.

Provide your feedback

30th December 2016 : Degenerate Unmixing Estimation Technique (DUET) by Girija Ramesan Karthik
30th December 2016 : Degenerate Unmixing Estimation Technique (DUET) by Girija Ramesan Karthik

Talk summary:

  • What does that even mean? W-Disjoint Orthogonality Approximate W-Disjoint Orthogonality of Speech ML Parameter Estimation for 2-mixture Speech Separation The phase wrapping problem Weighted histogram based Estimators for 2-mixture Speech Separation

Provide your feedback

16th December 2016 : Why does deep and cheap learning work so well? -- Part II by Achuth Rao
16th December 2016 : Why does deep and cheap learning work so well? -- Part II by Achuth Rao

Talk summary:

  • We focus out attention approximation of radial function[fun(|x|)]. We construct simple radial function and show how that can approximated by three layer network with complexity poly(d), but the 2-layer network requires exp(d) units.(d is the dimension of input) We show that the simple function can generalize to any radial function.

Provide your feedback

2nd December 2016 : Associative Networks by Karthik Ramesan
2nd December 2016 : Associative Networks by Karthik Ramesan

Talk summary:

  • What is association? Type of Associative Networks Linear & Non linear Associators Linear & Non linear Associators Energy function Conclusion

Provide your feedback

25th November 2016 : Mail server by Kausthubha
25th November 2016 : Mail server by Kausthubha

Talk summary:

  • Introduction How mail server works SMTP/POP3 Summary

Provide your feedback

18th November 2016 : Keyword Spotting in Continuous Speech; An Overview of Different Approaches to Keyword Spotting by Samik Sadhu
18th November 2016 : Keyword Spotting in Continuous Speech; An Overview of Different Approaches to Keyword Spotting by Samik Sadhu

Talk summary:

  • Keyword Spotting What is So Special in That? Going Deep! - DNNs, CNNs Going Sparse! - Dictionary Learning Go Semi(pseudo!) Unsupervised!- Query by Example Go Completely Unsupervised!

Provide your feedback

11th November 2016 : Automatic Prosodic Event Detection by Vijayakrishna
11th November 2016 : Automatic Prosodic Event Detection by Vijayakrishna

Talk summary:

  • Introduction ToBI convention explaination Existing methods to tackle the prosodic event detection problem Conclusion and future work

Provide your feedback

28th October 2016 : Why does deep and cheap learning work so well? by Achuth Rao
28th October 2016 : Why does deep and cheap learning work so well? by Achuth Rao

Talk summary:

  • Intro about neural network Proof overview of neural network universal approximation. Visual proof How the depth helps.

Provide your feedback

21st October 2016 : Return of the Savitzky-Golay (SG) filters by Nisha Meenakshi
21st October 2016 : Return of the Savitzky-Golay (SG) filters by Nisha Meenakshi

Talk summary:

  • Differentiation filter Moment preservation property of SG filters. When is an SG filter an optimal filter?

Provide your feedback

14th October 2016 : Savitzky-Golay (SG) filters by Nisha Meenakshi
14th October 2016 : Savitzky-Golay (SG) filters by Nisha Meenakshi

Talk summary:

  • Filter formulation Properties of the SG filter with illustrative examples Exemplary application: ECG denoising. Conclusions

Provide your feedback

7th October 2016 : Generalized Triangular Decomposition in Transform coding by Aravind Illa
7th October 2016 : Generalized Triangular Decomposition in Transform coding by Aravind Illa

Talk summary:

  • Theorem definition Karhunen-Loeve Transform (KLT) Prediction-based lower triangular transform (PLT) GTD Transform Coder conclusions

Provide your feedback

30th September 2016 : The Perron-Frobenius theorem and It's applications -- Part 2 by Pavan Karjol
30th September 2016 : The Perron-Frobenius theorem and It's applications -- Part 2 by Pavan Karjol

Talk summary:

  • Theorem definition Proof of the theorem. Applications involving Markov chains and page rank algorithm conclusions

Provide your feedback

23rd September 2016 : The Perron-Frobenius theorem and It's applications by Pavan Karjol
23rd September 2016 : The Perron-Frobenius theorem and It's applications by Pavan Karjol

Talk summary:

  • Theorem definition Proof of the theorem. Applications involving Markov chains and page rank algorithm conclusions

Provide your feedback

16th September 2016 : Allpass modeling of phase spectrum of speech signal by Prasanta Ghosh
16th September 2016 : Allpass modeling of phase spectrum of speech signal by Prasanta Ghosh

Talk summary:

  • significance of phase spectrum in speech signal processing Allpass modeling of phase spectrum of speech Applications of allpass modeling including formant tracking and GCI identification conclusions

Provide your feedback

2nd September 2016 : A General Regression Neural Network by Aravind Illa
2nd September 2016 : A General Regression Neural Network by Aravind Illa

Talk summary:

  • Introduction to GRNN Advantages Limitations Conclusion

Provide your feedback

19th August 2016 : Linear regression fit under Laplacian noise by Chiranjeevi Yarra
19th August 2016 : Linear regression fit under Laplacian noise by Chiranjeevi Yarra

Talk summary:

  • Problem definition Problem formulation Solution under special cases -- When C=0, When m=0 Generic solution -- Alternating minimization, Samples based solution Experimental results

Provide your feedback

12th August 2016 : Automatic recognition of social roles using long term role transitions in small group interactions by Gaurav Fotedar
12th August 2016 : Automatic recognition of social roles using long term role transitions in small group interactions by Gaurav Fotedar

Talk summary:

  • Introduction to Roles Data Proposed Method Experiments & Results Conclusion & Future Work

Provide your feedback

5th August 2016 : Distributed Maximum Likelihood estimation of GMM parameters by Varsha Satish
5th August 2016 : Distributed Maximum Likelihood estimation of GMM parameters by Varsha Satish

Talk summary:

  • Introduction Basic EM Distributed EM Application of Distributed EM Two distributed EM algorithms implementation of EM Algorithm in C problems faced while implementing it on C

Provide your feedback

29th July 2016 : Classification of Healthy Subjects and Patients with Essential vocal tremor using Empirical Mode Decomposition of High Resolution Pitch Contour by Mekhala
29th July 2016 : Classification of Healthy Subjects and Patients with Essential vocal tremor using Empirical Mode Decomposition of High Resolution Pitch Contour by Mekhala

Talk summary:

  • Introduction Obtaining High Resolution Pitch using Glottal Closure Instants(GCIs) Pitch Oscillation Characteristics (POC) extraction using Empirical Mode Decomposition Experimentation - Baseline and Evaluation metric Results and discussion

Provide your feedback

29th July 2016 : Audio Visual Synthesis by Valliappan
29th July 2016 : Audio Visual Synthesis by Valliappan

Talk summary:

  • Introduction to the Problem Dataset (PRAV Corpus) Approaches (Dynamic programming and LSTM - RNN) Results Conclusion and Further Work

Provide your feedback

22nd July 2016 : Music Reconstruction, Separation and synthesis by Anurendra
22nd July 2016 : Music Reconstruction, Separation and synthesis by Anurendra

Talk summary:

  • Problem formulation Background Our new model Theoretical derivations and solutions Implementation issues

Provide your feedback

15th July 2016 : Finding a relation between acoustic features of speech and head motion of the speaker by Pranav
15th July 2016 : Finding a relation between acoustic features of speech and head motion of the speaker by Pranav

Talk summary:

  • Introduction Brief idea about HMM Data Acquisition and Preparation Different methods that have been adopted previously to cluster head motion The different tests we performed to check for a relation between speech and head motion Scope for further research

Provide your feedback

11th July 2016 : Implementing an intonation practice environment for voisTUTOR by Anand
11th July 2016 : Implementing an intonation practice environment for voisTUTOR by Anand

Talk summary:

  • Why intonation is important in speech The methodology undertaken in creating the stylisation (Intonation practice) Results Further work

Provide your feedback

8th July 2016 : Sparse modelling of Residual for Vocal tract estimation at high pitch by Chaithya
8th July 2016 : Sparse modelling of Residual for Vocal tract estimation at high pitch by Chaithya

Talk summary:

  • Introduction Problem Statement Earlier Methods Sparse Modelling of residual Properties GCI Corrective Algorithm

Provide your feedback

8th July 2016 : Detection and delineation of SLEEP APNEA and HYPOPNEA from EDR-ECG Derived Respiratory Signal by Salma B
8th July 2016 : Detection and delineation of SLEEP APNEA and HYPOPNEA from EDR-ECG Derived Respiratory Signal by Salma B

Talk summary:

  • INITIAL WORK UNDERSTANDING OF THE ECG LINK OF RESPIRATION AND ECG MATHEMATICAL IMPLEMENTATION

Provide your feedback

4th July 2016 : Identification and labelling of prosodic groups in utterances using ToBI by Vaidhya
4th July 2016 : Identification and labelling of prosodic groups in utterances using ToBI by Vaidhya

Talk summary:

  • What is ToBI What are its applications Explanation of the 4 tiers in brief Explaining the tone tier in detail

Provide your feedback

4th July 2016 : Carnatic Music app in Android by Priyadarshini S
4th July 2016 : Carnatic Music app in Android by Priyadarshini S

Talk summary:

  • Different components of carnatic music Implementation of various aspects of carnatic music training. Online feedback for practice sessions

Provide your feedback

30th June 2016 : Comparative study of the pulse rate estimation from facial video under different video compression schemes by Paridhi Maheshwari
30th June 2016 : Comparative study of the pulse rate estimation from facial video under different video compression schemes by Paridhi Maheshwari

Talk summary:

  • Prior Methods Sparse Spectral Peak Tracking Algorithm Motivation Database & Recording Setup I & II Results Conclusion

Provide your feedback

30th June 2016 : Glottal source modelling for improving text-to-speech (TTS) systems by Tom Francis
30th June 2016 : Glottal source modelling for improving text-to-speech (TTS) systems by Tom Francis

Talk summary:

  • Introduction A biologically inspired glottal model for TTS A novel parameterization for the glottal waveform using the beta distribution Conclusion

Provide your feedback

24th June 2016 : Rank sparsity incoherence for matrix decomposition by Pavan Karjol
24th June 2016 : Rank sparsity incoherence for matrix decomposition by Pavan Karjol

Talk summary:

  • Introduction Problem formulation Conditions for unique decomposition Results and conclusions

Provide your feedback

17th June 2016 : Speaker Verification by Achuth Rao
17th June 2016 : Speaker Verification by Achuth Rao

Talk summary:

  • introduction to speaker verification Models for handling inter speaker variability Models for handling inter session variability Models that can handle both- i-vector

Provide your feedback

3rd June 2016 : Face and Body Gesture Recognition and Analysis by Dr. Tanaya Guha
3rd June 2016 : Face and Body Gesture Recognition and Analysis by Dr. Tanaya Guha

Talk summary:

  • We will cover two aspects of gesture understanding - recognition and analysis. In the first part, we will discuss sparse representation-based classification algorithms for recognizing face and body gestures in videos. In the second part, we'll concentrate on analyzing facial gestures of children with autism using motion capture (mocap) data.

Provide your feedback

27th May 2016 : SPIRE-ABC: An online tool for acoustic-unit boundary correction (ABC) via crowdsourcing by Kausthubha N K
27th May 2016 : SPIRE-ABC: An online tool for acoustic-unit boundary correction (ABC) via crowdsourcing by Kausthubha N K

Talk summary:

  • Introduction to Annotation Motivation for annotation Online tool for the annotation (wavesurfer.js) and it's limitations Modifications made to achieve the proposed system Hands-on session to use the online tool

Provide your feedback

20th May 2016 : Highs in my Life and my Selling Mishaps by Sanjeev Mittal
20th May 2016 : Highs in my Life and my Selling Mishaps by Sanjeev Mittal

Talk summary:

  • SPIRE recipe for super speaking skills to sell your ideas from stage Show case videos: two great public talks based on the similar recipe Open discussion on the recipe Open platform: opportunity if anyone wish to practice their ideas they can utilize the occasion to do a flash talk based on the framework introduced

Provide your feedback

29th April 2016 : Comparison of acoustic to articulatory inversion of ALS patients and healthy controls by Neha Koundal
29th April 2016 : Comparison of acoustic to articulatory inversion of ALS patients and healthy controls by Neha Koundal

Talk summary:

  • Comparison of acoustic to articulatory inversion of ALS patients and healthy controls

Provide your feedback

16th April 2016 : Acoustic based speech rate estimation using data driven approaches by Chiranjeevi Yarra
16th April 2016 : Acoustic based speech rate estimation using data driven approaches by Chiranjeevi Yarra

Talk summary:

  • Introduction NMF based speech rate estimation Mode-shape based peak detection strategy for speech rate estimation Experimental results Conclusions

Provide your feedback

17th March 2016 : ICASSP poster talks by Chiranjeevi, Navaneet, Prasanta
17th March 2016 : ICASSP poster talks by Chiranjeevi, Navaneet, Prasanta

Talk summary:

  • A ROBUST SPEECH RATE ESTIMATION BASED ON THE ACTIVATION PROFILE FROM THE SELECTED ACOUSTIC UNIT DICTIONARY MULTIPLE SPECTRAL PEAK TRACKING FOR HEART RATE MONITORING FROM PHOTOPLETHYSMOGRAPHY SIGNAL DURING INTENSIVE PHYSICAL EXERCISE Better acoustic normalization in subject independent acoustic-to-articulatory inversion: benefit to recognition

Provide your feedback

4th March 2016 : Spatial Hearing by Karthink Ramesan
4th March 2016 : Spatial Hearing by Karthink Ramesan

Talk summary:

  • INTRODUCTION TO SPATIAL HEARING CUES THAT HELP IN SPATIAL HEARING STRUCTURAL APPROXIMATIONS FOR BINAURAL HEARING CONCLUSIONS

Provide your feedback

19th February 2016 : A model selection approach to audio segmentation via the Bayesian Information Criterion (BIC) by Nisha Meenakshi
19th February 2016 : A model selection approach to audio segmentation via the Bayesian Information Criterion (BIC) by Nisha Meenakshi

Talk summary:

  • Problem1: Audio Segmentation Problem2: Model Selection BIC for model selection How can audio segmentation be viewed as a model selection problem? Literature: BIC in audio segmentation.

Provide your feedback

12th February 2016 : Phase Processing for Single-Channel Speech Enhancement by Pavan Karjol
12th February 2016 : Phase Processing for Single-Channel Speech Enhancement by Pavan Karjol

Talk summary:

  • INTRODUCTION ITERATIVE ALGORITHMS FOR PHASE ESTIMATION SINUSOIDAL MODEL-BASED PHASE ESTIMATION GROUP DELAY AND TRANSIENT PROCESSING RELATION BETWEEN PHASE AND MAGNITUDE ESTIMATION RESULTS AND CONCLUSION

Provide your feedback

5th February 2016 : Speaker Verification methods by Achuth Rao
5th February 2016 : Speaker Verification methods by Achuth Rao

Talk summary:

  • Introduction to speaker verification GMM based methods GMM UBM based methods(MAP adaptation) EMAP adaptation Eigen voices

Provide your feedback

29th January 2016 : HTML & CSS Building a Static Website by Gaurav Fotedar
29th January 2016 : HTML & CSS Building a Static Website by Gaurav Fotedar

Talk summary:

  • HTML Structure HTML Basic Elements HTML Forms Basic CSS Syntax (Inline and File) HTML 5 Elements CSS3 elements

Provide your feedback

15th January 2016 : SIGNAL SUBSPACE APPROACH FOR SPEECH ENHANCEMENT by Pavan Karjol
15th January 2016 : SIGNAL SUBSPACE APPROACH FOR SPEECH ENHANCEMENT by Pavan Karjol

Talk summary:

  • Speech enhancement overview Signal and noise models Signal and noise subspaces Linear estimators (TDC and LDC) Results and conclusions

Provide your feedback

1st January 2016 : Speech Analysis/Synthesis Based on a Sinusoidal Representation by Aravind Illa
1st January 2016 : Speech Analysis/Synthesis Based on a Sinusoidal Representation by Aravind Illa

Talk summary:

  • Sinusoidal Speech Model Estimation of Speech Parameters Frame-To-Frame Peak Matching Synthesis System Extension to Harmonic Models

Provide your feedback

11th December 2015 : Voice Conversion by Achuth Rao
11th December 2015 : Voice Conversion by Achuth Rao

Talk summary:

  • Overview GMM based voice conversion modifications to GMM for voice conversion Frequency warping and amplitude scaling for voice conversion.

Provide your feedback

11th December 2015 : On generative and discriminative models by Prasanta Kumar Ghosh
11th December 2015 : On generative and discriminative models by Prasanta Kumar Ghosh

Talk summary:

  • What is generative model (including examples)? What is discriminative model (including examples)? Asymptotic performance of generative and discriminative models Discriminative training of generative model - blending generative and discriminative models

Provide your feedback

20th November 2015 : Blind Source Separation Using Wigner-Ville Distribution (WVD) by Chiranjeevi Yarra
20th November 2015 : Blind Source Separation Using Wigner-Ville Distribution (WVD) by Chiranjeevi Yarra

Talk summary:

  • Problem statement Problem formulation with WVD Joint diagonalization Simulation results.

Provide your feedback

5th November 2015 : Language models and an introduction to the IRSTLM toolkit by Nisha Meenakshi
5th November 2015 : Language models and an introduction to the IRSTLM toolkit by Nisha Meenakshi

Talk summary:

  • What are language models? Where are they used? How does the IRSTLM toolkit perform language modeling? A few examples of IRSTLM implementation.

Provide your feedback

30th October 2015 : vi basics & survival skills by Sanjeev Mittal
30th October 2015 : vi basics & survival skills by Sanjeev Mittal

Talk summary:

  • various states & transition in vi. quick basic survival comamnds. day to day commands. advanced commands. pit falls & troubleshooting.

Provide your feedback

22nd October 2015 : Understanding OOPS Concepts using Java by Gaurav Fotedar
22nd October 2015 : Understanding OOPS Concepts using Java by Gaurav Fotedar

Talk summary:

  • Classes & Objects Polymorphism Inheritance Method and Operator Override Interfaces and Abstract Classes Arrays and String Class Generics

Provide your feedback

16th October 2015 : Robust real-time pulse rate estimation from facial video using sparse spectral peak tracking by Aditya Gaonkar
16th October 2015 : Robust real-time pulse rate estimation from facial video using sparse spectral peak tracking by Aditya Gaonkar

Talk summary:

  • Estimating pulse rate of subjects from facial videos. Usage of "Independent Component Analysis" in Biomedical Signal Processing. An overview of the proposed method. Discussion on the obtained results.

Provide your feedback

9th October 2015 : Video Lecture by
9th October 2015 : Video Lecture by

Talk summary:

  • Finite state transducer Speech recognition using finite state transducer

Provide your feedback

1st October 2015 : Low Rank and Sparse Matrix Decomposition by Jitendra Kumar Dhiman
1st October 2015 : Low Rank and Sparse Matrix Decomposition by Jitendra Kumar Dhiman

Talk summary:

  • Type of the problem : Inverse problem Problem formulation for low rank and sparse matrix decomposition Problem solution using "Augmented Lagrangian method of multipliers" Application to musical noise removal for speech signal Abstract: Inverse problems arise in many applications of science and engineering. There have been several approaches to solve such problems. We will discuss one particular type of inverse problem "Low rank and sparse matrix decomposition". In this type of problem, we are given the data in the form of a matrix which exhibit the property of being sum of two unknown matrices where one of these unknown matrices is low rank and the other is sparse. The goal is to achieve such decomposition of the given matrix. This problem can be solved in an optimization framework. Although there are many algorithms available to solve this problem, we will focus on one particular optimization algorithm (Augmented Lagrangian Method). Finally, we will apply the algorithm for musical noise separation for speech signals In order to separate musical noise from denoised speech signal, the algorithm exploits the structure of musical noise being sparse in time-frequency domain and speech being low rank.

Provide your feedback

25th September 2015 : Estimation of the air-tissue boundaries (ATBs) of the vocal tract in the mid-sagittal plane from electromagnetic articulograph (EMA) data by Pattem Ashok Kumar
25th September 2015 : Estimation of the air-tissue boundaries (ATBs) of the vocal tract in the mid-sagittal plane from electromagnetic articulograph (EMA) data by Pattem Ashok Kumar

Talk summary:

  • Introduction to EMA and real time magnetic resonance imagining (rtMRI) Co-registration of the EMA data and ATBs in the rtMRI Estimation of the ATBs from registered EMA Results Discussion on the quality of estimated ATBs

Provide your feedback

18th September 2015 : Speech Beyond Speech - IS2015? by Prasanta Ghosh
18th September 2015 : Speech Beyond Speech - IS2015? by Prasanta Ghosh

Talk summary:

  • SPIRE lab's paper presentations Some good/significant works latest trends Challenges New Tools/Datasets Dresden

Provide your feedback

11th September 2015 : Fundamentals of HMM-based speech synthesis. by Achuth Rao
11th September 2015 : Fundamentals of HMM-based speech synthesis. by Achuth Rao

Talk summary:

  • Vocoding techniques Speech Parameter modeling and generation algorithm Spectrum parameter F0 parameter Context-clustering Advantage and disadvantages

Provide your feedback

4th September 2015 : A discriminative analysis within and across voiced and unvoiced consonants in neutral and whispered speech in multiple Indian languages. by Nisha Meenakshi
4th September 2015 : A discriminative analysis within and across voiced and unvoiced consonants in neutral and whispered speech in multiple Indian languages. by Nisha Meenakshi

Talk summary:

  • Typically, voiced consonants are voiceless when whispered, as whispered speech lacks the vocal chord vibrations. Therefore, we ask the following questions. Is the discrimination between the voiced and unvoiced (V-UV) consonants still preserved in whispered speech? Is the variation of the acoustics from neutral to whispered speech, consonant specific? Does language affect V-UV consonant discrimination?

Provide your feedback

29th August 2015 : Speech recognition using HTK by Amber Afshan
29th August 2015 : Speech recognition using HTK by Amber Afshan

Talk summary:

  • Brief introduction about HMM and Viterbi recognition Tutorial on building a basic recognition system using HTK -- Preparing data, Training, Recognition. HLDA transform. Speaker adaptation. GMM using HTK.

Provide your feedback

28th August 2015 : Video Lecture by
28th August 2015 : Video Lecture by

Talk summary:

  • A brief history of Speech recognition The Probabilistic Approach Feature Extraction, Acoustic Modelling, Language Modelling, Search. Where we stand.

Provide your feedback

21st August 2015 : Android App Development by Ataur Rehman
21st August 2015 : Android App Development by Ataur Rehman

Talk summary:

  • What is Android? Android internals. Briefing about applicaion Building Blocks. Steps to develop Hello Android App in android studio. Demo of some crazy android Apps.

Provide your feedback

31st July 2015 : Implementation of Automatic Gender Classification using Normal and Whispered speech in Android by Shashidhar Prabhu
31st July 2015 : Implementation of Automatic Gender Classification using Normal and Whispered speech in Android by Shashidhar Prabhu

Talk summary:

  • Finding the Pitch in Neutral Speech. Recording Whisper and finding the MFCC Features of the recorded speech. SVM Modeling. Results.

Provide your feedback

31st July 2015 : Classification of Voiced and Unvoiced frames in speech using Periodicity Transforms by Shashidhar Prabhu
31st July 2015 : Classification of Voiced and Unvoiced frames in speech using Periodicity Transforms by Shashidhar Prabhu

Talk summary:

  • Introduction to Periodicity Transforms. Choosing the best algorithm in Periodicity Transforms for the classification. Feature Extraction and SVM Modeling. Results.

Provide your feedback

31st July 2015 : Part-1: Insight into real time magnetic resonance imagining (rtMRI) and its applications towards the understanding of human speechPart-2: Automatic classification of eating conditions from speech using acoustic feature selection and a set of hierarchical support vector machine classifiers by Abhay Prasad
31st July 2015 : Part-1: Insight into real time magnetic resonance imagining (rtMRI) and its applications towards the understanding of human speechPart-2: Automatic classification of eating conditions from speech using acoustic feature selection and a set of hierarchical support vector machine classifiers by Abhay Prasad

Talk summary:

  • Introduction to rtMRI Voice Activity Detection using rtMRI Broad class phonetic recognition rtMRI Extraction of variant and invariant components of speech and its application towards speaker identification using rtMRI.

Provide your feedback

3rd July 2015 : An introduction to the nature of whispered speech and an overview of LTLEV as a feature for whisper activity detection (WAD) by Nisha Meenakshi
3rd July 2015 : An introduction to the nature of whispered speech and an overview of LTLEV as a feature for whisper activity detection (WAD) by Nisha Meenakshi

Talk summary:

  • The differences between the nature of whispered speech and neutral speech will be discussed, with a few illustrations to aid understanding. The use of the newly developed signal characteristic- Long Term Logarithmic Energy Variation (LTLEV) will be explained. The performance assessment of this feature and 4 other baseline schemes, for WAD in the presence of 8 different noises.

Provide your feedback

26th June 2015 : Light on Photoplethysmography signal which requires processing by Vijitha Periyasamy
26th June 2015 : Light on Photoplethysmography signal which requires processing by Vijitha Periyasamy

Talk summary:

  • Insight into Photoplethysmography (PPG) Challenges in processing PPG Paths to analyse PPG Opportunities for contribution in PPG

Provide your feedback

22nd June 2015 : Exploring the use of discriminative dictionary learning for enhancement of audio in additive magnetic resonance imaging noise by Ataur Rehman
22nd June 2015 : Exploring the use of discriminative dictionary learning for enhancement of audio in additive magnetic resonance imaging noise by Ataur Rehman

Talk summary:

  • Introduction to mri recording. Advantages and problems (noisy speech recording) with mri recording. Techniques to enhance the noisy speech recording ie. NMF, PLCA, DDL algorithms based audio enhancement. Discuss consequences of aforementioned algorithms. Conclusions and future work.

Provide your feedback