A motion capture system or MoCap as it's generally called, is a system which tracks the 3D positions of several reflective markers attached to a subject. This technology has a wide variety of applications, from 3D animation for movies to gesture analysis. Optitrack is an industry leader providing various MoCap and 3D tracking products. The product specialized for facial motion capture consists of 7 Flex IR cameras system and the Arena software. The reflective markers can be attached to custom positions on the face or according to some pre-specified Optitrack templates. The system tracks the 3D x,y and z postions of each marker at 120 frames per second.
For the general problem statement, prosodic variations is considered as the training data (X) and the corresponding 3D coordinates as the labels (Y)
This paper takes a slightly different approach from the general problem statement and considers 3D coordinates as the training data. The paper aims to predict whether the subject is reciting a poem or telling a story based only on the coordinates (Head Gestures).
The paper does a thorough analysis of temporal synchrony between head gestures and prosodic patterns in spontaneous speech.
The paper considers a classification problem to identify speakers from head gestures