AI-powered feature analysis and categorization of musician performances and technique
PI: Dr. Mikhail Gilman (Associate Research Professor, NCSU)
Support: NCSU Data Science and AI Academy
Period of Performance: November 2024 — May 2025
Budget: $27,000
Summary: The rapid growth of Artificial Intelligence (AI) has led to a wealth of new AI-based applications, such as ChatGPT (Large Language Models), Alexa (speech recognition and natural language processors), and facial recognition (computer vision models). This has been enabled by the vast amount of collected text, audio, and video data and researchers who found novel ways to use the data for these use cases. In the context of human performance instruction, many AI models rely on similar datasets to measure success of the task. Even though skeletal tracking from video is possible, there can be errors due to camera viewpoint and its missing the micro muscular gestures, which could provide more insight on the user’s technique. For example, musical instructors can use video and audio data to evaluate a player’s note and tempo accuracy, but they would only have a rough estimate of body position to provide guidance for improvement.
We will address these deficiencies by augmenting the dataset with high-resolution motion data. For example, a pianist’s body can be tracked using LiDAR cameras (for depth and motion capture) and Electromyography (EMG) sensors (for micro gesture analysis during executions of musical gestures) to provide the data for more insightful analysis of body dynamics. Using AI-based approaches, we will establish the mapping between the musical features (notes, their duration and dynamics) and body motion and micro-gestures of the performer. Similar augmented datasets can be collected for various use cases of human performance, such as dance, sports, and speaker presentation.