Back to projects

Emotional Report

Emotional Report

Description

Speech Emotion Recognition & ASR - Emotional Report

🎤 The Project

Emotional Report is a sophisticated AI application dedicated to Speech Emotion Recognition (SER) and Automatic Speech Recognition (ASR). It extracts both what is said (text) and how it is said (emotion) from audio signals.

🛠️ Technical Implementation

Audio Processing Pipeline

  • Feature Extraction: Used Librosa for MFCC extraction, transforming raw signals into AI-ready data.
  • Preprocessing: Implemented padding and truncation to normalize audio segments to 10 seconds.

Deep Learning Architecture

  • Emotion Model: Custom PyTorch classifier using a BiLSTM architecture with an Attention Mechanism. This structure allows the model to focus on the most expressive moments in the audio.
  • Transcription (ASR): Integrated the state-of-the-art Wav2Vec2 model (facebook/wav2vec2-large-xlsr-53-french) via Hugging Face for precise French transcription.

User Interface & Deployment

  • Dashboard : Interactive Streamlit web app with real-time recording, probability visualization, and analysis history.
  • Scalability : Containerized with Docker for seamless deployment in any environment (Hugging Face Spaces, local servers).

Technologies

PythonWav2Vec2LSTMStreamlitDeep Learning

Links