← Back to projects
Emotional Report

Description
Speech Emotion Recognition & ASR - Emotional Report
🎤 The Project
Emotional Report is a sophisticated AI application dedicated to Speech Emotion Recognition (SER) and Automatic Speech Recognition (ASR). It extracts both what is said (text) and how it is said (emotion) from audio signals.
🛠️ Technical Implementation
Audio Processing Pipeline
- •Feature Extraction: Used Librosa for MFCC extraction, transforming raw signals into AI-ready data.
- •Preprocessing: Implemented padding and truncation to normalize audio segments to 10 seconds.
Deep Learning Architecture
- •Emotion Model: Custom PyTorch classifier using a BiLSTM architecture with an Attention Mechanism. This structure allows the model to focus on the most expressive moments in the audio.
- •Transcription (ASR): Integrated the state-of-the-art Wav2Vec2 model (facebook/wav2vec2-large-xlsr-53-french) via Hugging Face for precise French transcription.
User Interface & Deployment
- •Dashboard : Interactive Streamlit web app with real-time recording, probability visualization, and analysis history.
- •Scalability : Containerized with Docker for seamless deployment in any environment (Hugging Face Spaces, local servers).