← Back to projects

Emotional Report

Emotional Report

Description

Speech Emotion Recognition & ASR - Emotional Report

🎤 The Project

Emotional Report is a sophisticated AI application dedicated to Speech Emotion Recognition (SER) and Automatic Speech Recognition (ASR). It extracts both what is said (text) and how it is said (emotion) from audio signals.

🛠️ Technical Implementation

Audio Processing Pipeline

•Feature Extraction: Used Librosa for MFCC extraction, transforming raw signals into AI-ready data.
•Preprocessing: Implemented padding and truncation to normalize audio segments to 10 seconds.

Deep Learning Architecture

•Emotion Model: Custom PyTorch classifier using a BiLSTM architecture with an Attention Mechanism. This structure allows the model to focus on the most expressive moments in the audio.
•Transcription (ASR): Integrated the state-of-the-art Wav2Vec2 model (facebook/wav2vec2-large-xlsr-53-french) via Hugging Face for precise French transcription.

User Interface & Deployment

•Dashboard : Interactive Streamlit web app with real-time recording, probability visualization, and analysis history.
•Scalability : Containerized with Docker for seamless deployment in any environment (Hugging Face Spaces, local servers).

Technologies

PythonWav2Vec2LSTMStreamlitDeep Learning

Links