A robust speech recognition and natural language processing system that transcribes audio content and performs advanced text analysis in real-time. Perfect for meetings, lectures, and audio content analysis.
Audio Input β Format Conversion β Preprocessing β Chunking β Recognition β Text Output
- Format Conversion:
- Uses
pydub
for lossless format conversion - Supports sample rate adjustment (default: 16kHz)
- Maintains audio quality during conversion
- Uses
- Uses Google Speech Recognition API
- Implements error handling and retry mechanism
- Supports chunked processing for long audio files
- Separate threads for audio recording and processing
- Thread-safe queue for audio chunks
- Real-time processing pipeline
- Graceful degradation for API failures
- Automatic retry mechanism
- Error logging and recovery
- Implements chunked processing for large files
- Uses generators for memory-efficient processing
- Cleanup of temporary files
# config.py
AUDIO_CONFIG = {
'SAMPLE_RATE': 16000,
'CHANNELS': 1,
'CHUNK_SIZE': 1024,
'FORMAT': pyaudio.paFloat32,
'RECORD_SECONDS': 5
}
NLP_CONFIG = {
'MIN_PHRASE_LENGTH': 3,
'MAX_PHRASE_LENGTH': 40,
'MIN_TOPIC_COHERENCE': 0.3,
'SENTIMENT_THRESHOLD': 0.05,
'MAX_SUMMARY_RATIO': 0.3
}
- π― Real-time audio transcription
- π Support for multiple audio formats (WAV, MP3, M4A, FLAC, OGG)
- π Audio preprocessing and noise reduction
- ποΈ Live recording capabilities
- π Sentiment Analysis using VADER
- π Automatic Text Summarization
- π Topic Modeling using LDA
- π Key Phrase Extraction
- π Real-time analysis updates
- π Web-based interface using Streamlit
- π€ File upload functionality
- β‘ Real-time processing feedback
- π Formatted analysis display
- Real-time Audio Waveform
- Spectrogram Analysis
- Sentiment Gauge
- Topic Distribution Charts
- Word Clouds
- Performance Metrics Dashboard
- Python 3.8 or higher
- pip package manager
- Virtual environment (recommended)
- Clone the repository
git clone https://github.com/ansh-info/SpeechSense.git
cd SpeechSense
- Create and activate virtual environment (Use python3.12)
# on macOS/Linux install these packgaes
brew install ffmpeg
brew install portaudio
brew install gcc
# Using Conda(Recommended)
conda create --name SpeechSense python=3.12
conda activate SpeechSense
# On macOS/Linux
python3.12 -m venv venv
source venv/bin/activate
- Install dependencies
pip install -r requirements.txt
- Install NLTK data
python setup_nltk.py
python setup_nlp.py
- Start the Streamlit interface
streamlit run app/main.py
- Open your browser and navigate to
http://localhost:8501
speech_recognition_project/
βββ app/
β βββ main.py # Main application
β βββ visualization.py # Visualization components
β βββ static/css/
βββ src/
β βββ audio_file_handler.py
β βββ audio_preprocessing.py
β βββ nlp_processor.py
β βββ realtime_transcription.py
β βββ speech_recognition.py
βββ tests/ # Test suite
βββ data/ # Data storage
- Transcription Accuracy: ~85%
- Processing Speed: 1.2x real-time
- Real-time Analysis Delay: <2 seconds
- Memory Usage: ~200MB baseline
- Added comprehensive visualization dashboard
- Implemented real-time metrics tracking
- Enhanced project structure and organization
- Improved error handling and stability
- Added export functionality for analysis results
- Initial release with basic functionality
- File-based transcription
- Basic NLP analysis
- Simple user interface
- Select "File Upload" from the sidebar
- Upload your audio file (WAV, MP3, M4A, FLAC, OGG)
- Click "Process Audio"
- View results in the analysis dashboard
- Select "Real-time Recording" from the sidebar
- Click "Start Recording" to begin
- Monitor real-time transcription and analysis
- Click "Stop Recording" to view complete analysis
- View transcription text
- Explore sentiment analysis
- Check topic distribution
- Generate and download reports
- Uses Google Speech Recognition API
- Supports multiple audio formats through format conversion
- Implements audio preprocessing for better recognition
- Real-time audio streaming and processing
- Sentiment Analysis using VADER algorithm
- Text summarization using frequency-based approach
- Topic modeling using Latent Dirichlet Allocation (LDA)
- Key phrase extraction using statistical methods
- Real-time transcription with minimal delay
- Efficient memory usage (~200MB baseline)
- Scalable for longer recordings
- Handles multiple audio formats efficiently
- Core Framework: Python 3.8+
- Speech Recognition: Google Speech Recognition API
- NLP Libraries: NLTK, scikit-learn
- Audio Processing: PyAudio, librosa, sounddevice
- Visualization: Streamlit, Plotly, Matplotlib, Altair
- Data Processing: NumPy, Pandas
- Fork the repository
- Create a new branch
git checkout -b feature/your-feature-name
- Install development dependencies
pip install -r requirements-dev.txt
python -m pytest tests/
Feature | Performance |
---|---|
Real-time Transcription Delay | <2s |
Audio Processing Speed | 1.2x real-time |
NLP Analysis Time | ~0.1s/KB |
Memory Usage (Baseline) | ~200MB |
Memory Usage (Peak) | ~500MB |
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch
- Commit your changes
- Push to the branch
- Open a Pull Request
- Multi-language support
- Speaker diarization
- Advanced sentiment analysis
- Custom topic models
- Mobile responsive interface
- Cloud deployment support
Q: What audio formats are supported? A: The system supports WAV, MP3, M4A, FLAC, and OGG formats.
Q: Can it transcribe in real-time? A: Yes, the system supports real-time transcription with minimal delay.
Q: How accurate is the sentiment analysis? A: The sentiment analysis achieves approximately 85% accuracy using the VADER algorithm.
Q: Can it handle long recordings? A: Yes, the system is optimized for both short and long recordings.
If you have any questions or need help, please:
- Check the FAQ section
- Search in Issues
- Open a new Issue if needed
This project is licensed under the MIT License - see the LICENSE file for details.
- Google Speech Recognition API
- NLTK Team
- scikit-learn developers
- Streamlit community
Ansh Kumar - [email protected]