Copilot is an advanced assistant that uses speech recognition, natural language processing, and screen analysis to provide real-time assistance for various tasks. It combines multiple technologies to create an interactive and responsive AI companion.
- Speech Recognition: Captures and transcribes spoken input in real-time.
- Natural Language Processing: Utilizes OpenAI's API to generate contextual responses.
- Text-to-Speech: Converts AI responses to audible speech.
- Screen Capture and OCR: Analyzes on-screen content to provide context-aware assistance.
- Mouse Tracking: Monitors cursor position for potential context cues.
- Rust (latest stable version)
- Cargo (Rust's package manager)
- A valid OpenAI API key
- A valid Google Cloud API key with Speech-to-Text API enabled
- Tesseract OCR installed on your system
-
Clone the repository:
git clone https://github.com/doziestar/copilot.git cd copilot
-
Set up environment variables:
export OPENAI_API_KEY=openai_api_key export GOOGLE_API_KEY=google_api_key
-
Install dependencies:
cargo build
-
Install Tesseract OCR:
- On macOS:
brew install tesseract
- On Ubuntu:
sudo apt-get install tesseract-ocr
- On Windows: Download and install from Tesseract GitHub
- On macOS:
Run the program with:
cargo run
Once started, the copilot will:
- Listen for speech input for 10 seconds.
- Transcribe and process the speech.
- Generate an AI response.
- Speak the response aloud.
- Capture and analyze the screen content.
- Track mouse position.
This cycle repeats continuously until the program is terminated.
You can modify the following parameters in the code:
- Speech recognition duration (default: 10 seconds)
- OpenAI model (default: "gpt-3.5-turbo")
- OCR language (default: English)
- No audio input: Ensure your microphone is properly connected and set as the default input device.
- Speech recognition errors: Speak clearly and minimize background noise.
- OCR not working: Make sure Tesseract is properly installed and its path is correctly set.
- API errors: Verify that your API keys are correct and have the necessary permissions.
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.
- OpenAI for their powerful language model API
- Google Cloud for their Speech-to-Text API
- The Rust community for excellent libraries and tools