Skip to content

Quickstart Guide

Paci edited this page Jun 7, 2023 · 18 revisions

Installation

  1. Download the ZIP file for HOSCY
  2. Unpack it and start the executable
  3. If asked to, install .NET Runtime (Please use the DESKTOP x64 version, it will not work otherwise)
  4. Allow firewall and make sure your antivirus is not complaining
  5. Turn on OSC in your radial menu in VRChat and rejoin the instance
  6. HOSCY is ready to go, just select your preferred speech recognition mode in the "speech" tab and press the button labeled "Stopped" to start recognition

Extra steps I recommend

I highly recommend immediately switching from the "Windows Recognizer" to one of the following as they're much better

A. Whisper AI Recognizer

Whisper is a highly precise AI recognizer that uses both CPU and GPU for speech recognition

  1. Download an AI model here
    • If you have a powerful PC I recommend using Medium EN or Medium for multi-language needs
    • For weaker PCs I recommend Base EN or Base
    • Alternatively you could also go for a middle ground with Small EN or Small
    • Worst case scenario you can also use Tiny EN or Tiny
  2. Go to the "Speech" Page and press the button "Edit list" next to the "AI Model" dropdown
  3. Add the path of the .bin file, close the window and select the model in the dropdown
  4. That's it. Starting these usually takes a while so make sure you set the correct microphone

B. Vosk AI Recognizer

Vosk is a quite precise AI recognizer that uses CPU for speech recognition

  1. Download an AI model here
  2. Unzip the files
  3. Go to the "Speech" Page and press the button "Edit list" next to the "AI Model" dropdown
  4. Add the path of the folder containing all files, close the window and select the model in the dropdown
  5. That's it. Starting these usually takes a while so make sure you set the correct microphone

UI pages

HOSCY has multiple "pages" with different settings and usages

  • Main contains all your information and allows you to quickly mute, clear and stop recognition
  • Input features a manual input box with configurable presets
  • Speech contains all settings related to speech recognition including shortcuts and replacements, as well as the recognizer picker
  • API is all about external services like translation or remote speech recognition
  • Output contains all settings about the textbox and TTS
  • OSC lets you control your OSC parameters and routing
  • Config mostly includes logging information

Things I recommend checking out

  • Shortcuts and Replacements on the "Speech" page let you replace words with other words and let you trigger commands. Replacement looks for certain words and replaces them with others and Shortcuts does the same but replaces the entire message instead
  • Voice Commands allow you to do things like clearing your textbox with "clear" or control your currently playing media with "media skip", "media pause", "media resume" and more
  • Media Display display what you are currently listening to above your head (on output page)
  • AFK Timer and Counters Display how long you've been AFK and count parameters
  • OSC Commands let you control OSC parameters using for example your voice
  • OSC Parameters let you control HOSCY via your avatars radial

Video Example

https://youtu.be/hbMl33J_kYs

Clone this wiki locally