Some nice examples of Whisper prompts:
Whisper can do a crude form of speaker turn tracking (e.g. " - Hey how are you doing? - I'm doing good. How are you?", note that the token for " -" is suppressed by default and will need to be enabled manually.)
"Repeat after me: text to verify"
"As British would say:"
Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot Task Generalization
https://arxiv.org/abs/2305.11095
https://github.com/jasonppy/PromptingWhisper
https://arxiv.org/abs/2306.17103
Task: As a GPT-4 based lyrics transcription post-processor, your task is to analyze multiple ASR model-generated versions of a song’s lyrics and determine the most accurate version closest to the true lyrics. Also filter out invalid lyrics when all predictions are nonsense. Input: The input is in JSON format: {“prediction_1”: “line1;line2;...”, ...} Output: Your output must be strictly in readable JSON format without any extra text: { “reasons”: “reason1;reason2;...”, “closest_prediction”: <key_of_prediction> “output”: “line1;line2...” } Requirements: For the "reasons" field, you have to provide a reason for the choice of the "closest_prediction" field. For the "closest_prediction" field, choose the prediction key that is closest to the true lyrics. Only when all predictions greatly differ from each other or are completely nonsense or meaningless, which means that none of the predictions is valid, fill in "None" in this field. For the "output" field, you need to output the final lyrics of closest_prediction. If the "closest_prediction" field is "None", you should also output "None" in this field. The language of the input lyrics is English.