<< Click to Display Table of Contents >> Navigation: 3. Script Language > AI - Artificial Intelligence Commands > AIC. - Artificial Intelligence Command > ! Open AI - Whisper > Open AI - Whisper Commands |
MiniRobotLanguage (MRL)
AIC.Dictate Letter
Initiates voice recording and transcribes the audio.
Whisper is the "Speech to Text" Option from Open AI.
Intention
The Dictate Letter command is the easiest command to Dictate a longer Letter and get it transcribed into text.
The AIC.Dictate Letter command starts the recording process, showing you the Recording Form below.
This is shown while the Recording is running. This is shown while the recording is paused.
The AIC.Dictate Letter command is the "One stop get all" command and will do this:
•start the Synchronous recording using a unique, temporary file name
•start the Recording Button
• record until you press the "PAUSE" or the "STOP" - Button
•If you want to interrupt the recording, just press "PAUSE".
•If you want to end the "Dictation" press "STOP".
Internally pressing "PAUSE" will also start the transcription process.
This is done to prevent the recorded files from getting too long.
Depending on the length of the recorded Speech this may take some seconds.
The resulting text is kept internally and will altogether be returned when you press the STOP-Button.
Pressing "Pause" will internally start the Text transcription.
The transcribed Text is not shown until you press the "STOP"-Button.
Recording will Start immediately when you see the Recording Button.
It will last until you press "STOP" or "PAUSE".
If you press "PAUSE" you can rest and continue the recording when you are ready.
If you press "STOP" then the command will close the Recording Button and deliver you the Text.
Important:
WHISPER my need some time to transcribe longer Texts, therefore the longer your Speech,
the longer you will need to wait to get the final result.
Also Whisper has a limitation of 24 MB for upload. In our Tests a 30 Seconds Recording will take up to 385 kb.
Therefore this command has a limit of about 30 Minutes Recording Time,
after that the resulting MP3-File would be larger then 25 MB and can not be transcribed using WHISPER.
Using the PAUSE-Button will eliminate this Limit, as each "Segment" is transcribed by itself.
Means there is no real world limit, you should be able to dictate a full book using this command.
Just press Pause at least all 30 Minutes.
'***********************************
' AIC.-Sample
'***********************************
AIC.Set Key|file
AIC.Dictate Letter|$$OUT
' Here we get the result
DBP.$$OUT
ENR.
Once the recording is stopped (typically using the PAUSE or the STOP Button), the dictated text is transcribed.
If you used the PAUSE Button the Text will be kept internally until its complete. And returned once you press STOP.
You can Dictate Letter in a lot of languages (see below). While you can use the
The AIC.Set Language for Whisper command, to set the Input Language,
in most case that is not needed, as Whisper will identify the languages automatically.
Whisper will take into account the following Settings:
•response_format
•language
•temperature
•prompt
If you have Set the
Supported languages:
Afrikaans, Arabic, Armenian, Azerbaijani, Belarusian, Bosnian, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch,
English, Estonian, Finnish, French, Galician, German, Greek, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese,
Kannada, Kazakh, Korean, Latvian, Lithuanian, Macedonian, Malay, Marathi, Maori, Nepali, Norwegian, Persian, Polish, Portuguese,
Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swahili, Swedish, Tagalog, Tamil, Thai, Turkish, Ukrainian, Urdu,
Vietnamese, And Welsh.
You can use the Command: AIC.Set Format for Whisper to decide which Output-.Format you want.
The default format is "text".
Supported Output Formats:
Number |
Format |
Description |
0 |
text |
Plain text |
1 |
vtt |
WebVTT (Web Video Text Tracks) |
2 |
srt |
SubRip Text (Subtitles) |
3 |
raw |
Raw data |
4 |
json |
JSON format |
5 |
verbose_json |
Verbose JSON format |
Syntax
AIC.Dictate Letter|P1
Parameter Explanation
P1: Optional. The variable where the transcribed text will be stored. If omitted, the result is placed on the TOS.
Example
'***********************************
' AIC.-Sample
'***********************************
AIC.Set Key|file
AIS.Set Key|file
AIC.Dictate Letter|$$OUT
' Here we get the result
DBP.$$OUT
'We choose a speaker
AIS.Set Voice|7
' And we speak it using Elevenlabs
AIS.Say Text|$$OUT
ENR.
Remarks
Ensure that the environment is quiet enough for clear recording and accurate transcription. The Record Button will be displayed during the recording process, and pressing it will stop the recording.
It's important to keep in mind that Whisper only considers the first 244 tokens of the prompt.
Limitations:
As we explored in the prompting section, one of the most common challenges faced when using Whisper is the model often does not recognize uncommon words or acronyms.
To address this, we have highlighted different techniques which improve the reliability of Whisper in these cases:
1. The first method involves using the optional prompt parameter to pass a dictionary of the correct spellings.
Since it wasn't trained using instruction-following techniques, Whisper operates more like a base GPT model.
It's important to keep in mind that Whisper only considers the first 244 tokens of the prompt.
2. The second method involves a post-processing step using GPT-4 or GPT-3.5-Turbo.
We start by providing instructions for GPT-4 through the system_prompt variable.
Similar to what we did with the prompt parameter earlier, we can define our company and product names.
Sample: "You are a helpful assistant for the company ZyntriQix. Your task is to correct any spelling discrepancies in the transcribed text. Make sure that the names of the following products are spelled correctly: ZyntriQix, Digique Plus, CynapseFive, VortiQore V8, EchoNix Array, OrbitalLink Seven, DigiFractal Matrix, PULSE, RAPT, B.R.I.C.K., Q.U.A.R.T.Z., F.L.I.N.T. Only add necessary punctuation such as periods, commas, and capitalization, and use only the context provided."
See also:
•