<< Click to Display Table of Contents >> Navigation: 3. Script Language > AI - Artificial Intelligence Commands > AIC. - Artificial Intelligence Command > ! Open AI - Whisper > Open AI - Whisper Commands |
MiniRobotLanguage (MRL)
AIC.Dictate Text
Initiates voice recording and transcribes the audio.
Whisper is the "Speech to Text" Option from Open AI.
Intention
The dictate text command is the easiest command to dictate text and get it transcribed into text.
The AIC.Dictate Text command starts the recording process, allowing the user to dictate text.
The AIC.Dictate Text command is the "One stop get all" command and will do this:
•start the asynchronous recording using a unique, temporary filename
•start the Recording Button
• record until you press the recording Button
AIC.Create Rec Button|1|0|0 AIC.Create Rec Button|0|0
Recording will Start immediately when you see the Recording Button.
It will last until you press "STOP".
Then it will close the Recording Button and deliver you the Text.
Important:
WHISPER my need some time to transcribe longer Texts, therefore the longer your Speech,
the longer you will need to wait to get the final result.
Also Whisper has a limitation of 24 MB for upload. In our Tests a 30 Seconds Recording will take up to 385 kb.
Therefore this command has a limit of about 30 Minutes Recording Time,
after that the resulting MP3-File would be larger then 25 MB and can not be transcribed using WHISPER.
If you plan to make longer Recordings, use the
AIC.Dictate Letter - Command and make Pause between the Recordings, then there is virtually no limit.
'***********************************
' AIC.-Sample
'***********************************
AIC.Set Key|file
AIC.Dictate Letter|$$OUT
' Here we get the result
DBP.$$OUT
ENR.
Once the recording is stopped (typically using the Record Button), the dictated text is transcribed,
and returned in the specified variable or on the Top of Stack (TOS) if no variable is provided.
You can dictate text is a lot of languages (see below).
While you can use the
The AIC.Set Language for Whisper command, to set the Input Language,
in most case that is not needed, as Whisper will identify the languages automatically.
Whisper will take into account the following Settings:
•response_format
•language
•temperature
•prompt
If you have Set the
Supported languages:
Afrikaans, Arabic, Armenian, Azerbaijani, Belarusian, Bosnian, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch,
English, Estonian, Finnish, French, Galician, German, Greek, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese,
Kannada, Kazakh, Korean, Latvian, Lithuanian, Macedonian, Malay, Marathi, Maori, Nepali, Norwegian, Persian, Polish, Portuguese,
Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swahili, Swedish, Tagalog, Tamil, Thai, Turkish, Ukrainian, Urdu,
Vietnamese, And Welsh.
You can use the Command: AIC.Set Format for Whisper to decide which Output-.Format you want.
The default format is "text".
Supported Output Formats:
Number |
Format |
Description |
0 |
text |
Plain text |
1 |
vtt |
WebVTT (Web Video Text Tracks) |
2 |
srt |
SubRip Text (Subtitles) |
3 |
raw |
Raw data |
4 |
json |
JSON format |
5 |
verbose_json |
Verbose JSON format |
Syntax
AIC.Dictate Text|P1
Parameter Explanation
P1: Optional. The variable where the transcribed text will be stored. If omitted, the result is placed on the TOS.
Example
'***********************************
' AIC.-Sample
'***********************************
' This Sample will transcribe what you say and send it to ElevenLabs.io to say it loud.
AIC.Set Key|file
AIS.Set Key|file
AIC.Dictate Text|$$OUT
' Here we get the result
DBP.$$OUT
'We choose a speaker
AIS.Set Voice|7
' And we speak it using Elevenlabs
AIS.Say Text|$$OUT
ENR.
Remarks
Ensure that the environment is quiet enough for clear recording and accurate transcription. The Record Button will be displayed during the recording process, and pressing it will stop the recording.
It's important to keep in mind that Whisper only considers the first 244 tokens of the prompt.
Limitations:
As we explored in the prompting section, one of the most common challenges faced when using Whisper is the model often does not recognize uncommon words or acronyms.
To address this, we have highlighted different techniques which improve the reliability of Whisper in these cases:
1. The first method involves using the optional prompt parameter to pass a dictionary of the correct spellings.
Since it wasn't trained using instruction-following techniques, Whisper operates more like a base GPT model.
It's important to keep in mind that Whisper only considers the first 244 tokens of the prompt.
2. The second method involves a post-processing step using GPT-4 or GPT-3.5-Turbo.
We start by providing instructions for GPT-4 through the system_prompt variable.
Similar to what we did with the prompt parameter earlier, we can define our company and product names.
Sample: "You are a helpful assistant for the company ZyntriQix. Your task is to correct any spelling discrepancies in the transcribed text. Make sure that the names of the following products are spelled correctly: ZyntriQix, Digique Plus, CynapseFive, VortiQore V8, EchoNix Array, OrbitalLink Seven, DigiFractal Matrix, PULSE, RAPT, B.R.I.C.K., Q.U.A.R.T.Z., F.L.I.N.T. Only add necessary punctuation such as periods, commas, and capitalization, and use only the context provided."
See also:
•