Open AI - Whisper Commands

<< Click to Display Table of Contents >>

Navigation:  3. Script Language > AI - Artificial Intelligence Commands > AIC. - Artificial Intelligence Command > ! Open AI - Whisper >

Open AI - Whisper Commands

AIC.Dictate Text

Previous Top Next


MiniRobotLanguage (MRL)

 

AIC.Dictate Text

Initiates voice recording and transcribes the audio.

 

clip0793

Whisper is the "Speech to Text" Option from Open AI.

 

 

Intention

 

The dictate text command is the easiest command to dictate text and get it transcribed into text.

The AIC.Dictate Text command starts the recording process, allowing the user to dictate text.

 

The AIC.Dictate Text command is the "One stop get all" command and will do this:

start the asynchronous recording using a unique, temporary filename

start the Recording Button

record until you press the recording Button

 

clip0802   clip0803

         AIC.Create Rec Button|1|0|0                   AIC.Create Rec Button|0|0

 

Recording will Start immediately when you see the Recording Button.

 

It will last until you press "STOP".  

Then it will close the Recording Button and deliver you the Text.

 

Important:

WHISPER my need some time to transcribe longer Texts, therefore the longer your Speech,

the longer you will need to wait to get the final result.

 

Also Whisper has a limitation of 24 MB for upload. In our Tests a 30 Seconds Recording will take up to 385 kb.

Therefore this command has a limit of about 30 Minutes Recording Time,

after that the resulting MP3-File would be larger then 25 MB and can not be transcribed using WHISPER.

 

If you plan to make longer Recordings, use the

AIC.Dictate Letter - Command and make Pause between the Recordings, then there is virtually no limit.

 


 

'***********************************

' AIC.-Sample

'***********************************

AIC.Set Key|file

 

AIC.Dictate Letter|$$OUT

 

' Here we get the result

DBP.$$OUT

ENR.

 


 

Once the recording is stopped (typically using the Record Button), the dictated text is transcribed,

and returned in the specified variable or on the Top of Stack (TOS) if no variable is provided.

 

You can dictate text is a lot of languages (see below).

While you can use the

 

The AIC.Set Language for Whisper command, to set the Input Language,

in most case that is not needed, as Whisper will identify the languages automatically.

 

Important: 

Whisper will take into account the following Settings:

response_format

language

temperature

prompt

 

If you have Set the

 

 

Supported languages:

Afrikaans, Arabic, Armenian, Azerbaijani, Belarusian, Bosnian, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch,

English, Estonian, Finnish, French, Galician, German, Greek, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese,

Kannada, Kazakh, Korean, Latvian, Lithuanian, Macedonian, Malay, Marathi, Maori, Nepali, Norwegian, Persian, Polish, Portuguese,

Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swahili, Swedish, Tagalog, Tamil, Thai, Turkish, Ukrainian, Urdu,

Vietnamese, And Welsh.

 

You can use the Command: AIC.Set Format for Whisper to decide which Output-.Format you want.

The default format is "text".

 

 

Supported Output Formats:
 

Number

Format

Description

0

text

Plain text

1

vtt

WebVTT (Web Video Text Tracks)

2

srt

SubRip Text (Subtitles)

3

raw

Raw data

4

json

JSON format

5

verbose_json

Verbose JSON format

 

 

 

Syntax

 

 

AIC.Dictate Text|P1

 

 

 

Parameter Explanation

 

  P1: Optional. The variable where the transcribed text will be stored. If omitted, the result is placed on the TOS.

 

 

 

Example

 

'***********************************

' AIC.-Sample

'***********************************

' This Sample will transcribe what you say and send it to ElevenLabs.io to say it loud.

AIC.Set Key|file

AIS.Set Key|file

 

AIC.Dictate Text|$$OUT

 

' Here we get the result

DBP.$$OUT

 

'We choose a speaker

AIS.Set Voice|7

 

' And we speak it using Elevenlabs

AIS.Say Text|$$OUT

ENR.

 

 

 

 

Remarks

 

Ensure that the environment is quiet enough for clear recording and accurate transcription. The Record Button will be displayed during the recording process, and pressing it will stop the recording.

 
It's important to keep in mind that Whisper only considers the first 244 tokens of the prompt.

 

 

Limitations:

 

As we explored in the prompting section, one of the most common challenges faced when using Whisper is the model often does not recognize uncommon words or acronyms.

To address this, we have highlighted different techniques which improve the reliability of Whisper in these cases:

 

1. The first method involves using the optional prompt parameter to pass a dictionary of the correct spellings.

Since it wasn't trained using instruction-following techniques, Whisper operates more like a base GPT model.

It's important to keep in mind that Whisper only considers the first 244 tokens of the prompt.

 

2. The second method involves a post-processing step using GPT-4 or GPT-3.5-Turbo.

We start by providing instructions for GPT-4 through the system_prompt variable.

Similar to what we did with the prompt parameter earlier, we can define our company and product names.

 

Sample: "You are a helpful assistant for the company ZyntriQix. Your task is to correct any spelling discrepancies in the transcribed text. Make sure that the names of the following products are spelled correctly: ZyntriQix, Digique Plus, CynapseFive, VortiQore V8, EchoNix Array, OrbitalLink Seven, DigiFractal Matrix, PULSE, RAPT, B.R.I.C.K., Q.U.A.R.T.Z., F.L.I.N.T. Only add necessary punctuation such as periods, commas, and capitalization, and use only the context provided."

 

 

See also: