Open AI - Whisper Commands

<< Click to Display Table of Contents >>

Navigation:  3. Script Language > AI - Artificial Intelligence Commands > AIC. - Artificial Intelligence Command > ! Open AI - Whisper >

Open AI - Whisper Commands

AIC.Dictate Letter

Previous Top Next


MiniRobotLanguage (MRL)

 

AIC.Dictate Letter

Initiates voice recording and transcribes the audio.

 

clip0823

Whisper is the "Speech to Text" Option from Open AI.

 

 

Intention

 

The Dictate Letter command is the easiest command to Dictate a longer Letter and get it transcribed into text.

The AIC.Dictate Letter command starts the recording process, showing you the Recording Form below.

 

clip0806   clip0807

This is shown while the Recording is running.           This is shown while the recording is paused.

 

 

The AIC.Dictate Letter command is the "One stop get all" command and will do this:

start the Synchronous  recording using a unique, temporary file name

start the Recording Button

record until you press the "PAUSE" or the "STOP" - Button

If you want to interrupt the recording, just press "PAUSE".

If you want to end the "Dictation" press "STOP".

 

Internally pressing "PAUSE" will also start the transcription process.

This is done to prevent the recorded files from getting too long.

Depending on the length of the recorded Speech this may take some seconds.

The resulting text is kept internally and will altogether be returned when you press the STOP-Button.

Pressing "Pause" will internally start the Text transcription.
The transcribed Text is not shown until you press the "STOP"-Button.

 

Recording will Start immediately when you see the Recording Button.

 

It will last until you press "STOP" or "PAUSE".  

If you press "PAUSE" you can rest and continue the recording when you are ready.

 

If you press "STOP" then the command will close the Recording Button and deliver you the Text.

 

Important:

WHISPER my need some time to transcribe longer Texts, therefore the longer your Speech,

the longer you will need to wait to get the final result.

 

Also Whisper has a limitation of 24 MB for upload. In our Tests a 30 Seconds Recording will take up to 385 kb.

Therefore this command has a limit of about 30 Minutes Recording Time,

after that the resulting MP3-File would be larger then 25 MB and can not be transcribed using WHISPER.

Using the PAUSE-Button will eliminate this Limit, as each "Segment" is transcribed by itself.

Means there is no real world limit, you should be able to dictate a full book using this command.

Just press Pause at least all 30 Minutes.

 


 

'***********************************

' AIC.-Sample

'***********************************

AIC.Set Key|file

 

AIC.Dictate Letter|$$OUT

 

' Here we get the result

DBP.$$OUT

ENR.

 


 

Once the recording is stopped (typically using the PAUSE or the STOP Button), the dictated text is transcribed.

If you used the PAUSE Button the Text will be kept internally until its complete. And returned once you press STOP.

 

You can Dictate Letter in a lot of languages (see below). While you can use the

The AIC.Set Language for Whisper command, to set the Input Language,

in most case that is not needed, as Whisper will identify the languages automatically.

 

Important: 

Whisper will take into account the following Settings:

response_format

language

temperature

prompt

 

If you have Set the

 

 

Supported languages:

Afrikaans, Arabic, Armenian, Azerbaijani, Belarusian, Bosnian, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch,

English, Estonian, Finnish, French, Galician, German, Greek, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese,

Kannada, Kazakh, Korean, Latvian, Lithuanian, Macedonian, Malay, Marathi, Maori, Nepali, Norwegian, Persian, Polish, Portuguese,

Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swahili, Swedish, Tagalog, Tamil, Thai, Turkish, Ukrainian, Urdu,

Vietnamese, And Welsh.

 

You can use the Command: AIC.Set Format for Whisper to decide which Output-.Format you want.

The default format is "text".

 

 

Supported Output Formats:
 

Number

Format

Description

0

text

Plain text

1

vtt

WebVTT (Web Video Text Tracks)

2

srt

SubRip Text (Subtitles)

3

raw

Raw data

4

json

JSON format

5

verbose_json

Verbose JSON format

 

 

 

Syntax

 

 

AIC.Dictate Letter|P1

 

 

 

Parameter Explanation

 

  P1: Optional. The variable where the transcribed text will be stored. If omitted, the result is placed on the TOS.

 

 

 

Example

 

'***********************************

' AIC.-Sample

'***********************************

AIC.Set Key|file

AIS.Set Key|file

 

AIC.Dictate Letter|$$OUT

 

' Here we get the result

DBP.$$OUT

 

'We choose a speaker

AIS.Set Voice|7

 

' And we speak it using Elevenlabs

AIS.Say Text|$$OUT

ENR.

 

 

 

 

Remarks

 

Ensure that the environment is quiet enough for clear recording and accurate transcription. The Record Button will be displayed during the recording process, and pressing it will stop the recording.

 
It's important to keep in mind that Whisper only considers the first 244 tokens of the prompt.

 

 

Limitations:

 

As we explored in the prompting section, one of the most common challenges faced when using Whisper is the model often does not recognize uncommon words or acronyms.

To address this, we have highlighted different techniques which improve the reliability of Whisper in these cases:

 

1. The first method involves using the optional prompt parameter to pass a dictionary of the correct spellings.

Since it wasn't trained using instruction-following techniques, Whisper operates more like a base GPT model.

It's important to keep in mind that Whisper only considers the first 244 tokens of the prompt.

 

2. The second method involves a post-processing step using GPT-4 or GPT-3.5-Turbo.

We start by providing instructions for GPT-4 through the system_prompt variable.

Similar to what we did with the prompt parameter earlier, we can define our company and product names.

 

Sample: "You are a helpful assistant for the company ZyntriQix. Your task is to correct any spelling discrepancies in the transcribed text. Make sure that the names of the following products are spelled correctly: ZyntriQix, Digique Plus, CynapseFive, VortiQore V8, EchoNix Array, OrbitalLink Seven, DigiFractal Matrix, PULSE, RAPT, B.R.I.C.K., Q.U.A.R.T.Z., F.L.I.N.T. Only add necessary punctuation such as periods, commas, and capitalization, and use only the context provided."

 

 

See also: