AIU. - Artificial Intelligence Utility

<< Click to Display Table of Contents >>

Navigation:  3. Script Language > AI - Artificial Intelligence Commands > AIU. - OpenAI API > !Core Operations >

AIU. - Artificial Intelligence Utility

AIU.ResponsesVision

Previous Top Next


MiniRobotLanguage (MRL)

 

AIU.ResponsesVision
Perform a Responses Completion with Vision

 

Intention

 

ResponsesVision Command: Generate Vision-Based Responses
 
The ResponsesVision command performs a completion that incorporates vision capabilities, processing both text prompts and image inputs to generate responses.

It leverages multimodal models like GPT-4o for vision and text processing.

It’s part of the AIU - OpenAI API suite.

 

What is the ResponsesVision Command?

 

The ResponsesVision command sends a text prompt and an image file path to the OpenAI API, using a vision-capable model (e.g., GPT-4o) to generate a text response based on both inputs.

It requires three parameters: a prompt, an image file path, and a variable for the response, integrating settings like AIU.SetUser for context.

 

Why Do You Need It?

 

The ResponsesVision command is useful for:

Multimodal Analysis: Interpret images alongside text for richer responses.

Visual Queries: Answer questions about image content (e.g., "What’s in this photo?").

Enhanced Interaction: Combine visual and textual input for complex tasks.

 

How to Use the ResponsesVision Command?

 

Provide a prompt, an image file path, and a variable for the response.

It uses GPT-4o ($5.00/1M input tokens, $15.00/1M output tokens, 128K context, multimodal) as of March 18, 2025. Costs include text tokens and image processing (approximately 170 tokens per 512x512 image tile).

 

Example Usage

 

AIU.ResponsesVision|What’s in this image?|C:\Images\dog.png|$$RES

DBP.Vision Response: $$RES

 

Analyzes "dog.png" and responds (e.g., "The image shows a brown dog.") in $$RES.

 

Illustration

 

┌──────────────┬──────────────┬────────────────────┐

│ Prompt       │ Image        │ Response Example   │

├──────────────┼──────────────┼────────────────────┤

│ What’s this? │ cat.png      │ "It’s a gray cat." │

├──────────────┼──────────────┼────────────────────┤

│ Describe it  │ tree.png     │ "A tall oak tree." │

└──────────────┴──────────────┴────────────────────┘

Illustration of vision-based prompt-response pairs.

 

Syntax

 

AIU.ResponsesVision|P1|P2|P3

 

Parameter Explanation

 

P1 - A string containing the prompt or query for the AI. Required.

P2 - The file path to an image (e.g., "C:\Images\dog.png"). Required.

P3 - The variable where the generated response is stored. Required.

 

Example

 

AIU.ResponsesVision|What color is this flower?|C:\Images\rose.png|$$COL

DBP.Color: $$COL

ENR.

 

Remarks

 

- Requires a vision-capable model like GPT-4o; other models may fail.

- Image processing adds token costs (e.g., ~170 tokens per 512x512 tile).

 

Limitations

 

- Requires exactly three parameters; incorrect counts trigger an error (%IC_ER_PA).

- Image file must exist and be accessible; invalid paths cause errors.

 

See also:

 

AIU.Responses

AIU.Set_User

AIU.Set_Store