3. Script Language > AI - Artificial Intelligence Commands > AIU. - OpenAI API > !Core Operations > AIU.Responses Vision - Perform a responses completion with vision

AIU.ResponsesVision

Previous Top Next

MiniRobotLanguage (MRL)

AIU.ResponsesVision
Perform a Responses Completion with Vision

Intention

ResponsesVision Command: Generate Vision-Based Responses

The ResponsesVision command performs a completion that incorporates vision capabilities, processing both text prompts and image inputs to generate responses.

It leverages multimodal models like GPT-4o for vision and text processing.

It’s part of the AIU - OpenAI API suite.

What is the ResponsesVision Command?

The ResponsesVision command sends a text prompt and an image file path to the OpenAI API, using a vision-capable model (e.g., GPT-4o) to generate a text response based on both inputs.

It requires three parameters: a prompt, an image file path, and a variable for the response, integrating settings like AIU.SetUser for context.

Why Do You Need It?

The ResponsesVision command is useful for:

•Multimodal Analysis: Interpret images alongside text for richer responses.

•Visual Queries: Answer questions about image content (e.g., "What’s in this photo?").

•Enhanced Interaction: Combine visual and textual input for complex tasks.

How to Use the ResponsesVision Command?

Provide a prompt, an image file path, and a variable for the response.

It uses GPT-4o ($5.00/1M input tokens, $15.00/1M output tokens, 128K context, multimodal) as of March 18, 2025. Costs include text tokens and image processing (approximately 170 tokens per 512x512 image tile).

Example Usage

AIU.ResponsesVision|What’s in this image?|C:\Images\dog.png|$$RES

DBP.Vision Response: $$RES

Analyzes "dog.png" and responds (e.g., "The image shows a brown dog.") in $$RES.

Illustration

┌──────────────┬──────────────┬────────────────────┐

│ Prompt │ Image │ Response Example │

├──────────────┼──────────────┼────────────────────┤

│ What’s this? │ cat.png │ "It’s a gray cat." │

├──────────────┼──────────────┼────────────────────┤

│ Describe it │ tree.png │ "A tall oak tree." │

└──────────────┴──────────────┴────────────────────┘

Illustration of vision-based prompt-response pairs.

Syntax

AIU.ResponsesVision|P1|P2|P3

Parameter Explanation

P1 - A string containing the prompt or query for the AI. Required.

P2 - The file path to an image (e.g., "C:\Images\dog.png"). Required.

P3 - The variable where the generated response is stored. Required.

Example

AIU.ResponsesVision|What color is this flower?|C:\Images\rose.png|$$COL

DBP.Color: $$COL

ENR.

Remarks

- Requires a vision-capable model like GPT-4o; other models may fail.

- Image processing adds token costs (e.g., ~170 tokens per 512x512 tile).

Limitations

- Requires exactly three parameters; incorrect counts trigger an error (%IC_ER_PA).

- Image file must exist and be accessible; invalid paths cause errors.

See also:

• AIU.Responses

• AIU.Set_User