|
<< Click to Display Table of Contents >> Navigation: 3. Script Language > AI - Artificial Intelligence Commands > AIU. - OpenAI API > !Core Operations > AIU. - Artificial Intelligence Utility |
MiniRobotLanguage (MRL)
AIU.ResponsesVision
Perform a Responses Completion with Vision
Intention
ResponsesVision Command: Generate Vision-Based Responses
The ResponsesVision command performs a completion that incorporates vision capabilities, processing both text prompts and image inputs to generate responses.
It leverages multimodal models like GPT-4o for vision and text processing.
It’s part of the AIU - OpenAI API suite.
The ResponsesVision command sends a text prompt and an image file path to the OpenAI API, using a vision-capable model (e.g., GPT-4o) to generate a text response based on both inputs.
It requires three parameters: a prompt, an image file path, and a variable for the response, integrating settings like AIU.SetUser for context.
The ResponsesVision command is useful for:
•Multimodal Analysis: Interpret images alongside text for richer responses.
•Visual Queries: Answer questions about image content (e.g., "What’s in this photo?").
•Enhanced Interaction: Combine visual and textual input for complex tasks.
Provide a prompt, an image file path, and a variable for the response.
It uses GPT-4o ($5.00/1M input tokens, $15.00/1M output tokens, 128K context, multimodal) as of March 18, 2025. Costs include text tokens and image processing (approximately 170 tokens per 512x512 image tile).
Example Usage
AIU.ResponsesVision|What’s in this image?|C:\Images\dog.png|$$RES
DBP.Vision Response: $$RES
Analyzes "dog.png" and responds (e.g., "The image shows a brown dog.") in $$RES.
Illustration
┌──────────────┬──────────────┬────────────────────┐
│ Prompt │ Image │ Response Example │
├──────────────┼──────────────┼────────────────────┤
│ What’s this? │ cat.png │ "It’s a gray cat." │
├──────────────┼──────────────┼────────────────────┤
│ Describe it │ tree.png │ "A tall oak tree." │
└──────────────┴──────────────┴────────────────────┘
Illustration of vision-based prompt-response pairs.
Syntax
AIU.ResponsesVision|P1|P2|P3
Parameter Explanation
P1 - A string containing the prompt or query for the AI. Required.
P2 - The file path to an image (e.g., "C:\Images\dog.png"). Required.
P3 - The variable where the generated response is stored. Required.
Example
AIU.ResponsesVision|What color is this flower?|C:\Images\rose.png|$$COL
DBP.Color: $$COL
ENR.
Remarks
- Requires a vision-capable model like GPT-4o; other models may fail.
- Image processing adds token costs (e.g., ~170 tokens per 512x512 tile).
Limitations
- Requires exactly three parameters; incorrect counts trigger an error (%IC_ER_PA).
- Image file must exist and be accessible; invalid paths cause errors.
See also: