|
<< Click to Display Table of Contents >> Navigation: 3. Script Language > AI - Artificial Intelligence Commands > AIU. - OpenAI API > !Core Operations > AIU. - Artificial Intelligence Utility |
MiniRobotLanguage (MRL)
AIU.ChatVision
Perform a Chat Completion with Vision
Intention
ChatVision Command: Chat with Visual Input
The ChatVision command enables you to perform a chat completion by combining text prompts with visual input, leveraging OpenAI’s vision-capable models.
This allows for interactive dialogues where the AI can interpret and respond to images alongside text, enhancing multimodal applications.
It’s part of the AIU - OpenAI API suite.
This command sends a text prompt and an image file to the OpenAI API, which processes both inputs to generate a chat response.
The response can be stored in a variable, optionally copied to the clipboard, and is influenced by settings like model, temperature, and max tokens configured via other AIU commands.
The ChatVision command is essential for:
•Multimodal Interaction: Combine text and images for richer AI conversations.
•Visual Analysis: Get AI insights or descriptions based on image content.
•Automation: Integrate vision-based responses into automated workflows.
Provide a text prompt and an image file path as mandatory parameters, with optional parameters for storing the result and clipboard output.
The command relies on a vision-capable model (e.g., gpt-4o), which must be set via AIU.SetModel.
Available models and their prices (as of March 18, 2025, from OpenAI’s pricing page) include:
•gpt-4o: $5.00/1M input tokens, $15.00/1M output tokens (multimodal, 128K context).
•gpt-4o-mini: $0.15/1M input tokens, $0.60/1M output tokens (cost-effective, 128K context).
•gpt-4-turbo: $10.00/1M input tokens, $30.00/1M output tokens (high performance, 128K context).
Example Usage
AIU.SetModel|gpt-4o
AIU.ChatVision|What’s in this image?|C:\Images\cat.jpg|$$RES|1
DBP.AI Response: $$RES (also copied to clipboard)
This example asks the AI to describe an image of a cat, storing the response in $$RES and copying it to the clipboard.
Illustration
┌────────────────────┐
│ Input │
├──────┬─────────────┤
│ Text │ Image │
├──────┼─────────────┤
│ "What│ C:\cat.jpg │
│ is it│ │
│ ?" │ │
└──────┴─────────────┘
Combining text and image input for a chat response.
Syntax
AIU.ChatVision|P1|P2[|P3[|P4]]
AIU.ChatV|P1|P2[|P3[|P4]]
Parameter Explanation
P1 - The text prompt to send to the AI (required).
P2 - The file path to the image (required).
P3 - (Optional) The variable where the AI’s response will be stored. If omitted, the response is not stored in a variable.
P4 - (Optional) A numeric value: 1 to copy the response to the clipboard, 0 (or omitted) to skip clipboard output.
Example
AIU.SetModel|gpt-4o
AIU.ChatVision|Describe this scene|C:\Images\sunset.jpg|$$DES
DBP.Scene Description: $$DES
ENR.
Remarks
- Requires a vision-capable model like gpt-4o; non-vision models will fail.
- The image file must exist and be accessible at the specified path.
- Clipboard output depends on both P4 and the global setting from AIU.SetClipboardOutput.
Limitations
- Requires 2 to 5 parameters; fewer or more will trigger an error.
- Image processing increases token usage, impacting cost with high-resolution or complex images.
- Limited by the model’s vision capabilities and context window.
See also:
• AIU.Chat