3. Script Language > AI - Artificial Intelligence Commands > AIU. - OpenAI API > !Core Operations > AIU.Chat Vision

AIU.ChatVision

Previous Top Next

MiniRobotLanguage (MRL)

AIU.ChatVision
Perform a Chat Completion with Vision

Intention

ChatVision Command: Chat with Visual Input

The ChatVision command enables you to perform a chat completion by combining text prompts with visual input, leveraging OpenAI’s vision-capable models.

This allows for interactive dialogues where the AI can interpret and respond to images alongside text, enhancing multimodal applications.

It’s part of the AIU - OpenAI API suite.

What is the ChatVision Command?

This command sends a text prompt and an image file to the OpenAI API, which processes both inputs to generate a chat response.

The response can be stored in a variable, optionally copied to the clipboard, and is influenced by settings like model, temperature, and max tokens configured via other AIU commands.

Why Do You Need It?

The ChatVision command is essential for:

•Multimodal Interaction: Combine text and images for richer AI conversations.

•Visual Analysis: Get AI insights or descriptions based on image content.

•Automation: Integrate vision-based responses into automated workflows.

How to Use the ChatVision Command?

Provide a text prompt and an image file path as mandatory parameters, with optional parameters for storing the result and clipboard output.

The command relies on a vision-capable model (e.g., gpt-4o), which must be set via AIU.SetModel.

Available models and their prices (as of March 18, 2025, from OpenAI’s pricing page) include:

•gpt-4o: $5.00/1M input tokens, $15.00/1M output tokens (multimodal, 128K context).

•gpt-4o-mini: $0.15/1M input tokens, $0.60/1M output tokens (cost-effective, 128K context).

•gpt-4-turbo: $10.00/1M input tokens, $30.00/1M output tokens (high performance, 128K context).

Example Usage

AIU.SetModel|gpt-4o

AIU.ChatVision|What’s in this image?|C:\Images\cat.jpg|$$RES|1

DBP.AI Response: $$RES (also copied to clipboard)

This example asks the AI to describe an image of a cat, storing the response in $$RES and copying it to the clipboard.

Illustration

┌────────────────────┐

│ Input │

├──────┬─────────────┤

│ Text │ Image │

├──────┼─────────────┤

│ "What│ C:\cat.jpg │

│ is it│ │

│ ?" │ │

└──────┴─────────────┘

Combining text and image input for a chat response.

Syntax

AIU.ChatVision|P1|P2[|P3[|P4]]

AIU.ChatV|P1|P2[|P3[|P4]]

Parameter Explanation

P1 - The text prompt to send to the AI (required).

P2 - The file path to the image (required).

P3 - (Optional) The variable where the AI’s response will be stored. If omitted, the response is not stored in a variable.

P4 - (Optional) A numeric value: 1 to copy the response to the clipboard, 0 (or omitted) to skip clipboard output.

Example

AIU.SetModel|gpt-4o

AIU.ChatVision|Describe this scene|C:\Images\sunset.jpg|$$DES

DBP.Scene Description: $$DES

ENR.

Remarks

- Requires a vision-capable model like gpt-4o; non-vision models will fail.

- The image file must exist and be accessible at the specified path.

- Clipboard output depends on both P4 and the global setting from AIU.SetClipboardOutput.

Limitations

- Requires 2 to 5 parameters; fewer or more will trigger an error.

- Image processing increases token usage, impacting cost with high-resolution or complex images.

- Limited by the model’s vision capabilities and context window.

See also:

• AIU.Chat

• AIU.SetModel

• AIU.GetContent

• AIU.SetClipboardOutput