|
<< Click to Display Table of Contents >> Navigation: »No topics above this level« AIN. - AnythingLLM AI |
MiniRobotLanguage (MRL)
AIN.AskV
Perform a Vision Chat Completion with an Image
Intention
AskV Command: Vision-Based Chat Completion
The AIN.AskV command performs vision-based chat completions using a local Ollama model (e.g., llava). It accepts a text prompt and an image input, which can be a URL, local filepath, or base64-encoded string.
This enhances local automation by enabling direct use of filesystem images or embedded base64 data, processed on-premises with AnythingLLM.
It’s part of the AIN - AnythingLLM AI suite.
The AIN.AskV command sends a POST request to the AnythingLLM API (default: http://localhost:3001/api/chat), using a local Ollama model like llava to analyze an image and respond to a prompt.
The JSON payload includes message (prompt), either imageUrl (for URLs) or imageData (for base64 from files or strings), mode ("chat"), maxTokens (default 4096), and temperature (default 0.7).
Ollama runs at http://localhost:11434, bridged by AnythingLLM for local processing.
Key use cases include:
•Flexible Input: Supports URLs, local files, or base64 strings for vision tasks.
•Local Automation: Directly process filesystem images without hosting.
•Privacy: Keeps all data on-premises with Ollama.
Provide a prompt and an image input (URL, filepath, or base64 string). Optionally, specify a response variable and clipboard flag.
Requires Ollama at http://localhost:11434, integrated with AnythingLLM. Set an API key with AIN.SetKey and a model like llava via AIN.SetModel.
Example Usage
' Using a local filepath
AIN.SetKey|your_api_key_here
AIN.SetModel|llava
AIN.AskV|What’s on this screen?|C:\Screenshots\screen.jpg|$$RES|1
DBP.Screen Content: $$RES
Analyzes a local screenshot, converting it to base64 internally.
' Using a base64 string
AIN.AskV|Describe this image|data:image/jpeg;base64,/9j/4AAQSkZJRg...|$$OUT
DBP.Image Description: $$OUT
Processes a base64-encoded image directly.
Syntax
AIN.AskV|P1|P2[|P3][|P4]
Parameter Explanation
P1 - Text prompt (required), e.g., "What’s on this screen?"
P2 - Image input (required): URL (e.g., "http://localhost:8080/screen.jpg"), filepath (e.g., "C:\Screenshots\screen.jpg"), or base64 string (e.g., "data:image/jpeg;base64,...").
P3 - (Optional) Variable for the response, e.g., "$$RES". If omitted, use AIN.GetRaw.
P4 - (Optional) "1" to copy to clipboard; omit or "0" otherwise.
Example
AIN.SetKey|your_api_key_here
AIN.SetModel|llava
AIN.AskV|What’s in this photo?|C:\Photos\photo.jpg|$$OUT
DBP.Photo Description: $$OUT
ENR.
Describes a local photo, converting it to base64 internally.
Remarks
- Requires a vision-capable Ollama model (e.g., llava), installed via ollama pull llava.
- Local filepaths are converted to base64; ensure files are readable.
- Base64 inputs can be raw strings or data URIs (e.g., data:image/jpeg;base64,...).
- AnythingLLM must support imageData for base64 inputs.
Limitations
- Base64 support requires AnythingLLM to accept imageData; untested as of March 20, 2025.
- Large files or base64 strings may increase processing time or exceed payload limits.
See also:
• AIN.Ask