|
<< Click to Display Table of Contents >> Navigation: 3. Script Language > AI - Artificial Intelligence Commands > AIG. - Google AI > AIG. - AI Google Gemini Integration |
MiniRobotLanguage (MRL)
9. Computer Use Commands
Autonomous UI Interaction via AI Vision
Overview
The "Computer Use" commands represent a significant leap in automation capability. Instead of relying on hard-coded coordinates or pixel patterns, these commands allow the robot to "see" the screen using Google Gemini's vision capabilities and interact with UI elements based on natural language descriptions.
This enables scenarios such as:
•"Click the blue Save button in the top right."
•"Find the text box labeled 'Username'."
•"Locate the icon that looks like a gear."
Available Commands
This section contains the following commands:
•AIG.SetCuOptions - Configures global settings such as screenshot resolution, image compression quality, temporary file paths, and the AI system prompt template.
•AIG.ComputerUse - The primary command. It captures a specific window (or the active one), processes the image, asks the AI to locate a described element, and returns the exact X/Y screen coordinates.
How It Works
1.The Robot captures a high-resolution screenshot of the target window.
2.The image is intelligently resized (if larger than MaxRes) and converted to an optimized format (JPG) to ensure speed and low API costs.
3.The image is sent to the Google Gemini Vision model with your description.
4.The AI returns coordinate data.
5.The library automatically re-scales these coordinates back to the original screen resolution, providing precise points for mouse clicks.
See also: