AIG. - AI Google Gemini Integration

<< Click to Display Table of Contents >>

Navigation:  3. Script Language > AI - Artificial Intelligence Commands > AIG. - Google AI > 9. Computer Use Commands >

AIG. - AI Google Gemini Integration

AIG.ComputerUse

Previous Top Next


MiniRobotLanguage (MRL)

 

AIG.ComputerUse
Autonomous UI Element Finding via AI Vision

 

Intention

 

To find the exact screen coordinates of a UI element (button, icon, text field) based solely on a natural language description. This allows the robot to "see" the screen and click items even if they don't have standard Windows Controls or IDs.

 

What is the ComputerUse Command?

 

This is a high-level "Agent" command that performs a complex workflow automatically:

1.Captures: Takes a screenshot of the specified window (or active screen).

2.Optimizes: Resizes and converts the image according to AIG.SetCuOptions (default: 1500px width, JPG) to ensure speed and low cost.

3.Analyzes: Sends the image to Google Gemini with your description.

4.Calculates: Translates the AI's response back into real-world screen coordinates.

5.Returns: Puts the X and Y coordinates into variables you provide.

 

Why Do You Need It?

 

Robustness: Finds elements even if they move, change color, or change resolution.

Ease of Use: No need to capture "Image Search" bitmaps or record coordinate offsets.

Semantic Finding: You can ask for "The red delete icon" or "The third checkbox in the list".

 

Example Usage

 

' 1. Select the target window (e.g. Calculator)

STW.t|Calculator

HTV.$$HND

 

' 2. Ask AI to find the "Equal" button

AIG.ComputerUse|$$HND|the equals sign button|$$X|$$Y

 

' 3. Check results (0,0 means not found)

IVV.$$X>0

 ' 4. Click it (using Window Relative coordinates)

 STW.h|$$HND

 MLC.t|$$X,$$Y

ELS.

 DBP.Element not found!

EIF.

 

Syntax

 

AIG.ComputerUse|P1|P2|P3|P4

AIG.cpu|P1|P2|P3|P4

 

Parameter Explanation

 

P1 - (String/Int) Window Handle.
� �- Provide a handle variable (e.g., `$$HND`) to capture a specific window.
� �- Pass `0` or an empty string to capture the currently **Active Window**.

 

P2 - (String) Description of the element.
� �- e.g., "The blue Login button", "The gear icon in the top right".

 

P3 - (Variable) Variable to receive the X coordinate (Integer).

 

P4 - (Variable) Variable to receive the Y coordinate (Integer).

 

Returns

 

- Sets P3 and P4 to the pixel coordinates relative to the Window's client area (usually).

- If the element is not found or an error occurs, P3 and P4 are set to 0.

 

See also:

 

? AIG.SetCuOptions (Configure resolution/quality)

? AIG.AskVision (Manual vision requests)

? MCP.Screenshot