HomeMachine LearningI gave Qwen3.7-Plus a screenshot and found the exact pixel to click...

I gave Qwen3.7-Plus a screenshot and found the exact pixel to click for $0.40

Last updated on June 8, 2026 by the editorial team

How Qwen3.7-Plus Revolutionizes GUI Interaction

Introduction to Qwen3.7-Plus

In an era where technology is rapidly evolving, the ability to automate and enhance user interface interactions has become crucial. Qwen3.7-Plus, a vision-enabled AI model, has emerged as a powerful tool in this domain. As demonstrated by Chew Loong Nian, an AI engineer, this model can precisely identify the pixel coordinates necessary to interact with specific elements on a graphical user interface (GUI), such as the AWS console.

Precision and Cost-Effectiveness

In a test scenario, a screenshot of the AWS console was uploaded with a query: which pixel should be clicked to launch an instance? Qwen3.7-Plus responded with pinpoint accuracy, identifying the coordinates (x=1147, y=283) that landed exactly on the “Launch instance” button. This precision is offered at a competitive price of $0.40 per million entry tokens, significantly less than the text-only Qwen3.7-Max from Alibaba, making it an attractive option for cost-conscious developers.

I gave a screenshot to Qwen3.7-Plus and found the exact pixel to click for alt=

Understanding ScreenSpot Pro

Qwen3.7-Plus achieves an impressive score of 79.0 on ScreenSpot Pro, a benchmark index that evaluates the effectiveness of a model’s ability to understand and interact with GUIs. This score highlights the model’s capability to facilitate successful computer use by accurately identifying the required user interface elements.

Practical Applications and Implementation

The implementation of Qwen3.7-Plus is streamlined through Alibaba Cloud Model Studio via the OpenAI-enabled SDK. The article discusses four crucial “glue” calls: converting screenshot coordinates to JSON, transforming these coordinates into actual clicks with a trust gate, utilizing Playwright for browser tasks, and employing “screenshot to code” to recreate user interface components. These methods illustrate the practical utility and versatility of Qwen3.7-Plus in real-world applications.

Choosing Qwen3.7-Plus

While Qwen3.7-Plus offers numerous advantages, it is important to note its main limitation: it is proprietary and available only through an API, with no open weights or self-hosted options. Despite this, it remains a cost-effective solution for prototyping and screen grounding, providing a solid foundation before transitioning to more advanced managed or self-hosted solutions.

Conclusion

Qwen3.7-Plus represents a significant advancement in AI-driven GUI interaction. Its ability to accurately pinpoint GUI elements and its cost-effectiveness make it a valuable tool for developers. For more insights and details, you can read the full article Here.

Towards AI Academy

Towards AI Academy is dedicated to equipping students and professionals with the knowledge and skills needed to excel in AI engineering. With over 100,000 students and a team of 15 engineers, the academy provides a comprehensive curriculum designed to withstand the demands of production environments.

Start for free – no obligation:

→ 6-Day Agentic AI Engineering Email Guide — One Practical Lesson Per Day

→ Agents Architecture Cheatsheet — 3 years of architectural decisions in 6 pages

Our courses:

→ AI Engineering Certification — 90+ lessons from project selection to deployed product. The most comprehensive practical LLM course available.

→ Agent Engineering Course — Hands-on with production agent architectures, memory, routing, and evaluation frameworks — built from real enterprise engagements.

→ AI for Work — Understand, evaluate, and apply AI for complex work tasks.

Note: The content of the article reflects the views of the contributing authors and not those of Towards AI.

“`

Must Read
Related News

LEAVE A REPLY

Please enter your comment!
Please enter your name here