UI-TARS-desktop is an open-source desktop application built on the UI-TARS vision-language model, allowing you to control your computer through plain English commands. It treats screenshots as input, detects buttons, text fields, and menus via a unified perception module, and performs clicks, typing, drags, and other actions as if a human user were in control. Licensed under Apache 2.0, it encourages commercial and community contributions without vendor lock-in. Its core research—described in ByteDance’s UI-TARS ArXiv paper—combines perception, system-2 reasoning, and memory to handle complex, multi-step workflows with high reliability.
How to Use
-
Download or view the repository at
https://github.com/bytedance/UI-TARS-desktop -
Install via pip on any platform with Python 3.8+:
pip install agent-tars
-
Or install native packages from the Releases page for Windows, macOS ARM/x64, or Linux:
https://github.com/bytedance/UI-TARS-desktop/releases -
Launch the GUI by running
agent-tars
in your terminal or clicking the desktop icon. -
Type or speak commands like “open Chrome,” “click File → Save,” or “scroll down,” and watch it execute operations in real time.
-
Configure advanced settings (model endpoints, hotkeys, themes) in the Settings panel, accessed via the gear icon.
Features
-
Multimodal Perception: Processes full-screen captures to locate UI elements without relying on DOM or accessibility trees.
-
Unified Action API: Single interface for clicks, typing, drags, and key presses across Windows, macOS, Linux, and web apps.
-
System-2 Reasoning: Plans multi-step tasks (e.g., “open settings, navigate to privacy, toggle option”) and adapts to UI changes dynamically.
-
Zero-Script Automation: No manual scripting—define tasks through examples or natural language prompts.
-
Open Source & Extensible: Source code under Apache 2.0, with plugin support for custom models, macros, and enterprise integrations.
-
Cross-Platform Support: Native installers and CLI packages for all major desktop OSes; automatic updates via MCP server.
-
Community & Documentation: Comprehensive guides on GitHub, a Discord server for discussion, and a showcase gallery for real-world workflows.
Suitable Scenarios
-
Robotic Process Automation (RPA): Automate legacy desktop applications without APIs or UI toolkits.
-
GUI Testing & QA: Create robust end-to-end tests that mimic real user behavior, reducing flaky failures.
-
Accessibility Assistants: Build on-screen helpers for users with motor or vision impairments by scripting UI flows.
-
Customer Support Simulations: Script and record support scenarios for training and debugging.
-
DevOps & CI/CD: Embed GUI validation in pipelines to catch interface regressions before deployment.
Libre Depot original article,Publisher:Libre Depot,Please indicate the source when reprinting:https://www.libredepot.top/5396.html