PC Agent: While You Sleep, AI Works - A Cognitive Journey into Digital World

Demo

Check out our demo of PC Agent autonomously controlling a computer to complete complex tasks involving dozens of steps!

Attention-.mp4

Introduction

PC Agent introduces a novel framework to empower autonomous digital agents through human cognition transfer. This transfer is implemented through three key components:

PC Tracker, the first lightweight infrastructure for large-scale human-computer interaction data collection;
A Cognition Completion postprocess pipeline that transforms raw interaction data into cognitive trajectories;
A multi-agent system combining a planning agent for decision-making with a grounding agent for robust visual grounding.

Quick Start

Setup

To get started with PC Agent, we recommend setting up your Python environment using conda:

# Clone the repository and navigate to the folder
git clone https://github.com/GAIR-NLP/PC-Agent.git
cd PC-Agent
# Create and activate conda environment
conda env create -f environment.yml
conda activate pcagent

PC Tracker

PC Tracker is an infrastructure for human-computer interaction data collection. The source code in tracker/ directory can be modified to fit your specific data collection requirements.

To deploy:

Build the executable (Windows):

cd tracker
.\package.ps1

Customize tasks.json according to your annotation needs
Distribute to annotators
Collect annotation data from annotators - annotated data will be saved in the events/ folder (hidden) under working directory

For user instructions, please refer to our PC Tracker User Manual.

Post Processing

To convert raw interaction data into cognitive trajectories, follow these steps:

Place your data in the postprocess/data/ directory. Example data is available in this directory for reference.
Run post-processing pipeline:

python postprocess/refinement.py    # Data refinement
python postprocess/completion.py    # Cognition completion

Note: You need to prepare your OpenAI API key in advance to perform cognition completion.

Agent

We provide a reference implementation of our multi-agent system in the agent/ directory, combining planning and grounding agents. To run:

python agent/main.py

Reference scripts for model deployment can be found in agent/server/ directory.

Citation

If you find this work helpful, please consider citing:

@article{he2024pcagent,
      title={PC Agent: While You Sleep, AI Works - A Cognitive Journey into Digital World},
      author={Yanheng He and Jiahe Jin and Shijie Xia and Jiadi Su and Runze Fan and Haoyang Zou and Xiangkun Hu and Pengfei Liu},
      year={2024},
      journal={arXiv preprint arXiv:2412.17589},
      url={https://arxiv.org/abs/2412.17589}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

PC Agent: While You Sleep, AI Works - A Cognitive Journey into Digital World

Demo

Introduction

Quick Start

Setup

PC Tracker

Post Processing

Agent

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

PC Agent: While You Sleep, AI Works - A Cognitive Journey into Digital World

Demo

Introduction

Quick Start

Setup

PC Tracker

Post Processing

Agent

Citation