An example of a high-stakes scenario involving irreversible file deletion, which requires human confirmation before execution. In fact, situations requiring human assistance are widespread.
Distribution of our InquireBench dataset. The top three frequent apps are listed for each category.
Data Collection Pipeline of our InquireBench. Among them, we employ a random walk approach to trigger the potential inquiry scenario, in which the agent seek human assistance.
Our training framework consists of two stages:
Format finetuning by supervised fine-tuning (SFT).
Inquiry enhancement via GRPO training with verifiable rewards, to improve the agent's interactive capabilities.
Main results on InquireBench. ISR denotes the inquiry success rate and SR denotes the task success rate.
Comparison of InquireMobile and Qwen2.5-VL-3B-Instruct
Task: "One of my socks is torn"