Xiaohongshu has introduced a new reinforcement learning framework called Evolving-RL, which allows AI agents to independently enhance their skills through experience. This development marks an important step forward in the field, as it addresses a key limitation faced by AI agents: their inability to learn and adapt after training. Traditional AI models have fixed parameters post-training, which restricts their ability to improve when faced with new challenges.
Existing methods for self-evolving agents mainly concentrate on optimizing how experiences are stored and retrieved, overlooking the vital task of extracting useful skills from those experiences. The issue of 'skill amnesia' occurs when agents struggle to apply past experiences effectively due to poor training signals. Evolving-RL overcomes this limitation by using a single-model co-evolution architecture that facilitates simultaneous skill extraction and application.
Evolving-RL operates through a detailed four-stage training process. First, the solver interacts with source tasks to generate interaction trajectories. Next, the extractor identifies several candidate skills from these trajectories. The skills are then evaluated across related downstream tasks to determine their effectiveness. In the final stage, a joint optimization signal guides both roles, enhancing their performance together. This approach rewards high-quality skills while teaching the solver to differentiate between useful and unhelpful skills.
The effectiveness of Evolving-RL is highlighted by impressive results on various benchmarks. On the ALFWorld indoor interaction benchmark, Evolving-RL achieved a success rate of 96.0% for known tasks and 88.6% for unseen tasks, reflecting a significant 98.7% improvement over the previous state-of-the-art method, GRPO, for unseen tasks. In the Mind2Web web navigation benchmark, Evolving-RL showed an action accuracy of 30.87%, outperforming GRPO's 22.73%. The framework excelled in cross-task scenarios as well, achieving 42.0% accuracy compared to GRPO's 28.8%.
Ablation studies have confirmed the importance of joint training for both the extractor and solver components. Training either component in isolation resulted in subpar performance: skills produced by the extractor alone often overfitted to training data, while the solver trained alone showed indifference to all skills. The comprehensive co-evolution framework was crucial for achieving optimal performance across both familiar and new contexts.
Importantly, the skills generated by Evolving-RL demonstrate impressive cross-model transferability. When incorporated into the Qwen2.5-7B-Instruct model, the skills extracted through Evolving-RL increased its success rate on the ALFWorld benchmark from 45.5% to 60.4%. A GRPO-trained model also benefited from these skills, raising its success rate from 79.9% to 88.8%. These findings indicate that Evolving-RL produces genuinely transferable experiential knowledge rather than model-specific artifacts.
The introduction of Evolving-RL by Xiaohongshu is poised to significantly impact the future of AI development. By enabling agents to learn and adapt autonomously, this framework may lead to more advanced AI systems capable of managing an expanding array of tasks. As researchers delve into the implications of this framework, it's evident that the future of AI agents is heading towards greater autonomy and effectiveness.
Quick answers
What is the Evolving-RL framework?
Evolving-RL is a reinforcement learning framework developed by Xiaohongshu that allows AI agents to autonomously enhance their skills through experience.
How does Evolving-RL address skill amnesia?
Evolving-RL uses a single-model co-evolution architecture that simultaneously extracts and applies skills, helping agents avoid skill amnesia.
What benchmarks has Evolving-RL performed well on?
Evolving-RL achieved a 96.0% success rate on the ALFWorld benchmark and 30.87% action accuracy on the Mind2Web benchmark.
The stories that move AI & crypto markets — before the market reacts.
Free. 7am ET. Five stories. 62,400 readers.



