LoongFlow: An Open-Sourced Agent Framework That Transforms Expert Experience into Autonomous AI Productivity

Published January 19, 2026

The "Stuck Agent" Problem
1. The Core Innovation: "Thinking" + "Learning"
The PES Paradigm (Thinking)
Evolutionary Memory (Learning)
2. Benchmarks: Breaking the Ceiling
🧮 Mathematical Discovery (Tao & AlphaEvolve Challenges)
🏆 Kaggle Grandmaster Level (MLE-Bench)
3. Real-World Use Case: The "GPU Doctor" Agent
4. Getting Started
Conclusion
🔗 Resources:
From "Chatting" to "Reasoning": How LoongFlow’s PES Paradigm and Evolutionary Memory are breaking the glass ceiling of Agentic AI.

The "Stuck Agent" Problem

We’ve all been there. You give an LLM-based Agent a complex task—optimizing a machine learning pipeline or solving a multi-step logic puzzle. It starts strong, but then it gets stuck in a loop, hallucinates a solution, or settles for a mediocre result (a local optimum) because it lacks the ability to "step back" and reflect.

Most current frameworks (like LangChain or pure ReAct loops) provide the scaffolding, but they don't solve the cognitive depth problem. They are excellent at following instructions but struggle to "think" like a scientist or an engineer over long horizons.

Enter LoongFlow (released by Baidu's Baige team), a new open-source framework that introduces Evolutionary Strategies into the agentic loop. It has already achieved SOTA results on 11 complex mathematical problems (beating human mathematicians) and secured over 23 Gold Medals on Kaggle benchmarks.

Here is how it works, and why it might be the missing piece in your Agent stack.

1. The Core Innovation: "Thinking" + "Learning"

LoongFlow isn't just another tool-calling library. It is designed around two biological and cognitive principles: Structured Thinking and Evolutionary Memory.

The PES Paradigm (Thinking)

Instead of a simple "Action $\rightarrow$ Observation" loop, LoongFlow enforces a cognitive rhythm called PES:

Plan: The agent doesn't blindly try code. It reviews history, analyzes the problem, and generates a strategic blueprint.
Execute: The agent implements the plan, but with strict verification contracts (e.g., checking if the code actually runs or if the math holds up).
Summary: This is the game-changer. The agent performs an "abductive reflection"—analyzing why a specific attempt failed or succeeded and writing that knowledge into its long-term memory.

Evolutionary Memory (Learning)

Most agents "forget" everything once the session ends. LoongFlow uses Evolutionary Algorithms (EA) to maintain a population of solutions, similar to biological evolution.

Multi-Island Model: It maintains diverse "species" of solutions on different "islands" to prevent the agent from converging on a bad solution too early.
MAP-Elites: It archives high-quality solutions based on their features, allowing the agent to "teleport" to a promising previous state rather than starting from scratch.
Evolution Tree: A global view of the decision history, allowing the agent to balance exploration (trying new things) and exploitation (refining what works).

2. Benchmarks: Breaking the Ceiling

LoongFlow has been tested on tasks where "guessing" is impossible.

🧮 Mathematical Discovery (Tao & AlphaEvolve Challenges)

On high-precision geometry and algebra problems proposed by Terence Tao and the AlphaEvolve team, LoongFlow didn't just match baselines—it set new records.

Circle Packing: Maximizing the radius of 26 circles in a square. Previous best (Human): $2.634$. LoongFlow: $2.635$.
Second Autocorrelation Inequality: LoongFlow pushed the lower bound to $0.9027$, beating the previous AlphaEvolve SOTA.
Efficiency: Compared to other evolutionary frameworks (like OpenEvolve or ShinkaEvolve), LoongFlow achieves these results with 60% higher sample efficiency.

🏆 Kaggle Grandmaster Level (MLE-Bench)

Data Science requires intuition + rigorous testing. On the OpenAI MLE-Bench (a benchmark of 75 Kaggle competitions), LoongFlow's 『ML-Evolve』 agent won Gold Medals in 23 distinct challenges.

3. Real-World Use Case: The "GPU Doctor" Agent

LoongFlow is highly adaptable. A practical example currently in development is a GPU Troubleshooting Agent designed for Cloud Ops.

4. Getting Started

LoongFlow is open source and Python-native. It fits into the modern AI stack (supports OpenAI, DeepSeek, etc.).

Installation:

cd LoongFlow
uv venv .venv --python 3.12
source .venv/bin/activate
uv pip install -e .

Defining a Simple Agent: You can quickly extend the 『Planner』, 『Executor』, and 『Summary』 classes to fit your domain, or use the pre-built General-Evolve agent for algorithmic tasks.

from loongflow.agents import GeneralEvolveAgent

# Initialize with your LLM config
agent = GeneralEvolveAgent()

# Run a complex task
result = await agent.run()

Conclusion

We are moving past the era of "Prompt Engineering" and into the era of "Agent Evolution." If you are working on complex tasks where standard RAG or ReAct loops are failing, LoongFlow provides the structured reasoning and memory needed to break through.