Spaces:

ArpitBaliyan
/

npm-resolver

Runtime error

App Files Files Community

npm-resolver / Blog.md

ArpitBaliyan

Upload 5 files

b20771e verified 19 days ago

preview code

raw

history blame contribute delete

3.48 kB

A newer version of the Gradio SDK is available: 6.14.0

Upgrade

Teaching an LLM to Survive Node.js Dependency Hell using RL and OpenEnv

If you are a JavaScript developer, you have seen this wall of red text: npm ERR! code ERESOLVE npm ERR! ERESOLVE unable to resolve dependency tree

Fixing these peer-dependency conflicts usually involves twenty minutes of frantic Googling, manually downgrading packages, and praying your app still builds. Standard LLMs aren't much help either; they hallucinate versions because they treat Semantic Versioning as text generation, rather than a strict mathematical constraint.

For the Meta OpenEnv Hackathon, we decided to fix this by treating package resolution not as a chat prompt, but as a playable game. We built AutoResolve.

The Architecture: OpenEnv + GRPO

We utilized the OpenEnv framework to build a strict, Gym-style Python environment (env.py and openenv.yaml) that acts as a mock NPM registry.

Instead of just telling an LLM the answer, we let it play inside this registry. We deployed Llama-3 8B (using Unsloth for 4-bit quantization) and trained it using Hugging Face TRL’s Generative Reward Policy Optimization (GRPO).

How the Environment Works

The State: The environment generates a broken package.json alongside a realistic NPM error trace.
The Action: The agent acts as a greedy 1-step optimizer. It must output a strict JSON payload, for example: {"package_to_update": "react", "new_version": "^18.0.0"}.
The Reward: The environment validates the action against the mock registry.
- Fixing the tree grants +50.
- Hallucinating a fake package yields -100.
- Forgetting the caret (^) symbol yields -5.

Overcoming "Reward Hacking"

During training, we encountered a classic RL phenomenon: Reward Hacking. Our environment initially validated the major version numbers, so the AI figured out it could achieve maximum points while saving token-generation time by dropping the caret symbol (^). Rather than wasting compute on a complete retraining cycle, we implemented a lightweight post-processing formatter—a standard practice in production LLM pipelines—to re-inject the caret.

The Results: 93.3% Zero-Shot Accuracy

Training an RL agent on the entire 2-million-package NPM registry requires a massive compute cluster. To prove our architecture within the hackathon timeline, we curated a "Mega Registry" containing 5 distinct ecosystems (React, Vue, Express, Webpack, Mongoose).

To evaluate the model, we generated 15 rigorous, blind test cases spanning cross-ecosystem conflicts. The agent scored 93.3% (14/15) accuracy. The single edge-case failure was an attempt to update babel-loader instead of its missing peer @babel/core—a known artifact of our MVP's 'greedy 1-step optimizer' design, which we plan to resolve in V2 using Monte Carlo Tree Search (MCTS) for multi-step rollouts.

Conclusion

By utilizing OpenEnv, we successfully proved that Reinforcement Learning can teach an LLM rigid mathematical constraints. The architecture is fully complete; scaling to the entire NPM registry simply requires deploying our identical pipeline to a production GPU cluster.

Explore the Project:

🧠 Try the Gradio UI / View Training Code (<- Insert Colab Link)
📦 View the OpenEnv Space (<- Insert HF Space Link)
🚀 Download the Weights