update

b6b1825 9 days ago

11.2 kB

	---
	tags:
	- ml-intern
	---
	# ML Intern Plugin for OpenAI Codex

	Hugging Face ML Intern reimagined as an OpenAI Codex plugin. Research papers, inspect datasets and models, plan and evaluate AI/RAG systems, run training and evaluation on Hugging Face Jobs, and ship ML artifacts — all inside Codex.

	## What This Is

	This repository replicates the core functionality of [`huggingface/mlintern-plugin`](https://github.com/huggingface/mlintern-plugin) (a Claude Code plugin) for the OpenAI Codex ecosystem. It is a Codex plugin, not a Claude Code plugin, using the Codex Skills and Plugin format.

	The original `mlintern-plugin` wraps the `ml-intern` CLI binary inside Claude Code with slash commands. This Codex plugin instead uses:
	- Skills (`SKILL.md` files) that teach Codex how to use Hugging Face tools
	- Commands (`commands/run.md`) for `/mlintern:run` task execution
	- Scripts for operations the HF plugin doesn't directly expose (dataset inspection, paper research)
	- The Hugging Face Codex plugin (`hugging-face` MCP) for direct API tools (`model_search`, `dataset_search`, `paper_search`, `hub_repo_details`, `hf_doc_search`, `hf_doc_fetch`, `hf_jobs`)

	## Plugin Structure

	- `./.agents/plugins/marketplace.json` - Repo marketplace for Codex
	- `./plugins/ml-intern/.codex-plugin/plugin.json` - Plugin manifest
	- `./plugins/ml-intern/agents/openai.yaml` - UI metadata for Codex
	- `./plugins/ml-intern/commands/run.md` - `/mlintern:run` command definition
	- `./plugins/ml-intern/skills/ml-intern-harness/` - Core ML Intern behavior
	- `./plugins/ml-intern/skills/hf-model-search/` - Model discovery and validation
	- `./plugins/ml-intern/skills/hf-dataset-search/` - Dataset discovery and schema inspection
	- `./plugins/ml-intern/skills/hf-paper-search/` - Paper research, reading, citations
	- `./plugins/ml-intern/skills/hf-docs/` - Hugging Face library documentation lookup
	- `./plugins/ml-intern/skills/github-example-search/` - GitHub example-file discovery
	- `./plugins/ml-intern/skills/web-search/` - Current web search with source filtering
	- `./plugins/ml-intern/skills/hf-jobs/` - Hugging Face cloud job submission and monitoring

	## Installation

	This plugin is hosted on GitHub, and the easiest local install is to clone the repo and let Codex load the plugin from the local checkout.

	### Method 1: Clean Codex UI Install

	This is the recommended path if you want a clean install inside the Codex UI without copying files into a global plugin directory.

	1. Clone this repo somewhere local that Codex can read:

	```bash
	git clone https://github.com/razvan/ml-intern-codex-plugin.git
	cd ml-intern-codex-plugin
	```

	2. Restart Codex so it reloads the repo marketplace.

	3. Use the repository marketplace entry in this repo. The marketplace file is:

	`./.agents/plugins/marketplace.json`

	4. The marketplace points Codex at the plugin bundle here:

	`./plugins/ml-intern`

	After Codex reloads, the plugin should appear in the Codex UI as ML Intern for Codex.

	### Method 2: Manual Local Install

	If you want to install it into your local Codex plugin directory manually:

	1. Clone the repository locally:

	```bash
	git clone https://github.com/razvan/ml-intern-codex-plugin.git
	cd ml-intern-codex-plugin
	```

	2. Copy the plugin bundle into your Codex plugins directory:

	```bash
	cp -R plugins/ml-intern ~/.codex/plugins/ml-intern
	```

	3. Make sure Codex can see the plugin via a local marketplace entry or by reloading the Codex UI, depending on how your Codex setup is configured.

	4. Restart Codex.

	5. Look for ML Intern for Codex in the plugin list.

	## Dependencies

	- OpenAI Codex (with the `hugging-face` MCP plugin enabled)
	- `HF_TOKEN` environment variable for private/gated resources and Jobs
	- Python 3.10+ for the helper scripts (dataset inspection, paper research)

	## Usage

	Once installed, invoke the plugin in Codex by typing:

	```
	/mlintern:run "fine-tune Qwen3-4B for code completion on my dataset"
	```

	Or use the skill name in your prompt:

	```
	Use ml-intern-harness to research DPO training recipes, find a suitable dataset, and implement a training script.
	```

	The bundled skills are the main entry points:

	- `ml-intern-harness` for the end-to-end research, validation, implementation, and job loop, including plan-only AI/RAG/search/QA system design
	- `ml-intern` as the short alias for the main plugin workflow
	- `hf-model-search` for model discovery and validation
	- `hf-dataset-search` for dataset discovery and schema inspection
	- `hf-paper-search` for paper research and recipe extraction
	- `hf-docs` for current Hugging Face library docs
	- `github-example-search` for finding working example files in GitHub repos
	- `web-search` for current web sources and general research outside the Hugging Face Hub
	- `hf-jobs` for Hugging Face job submission and monitoring

	## Skills Reference

	\| Skill \| Purpose \| Key Tools \|
	\|---\|---\|---\|
	\| `ml-intern-harness` \| Core autonomous ML workflow \| Research, validate, implement, test, run, evaluate, ship \|
	\| `hf-model-search` \| Find and validate models \| `_model_search`, `_hub_repo_details` \|
	\| `hf-dataset-search` \| Find and validate datasets \| `_dataset_search`, `_hub_repo_details`, `inspect_dataset.py` \|
	\| `hf-paper-search` \| Research papers and extract recipes \| `_paper_search`, `papers.py` (details, citations, resources) \|
	\| `hf-docs` \| Look up current HF library APIs \| `_hf_doc_search`, `_hf_doc_fetch` \|
	\| `github-example-search` \| Find working example files in GitHub repos \| GitHub repo/file search plus `fetch_file` \|
	\| `web-search` \| Find current web sources with filters \| Codex web browsing/search tools and citation links \|
	\| `hf-jobs` \| Submit and monitor cloud jobs \| `_hf_jobs` (run, uv, ps, logs, inspect, cancel) \|

	## Comparison to Original

	\| Feature \| `huggingface/mlintern-plugin` (Claude Code) \| This Codex Plugin \|
	\|---\|---\|---\|
	\| Platform \| Claude Code \| OpenAI Codex \|
	\| Format \| `.claude-plugin` manifest + companion script \| `.codex-plugin` manifest + Skills \|
	\| Interaction \| Slash commands (`/mlintern:run`) \| Slash commands + skill invocation \|
	\| Agent Runtime \| Spawns `ml-intern` CLI binary \| Uses Codex's native agent loop + Skills \|
	\| Paper Research \| Built into `ml-intern` binary \| `papers.py` script shim \|
	\| Dataset Inspection \| Built into `ml-intern` binary \| `inspect_dataset.py` script shim \|
	\| Job Submission \| Built into `ml-intern` binary \| `_hf_jobs` via Hugging Face Codex plugin \|
	\| Sandbox \| HF Space sandboxes \| Codex local shell + `_hf_jobs` \|

	## Parity Status

	This plugin intentionally mirrors the parts of ML Intern that matter most for HF ML work:

	\| Area \| Status \| Notes \|
	\|---\|---\|---\|
	\| Paper discovery \| Done \| HF paper search plus the deeper `papers.py` research flow is available. \|
	\| Paper reading \| Done \| Section reading, citations, recommendations, and linked resources are implemented. \|
	\| Dataset validation \| Done \| Schema, splits, sample rows, parquet availability, and compatibility notes are covered. \|
	\| HF docs lookup \| Done \| Search plus fetch are available for current HF library guidance. \|
	\| End-to-end ML workflow \| Done \| The harness pushes research-first, validate-first behavior. \|
	\| Generic web search \| Partial \| Best-effort Codex `web-search` guidance exists, but it is not the exact ML Intern DuckDuckGo wrapper. \|

	For the closest ML Intern feel, use the paper and dataset skills first, then docs, then the harness workflow.

	## Behavioral Contract

	When ML Intern is invoked directly, the plugin should behave like a research harness rather than a generic assistant:

	- Route through `ml-intern` -> `ml-intern-harness` for non-trivial AI/ML/RAG/search/evaluation tasks, even when they are not purely Hugging Face tasks.
	- Use plan tracking at the beginning, each phase transition, and completion when Codex exposes a planning tool.
	- Split research into explicit tracks before synthesis, such as platform constraints, technical approaches, and evaluation methodology.
	- Use `hf-paper-search` for literature, benchmarks, and evaluation methods.
	- Use `web-search` for current platform/API constraints, official docs, policies, pricing, rate limits, SDK behavior, and other non-HF facts.
	- Cite important architecture and evaluation decisions with papers, official docs, or primary sources.
	- If the user asks for "plan only", stop after research and do not write implementation code or scaffold files.
	- If a write, shell, network, or sandbox step fails, fail forward with an inline deliverable when possible.

	### Codex Compatibility Layer

	This plugin cannot inject upstream Python tools into Codex directly, so it should reproduce the behavior of the upstream tools using Codex-native primitives:

	\| Upstream tool \| Codex compatibility layer \| Required behavior to preserve \|
	\|---\|---\|---\|
	\| `plan_tool` \| `update_plan` \| Full-plan replacement, one `in_progress` item, updates at start/phase transition/completion \|
	\| `research` \| delegated sub-agent research when explicitly allowed, otherwise focused sequential research \| Separate research context when possible, read-only scope, papers-first workflow, compact evidence-backed summary \|

	Faithful `plan_tool` semantics to preserve:
	- Use it for tasks with 3 or more meaningful steps.
	- Replace the whole visible plan each update.
	- Keep exactly one item in progress.
	- Mark completed only after full success.

	Faithful `research` semantics to preserve:
	- Main context stays focused on synthesis and decisions.
	- Research uses papers, citation graphs, dataset inspection, docs, GitHub examples, and web search.
	- Research returns compact findings with concrete references and recipe-level claims.
	- If separate delegation is unavailable, preserve the same research floor directly rather than skipping it.

	Example plan-only trigger:

	```text
	[@ml-intern](plugin://ml-intern@ml-intern-codex)
	i want to query generic Discord servers in natural language.
	first figure out constraints and challenges, then research how to build and test quality.
	i'm only interested in the plan for now.
	```

	Expected behavior: track a plan, use `web-search` for Discord API constraints, use `hf-paper-search` for RAG/forum/social QA and evaluation research, synthesize a cited build-and-test plan, and avoid implementation.

	## Why This Exists

	The original `huggingface/mlintern-plugin` is Claude Code only — it's a companion script that spawns the `ml-intern` CLI inside Claude Code sessions. There is no equivalent for Codex. The `huggingface/skills` repo provides general HF skills but not the full ML Intern harness. This plugin bridges the gap.

	## Contributing

	This is an early version. The biggest improvement would be testing the skill instructions in real Codex sessions and tightening the guardrails where the LLM deviates. PRs welcome.

	## License

	Apache-2.0

	<!-- ml-intern-provenance -->
	## Generated by ML Intern

	This model repository was generated by [ML Intern](https://github.com/huggingface/ml-intern), an agent for machine learning research and development on the Hugging Face Hub.

	- Try ML Intern: https://smolagents-ml-intern.hf.space
	- Source code: https://github.com/huggingface/ml-intern