razvan
/

ml-intern-codex-plugin

Model card Files Files and versions

ml-intern-codex-plugin / plugins /ml-intern /skills /github-example-search /SKILL.md

razvan's picture

update

b6b1825 10 days ago

|

history blame contribute delete

3.22 kB

	---
	name: github-example-search
	description: "Find working example files in GitHub repositories using path heuristics and GitHub file search, then read the best candidates."
	disable-model-invocation: false
	---

	# GitHub Example Search

	## Purpose

	Replicate the useful part of ML Intern's GitHub example discovery:
	find example scripts, tutorials, notebooks, and guides in a target repo, then read the best matches before implementing code.

	This skill is intentionally path-first. It is not a semantic code search engine.

	## Tools

	Use the GitHub plugin tools:

	- `search_repositories` or `search_installed_repositories_v2` to identify the repo if the user did not name it.
	- `search` to search files within the target repository.
	- `fetch_file` to read the candidate file contents.
	- `search_branches` only if you need to confirm branch names.

	## Search Strategy

	Use prioritized file-path patterns, roughly in this order:

	1. `scripts`
	2. `examples`, `example`
	3. `notebooks`, `notebook`
	4. `tutorials`, `tutorial`, `quickstart`, `walkthrough`, `walkthroughs`
	5. `cookbook`, `recipe`, `recipes`
	6. `demos`, `demo`, `samples`, `sample`
	7. `guides`, `guide`, `getting-started`, `getting_started`
	8. `playground`, `howto`, `how-to`
	9. `use-cases`, `usecases`, `use_cases`
	10. `sandbox`, `showcase`

	## Workflow

	1. Resolve the repository first.
	2. If the repository is ambiguous, search repositories by name or organization until you have a strong candidate.
	3. Search the repo with the highest-priority path patterns first.
	4. If the user gave a keyword, combine it with the pattern queries.
	5. Prefer files under `examples/` or `example/` when there is a tie.
	6. Prefer `scripts/` over other example-like directories.
	7. Prefer shallower paths when multiple matches are similar.
	8. Read the top candidate files with `fetch_file`, using line ranges for large files.
	9. Use the exact file path from the search result to continue the investigation.

	## Practical Query Plan

	When looking for a specific method or trainer, search in this order:

	1. `<keyword> scripts`
	2. `<keyword> examples`
	3. `<keyword> tutorial`
	4. `<keyword> notebook`
	5. `<keyword> guide`

	When no keyword is given, search for the directories themselves:

	1. `scripts`
	2. `examples`
	3. `example`
	4. `notebooks`
	5. `tutorials`

	## Reading Pattern

	After finding a candidate file:

	1. Read the file header and argument parsing.
	2. Read the model, dataset, and trainer setup.
	3. Read the training loop or main execution path.
	4. If the file is long, fetch only the relevant line range.

	## Output Expectations

	Return:

	- the best candidate file path
	- why it was selected
	- the repository it came from
	- the exact line range to read next if the file is large

	## Example

	```
	_search(query="grpo scripts", repository_name="huggingface/trl", topn=10)
	_fetch_file(repository_full_name="huggingface/trl", ref="main", path="examples/scripts/grpo.py", encoding="utf-8")
	```

	## Notes

	- This is the closest Codex-native replacement for ML Intern's `github_find_examples`.
	- It relies on GitHub repo/file search plus the same path-priority intuition as the upstream tool.
	- Once you have a candidate, always read the actual file before implementing.