| --- |
| name: github-example-search |
| description: "Find working example files in GitHub repositories using path heuristics and GitHub file search, then read the best candidates." |
| disable-model-invocation: false |
| --- |
| |
| # GitHub Example Search |
|
|
| ## Purpose |
|
|
| Replicate the useful part of ML Intern's GitHub example discovery: |
| find example scripts, tutorials, notebooks, and guides in a target repo, then read the best matches before implementing code. |
|
|
| This skill is intentionally path-first. It is not a semantic code search engine. |
|
|
| ## Tools |
|
|
| Use the GitHub plugin tools: |
|
|
| - `search_repositories` or `search_installed_repositories_v2` to identify the repo if the user did not name it. |
| - `search` to search files within the target repository. |
| - `fetch_file` to read the candidate file contents. |
| - `search_branches` only if you need to confirm branch names. |
|
|
| ## Search Strategy |
|
|
| Use prioritized file-path patterns, roughly in this order: |
|
|
| 1. `scripts` |
| 2. `examples`, `example` |
| 3. `notebooks`, `notebook` |
| 4. `tutorials`, `tutorial`, `quickstart`, `walkthrough`, `walkthroughs` |
| 5. `cookbook`, `recipe`, `recipes` |
| 6. `demos`, `demo`, `samples`, `sample` |
| 7. `guides`, `guide`, `getting-started`, `getting_started` |
| 8. `playground`, `howto`, `how-to` |
| 9. `use-cases`, `usecases`, `use_cases` |
| 10. `sandbox`, `showcase` |
|
|
| ## Workflow |
|
|
| 1. Resolve the repository first. |
| 2. If the repository is ambiguous, search repositories by name or organization until you have a strong candidate. |
| 3. Search the repo with the highest-priority path patterns first. |
| 4. If the user gave a keyword, combine it with the pattern queries. |
| 5. Prefer files under `examples/` or `example/` when there is a tie. |
| 6. Prefer `scripts/` over other example-like directories. |
| 7. Prefer shallower paths when multiple matches are similar. |
| 8. Read the top candidate files with `fetch_file`, using line ranges for large files. |
| 9. Use the exact file path from the search result to continue the investigation. |
|
|
| ## Practical Query Plan |
|
|
| When looking for a specific method or trainer, search in this order: |
|
|
| 1. `<keyword> scripts` |
| 2. `<keyword> examples` |
| 3. `<keyword> tutorial` |
| 4. `<keyword> notebook` |
| 5. `<keyword> guide` |
|
|
| When no keyword is given, search for the directories themselves: |
|
|
| 1. `scripts` |
| 2. `examples` |
| 3. `example` |
| 4. `notebooks` |
| 5. `tutorials` |
|
|
| ## Reading Pattern |
|
|
| After finding a candidate file: |
|
|
| 1. Read the file header and argument parsing. |
| 2. Read the model, dataset, and trainer setup. |
| 3. Read the training loop or main execution path. |
| 4. If the file is long, fetch only the relevant line range. |
|
|
| ## Output Expectations |
|
|
| Return: |
|
|
| - the best candidate file path |
| - why it was selected |
| - the repository it came from |
| - the exact line range to read next if the file is large |
|
|
| ## Example |
|
|
| ``` |
| _search(query="grpo scripts", repository_name="huggingface/trl", topn=10) |
| _fetch_file(repository_full_name="huggingface/trl", ref="main", path="examples/scripts/grpo.py", encoding="utf-8") |
| ``` |
|
|
| ## Notes |
|
|
| - This is the closest Codex-native replacement for ML Intern's `github_find_examples`. |
| - It relies on GitHub repo/file search plus the same path-priority intuition as the upstream tool. |
| - Once you have a candidate, always read the actual file before implementing. |
|
|