razvan's picture
update
b6b1825
metadata
name: github-example-search
description: >-
  Find working example files in GitHub repositories using path heuristics and
  GitHub file search, then read the best candidates.
disable-model-invocation: false

GitHub Example Search

Purpose

Replicate the useful part of ML Intern's GitHub example discovery: find example scripts, tutorials, notebooks, and guides in a target repo, then read the best matches before implementing code.

This skill is intentionally path-first. It is not a semantic code search engine.

Tools

Use the GitHub plugin tools:

  • search_repositories or search_installed_repositories_v2 to identify the repo if the user did not name it.
  • search to search files within the target repository.
  • fetch_file to read the candidate file contents.
  • search_branches only if you need to confirm branch names.

Search Strategy

Use prioritized file-path patterns, roughly in this order:

  1. scripts
  2. examples, example
  3. notebooks, notebook
  4. tutorials, tutorial, quickstart, walkthrough, walkthroughs
  5. cookbook, recipe, recipes
  6. demos, demo, samples, sample
  7. guides, guide, getting-started, getting_started
  8. playground, howto, how-to
  9. use-cases, usecases, use_cases
  10. sandbox, showcase

Workflow

  1. Resolve the repository first.
  2. If the repository is ambiguous, search repositories by name or organization until you have a strong candidate.
  3. Search the repo with the highest-priority path patterns first.
  4. If the user gave a keyword, combine it with the pattern queries.
  5. Prefer files under examples/ or example/ when there is a tie.
  6. Prefer scripts/ over other example-like directories.
  7. Prefer shallower paths when multiple matches are similar.
  8. Read the top candidate files with fetch_file, using line ranges for large files.
  9. Use the exact file path from the search result to continue the investigation.

Practical Query Plan

When looking for a specific method or trainer, search in this order:

  1. <keyword> scripts
  2. <keyword> examples
  3. <keyword> tutorial
  4. <keyword> notebook
  5. <keyword> guide

When no keyword is given, search for the directories themselves:

  1. scripts
  2. examples
  3. example
  4. notebooks
  5. tutorials

Reading Pattern

After finding a candidate file:

  1. Read the file header and argument parsing.
  2. Read the model, dataset, and trainer setup.
  3. Read the training loop or main execution path.
  4. If the file is long, fetch only the relevant line range.

Output Expectations

Return:

  • the best candidate file path
  • why it was selected
  • the repository it came from
  • the exact line range to read next if the file is large

Example

_search(query="grpo scripts", repository_name="huggingface/trl", topn=10)
_fetch_file(repository_full_name="huggingface/trl", ref="main", path="examples/scripts/grpo.py", encoding="utf-8")

Notes

  • This is the closest Codex-native replacement for ML Intern's github_find_examples.
  • It relies on GitHub repo/file search plus the same path-priority intuition as the upstream tool.
  • Once you have a candidate, always read the actual file before implementing.