Spaces:
Running
Running
add CLAUDE.md architecture docs and include tests in validate script
Browse files- Add CLAUDE.md with full architecture overview: dataset version support
(v2.0, v2.1, v3.0), key files, chart data pipeline, testing setup,
URL structure, and post-process instructions
- Update validate script to include `bun test` so tests run as part of
the full CI check alongside type-check, lint, and format
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- CLAUDE.md +116 -0
- package.json +1 -1
CLAUDE.md
ADDED
|
@@ -0,0 +1,116 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# CLAUDE.md β LeRobot Dataset Visualizer
|
| 2 |
+
|
| 3 |
+
## Package manager
|
| 4 |
+
|
| 5 |
+
Always use **bun** (`bun install`, `bun dev`, `bun run build`, `bun test`). Never use npm or yarn.
|
| 6 |
+
|
| 7 |
+
## Post-process β run after every code change
|
| 8 |
+
|
| 9 |
+
After making any code changes, always run these commands in order and fix any errors before finishing:
|
| 10 |
+
|
| 11 |
+
```
|
| 12 |
+
bun run format # auto-fix formatting (prettier)
|
| 13 |
+
bun run type-check # TypeScript: app + test files
|
| 14 |
+
bun run lint # ESLint (next lint)
|
| 15 |
+
bun test # unit tests
|
| 16 |
+
```
|
| 17 |
+
|
| 18 |
+
Or run them all at once (format first, then the full validate suite):
|
| 19 |
+
|
| 20 |
+
```
|
| 21 |
+
bun run format && bun run validate
|
| 22 |
+
```
|
| 23 |
+
|
| 24 |
+
`bun run validate` runs: type-check β lint β format:check β test
|
| 25 |
+
|
| 26 |
+
## Key scripts
|
| 27 |
+
|
| 28 |
+
```
|
| 29 |
+
bun dev # Next.js dev server
|
| 30 |
+
bun test # Run all unit tests (bun:test)
|
| 31 |
+
bun run type-check # tsc --noEmit (app) + tsc -p tsconfig.test.json --noEmit (tests)
|
| 32 |
+
bun run lint # next lint
|
| 33 |
+
bun run validate # type-check + lint + format:check
|
| 34 |
+
```
|
| 35 |
+
|
| 36 |
+
## Architecture
|
| 37 |
+
|
| 38 |
+
### Dataset version support
|
| 39 |
+
|
| 40 |
+
Three versions are supported. Version is detected from `meta/info.json` β `codebase_version`.
|
| 41 |
+
|
| 42 |
+
| Version | Path pattern | Episode metadata | Video |
|
| 43 |
+
| -------- | ----------------------------------------------------------------- | ------------------------------------------ | ---------------------------------------------- |
|
| 44 |
+
| **v2.0** | `data/{episode_chunk:03d}/episode_{episode_index:06d}.parquet` | None (computed from `chunks_size`) | Full file per episode |
|
| 45 |
+
| **v2.1** | Same as v2.0 | None | Full file per episode |
|
| 46 |
+
| **v3.0** | `data/chunk-{N:03d}/file-{N:03d}.parquet` (via `buildV3DataPath`) | `meta/episodes/chunk-{N}/file-{N}.parquet` | Segmented (timestamps per episode, per camera) |
|
| 47 |
+
|
| 48 |
+
### Routing to parsers
|
| 49 |
+
|
| 50 |
+
`src/app/[org]/[dataset]/[episode]/fetch-data.ts` β `getEpisodeData()` dispatches to:
|
| 51 |
+
|
| 52 |
+
- `getEpisodeDataV2()` for v2.0 and v2.1
|
| 53 |
+
- `getEpisodeDataV3()` for v3.0
|
| 54 |
+
|
| 55 |
+
### v3.0 specifics
|
| 56 |
+
|
| 57 |
+
- Episode metadata row has named keys (`episode_index`, `data/chunk_index`, `data/file_index`, `dataset_from_index`, `dataset_to_index`, `videos/{key}/chunk_index`, etc.)
|
| 58 |
+
- Integer columns from parquet come out as **BigInt** β always use `bigIntToNumber()` from `src/utils/typeGuards.ts`
|
| 59 |
+
- Row-range selection: `dataset_from_index` / `dataset_to_index` allow reading only the episode's rows from a shared parquet file
|
| 60 |
+
- Fallback format uses numeric keys `"0"`.."9"` when column names are unavailable
|
| 61 |
+
|
| 62 |
+
### v2.x path construction
|
| 63 |
+
|
| 64 |
+
```ts
|
| 65 |
+
formatStringWithVars(info.data_path, {
|
| 66 |
+
episode_chunk: Math.floor(episodeId / chunkSize)
|
| 67 |
+
.toString()
|
| 68 |
+
.padStart(3, "0"),
|
| 69 |
+
episode_index: episodeId.toString().padStart(6, "0"),
|
| 70 |
+
});
|
| 71 |
+
// β "data/000/episode_000042.parquet"
|
| 72 |
+
```
|
| 73 |
+
|
| 74 |
+
`formatStringWithVars` strips `:03d` format specifiers β padding must be done by the caller.
|
| 75 |
+
|
| 76 |
+
## Key files
|
| 77 |
+
|
| 78 |
+
| File | Purpose |
|
| 79 |
+
| ------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------- |
|
| 80 |
+
| `src/app/[org]/[dataset]/[episode]/fetch-data.ts` | Main data-loading entry point; v2/v3 parsers; `computeColumnMinMax` |
|
| 81 |
+
| `src/utils/versionUtils.ts` | `getDatasetInfo`, `getDatasetVersionAndInfo`, `buildVersionedUrl` |
|
| 82 |
+
| `src/utils/stringFormatting.ts` | `buildV3DataPath`, `buildV3VideoPath`, `buildV3EpisodesMetadataPath`, padding helpers |
|
| 83 |
+
| `src/utils/parquetUtils.ts` | `fetchParquetFile`, `readParquetAsObjects`, `formatStringWithVars` |
|
| 84 |
+
| `src/utils/dataProcessing.ts` | Chart grouping pipeline: `buildSuffixGroupsMap` β `computeGroupStats` β `groupByScale` β `flattenScaleGroups` β `processChartDataGroups` |
|
| 85 |
+
| `src/utils/typeGuards.ts` | `bigIntToNumber`, `isNumeric`, `isValidTaskIndex`, etc. |
|
| 86 |
+
| `src/utils/constants.ts` | `PADDING`, `EXCLUDED_COLUMNS`, `CHART_CONFIG`, `THRESHOLDS` |
|
| 87 |
+
| `src/types/` | TypeScript types: `DatasetVersion`, `EpisodeMetadataV3`, `VideoInfo`, `ChartDataGroup`, etc. |
|
| 88 |
+
|
| 89 |
+
## Chart data pipeline
|
| 90 |
+
|
| 91 |
+
Series keys use `" | "` as delimiter (e.g. `observation.state | 0`).
|
| 92 |
+
`groupRowBySuffix` groups by **suffix**: if two different prefixes share suffix `"0"` (e.g. `observation.state | 0` and `action | 0`), they are merged under `result["0"] = { "observation.state": ..., "action": ... }`. A series with a unique suffix stays flat with its full original key.
|
| 93 |
+
|
| 94 |
+
## Testing
|
| 95 |
+
|
| 96 |
+
- Test files live in `**/__tests__/` directories alongside source
|
| 97 |
+
- Uses `bun:test` (built-in, no extra install)
|
| 98 |
+
- BigInt literals (`42n`) require `tsconfig.test.json` (target ES2020) β test files are excluded from `tsconfig.json`
|
| 99 |
+
- `@types/bun` is installed as a devDependency for `bun:test` type resolution
|
| 100 |
+
- Mocking fetch: `globalThis.fetch = mock(() => Promise.resolve(new Response(...))) as unknown as typeof fetch`
|
| 101 |
+
- CI: `.github/workflows/test.yml` runs `bun test` on push/PR to main
|
| 102 |
+
|
| 103 |
+
## URL structure
|
| 104 |
+
|
| 105 |
+
All dataset URLs:
|
| 106 |
+
|
| 107 |
+
```
|
| 108 |
+
https://huggingface.co/datasets/{org}/{dataset}/resolve/main/{path}
|
| 109 |
+
```
|
| 110 |
+
|
| 111 |
+
Built by `buildVersionedUrl(repoId, version, path)`. The `version` param is accepted but currently unused in the URL (always `main` revision).
|
| 112 |
+
|
| 113 |
+
## Excluded columns (not shown in charts)
|
| 114 |
+
|
| 115 |
+
- v2.x: `timestamp`, `frame_index`, `episode_index`, `index`, `task_index`
|
| 116 |
+
- v3.0: `index`, `task_index`, `episode_index`, `frame_index`, `next.done`
|
package.json
CHANGED
|
@@ -12,7 +12,7 @@
|
|
| 12 |
"type-check": "tsc --noEmit && tsc -p tsconfig.test.json --noEmit",
|
| 13 |
"type-check:watch": "tsc --noEmit --watch",
|
| 14 |
"test": "bun test",
|
| 15 |
-
"validate": "bun run type-check && bun run lint && bun run format:check"
|
| 16 |
},
|
| 17 |
"dependencies": {
|
| 18 |
"@react-three/drei": "^10.7.7",
|
|
|
|
| 12 |
"type-check": "tsc --noEmit && tsc -p tsconfig.test.json --noEmit",
|
| 13 |
"type-check:watch": "tsc --noEmit --watch",
|
| 14 |
"test": "bun test",
|
| 15 |
+
"validate": "bun run type-check && bun run lint && bun run format:check && bun test"
|
| 16 |
},
|
| 17 |
"dependencies": {
|
| 18 |
"@react-three/drei": "^10.7.7",
|