| ---
|
| title: README
|
| emoji: "π"
|
| colorFrom: blue
|
| colorTo: purple
|
| sdk: static
|
| pinned: false
|
| ---
|
|
|
| # OBay Data
|
|
|
| **World-class training data production for frontier AI models.**
|
|
|
| We build high-quality datasets that power the next generation of AI β from large language models to embodied intelligence.
|
|
|
| ## What We Do
|
|
|
| | Domain | Description |
|
| |--------|-------------|
|
| | π§ **Pre-training Data** | Large-scale, curated corpora for foundation model training |
|
| | π― **Post-training Data** | SFT, RLHF, DPO datasets for alignment and instruction-following |
|
| | π€ **Embodied AI Data** | Robotics trajectories, gameplay recordings, sensor logs for world models |
|
| | πΌοΈ **Multimodal Data** | Image editing, composition, style transfer instruction sets |
|
|
|
| ## Datasets
|
|
|
| | Dataset | Description |
|
| |---------|-------------|
|
| | trajectory_demo | Terminal agent trajectories (ATIF format) |
|
| | svg-multimodal-rubrics | SVG code generation + evaluation rubrics |
|
| | image-editing-style-instruction-following | Style transfer + instruction following |
|
| | swe-coding-instruction-following | SWE-bench coding tasks |
|
| | world-model-gameplay-recording | Gameplay recording for world model training |
|
| | multi-image-composition-instruction-following | Multi-image composition with instructions |
|
|
|
| ## Contact
|
|
|
| π [obaydata.com](https://obaydata.com) Β· π» [GitHub](https://github.com/simonsu20000) Β· βοΈ simon.su@obaydata.com
|
| |