Built from 7 TB of real Kaggle datasets + 20k notebooks, creating real code exec traces using Qwen3-Coder and E2B. Training on this data dramatically improves the ability to execute code and analyze data.
We (@baptistecolle@hannayukhymenko@lvwerra) have created a novel synthetic data generation pipeline with efficient scaffolding, which gives a big performance boost after training your coding agent🔥With the help of real Kaggle notebooks and datasets we generate synthetic notebooks which aim to analyze datasets and answer factual questions about them more efficiently. We simulate a real code execution environment by prompting LLMs or with the help of E2B sandboxes. We have built a dataset of 50k+ high-quality LLM-generated notebooks which can help your agent become better at performing data analysis and question answering.
Snooping on HF is the best because sometimes you just discover that someone (in this case, Earth Species Project) is about to drop terabytes of sick (high quality animal sounds) data...
Just dropped two bigger physics datasets (both on photonics)!
NUMBA 1: SIB-CL This dataset of Surrogate- and Invariance-Boosted Contrastive Learning (SIB-CL) datasets for two scientific problems: - PhC2D: 2D photonic crystal density-of-states (DOS) and bandstructure data. - TISE: 3D time-independent Schrödinger equation eigenvalue and eigenvector solutions.
NUMBA2: 2D Photonic Topology Symmetry-driven analysis of 2D photonic crystals: 10k random unit cells across 11 symmetries, 2 polarizations, 5 contrasts. Includes time-reversal breaking cases for 4 symmetries at high contrast.
It's just become easier to share your apps on the biggest AI app store (aka HF spaces) for unlimited storage, more visibility and community interactions.
Just pick a React, Svelte, or Vue template when you create your space or add app_build_command: npm run build in your README's YAML and app_file: build/index.html in your README's YAML block.
Playing with Veo3 this morning. Share your prompt if you want me to create videos for you (bonus point if they funnily reference HF/open-source). These videos are "a cat on the moon rapping "I love Hugging Face""!