AI & ML interests

None defined yet.

Recent Activity

Articles

cgeorgiaw 
posted an update 22 days ago
cgeorgiaw 
posted an update 3 months ago
view post
Post
5955
🚀🚀🚀 The largest ever dataset of co-folded 3D protein-ligand structures just dropped on HF!!

Meet SAIR (Structurally Augmented IC₅₀ Repository): 5M+ AI-generated complexes with experimentally measured drug potency data from SandboxAQ. 🚀🚀🚀

Check it out and explore here: SandboxAQ/SAIR

·
hannayukhymenko 
posted an update 3 months ago
view post
Post
2911
Releasing the Jupyter Agent Dataset! 🚀

Built from 7 TB of real Kaggle datasets + 20k notebooks, creating real code exec traces using Qwen3-Coder and E2B.
Training on this data dramatically improves the ability to execute code and analyze data.

We ( @baptistecolle @hannayukhymenko @lvwerra ) have created a novel synthetic data generation pipeline with efficient scaffolding, which gives a big performance boost after training your coding agent🔥With the help of real Kaggle notebooks and datasets we generate synthetic notebooks which aim to analyze datasets and answer factual questions about them more efficiently. We simulate a real code execution environment by prompting LLMs or with the help of E2B sandboxes. We have built a dataset of 50k+ high-quality LLM-generated notebooks which can help your agent become better at performing data analysis and question answering.

Link: https://huggingface.co/datasets/data-agents/jupyter-agent-dataset
  • 3 replies
·
cgeorgiaw 
posted an update 4 months ago
clem 
posted an update 4 months ago
merterbak 
posted an update 4 months ago
view post
Post
3945
OpenAI is now open again! Check out OpenAI’s brand new gpt‑oss‑20b model hosted on ZeroGPU 🤗

merterbak/gpt-oss-20b-demo
cgeorgiaw 
posted an update 6 months ago
clem 
posted an update 6 months ago
cgeorgiaw 
posted an update 6 months ago
view post
Post
1608
Snooping on HF is the best because sometimes you just discover that someone (in this case, Earth Species Project) is about to drop terabytes of sick (high quality animal sounds) data...

EarthSpeciesProject/NatureLM-audio-training
clem 
posted an update 7 months ago
view post
Post
7739
Today, we're unveiling two new open-source AI robots! HopeJR for $3,000 & Reachy Mini for $300 🤖🤖🤖

Let's go open-source AI robotics!
·
cgeorgiaw 
posted an update 7 months ago
view post
Post
522
Just dropped two bigger physics datasets (both on photonics)!

NUMBA 1: SIB-CL
This dataset of Surrogate- and Invariance-Boosted Contrastive Learning (SIB-CL) datasets for two scientific problems:
- PhC2D: 2D photonic crystal density-of-states (DOS) and bandstructure data.
- TISE: 3D time-independent Schrödinger equation eigenvalue and eigenvector solutions.

NUMBA2: 2D Photonic Topology
Symmetry-driven analysis of 2D photonic crystals: 10k random unit cells across 11 symmetries, 2 polarizations, 5 contrasts. Includes time-reversal breaking cases for 4 symmetries at high contrast.

Check them out: cgeorgiaw/sib-cl & cgeorgiaw/2d-photonic-topology
clem 
posted an update 7 months ago
view post
Post
3871
It's just become easier to share your apps on the biggest AI app store (aka HF spaces) for unlimited storage, more visibility and community interactions.

Just pick a React, Svelte, or Vue template when you create your space or add app_build_command: npm run build in your README's YAML and app_file: build/index.html in your README's YAML block.

Or follow this link: https://huggingface.co/new-space?sdk=static

Let's build!
  • 1 reply
·
clem 
posted an update 7 months ago
view post
Post
4062
Playing with Veo3 this morning. Share your prompt if you want me to create videos for you (bonus point if they funnily reference HF/open-source). These videos are "a cat on the moon rapping "I love Hugging Face""!
·