Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
Yann-CV 
posted an update 6 days ago
Post
181
🔦 Goldener feature: Semantics aware sampling for better models

Goldener provides smart data sampling out of the box by combining 2 different GoldDoers (classes orchestrating data actions):
1️⃣ GoldDescriptor: Unlock data semantics access via embeddings computed from foundation models.
2️⃣ GoldSelector: Select samples automatically by digging into data semantics with coreset algorithms

Both the foundation model and coreset algorithm are fully customizable to achieve the selection goals from a few lines of Python code.

The result? Goldener can replace the usual random selection and help release better models, faster!

🔗 More details: https://huggingface.co/blog/Yann-CV/goldener-smart-sampling
🔨 Give it a try: pip install goldener
In this post