Do not copy and paste! Rewriting strategies for code retrieval Paper • 2605.08299 • Published 6 days ago • 8
Do not copy and paste! Rewriting strategies for code retrieval Paper • 2605.08299 • Published 6 days ago • 8
view article Article License to Call: Introducing Transformers Agents 2.0 +1 m-ric, lysandre, pcuenq • May 13, 2024 • 137
modularStarEncoder/ModularStarEncoder Feature Extraction • 1B • Updated May 22, 2025 • 35 • 2
Running 3.84k The Ultra-Scale Playbook 🌌 3.84k The ultimate guide to training LLM on large GPU Clusters
Large Language Models Do NOT Really Know What They Don't Know Paper • 2510.09033 • Published Oct 10, 2025 • 17
One Model to Train them All: Hierarchical Self-Distillation for Enhanced Early Layer Embeddings Paper • 2503.03008 • Published Mar 4, 2025 • 1
modularStarEncoder/ModularStarEncoder-finetuned-9 Feature Extraction • 0.3B • Updated May 21, 2025 • 20 • 1
modularStarEncoder/ModularStarEncoder-finetuned Feature Extraction • 1B • Updated May 21, 2025 • 57 • 1
Mixture-of-Recursions: Learning Dynamic Recursive Depths for Adaptive Token-Level Computation Paper • 2507.10524 • Published Jul 14, 2025 • 73
modularStarEncoder/ModularStarEncoder Feature Extraction • 1B • Updated May 22, 2025 • 35 • 2
modularStarEncoder/ModularStarEncoder-finetuned-4 Feature Extraction • 0.2B • Updated May 21, 2025 • 21
modularStarEncoder/ModularStarEncoder-finetuned-9 Feature Extraction • 0.3B • Updated May 21, 2025 • 20 • 1
modularStarEncoder/ModularStarEncoder-finetuned-18 Feature Extraction • 0.6B • Updated May 21, 2025 • 24
modularStarEncoder/ModularStarEncoder-finetuned-27 Feature Extraction • 0.8B • Updated May 21, 2025 • 55
modularStarEncoder/ModularStarEncoder-finetuned Feature Extraction • 1B • Updated May 21, 2025 • 57 • 1
One Model to Train them All: Hierarchical Self-Distillation for Enhanced Early Layer Embeddings Paper • 2503.03008 • Published Mar 4, 2025 • 1