Domain-Specific Data Synthesis for LLMs via Minimal Sufficient Representation Learning Paper • 2605.30039 • Published about 1 month ago • 20
ML-Embed: Inclusive and Efficient Embeddings for a Multilingual World Paper • 2605.15081 • Published May 14 • 11
Beyond Retrieval: A Multitask Benchmark and Model for Code Search Paper • 2605.04615 • Published May 6 • 24
QuitoBench: A High-Quality Open Time Series Forecasting Benchmark Paper • 2603.26017 • Published Mar 27 • 31
F2LLM-v2: Inclusive, Performant, and Efficient Embeddings for a Multilingual World Paper • 2603.19223 • Published Mar 19 • 36
CodeFuse-CR-Bench: A Comprehensiveness-aware Benchmark for End-to-End Code Review Evaluation in Python Projects Paper • 2509.14856 • Published Sep 18, 2025 • 2
C2LLM Technical Report: A New Frontier in Code Retrieval via Adaptive Cross-Attention Pooling Paper • 2512.21332 • Published Dec 24, 2025 • 17
D2LLM: Decomposed and Distilled Large Language Models for Semantic Search Paper • 2406.17262 • Published Jun 25, 2024 • 6
GALLa: Graph Aligned Large Language Models for Improved Source Code Understanding Paper • 2409.04183 • Published Sep 6, 2024 • 3
F2LLM Technical Report: Matching SOTA Embedding Performance with 6 Million Open-Source Data Paper • 2510.02294 • Published Oct 2, 2025 • 48
Unifying the Perspectives of NLP and Software Engineering: A Survey on Language Models for Code Paper • 2311.07989 • Published Nov 14, 2023 • 26
E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning Paper • 2409.06679 • Published Sep 10, 2024 • 5