Harvesting Textual and Structured Data from the HAL Publication Repository Paper • 2407.20595 • Published Jul 30, 2024 • 22
LLM Reasoning for Machine Translation: Synthetic Data Generation over Thinking Tokens Paper • 2510.11919 • Published Oct 13, 2025 • 6
Gaperon: A Peppered English-French Generative Language Model Suite Paper • 2510.25771 • Published Oct 29, 2025 • 17
Disentangling meaning from language in LLM-based machine translation Paper • 2602.04613 • Published Feb 4 • 1
A Causal Language Modeling Detour Improves Encoder Continued Pretraining Paper • 2605.12438 • Published May 12 • 7
Language-Switching Triggers Take a Latent Detour Through Language Models Paper • 2605.18646 • Published May 18 • 4
Where Does Authorship Signal Emerge in Encoder-Based Language Models? Paper • 2605.19908 • Published May 19 • 5
Triggers Hijack Language Circuits: A Mechanistic Analysis of Backdoor Behaviors in Large Language Models Paper • 2602.10382 • Published Feb 12 • 2
CamemBERT 2.0: A Smarter French Language Model Aged to Perfection Paper • 2411.08868 • Published Nov 13, 2024 • 13