Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Lutalica's picture
7 4

Lutalica

Lutalica
21world's profile picture bowiehsu's profile picture
·
https://github.com/RewindL
  • RewindL

AI & ML interests

Multimodal LLMs, LLM Reasoning, Reinforcement Learning, Efficient Inference

Organizations

alibaba-inc's profile picture

commented 2 papers 2 months ago

ReMix: Reinforcement routing for mixtures of LoRAs in LLM finetuning

Paper • 2603.10160 • Published Mar 10 • 26 •
4

MASQuant: Modality-Aware Smoothing Quantization for Multimodal Large Language Models

Paper • 2603.04800 • Published Mar 5 • 25 •
8
commented a paper 3 months ago

D-CORE: Incentivizing Task Decomposition in Large Reasoning Models for Complex Tool Use

Paper • 2602.02160 • Published Feb 2 • 14 •
9
commented a paper 7 months ago

One-Token Rollout: Guiding Supervised Fine-Tuning of LLMs with Policy Gradient

Paper • 2509.26313 • Published Sep 30, 2025 • 5 •
4
commented 2 papers about 1 year ago

Reinforcement Learning for Reasoning in Large Language Models with One Training Example

Paper • 2504.20571 • Published Apr 29, 2025 • 99 •
15

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

Paper • 2504.13837 • Published Apr 18, 2025 • 141 •
21
New activity in monology/pile-uncopyrighted over 1 year ago

Format issue when loading dataset

1
#1 opened over 2 years ago by
antoine314
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs