Papers - Fine-tuning
updated
Unleashing the Power of Pre-trained Language Models for Offline
Reinforcement Learning
Paper
• 2310.20587
• Published • 18
SELF: Language-Driven Self-Evolution for Large Language Model
Paper
• 2310.00533
• Published • 2
QLoRA: Efficient Finetuning of Quantized LLMs
Paper
• 2305.14314
• Published • 60
QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models
Paper
• 2309.14717
• Published • 46
Table-GPT: Table-tuned GPT for Diverse Table Tasks
Paper
• 2310.09263
• Published • 40
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language
Models
Paper
• 2401.01335
• Published • 68
LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement
Paper
• 2403.15042
• Published • 27
Toolformer: Language Models Can Teach Themselves to Use Tools
Paper
• 2302.04761
• Published • 12
The Unreasonable Ineffectiveness of the Deeper Layers
Paper
• 2403.17887
• Published • 82
InternLM2 Technical Report
Paper
• 2403.17297
• Published • 34
LIMA: Less Is More for Alignment
Paper
• 2305.11206
• Published • 27
Direct Preference Optimization: Your Language Model is Secretly a Reward
Model
Paper
• 2305.18290
• Published • 64
sDPO: Don't Use Your Data All at Once
Paper
• 2403.19270
• Published • 41
Deep reinforcement learning from human preferences
Paper
• 1706.03741
• Published • 4
Fine-tuning Language Models for Factuality
Paper
• 2311.08401
• Published • 30
An Emulator for Fine-Tuning Large Language Models using Small Language
Models
Paper
• 2310.12962
• Published • 13
Gecko: Versatile Text Embeddings Distilled from Large Language Models
Paper
• 2403.20327
• Published • 47
Model Stock: All we need is just a few fine-tuned models
Paper
• 2403.19522
• Published • 14
ReFT: Representation Finetuning for Language Models
Paper
• 2404.03592
• Published • 101
UltraFeedback: Boosting Language Models with High-quality Feedback
Paper
• 2310.01377
• Published • 5
RL for Consistency Models: Faster Reward Guided Text-to-Image Generation
Paper
• 2404.03673
• Published • 15
Stream of Search (SoS): Learning to Search in Language
Paper
• 2404.03683
• Published • 30
CantTalkAboutThis: Aligning Language Models to Stay on Topic in
Dialogues
Paper
• 2404.03820
• Published • 25
ORPO: Monolithic Preference Optimization without Reference Model
Paper
• 2403.07691
• Published • 72
Learn Your Reference Model for Real Good Alignment
Paper
• 2404.09656
• Published • 90
Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity
Tracking
Paper
• 2402.14811
• Published • 4
Comprehensive Survey of Model Compression and Speed up for Vision
Transformers
Paper
• 2404.10407
• Published • 1
OpenBezoar: Small, Cost-Effective and Open Models Trained on Mixes of
Instruction Data
Paper
• 2404.12195
• Published • 12
Scaling Down to Scale Up: A Guide to Parameter-Efficient Fine-Tuning
Paper
• 2303.15647
• Published • 4
Hyper-X: A Unified Hypernetwork for Multi-Task Multilingual Transfer
Paper
• 2205.12148
• Published • 2
Beyond the Turn-Based Game: Enabling Real-Time Conversations with Duplex
Models
Paper
• 2406.15718
• Published • 14
In-context Vectors: Making In Context Learning More Effective and
Controllable Through Latent Space Steering
Paper
• 2311.06668
• Published • 5
SpreadsheetLLM: Encoding Spreadsheets for Large Language Models
Paper
• 2407.09025
• Published • 140
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models
Paper
• 2403.13372
• Published • 182
Adapting While Learning: Grounding LLMs for Scientific Problems with
Intelligent Tool Usage Adaptation
Paper
• 2411.00412
• Published • 10
CLEAR: Character Unlearning in Textual and Visual Modalities
Paper
• 2410.18057
• Published • 209
LoRA vs Full Fine-tuning: An Illusion of Equivalence
Paper
• 2410.21228
• Published • 3
Cut Your Losses in Large-Vocabulary Language Models
Paper
• 2411.09009
• Published • 49
LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models
Paper
• 2411.09595
• Published • 77
No More Adam: Learning Rate Scaling at Initialization is All You Need
Paper
• 2412.11768
• Published • 43
Group Robust Preference Optimization in Reward-free RLHF
Paper
• 2405.20304
• Published • 1