leonardlin 's Collections tuning
updated
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models
Paper
• 2309.12307
• Published
• 90
NEFTune: Noisy Embeddings Improve Instruction Finetuning
Paper
• 2310.05914
• Published
• 14
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective
Depth Up-Scaling
Paper
• 2312.15166
• Published
• 61
Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon
Paper
• 2401.03462
• Published
• 29
YaRN: Efficient Context Window Extension of Large Language Models
Paper
• 2309.00071
• Published
• 81
Blending Is All You Need: Cheaper, Better Alternative to
Trillion-Parameters LLM
Paper
• 2401.02994
• Published
• 52
A Mechanistic Understanding of Alignment Algorithms: A Case Study on DPO
and Toxicity
Paper
• 2401.01967
• Published
Zephyr: Direct Distillation of LM Alignment
Paper
• 2310.16944
• Published
• 123
Direct Preference Optimization: Your Language Model is Secretly a Reward
Model
Paper
• 2305.18290
• Published
• 64
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Paper
• 2311.03285
• Published
• 31
What Makes Good Data for Alignment? A Comprehensive Study of Automatic
Data Selection in Instruction Tuning
Paper
• 2312.15685
• Published
• 16
Self-Rewarding Language Models
Paper
• 2401.10020
• Published
• 152
TOFU: A Task of Fictitious Unlearning for LLMs
Paper
• 2401.06121
• Published
• 20
Tuning LLMs with Contrastive Alignment Instructions for Machine
Translation in Unseen, Low-resource Languages
Paper
• 2401.05811
• Published
• 8
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language
Models
Paper
• 2401.01335
• Published
• 68
WARM: On the Benefits of Weight Averaged Reward Models
Paper
• 2401.12187
• Published
• 19
Learning Universal Predictors
Paper
• 2401.14953
• Published
• 22
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language
Modeling
Paper
• 2401.16380
• Published
• 51
Language Models can be Logical Solvers
Paper
• 2311.06158
• Published
• 20
ReFT: Reasoning with Reinforced Fine-Tuning
Paper
• 2401.08967
• Published
• 31
Continual Learning for Large Language Models: A Survey
Paper
• 2402.01364
• Published
• 1
Direct Language Model Alignment from Online AI Feedback
Paper
• 2402.04792
• Published
• 35
Vision Superalignment: Weak-to-Strong Generalization for Vision
Foundation Models
Paper
• 2402.03749
• Published
• 15
Suppressing Pink Elephants with Direct Principle Feedback
Paper
• 2402.07896
• Published
• 11
How to Train Data-Efficient LLMs
Paper
• 2402.09668
• Published
• 43
QuRating: Selecting High-Quality Data for Training Language Models
Paper
• 2402.09739
• Published
• 5
DoRA: Weight-Decomposed Low-Rank Adaptation
Paper
• 2402.09353
• Published
• 32
Smaug: Fixing Failure Modes of Preference Optimisation with DPO-Positive
Paper
• 2402.13228
• Published
• 3
FuseChat: Knowledge Fusion of Chat Models
Paper
• 2402.16107
• Published
• 39
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models
Paper
• 2403.13372
• Published
• 182
Evolutionary Optimization of Model Merging Recipes
Paper
• 2403.13187
• Published
• 58
LISA: Layerwise Importance Sampling for Memory-Efficient Large Language
Model Fine-Tuning
Paper
• 2403.17919
• Published
• 16
Direct Nash Optimization: Teaching Language Models to Self-Improve with
General Preferences
Paper
• 2404.03715
• Published
• 62
Insights into Alignment: Evaluating DPO and its Variants Across Multiple
Tasks
Paper
• 2404.14723
• Published
• 10
Instruction Tuning with Human Curriculum
Paper
• 2310.09518
• Published
• 3
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
Paper
• 2405.12130
• Published
• 50
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework
Paper
• 2405.11143
• Published
• 41
Self-Play Preference Optimization for Language Model Alignment
Paper
• 2405.00675
• Published
• 28
NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment
Paper
• 2405.01481
• Published
• 30
FLAME: Factuality-Aware Alignment for Large Language Models
Paper
• 2405.01525
• Published
• 29
WildChat: 1M ChatGPT Interaction Logs in the Wild
Paper
• 2405.01470
• Published
• 64
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report
Paper
• 2405.00732
• Published
• 122
RLHF Workflow: From Reward Modeling to Online RLHF
Paper
• 2405.07863
• Published
• 71
Understanding the performance gap between online and offline alignment
algorithms
Paper
• 2405.08448
• Published
• 18
SimPO: Simple Preference Optimization with a Reference-Free Reward
Paper
• 2405.14734
• Published
• 12
Self-Improving Robust Preference Optimization
Paper
• 2406.01660
• Published
• 20
The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context
Learning
Paper
• 2312.01552
• Published
• 32
Creativity Has Left the Chat: The Price of Debiasing Language Models
Paper
• 2406.05587
• Published
• 1
Sailor: Open Language Models for South-East Asia
Paper
• 2404.03608
• Published
• 21
Continued Pretraining for Better Zero- and Few-Shot Promptability
Paper
• 2210.10258
• Published
Are You Sure? Rank Them Again: Repeated Ranking For Better Preference
Datasets
Paper
• 2405.18952
• Published
• 10