Nurmukhamed 's Collections
Memory Augmented Language Models through Mixture of Word Experts
Paper
• 2311.10768
• Published • 19
System 2 Attention (is something you might need too)
Paper
• 2311.11829
• Published • 43
Fine-tuning Language Models for Factuality
Paper
• 2311.08401
• Published • 30
Orca 2: Teaching Small Language Models How to Reason
Paper
• 2311.11045
• Published • 77
Beyond Surface: Probing LLaMA Across Scales and Layers
Paper
• 2312.04333
• Published • 19
Beyond Human Data: Scaling Self-Training for Problem-Solving with
Language Models
Paper
• 2312.06585
• Published • 29
Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak
Supervision
Paper
• 2312.09390
• Published • 33
TinyGSM: achieving >80% on GSM8k with small language models
Paper
• 2312.09241
• Published • 39
Gemini: A Family of Highly Capable Multimodal Models
Paper
• 2312.11805
• Published • 49
Secrets of RLHF in Large Language Models Part II: Reward Modeling
Paper
• 2401.06080
• Published • 27
Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon
Paper
• 2401.03462
• Published • 29
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty
Paper
• 2401.15077
• Published • 20
A Comprehensive Study of Knowledge Editing for Large Language Models
Paper
• 2401.01286
• Published • 21
H2O-Danube-1.8B Technical Report
Paper
• 2401.16818
• Published • 18
Weaver: Foundation Models for Creative Writing
Paper
• 2401.17268
• Published • 45
Paper
• 2401.04088
• Published • 160
Understanding LLMs: A Comprehensive Overview from Training to Inference
Paper
• 2401.02038
• Published • 65
Rephrasing the Web: A Recipe for Compute and Data-Efficient Language
Modeling
Paper
• 2401.16380
• Published • 53
Self-Rewarding Language Models
Paper
• 2401.10020
• Published • 153
TinyLlama: An Open-Source Small Language Model
Paper
• 2401.02385
• Published • 95
Improving Text Embeddings with Large Language Models
Paper
• 2401.00368
• Published • 82
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language
Models
Paper
• 2401.01335
• Published • 69
The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context
Learning
Paper
• 2312.01552
• Published • 31
LLM Augmented LLMs: Expanding Capabilities through Composition
Paper
• 2401.02412
• Published • 38
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper
• 2408.11796
• Published • 60
Building and better understanding vision-language models: insights and
future directions
Paper
• 2408.12637
• Published • 133