🔄 In a Training Loop

3 213 51

Joel Wang

joelhenwang

joelhenwang

AI & ML interests

None yet

Recent Activity

upvoted a paper about 12 hours ago

When Losses Align: Gradient-Based Composite Loss Weighting for Efficient Pretraining

upvoted a paper 2 days ago

RoPE-Aware Bit Allocation for KV-Cache Quantization

upvoted an article 2 days ago

Kog Laneformer 2B: The Latency-First Model Behind Kog Inference Engine

View all activity

Organizations

upvoted a paper about 12 hours ago

When Losses Align: Gradient-Based Composite Loss Weighting for Efficient Pretraining

Paper • 2605.07756 • Published May 8 • 1

upvoted a paper 2 days ago

RoPE-Aware Bit Allocation for KV-Cache Quantization

Paper • 2606.24033 • Published 6 days ago • 8

upvoted an article 2 days ago

Article

Kog Laneformer 2B: The Latency-First Model Behind Kog Inference Engine

kogai

•

4 days ago

• 27

upvoted 17 papers 5 days ago

Q-Delta: Beyond Key-Value Associative State Evolution

Paper • 2606.08804 • Published 22 days ago • 1

Comparing Linear Probes with Mahalanobis Cosine Similarity

Paper • 2606.19603 • Published 12 days ago • 3

Demystifying Training-Time Augmentation for Data-Constrained Language Model Pretraining

Paper • 2606.16246 • Published 10 days ago • 4

A Verifiable Search Is Not a Learnable Chain-of-Thought

Paper • 2606.21884 • Published 9 days ago • 3

Tmax: A simple recipe for terminal agents

Paper • 2606.23321 • Published 7 days ago • 12

Learning from Your Own Mistakes: Constructing Learnable Micro-Reflective Trajectories for Self-Distillation

Paper • 2606.18844 • Published 12 days ago • 18

KaLM-Reranker-V1: Fast but Not Late Interaction for Compressed Document Reranking

Paper • 2606.22807 • Published 7 days ago • 47

When, Where, and How: Adaptive Binning for Tabular Self-Supervised Learning

Paper • 2606.19827 • Published 11 days ago • 3

FastMix: Fast Data Mixture Optimization via Gradient Descent

Paper • 2606.14971 • Published 17 days ago • 3

Tapered Language Models

Paper • 2606.23670 • Published 7 days ago • 8

Causal Discovery in the Era of Agents

Paper • 2606.23608 • Published 7 days ago • 7

Connect the Dots: Training LLMs for Long-Lifecycle Agents with Cross-Domain Generalization Via Reinforcement Learning

Paper • 2606.20002 • Published 11 days ago • 8

Training Open Models for Agentic Phone Use

Paper • 2606.23049 • Published 7 days ago • 15

Self-Compacting Language Model Agents

Paper • 2606.23525 • Published 7 days ago • 17

Joel Wang

AI & ML interests

Recent Activity

Organizations

joelhenwang's activity

Kog Laneformer 2B: The Latency-First Model Behind Kog Inference Engine