Peter Szemraj PRO

pszemraj

https://pszemraj.carrd.co/

AI & ML interests

metallic intuition

Recent Activity

upvoted a paper about 10 hours ago

Rethinking State Tracking in Recurrent Models Through Error Control Dynamics

upvoted a paper about 10 hours ago

Can Muon Fine-tune Adam-Pretrained Models?

upvoted a paper about 11 hours ago

Efficient Pre-Training with Token Superposition

View all activity

Organizations

upvoted 2 papers about 10 hours ago

Rethinking State Tracking in Recurrent Models Through Error Control Dynamics

Paper • 2605.07755 • Published 6 days ago • 21

Can Muon Fine-tune Adam-Pretrained Models?

Paper • 2605.10468 • Published 3 days ago • 4

upvoted a paper about 11 hours ago

Efficient Pre-Training with Token Superposition

Paper • 2605.06546 • Published 7 days ago • 32

upvoted an article 2 days ago

Article

Adding Benchmaxxer Repellant to the Open ASR Leaderboard

bezzam, Steveeeeeeen, eustlb, SBruccoleriAppen, jmss-appen, c-e-ford-appen, wgb14, YukaiHuang, like2026, logicbean, ally-lxl

•

8 days ago

• 15

upvoted a paper 2 days ago

Investigating Efficiently Extending Transformers for Long Input Summarization

Paper • 2208.04347 • Published Aug 8, 2022 • 1

upvoted an article 3 days ago

Article

EMO: Pretraining mixture of experts for emergent modularity

allenai

•

6 days ago

• 31

upvoted an article 9 days ago

Article

Multimodal Embedding & Reranker Models with Sentence Transformers

tomaarsen

•

Apr 9

• 59

upvoted a collection 11 days ago

OlmPool

Collection

Collection of models from the paper "Cracks in the Foundation: Seemingly Minor Architectural Choices Impact Long Context Extension". • 26 items • Updated 14 days ago • 3

upvoted 2 papers 12 days ago

A Survey on LLM-based Conversational User Simulation

Paper • 2604.24977 • Published 17 days ago • 8

Efficient Training on Multiple Consumer GPUs with RoundPipe

Paper • 2604.27085 • Published 15 days ago • 40

upvoted a paper 14 days ago

Why Fine-Tuning Encourages Hallucinations and How to Fix It

Paper • 2604.15574 • Published 28 days ago • 23

upvoted a collection 14 days ago

Olmo 3.1

Collection

The latest members of the Olmo 3 family: another 3 weeks of RL for 32B Think, the 32B Instruct model, large post-training research datasets... • 9 items • Updated Dec 23, 2025 • 51

upvoted an article 14 days ago

Article

Granite 4.1 LLMs: How They’re Built

ibm-granite

•

15 days ago

• 68

upvoted a paper 14 days ago

Programming with Data: Test-Driven Data Engineering for Self-Improving LLMs from Raw Corpora

Paper • 2604.24819 • Published 17 days ago • 88

upvoted a collection 15 days ago

Laguna XS.2

Collection

Designed for agentic coding and long-horizon work on a local machine. Apache 2.0. • 5 items • Updated 7 days ago • 20

upvoted a collection 17 days ago

Parakeet ASR

Collection

NeMo Parakeet ASR Models attain strong speech recognition accuracy while being efficient for inference. Available in CTC and RNN-Transducer variants. • 16 items • Updated 5 days ago • 72

upvoted 4 papers 27 days ago

Multi-User Large Language Model Agents

Paper • 2604.08567 • Published Mar 19 • 27

BERT-as-a-Judge: A Robust Alternative to Lexical Methods for Efficient Reference-Based LLM Evaluation

Paper • 2604.09497 • Published Apr 10 • 29

Dive into Claude Code: The Design Space of Today's and Future AI Agent Systems

Paper • 2604.14228 • Published about 1 month ago • 25

Cross-Tokenizer LLM Distillation through a Byte-Level Interface

Paper • 2604.07466 • Published Apr 13 • 6

Peter Szemraj PRO

AI & ML interests

Recent Activity

Organizations

pszemraj's activity

Adding Benchmaxxer Repellant to the Open ASR Leaderboard

EMO: Pretraining mixture of experts for emergent modularity

Multimodal Embedding & Reranker Models with Sentence Transformers

Granite 4.1 LLMs: How They’re Built