Peter Szemraj's picture

Peter Szemraj PRO

pszemraj

·

https://pszemraj.carrd.co/

AI & ML interests

metallic intuition

Recent Activity

upvoted an article about 4 hours ago

Introducing the FFASR Leaderboard: Benchmarking ASR in the Real World

upvoted an article about 4 hours ago

Which tokens does a hybrid model predict better?

commentedon an article about 5 hours ago

Which tokens does a hybrid model predict better?

View all activity

Organizations

upvoted 2 articles about 4 hours ago

Article

Introducing the FFASR Leaderboard: Benchmarking ASR in the Real World

+3

daniel-treble, whojavumusic, alessia-treble, georg-goetz, bezzam

•

2 days ago

• 4

Article

Which tokens does a hybrid model predict better?

allenai

•

about 18 hours ago

• 2

commented on Which tokens does a hybrid model predict better? about 5 hours ago

Cool work, I've been quite excited about AllenAI's take/improved hybrid arch. Question for you though:

The one genuinely matched-data comparison in the paper is the 1B ladder (transformer / hybrid / pure-RNN, identical mix), which you use for the 6 filtered-loss eval - but only as aggregate loss, not the POS/bracket/copy decomposition. Since that's forward-passes-only on released checkpoints, have you run (or can you) the same tag-stratified analysis on those models? It'd help show whether the content-word / open-close / copy structure survives when data is actually held constant (vs ~7b case).

Curious if you've looked at this internally as well

published a dataset 3 days ago

BEE-spoke-data/synthsumm-open-v1.0

Viewer • Updated Dec 29, 2025 • 5.01k • 10

liked a model 3 days ago

google/gemma-4-12B-it

Any-to-Any • 12B • Updated 22 days ago • 2.31M • 1.18k

liked 2 models 6 days ago

hustvl/PixelHacker

Updated May 20, 2025 • 12

hustvl/Moebius

Updated 4 days ago • 51

upvoted a paper 6 days ago

Freeing the Law with LOCUS: A Local Ordinance Corpus for the United States

Paper • 2606.19334 • Published 9 days ago • 7

upvoted a collection 10 days ago

SWE-FastContext

A family of code-search models powering the Explore subagent for coding agents. • 3 items • Updated 9 days ago • 15

liked 2 models 12 days ago

mlx-community/gemma-4-E4B-it-qat-6bit

Image-Text-to-Text • 2B • Updated 21 days ago • 360 • 1

mlx-community/gemma-4-12B-it-qat-6bit

Image-Text-to-Text • 3B • Updated 20 days ago • 1.58k • 1

New activity in EssentialAI/rnj-1.5-instruct 12 days ago

any GGUF ?

#1 opened 14 days ago by

published a model 12 days ago

pszemraj/rnj-1.5-instruct-GGUF

8B • Updated 12 days ago • 221

updated a model 12 days ago

pszemraj/rnj-1.5-instruct-GGUF

8B • Updated 12 days ago • 221

upvoted a collection 13 days ago

Gemma 4 QAT Mobile

4 items • Updated 21 days ago • 44

updated a model 13 days ago

pszemraj/rnj-1.5-instruct

Text Generation • 8B • Updated 13 days ago • 46

published a model 13 days ago

pszemraj/rnj-1.5-instruct

Text Generation • 8B • Updated 13 days ago • 46

upvoted 3 articles 13 days ago

Article

MTEB Leaderboard: From a slow demo to feature-rich leaderboard

Samoed

•

14 days ago

• 22

Article

Unlocking asynchronicity in continuous batching

+1

ror, pcuenq, ariG23498

•

May 14

• 61

Article

Continuous batching from first principles

+1

ror, ArthurZ, mcpotato

•

Nov 25, 2025

• 411