Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
7
Jonathan von Rad
jonny-vr
Follow
0 followers
ยท
1 following
jonny-vr
AI & ML interests
LLM Compression & Mechanistic Interpretability
Recent Activity
new
activity
3 days ago
Qwen/Qwen3-32B:
Where is the Base Model?
new
activity
10 days ago
Harvard-DCML/boomerang-qwen3-4.9B:
Substantially lower accuracy on reasoning benchmarks such as GSM8K (1.5%) and MATH-500 (4.2%)
updated
a model
17 days ago
jonny-vr/mv-final-assignment
View all activity
Organizations
jonny-vr
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
Articles
New activity in
Qwen/Qwen3-32B
3 days ago
Where is the Base Model?
๐
โ
10
3
#34 opened 6 months ago by
jonny-vr
New activity in
Harvard-DCML/boomerang-qwen3-4.9B
10 days ago
Substantially lower accuracy on reasoning benchmarks such as GSM8K (1.5%) and MATH-500 (4.2%)
1
#1 opened 10 days ago by
jonny-vr
updated
a model
17 days ago
jonny-vr/mv-final-assignment
Updated
17 days ago
published
a model
17 days ago
jonny-vr/mv-final-assignment
Updated
17 days ago
New activity in
monology/pile-uncopyrighted
6 months ago
Could you please implement train:1% feature? This way we don't have to download the entire dataset.
1
#12 opened 6 months ago by
jonny-vr
New activity in
Qwen/Qwen3-32B
6 months ago
Low Score on GSM8K on lm-eval-harness? (just 74.91)
2
#36 opened 6 months ago by
jonny-vr
New activity in
nvidia/NV-Embed-v2
6 months ago
TypeError: cannot unpack non-iterable NoneType object
๐
๐
8
5
#37 opened 11 months ago by
Pietroferr
New activity in
google/gemma-3-27b-pt
6 months ago
Model is a Memory Hog - 2xH100 80GB OOM??
1
#5 opened 6 months ago by
jonny-vr
New activity in
google/gemma-3-1b-pt
7 months ago
When evaluating Wiki2, I just get Loss: Nan, while with gemma-3-1b-it it works..
2
#8 opened 7 months ago by
jonny-vr