1 6 24

Ofer Hasson

hassonofer

hassonofer

AI & ML interests

Computer Vision

Recent Activity

updated a model 2 days ago

birder-project/rope_i_vit_l14_nf_swiglu_c1_eva02-clip

published a model 2 days ago

birder-project/rope_i_vit_l14_nf_swiglu_c1_eva02-clip

upvoted a collection 4 days ago

Perception Encoder

View all activity

Organizations

updated a model 2 days ago

birder-project/rope_i_vit_l14_nf_swiglu_c1_eva02-clip

Image Feature Extraction • Updated 2 days ago • 43

published a model 2 days ago

birder-project/rope_i_vit_l14_nf_swiglu_c1_eva02-clip

Image Feature Extraction • Updated 2 days ago • 43

upvoted a collection 4 days ago

Perception Encoder

Collection

OpenCLIP (PE Core image + text) and timm PE Core, Spatial, Lang (ViT only) weights. NOTE: These weights do not work with original modeling code. • 19 items • Updated Sep 19, 2025 • 8

reacted to Anran-MLLM's post with 👍 4 days ago

Post

3541

🚀 Introducing PerceptionDLM — the first multimodal diffusion LLM for parallel region perception!

Most MLLMs are autoregressive, so captioning N regions costs N sequential passes. PerceptionDLM instead describes ALL masked regions in a single denoising process. 🧩

✨ Highlights
• ⚡ Up to 3.4× faster on dense multi-region captioning, with stable per-image latency
• 🏆 PerceptionDLM-Base beats LLaDA-V on 15/16 multimodal benchmarks (new SOTA among open diffusion VLMs)
• 📊 New benchmark: ParaDLC-Bench — jointly evaluates caption quality AND inference efficiency
• 🔓 Code, models & benchmark all open-sourced

🤖 Models
MSALab/PerceptionDLM-Base
MSALab/PerceptionDLM

📊 Benchmark
MSALab/ParaDLC-Bench

📄 Paper: PerceptionDLM: Parallel Region Perception with Multimodal Diffusion Language Models (2606.19534)
💻 Code: https://github.com/MSALab-PKU/PerceptionDLM

Diffusion LLMs aren't just for text — they unlock efficient, parallel visual perception. 👁️✨

#multimodal #diffusion #VLM #perception

liked a model 4 days ago

MiniMaxAI/MiniMax-M3

Image-Text-to-Text • 427B • Updated 1 day ago • 143k • • 1.23k

updated a Space 6 days ago

The Birder Leaderboard

🏆

Explore and compare Birder model performance across benchmarks

liked a model 6 days ago

zai-org/GLM-5.2

Text Generation • 753B • Updated 2 days ago • 57.2k • • 2.36k

updated a model 6 days ago

birder-project/rope_vit5_reg4_b16_nepa-bio

Image Feature Extraction • Updated 6 days ago • 46

published a model 6 days ago

birder-project/rope_vit5_reg4_b16_nepa-bio

Image Feature Extraction • Updated 6 days ago • 46

liked a model 12 days ago

facebook/sam-vit-base

Mask Generation • 93.7M • Updated Jan 11, 2024 • 760k • 170

updated a model 12 days ago

birder-project/vit_sam_b16_sam-sa1b

Image Feature Extraction • Updated 12 days ago • 36

published a model 12 days ago

birder-project/vit_sam_b16_sam-sa1b

Image Feature Extraction • Updated 12 days ago • 36

updated a model 12 days ago

birder-project/vit_so400m_p14_ap_c1_siglip-v2-webli

Image Feature Extraction • Updated 12 days ago • 35

updated a model 16 days ago

birder-project/se_resnext_50_arabian-peninsula

Image Classification • Updated 16 days ago • 42

published a model 16 days ago

birder-project/se_resnext_50_arabian-peninsula

Image Classification • Updated 16 days ago • 42

liked a dataset 17 days ago

pixparse/cc12m-wds

Viewer • Updated Dec 15, 2023 • 11M • 24.1k • 42

reacted to danielhanchen's post with 👍 20 days ago

Post

9184

Gemma 4 12B can now run locally on just 8GB RAM via Dynamic GGUFs.

Google's new model, Gemma 4 12B Unified supports image, audio and 256K context.
You can run and train the model via Unsloth Studio.

GGUF: unsloth/gemma-4-12b-it-GGUF
Guide: https://unsloth.ai/docs/models/gemma-4