Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
1
abdo hamdy
abdohamdy12739
Follow
Fishtiks's profile picture
1 follower
ยท
10 following
AI & ML interests
None yet
Recent Activity
reacted
to
Yann-CV
's
post
with ๐ฅ
12 days ago
๐ Introducing Goldener: The Python Data Orchestrator for more efficient ML Machine Learning workflows often rely on randomness: selecting/splitting data for training, batching it variably, and monitoring real-world performance. Nowadays, foundation models give access to the semantics of data. Goldener leverages this semantic to make the entire ML lifecycle more efficient! ๐ Check it out: https://github.com/goldener-data/goldener ๐จ Give it a try: pip install goldener
reacted
to
ManniX-ITA
's
post
with ๐ฅ
12 days ago
Two custom releases โ both unusual takes on common problems, on a single RTX 3090 + a vast.ai pod. ๐น ManniX-ITA/Qwen3.5-27B-Omnimerge-v2 3-source weight-space merge over Qwen3.5-27B combining OBIM-lite magnitude masking + DAREx rescaling + EMR election (sign from consensus, amplitude from max-abs across sources). GPU-accelerated, ~35ร over CPU. Sources: Claude-4.6-Opus-distill (0.40), Esper3.1 code (0.35), Gemini-3.1-Pro-distill (0.25). density 0.53, DAREx q 0.75. Q6_K vs best source: โข GPQA Diamond: 53.03 โ 69.19 (+16.16 pp) โข MBPP pass@1: 71.20 โ 74.60 (+3.40) โข HumanEval pass@1: 76.22 โ 79.27 (+3.05) vs Omnimerge v1 (vanilla DARE-TIES): +8.08 pp GPQA, +2.80 MBPP. Amplitude-from-max + sign-from-consensus is what unlocked the GPQA jump. ๐น ManniX-ITA/gemma-4-A4B-98e-v3-it Gemma 4 26B-A4B pruned 128 โ 98 experts/layer (-23.4% MoE capacity, -5.2B params), zero GPQA degradation. GPQA Diamond: โข 128e reference: 75.25% โข 98e v3 (this): 75.25% โ +0.00 pp despite -23.4% capacity, -5.2B params โข 109e v3 (older): 71.72% โ -3.53 pp The win over 109e v3 came from changing the importance map: aggregate per-expert contribution across math/logic/code/science/creative via 128-token teacher-force, instead of GPQA-specific per-question top-16 (which overfitted). Result: more experts dropped, quality preserved. Findings worth flagging: โข Experts NOT topic-specialized โ 28/32 overlap math/creative top-32. โข Expert weight cosine โ 0.05 max โ merging destroys the model. Dropping is the only viable structural compression here. โข Contribution Gini โ 0.38 โ ~75 experts/layer carry 80% of signal. Eval: lm-eval gpqa_diamond_cot_zeroshot, llama-server --reasoning-format deepseek --reasoning-budget 8192, Gemma 4 official sampling. Feedback welcome.
liked
a model
6 months ago
Qwen/Qwen3-VL-2B-Instruct-FP8
View all activity
Organizations
abdohamdy12739
's activity
All
Models
Datasets
Spaces
Buckets
Papers
Collections
Community
Posts
Upvotes
Likes
Articles
liked
a model
6 months ago
Qwen/Qwen3-VL-2B-Instruct-FP8
Image-Text-to-Text
โข
2B
โข
Updated
Oct 20, 2025
โข
291k
โข
38