AI & ML interests

We aim to unify the schema across many different biomedical NLP resources.

Recent Activity

Nymbo 
posted an update about 1 hour ago
view post
Post
4
We should really have a release date slider on the /models page. Tired of "trending/most downloaded" being the best way to sort and still seeing models from 2023 on the first page just because they're embedded in enterprise pipelines and get downloaded repeatedly. "Recently Created/Recently Updated" don't solve the discovery problem considering the amount of noise to sift through.

Slight caveat: Trending actually does have some recency bias, but it's not strong/precise enough.
MaziyarPanahi 
posted an update 12 days ago
view post
Post
4303
DNA, mRNA, proteins, AI. I spent the last year going deep into computational biology as an ML engineer. This is Part I of what I found. 🧬

In 2024, AlphaFold won the Nobel Prize in Chemistry.

By 2026, the open-source community had built alternatives that outperform it.

That's the story I find most interesting about protein AI right now. Not just the science (which is incredible), but the speed at which open-source caught up. Multiple teams, independently, reproduced and then exceeded AlphaFold 3's accuracy with permissive licenses. The field went from prediction to generation: we're not just modeling known proteins anymore, we're designing new ones.

I spent months mapping this landscape for ML engineers. What the architectures actually are (spoiler: transformers and diffusion models), which tools to use for what, and which ones you can actually ship commercially.

New post on the Hugging Face blog: https://huggingface.co/blog/MaziyarPanahi/protein-ai-landscape

Hope you all enjoy! 🤗
  • 2 replies
·
albertvillanova 
posted an update 17 days ago
view post
Post
1917
🚀 TRL v0.29.0 introduces trl-training: an agent-native training skill.

This makes the TRL CLI a structured, agent-readable capability, allowing AI agents to reliably execute training workflows such as:
- Supervised Fine-Tuning (SFT)
- Direct Preference Optimization (DPO)
- Group Relative Policy Optimization (GRPO)

We’re excited to see what the community builds on top of this.

If you’re working on AI agents, alignment research, or scalable RL training infrastructure: give TRL v0.29.0 a try! 🤗

The future of ML tooling is agent-native.
🔗 https://github.com/huggingface/trl/releases/tag/v0.29.0
Tonic 
posted an update 22 days ago
view post
Post
3243
🤔 Who would win ?

- a fully subsidized ai lab
OR
- 3 random students named
kurakurai
?

demo : Tonic/fr-on-device

if you like it give the demo a little star and send a shoutout to : @MaxLSB @jddqd and @GAD-cell for absolutely obliterating the pareto frontier of the french language understanding .
·
Tonic 
posted an update 26 days ago
view post
Post
3254
🙋🏻‍♂️hello my lovelies ,

it is with great pleasure i present to you my working one-click deploy 16GB ram completely free huggingface spaces deployment.

repo : Tonic/hugging-claw (use git clone to inspect)
literally the one-click link : Tonic/hugging-claw

you can also run it locally and see for yourself :

docker run -it -p 7860:7860 --platform=linux/amd64 \
-e HF_TOKEN="YOUR_VALUE_HERE" \
-e OPENCLAW_GATEWAY_TRUSTED_PROXIES="YOUR_VALUE_HERE" \
-e OPENCLAW_GATEWAY_PASSWORD="YOUR_VALUE_HERE" \
-e OPENCLAW_CONTROL_UI_ALLOWED_ORIGINS="YOUR_VALUE_HERE" \
registry.hf.space/tonic-hugging-claw:latest


just a few quite minor details i'll take care of but i wanted to share here first
  • 2 replies
·
albertvillanova 
posted an update about 1 month ago
view post
Post
1760
5 years already working in democratizing AI 🤗
Grateful to be part of such an awesome team making it happen every day.
MaziyarPanahi 
posted an update about 1 month ago
view post
Post
2248
Announcing: OpenMed Multilingual PII Detection Models

Today I am releasing 105 open-source models for Personally Identifiable Information (PII) detection in French, German, and Italian.

All Apache 2.0 licensed. Free for commercial use. No restrictions.

Performance:

- French: 97.97% F1 (top model)
- German: 97.61% F1 (top model)
- Italian: 97.28% F1 (top model)

All top-10 models per language exceed 96% F1

Coverage:

55+ PII entity types per language
Native ID formats: NSS (French), Sozialversicherungsnummer (German), Codice Fiscale (Italian)
Language-specific address, phone, and name patterns

Training Data:

French: 49,580 samples
German: 42,250 samples
Italian: 40,944 samples

Why Multilingual?

European healthcare operates in European languages. Clinical notes, patient records, and medical documents are generated in French, German, Italian, and other languages.

Effective de-identification requires:

- Native language understanding — not translation
- Local ID format recognition — each country has unique patterns
- Cultural context awareness — names, addresses, and formats vary
- These models deliver production-ready accuracy without requiring data to leave your infrastructure or language.

HIPAA & GDPR Compliance
Built for US and European privacy regulations:

- On-premise deployment: Process data locally with zero external dependencies
- Data sovereignty: No API calls, no cloud services, no cross-border transfers
- Air-gapped capable: Deploy in fully isolated environments if required
- Regulatory-grade accuracy: Supporting Expert Determination standards
- HIPAA and GDPR compliance across languages, without compliance gaps.

Use Cases
- Hospital EHR systems: Automated patient record de-identification
- Clinical research: Multilingual dataset preparation for studies
- Insurance companies: Claims processing across

https://huggingface.co/collections/OpenMed/multilingual-pii-and-de-identification
  • 1 reply
·
MaziyarPanahi 
posted an update about 1 month ago
view post
Post
1260
From Golden Gate Bridge to Broken JSON: Why Anthropic's SAE Steering Fails for Structured Output

I ran 6 experiments trying to use Anthropic's SAE steering for JSON generation.

- Base model: 86.8% valid JSON
- Steering only: 24.4%
- Fine-tuned: 96.6%
- FSM constrained: 100%

Steering is for semantics, not syntax.

https://huggingface.co/blog/MaziyarPanahi/sae-steering-json
MaziyarPanahi 
posted an update about 1 month ago
view post
Post
4003
🚨 Day 8/8: OpenMed Medical Reasoning Dataset Release - THE GRAND FINALE

Today I complete my 8-day release series with Medical-Reasoning-SFT-Mega.
The largest open medical reasoning dataset, combining 7 state-of-the-art AI models with fair distribution deduplication.

THE 7 SOURCE MODELS (Original Sample Counts):

1. Trinity-Mini: 810,284 samples
2. Qwen3-Next-80B: 604,249 samples
3. GPT-OSS-120B: 506,150 samples
4. Nemotron-Nano-30B: 444,544 samples
5. GLM-4.5-Air: 225,179 samples
6. MiniMax-M2.1: 204,773 samples
7. Baichuan-M3-235B: 124,520 samples

TOTAL BEFORE DEDUPLICATION: 2,919,699 samples

TOKEN COUNTS:
- Content tokens: 2.22 Billion
- Reasoning tokens: 1.56 Billion
- Total tokens: 3.78 Billion
- Samples with chain-of-thought: 100%

Quick Start:
from datasets import load_dataset
ds = load_dataset("OpenMed/Medical-Reasoning-SFT-Mega")


All datasets Apache 2.0 licensed. Free for research and commercial use.

Thank you for following OpenMed's release series. I can't wait to see what you build. 🔥

OpenMed/Medical-Reasoning-SFT-Mega
OpenMed/Medical-Reasoning-SFT-GPT-OSS-120B-V2
OpenMed/Medical-Reasoning-SFT-Trinity-Mini
OpenMed/Medical-Reasoning-SFT-GLM_4.5_Air
OpenMed/Medical-Reasoning-SFT-MiniMax-M2.1
OpenMed/Medical-Reasoning-SFT-Qwen3-Next-80B
OpenMed/Medical-Reasoning-SFT-Nemotron-Nano-30B
OpenMed/Medical-Reasoning-SFT-Baichuan-M3-235B



https://huggingface.co/collections/OpenMed/medical-datasets
·
mkurman 
posted an update 2 months ago
view post
Post
1115
# SYNTHLabs

Symbolic reasoning dataset creator & curator.

- Generator: create your own dataset from scratch
- Converter: use existing datasets (Hugging Face support) with reasoning traces to match our SYNTH style
- DEEP Mode: multiple agents working together in various configurations
- Multi-turn Support: pass one DEEP run, let the model ask follow-up questions, and choose who should respond using SYNTH-like thinking
- Firebase/Firestore: download your data directly as a JSONL file or upload it to your Firestore (production mode)
- Data Preview: Have data but unsure what's inside? Explore it directly!
- Verifier View: evaluate generated data, remove duplicates, assign ratings

and many more!

GitHub: https://github.com/mkurman/synthlabs

Dataset:
mkurman/medical-SYNTH-reasoning-preview
Nymbo 
posted an update 2 months ago
view post
Post
2447
Genuine recommendation: You should really use this AutoHotKey macro. Save the file as macros.ahk and run it. Before sending a prompt to your coding agent, press Ctrl + Alt + 1 and paste your prompt to any regular chatbot. Then send the output to the agent. This is the actual, boring, real way to "10x your prompting". Use the other number keys to avoid repeating yourself over and over again. I use this macro prolly 100-200 times per day. AutoHotKey isn't as new or hype as a lot of other workflows, but there's a reason it's still widely used after 17 years. Don't overcomplicate it.

; Requires AutoHotkey v1.1+

; All macros are `Ctrl + Alt + <variable>`

^!1::
    Send, Please help me more clearly articulate what I mean with this message (write the message in a code block):
return

^!2::
    Send, Please make the following changes:
return

^!3::
    Send, It seems you got cut off by the maximum response limit. Please continue by picking up where you left off.
return


In my experience the past few months, Ctrl + Alt + 1 works best with Instruct models (non-thinking). Reasoning causes some models to ramble and miss the point. I've just been using GPT-5.x for this.
MaziyarPanahi 
posted an update 2 months ago
view post
Post
3726
🎉 OpenMed 2025 Year in Review: 6 Months of Open Medical AI

I'm thrilled to share what the OpenMed community has accomplished since our July 2025 launch!

📊 The Numbers

29,700,000 downloads Thank you! 🙏

- 481 total models (475 medical NER models + 6 fine-tuned LLMs)
- 475 medical NER models in [OpenMed](
OpenMed
) organization
- 6 fine-tuned LLMs in [openmed-community](
openmed-community
)
- 551,800 PyPI downloads of the [openmed package](https://pypi.org/project/openmed/)
- 707 followers on HuggingFace (you!)
- 97 GitHub stars on the [toolkit repo](https://github.com/maziyarpanahi/openmed)

🏆 Top Models by Downloads

1. [OpenMed-NER-PharmaDetect-SuperClinical-434M]( OpenMed/OpenMed-NER-PharmaDetect-SuperClinical-434M) — 147,305 downloads
2. [OpenMed-NER-ChemicalDetect-ElectraMed-33M]( OpenMed/OpenMed-NER-ChemicalDetect-ElectraMed-33M) — 126,785 downloads
3. [OpenMed-NER-BloodCancerDetect-TinyMed-65M]( OpenMed/OpenMed-NER-BloodCancerDetect-TinyMed-65M) — 126,465 downloads

🔬 Model Categories

Our 481 models cover comprehensive medical domains:

- Disease Detection (~50 variants)
- Pharmaceutical Detection (~50 variants)
- Oncology Detection (~50 variants)
- Genomics/DNA Detection (~80 variants)
- Chemical Detection (~50 variants)
- Species/Organism Detection (~60 variants)
- Protein Detection (~50 variants)
- Pathology Detection (~50 variants)
- Blood Cancer Detection (~30 variants)
- Anatomy Detection (~40 variants)
- Zero-Shot NER (GLiNER-based)


OpenMed

OpenMed NER: Open-Source, Domain-Adapted State-of-the-Art Transformers for Biomedical NER Across 12 Public Datasets (2508.01630)
https://huggingface.co/collections/OpenMed/medical-and-clinical-ner
https://huggingface.co/collections/OpenMed/zeroshot-medical-and-clinical-ner
OpenMed/Medical-Reasoning-SFT-GPT-OSS-120B
  • 1 reply
·
Nymbo 
posted an update 3 months ago
view post
Post
2791
🚨 New tool for the Nymbo/Tools MCP server: The new Agent_Skills tool provides full support for Agent Skills (Claude Skills but open-source).

How it works: The tool exposes the standard discover/info/resources/validate actions. Skills live in /Skills under the same File_System root, and any bundled scripts run through Shell_Command, no new infrastructure required.

Agent_Skills(action="discover")  # List all available skills
Agent_Skills(action="info", skill_name="music-downloader")  # Full SKILL.md
Agent_Skills(action="resources", skill_name="music-downloader")  # Scripts, refs, assets


I've included a music-downloader skill as a working demo, it wraps yt-dlp for YouTube/SoundCloud audio extraction.

Caveat: On HF Spaces, Shell_Command works for most tasks, but some operations (like YouTube downloads) are restricted due to the container environment. For full functionality, run the server locally on your machine.

Try it out ~ https://www.nymbo.net/nymbot