HugGAN Community

non-profit

Activity Feed Request to join this org

AI & ML interests

GANs!

Recent Activity

nielsr submitted a paper 20 days ago

Scaling Test-Time Compute for Agentic Coding

nielsr submitted a paper 26 days ago

Geometric Context Transformer for Streaming 3D Reconstruction

gagan3012 authored a paper 27 days ago

Beyond LLM-as-a-Judge: Deterministic Metrics for Multilingual Generative Text Evaluation

View all activity

nielsr

submitted a paper to Daily Papers 20 days ago

Scaling Test-Time Compute for Agentic Coding

Paper • 2604.16529 • Published 27 days ago • 11

nielsr

submitted a paper to Daily Papers 26 days ago

Geometric Context Transformer for Streaming 3D Reconstruction

Paper • 2604.14141 • Published 28 days ago • 20

gagan3012

authored a paper 27 days ago

Beyond LLM-as-a-Judge: Deterministic Metrics for Multilingual Generative Text Evaluation

Paper • 2604.05083 • Published Apr 6

nielsr

submitted 2 papers to Daily Papers about 1 month ago

A Frame is Worth One Token: Efficient Generative World Modeling with Delta Tokens

Paper • 2604.04913 • Published Apr 6 • 11

MDPBench: A Benchmark for Multilingual Document Parsing in Real-World Scenarios

Paper • 2603.28130 • Published Mar 30 • 11

gagan3012

authored a paper about 2 months ago

Fanar-Sadiq: A Multi-Agent Architecture for Grounded Islamic QA

Paper • 2603.08501 • Published Mar 9

nielsr

submitted a paper to Daily Papers about 2 months ago

Do VLMs Need Vision Transformers? Evaluating State Space Models as Vision Encoders

Paper • 2603.19209 • Published Mar 19 • 5

gagan3012

authored a paper about 2 months ago

What Really Controls Temporal Reasoning in Large Language Models: Tokenisation or Representation of Time?

Paper • 2603.19017 • Published Mar 19 • 3

gagan3012

submitted a paper to Daily Papers about 2 months ago

What Really Controls Temporal Reasoning in Large Language Models: Tokenisation or Representation of Time?

Paper • 2603.19017 • Published Mar 19 • 3

nielsr

submitted a paper to Daily Papers about 2 months ago

V-JEPA 2.1: Unlocking Dense Features in Video Self-Supervised Learning

Paper • 2603.14482 • Published Mar 15 • 34

gagan3012

submitted a paper to Daily Papers about 2 months ago

Fanar-Sadiq: A Multi-Agent Architecture for Grounded Islamic QA

Paper • 2603.08501 • Published Mar 9

nielsr

submitted a paper to Daily Papers about 2 months ago

Omnilingual MT: Machine Translation for 1,600 Languages

Paper • 2603.16309 • Published Mar 17 • 22

nielsr

authored a paper 2 months ago

Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections

Paper • 2603.12180 • Published Mar 12 • 65

nielsr

submitted 2 papers to Daily Papers 3 months ago

VidEoMT: Your ViT is Secretly Also a Video Segmentation Model

Paper • 2602.17807 • Published Feb 19 • 7

Causal-JEPA: Learning World Models through Object-Level Latent Interventions

Paper • 2602.11389 • Published Feb 11 • 9

AlekseyKorshuk

authored a paper 3 months ago

Evaluation of a Robust Control System in Real-World Cable-Driven Parallel Robots

Paper • 2510.08270 • Published Oct 9, 2025 • 2

nielsr

submitted a paper to Daily Papers 3 months ago

UPLiFT: Efficient Pixel-Dense Feature Upsampling with Local Attenders

Paper • 2601.17950 • Published Jan 25 • 4

IlyasMoutawwakil

posted an update 4 months ago

Post

3240

Transformers v5 just landed! 🚀
It significantly unifies and reduces modeling code across architectures, while opening the door to a whole new class of performance optimizations.

My favorite new feature? 🤔
The new dynamic weight loader + converter. Here’s why 👇

Over the last few months, the core Transformers maintainers built an incredibly fast weight loader, capable of converting tensors on the fly while loading them in parallel threads. This means we’re no longer constrained by how parameters are laid out inside the safetensors weight files.

In practice, this unlocks two big things:
- Much more modular modeling code. You can now clearly see how architectures build on top of each other (DeepSeek v2 → v3, Qwen v2 → v3 → MoE, etc.). This makes shared bottlenecks obvious and lets us optimize the right building blocks once, for all model families.
- Performance optimizations beyond what torch.compile can do alone. torch.compile operates on the computation graph, but it can’t change parameter layouts. With the new loader, we can restructure weights at load time: fusing MoE expert projections, merging attention QKV projections, and enabling more compute-dense kernels that simply weren’t possible before.

Personally, I'm honored to have contributed in this direction, including the work on optimizing MoE implementations and making modeling code more torch-exportable, so these optimizations can be ported cleanly across runtimes.

Overall, Transformers v5 is a strong signal of where the community and industry are converging: Modularity and Performance, without sacrificing Flexibility.

Transformers v5 makes its signature from_pretrained an entrypoint where you can mix and match:
- Parallelism
- Quantization
- Custom kernels
- Flash/Paged attention
- Continuous batching
- ...

Kudos to everyone involved! I highly recommend the:
Release notes: https://github.com/huggingface/transformers/releases/tag/v5.0.0
Blog post: https://huggingface.co/blog/transformers-v5

3 replies

IlyasMoutawwakil

posted an update 4 months ago

Post

2464

After 2 months of refinement, I'm happy to announce that a lot of Transformers' modeling code is now significantly more torch-compile & export-friendly 🔥

Why it had to be done 👇
PyTorch's Dynamo compiler is increasingly becoming the default interoperability layer for ML systems. Anything that relies on torch.export or torch.compile, from model optimization to cross-framework integrations, benefits directly when models can be captured as a single dynamo-traced graph !

Transformers models are now easier to:
⚙️ Compile end-to-end with torch.compile backends
📦 Export reliably via torch.export and torch.onnx.export
🚀 Deploy to ONNX / ONNX Runtime, Intel Corporation's OpenVINO, NVIDIA AutoDeploy (TRT-LLM), AMD's Quark, Meta's Executorch and more hardware-specific runtimes.

This work aims at unblocking entire TorchDynamo-based toolchains that rely on exporting Transformers across runtimes and accelerators.

We are doubling down on Transformers commitment to be a first-class citizen of the PyTorch ecosystem, more exportable, more optimizable, and easier to deploy everywhere.

There are definitely some edge-cases that we still haven't addressed so don't hesitate to try compiling / exporting your favorite transformers and to open issues / PRs.

PR in the comments ! More updates coming coming soon !

1 reply

gagan3012

authored a paper 4 months ago

From RAG to Agentic RAG for Faithful Islamic Question Answering

Paper • 2601.07528 • Published Jan 12 • 4

AI & ML interests

Recent Activity

Team members 96

huggan's activity