Spaces:

transformers-community
/

support

Running

App Files Files Community

Custom `generate` methods discussion

#10

by joaogante - opened May 20, 2025

Discussion

joaogante

May 20, 2025

Hi everyone 👋

This thread aims to centralize the discussion about custom generate methods. Questions about how it works, improvement suggestions, and requests for new generation methods are welcome!

Resources:
👉 docs
👉 all custom generate methods

Gausson

Transformers Community org Jul 22, 2025

•

edited Jul 22, 2025

Hi @joaogante

We want to integrate our method into the transformers-community (advised by your official team.). See details in https://github.com/huggingface/transformers/pull/38824

Could you please provide us with the detailed procedures on how to create a new pull request to this transformers-community?

Summary:
We want to merge https://huggingface.co/Gausson/sep_cache [ICML 2025] into transformers-community/sep_cache but we couldn’t find a pull request option on the Hugging Face Hub—could you guide me on the correct procedure?

Best Regards

pcuenq

Transformers Community org Jul 28, 2025

Hi @Gausson , amazing contribution! We'll invite you as a Contributor to the organization so you can transfer your repo using the "Rename or transfer this model" section of your repo's settings. This way, the model URL will be automatically redirected, and downloads and like counts will be preserved.

joaogante

Jul 28, 2025

•

edited Jul 28, 2025

Hey @Gausson !

Veeery cool repo -- I'm impressed with the complexity of what you've build with custom_generate, and with how well documented it is 🔥

I'm going to invite you as a contributor to transformers-community, so you can move your repo there. In a nutshell, you would still have complete control over the Hub repo, but it gets higher visibility because of the org. After it is moved, we're going to create a new collection of community-contributed custom_generate repos. Let us know if you have any questions! 🤗

Meanwhile, two minor comments to the repo itself:

On transformers==4.54, we've released a cache refator: caches are built as a composition of layers, as opposed to being isolated objects. Sadly, I think your current abstraction doesn't work with it. You might want to update references from transformers>=4.53 to transformers==4.53;
On your demo script in README.md, I suggest adding torch_dtype=torch.bfloat16 when loading the model, so that the demo is immediately compatible with 24GB GPUs.

Discussion centralized from this GH issue and this Hub issue.

joaogante

Jul 28, 2025

@Gausson you should have an invite to transformers-community

Gausson

Transformers Community org Jul 29, 2025

@joaogante @pcuenq

Thank you very much for your reply.

I have merged sep_cache into the transformers-community repo. Next, I will modify my README.md according to your suggestions later. When you have time, please kindly add sep_cache to the collection of custom_generate, etc. Thank you very much. 😊😊😊

joaogante

Jul 29, 2025

@Gausson added to a new collection here 🤗

maxholsman

Jan 6

Hi @joaogante , @pcuenq , and team! Hope you're all doing well!

I'm reaching out to see whether our method Fuzzy Speculative Decoding (FSD) [ACL Findings 2025] could also be featured as a transformers-community custom_generate repo.

FSD is a lightweight extension of standard speculative decoding (SD) that allows users to adjust how leniently draft tokens are accepted, enabling a tunable trade-off between small reductions in generation quality and inference acceleration. By relaxing SD's strict distributional equivalence requirement, FSD achieves inference acceleration beyond traditional SD, flexibly offering moderate gains when full generation quality is to be maintained or much larger gains when small reductions in quality are acceptable.

We have already implemented our method as a custom_generate repo here (https://huggingface.co/maxholsman/fuzzy-spec-dec) with which users can run FSD on any draft-target model pair. We're planning to add some guidance on hyperparameter setting for common draft-target model pairs and are happy to make any further modifications if required!

Thank you!

Paper: https://aclanthology.org/2025.findings-acl.1346/

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment