patch inference on CPU & Windows + Update README snippets

by tomaarsen HF Staff - opened Jan 22, 2025

base: refs/heads/main

←

from: refs/pr/2

Discussion Files changed

+32

-18

tomaarsen

Jan 22, 2025

•

edited Jan 22, 2025

Hello!

Pull Request overview

Remove reference_compile config option. When not specified in the config, it will be set dynamically based on the user's hardware and software: https://github.com/huggingface/transformers/blob/f439e28d32c9fa061c4fd90696ba0b158d273d09/src/transformers/models/modernbert/modeling_modernbert.py#L689-L718
Update the README:
- Add tag for Sentence Transformers to boost visibility
- Add model outputs so people get a better feel for what the model does
- Remove 'trust_remote_code', not needed for ModernBERT!
- Update minimum 'transformers' to v4.48.0, as that version introduced the modernbert architecture.
- Mention that flash_attn is recommended (but not required) for faster inference.

Details

Regarding the reference_compile config change: if that isn't done, then parts of the model are always compiled, even if the user does not have triton (a core requirement for compilation) or if they are running on CPU (which isn't compatible with compilation). Removing the option will help.

Tom Aarsen

Remove reference_compile; set model max length to avoid warning0e403684

tomaarsen changed pull request status to open Jan 22, 2025

thenlper changed pull request status to merged Jan 22, 2025

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment