Feature Extraction
Transformers
PyTorch
Safetensors
English
bert
mteb
sentence-transfomres
Eval Results (legacy)
text-embeddings-inference
Instructions to use BAAI/bge-large-en with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use BAAI/bge-large-en with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="BAAI/bge-large-en")# Load model directly from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("BAAI/bge-large-en") model = AutoModel.from_pretrained("BAAI/bge-large-en") - Inference
- Notebooks
- Google Colab
- Kaggle
Fix `model_max_length` in `tokenizer_config.json`
#7
by bryant1410 - opened
The current value of model_max_length in tokenizer_config.json(basically infinity) is inconsistent with max_position_embeddings in config.json. It's also inconsistent with that of bge-base-en.
This also happens with bge-small-en, but I was thinking that it was good to have any potential discussion here first, before sending a PR for that one as well.
Thanks!
The tokenizer_config.config is generated by huggingface transformers package automatically. To avoid confusion for users, It's better to fix this.
Shitao changed pull request status to merged
Yeah, not only for confusion, but I also forgot to mention that the tokenization wouldn't respect the max length otherwise.