| license: apache-2.0 | |
| tags: | |
| - text-encoder | |
| - feature-extraction | |
| - sentence-transformers | |
| - contrastive-learning | |
| base_model: mjaliz/vision-text-dual-encoder-v1 | |
| # Text Encoder extracted from mjaliz/vision-text-dual-encoder-v1 | |
| This is the text encoder component extracted from the VisionTextDualEncoder model | |
| [mjaliz/vision-text-dual-encoder-v1](https://huggingface.co/mjaliz/vision-text-dual-encoder-v1). | |
| ## Model Details | |
| - **Model type:** XLMRobertaModel | |
| - **Source model:** [mjaliz/vision-text-dual-encoder-v1](https://huggingface.co/mjaliz/vision-text-dual-encoder-v1) | |
| - **Includes projection:** False | |
| ## Usage | |
| ```python | |
| from transformers import AutoModel, AutoTokenizer | |
| # Load text encoder | |
| model = AutoModel.from_pretrained("mjaliz/siglip-text-encoder") | |
| tokenizer = AutoTokenizer.from_pretrained("mjaliz/siglip-text-encoder") | |
| # Encode text | |
| texts = ["Hello world", "How are you?"] | |
| inputs = tokenizer(texts, padding=True, truncation=True, return_tensors="pt") | |
| outputs = model(**inputs) | |
| # Get embeddings (pooler output or mean of last hidden state) | |
| if hasattr(outputs, "pooler_output") and outputs.pooler_output is not None: | |
| embeddings = outputs.pooler_output | |
| else: | |
| embeddings = outputs.last_hidden_state.mean(dim=1) | |
| print(embeddings.shape) | |
| ``` | |
| ## Citation | |
| If you use this model, please cite the original dual encoder model. | |