|
|
--- |
|
|
license: apache-2.0 |
|
|
base_model: swiss-ai/Apertus-8B-2509 |
|
|
tags: |
|
|
- text-embeddings |
|
|
- multilingual |
|
|
- encoder |
|
|
- apertus |
|
|
- experimental |
|
|
language: |
|
|
- multilingual |
|
|
library_name: transformers |
|
|
pipeline_tag: feature-extraction |
|
|
model_type: apertus |
|
|
--- |
|
|
|
|
|
# Apertus-8B-2509-Encoder |
|
|
|
|
|
## Model Overview |
|
|
|
|
|
**Apertus-8B-2509-Encoder** is an experimental bidirectional encoder model derived from the swiss-ai/Apertus-8B-2509 decoder-only model. This model represents the first attempt to create a native Apertus-based encoder for text embedding generation and semantic similarity tasks. |
|
|
|
|
|
**⚠️ Experimental Notice**: This model is in experimental stage and may not perform optimally for production embedding tasks. See limitations section for details. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Model Type**: Bidirectional Transformer Encoder |
|
|
- **Base Model**: swiss-ai/Apertus-8B-2509 |
|
|
- **Parameters**: 8.053 billion |
|
|
- **Architecture**: 32-layer transformer with XIELUActivation |
|
|
- **Embedding Dimension**: 4096 |
|
|
- **Supported Languages**: 1811 (inherited from base model) |
|
|
- **License**: Apache 2.0 |
|
|
|
|
|
## Intended Use |
|
|
|
|
|
### Primary Use Cases |
|
|
- Text embedding generation for research purposes |
|
|
- Cross-lingual semantic analysis experiments |
|
|
- Proof-of-concept for decoder-to-encoder conversion |
|
|
- Base model for further fine-tuning on embedding tasks |
|
|
|
|
|
### Downstream Tasks |
|
|
- Semantic similarity analysis |
|
|
- Information retrieval systems |
|
|
- Cross-lingual text comparison |
|
|
- Vector database integration |
|
|
|
|
|
## How to Use |
|
|
|
|
|
```python |
|
|
from transformers import AutoModel, AutoTokenizer |
|
|
import torch |
|
|
|
|
|
# Load model and tokenizer |
|
|
tokenizer = AutoTokenizer.from_pretrained( |
|
|
"speakdatawith/Apertus-8B-2509-Encoder", |
|
|
trust_remote_code=True |
|
|
) |
|
|
model = AutoModel.from_pretrained( |
|
|
"speakdatawith/Apertus-8B-2509-Encoder", |
|
|
trust_remote_code=True, |
|
|
torch_dtype=torch.bfloat16 |
|
|
) |
|
|
|
|
|
# Generate embeddings |
|
|
def get_embeddings(texts): |
|
|
inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True, max_length=512) |
|
|
with torch.no_grad(): |
|
|
outputs = model(**inputs) |
|
|
embeddings = outputs.last_hidden_state.mean(dim=1) |
|
|
return embeddings |
|
|
|
|
|
# Example usage |
|
|
texts = ["Hello world", "Hallo Welt", "Bonjour monde"] |
|
|
embeddings = get_embeddings(texts) |
|
|
print(f"Embeddings shape: {embeddings.shape}") |
|
|
``` |
|
|
|
|
|
## Model Architecture |
|
|
|
|
|
The model maintains the original Apertus-8B-2509 architecture with key modifications: |
|
|
|
|
|
- **Attention Mechanism**: Converted from causal (decoder) to bidirectional (encoder) |
|
|
- **Configuration Changes**: |
|
|
- `is_decoder = False` |
|
|
- `is_causal = False` |
|
|
- `architectures = ['ApertusModel']` |
|
|
- **Pooling Strategy**: Mean pooling over last hidden states |
|
|
|
|
|
## Training Details |
|
|
|
|
|
### Conversion Process |
|
|
1. Loaded pre-trained swiss-ai/Apertus-8B-2509 model |
|
|
2. Disabled causal masking in all attention layers |
|
|
3. Updated model configuration for encoder usage |
|
|
4. No additional training performed |
|
|
|
|
|
### Training Data |
|
|
Inherits training data from the base model swiss-ai/Apertus-8B-2509. Refer to the base model documentation for detailed data information. |
|
|
|
|
|
## Performance & Limitations |
|
|
|
|
|
### Known Limitations |
|
|
|
|
|
**⚠️ Important Performance Notice**: |
|
|
- Initial testing revealed suboptimal embedding quality |
|
|
- Semantic similarity scores appear inconsistent with expected behavior |
|
|
- Model may produce embeddings that do not accurately reflect semantic relationships |
|
|
- Performance significantly below specialized embedding models |
|
|
|
|
|
### Technical Limitations |
|
|
- **Resource Requirements**: 16GB+ GPU memory for inference |
|
|
- **Speed**: Significantly slower than specialized embedding models |
|
|
- **Optimization**: Not fine-tuned for embedding tasks |
|
|
- **Pooling**: Uses simple mean pooling strategy |
|
|
|
|
|
### Benchmark Results |
|
|
Preliminary testing on basic similarity tasks showed: |
|
|
- Cross-lingual similarity detection: Inconsistent |
|
|
- Direct translation pairs: Below expected performance |
|
|
- Semantic relationship recognition: Requires improvement |
|
|
|
|
|
## System Requirements |
|
|
|
|
|
### Hardware |
|
|
- **GPU**: 16GB+ VRAM recommended (A100, H100, or equivalent) |
|
|
- **CPU**: High-memory alternative possible but significantly slower |
|
|
- **RAM**: 32GB+ system RAM recommended |
|
|
|
|
|
### Software |
|
|
- Python 3.12+ |
|
|
- PyTorch 2.8.0+cu126 |
|
|
- Transformers >= 4.56.1 |
|
|
- `trust_remote_code=True` required |
|
|
|
|
|
## Ethical Considerations & Biases |
|
|
|
|
|
### Inherited Considerations |
|
|
This model inherits all ethical considerations and potential biases from the base swiss-ai/Apertus-8B-2509 model. Users should: |
|
|
|
|
|
- Review base model documentation for bias analysis |
|
|
- Conduct appropriate bias testing for their specific use cases |
|
|
- Consider potential cultural and linguistic biases across 1811 supported languages |
|
|
|
|
|
### EU AI Act Compliance |
|
|
This model is developed in compliance with EU AI Act requirements: |
|
|
- Comprehensive documentation provided |
|
|
- Risk assessment conducted |
|
|
- Transparency obligations fulfilled |
|
|
- Technical documentation available |
|
|
|
|
|
## Environmental Impact |
|
|
|
|
|
- **Energy Consumption**: High due to 8B parameter size |
|
|
- **Carbon Footprint**: Significant computational requirements |
|
|
- **Efficiency**: Substantially less efficient than specialized embedding models |
|
|
|
|
|
## Future Development |
|
|
|
|
|
Potential improvements for future versions: |
|
|
- Fine-tuning on embedding-specific datasets |
|
|
- Implementation of advanced pooling strategies |
|
|
- Model distillation for efficiency improvements |
|
|
- Comprehensive evaluation on standard embedding benchmarks |
|
|
|
|
|
## Citation |
|
|
|
|
|
``` |
|
|
@misc{apertus8b2509encoder, |
|
|
title={Apertus-8B-2509-Encoder: Experimental Bidirectional Encoder}, |
|
|
author={speakdatawith}, |
|
|
year={2025}, |
|
|
howpublished={Hugging Face Model Hub}, |
|
|
url={https://huggingface.co/speakdatawith/Apertus-8B-2509-Encoder} |
|
|
} |
|
|
``` |
|
|
|
|
|
## Acknowledgments |
|
|
|
|
|
- Base model: swiss-ai/Apertus-8B-2509 |
|
|
- Architecture: Transformer-based encoder conversion |
|
|
- Framework: Hugging Face Transformers |
|
|
|
|
|
## Contact |
|
|
|
|
|
For questions regarding this model or its implementation, please open an issue in the model repository. |
|
|
|
|
|
--- |
|
|
|
|
|
**Disclaimer**: This is an experimental model. Production use is not recommended without thorough evaluation and potential fine-tuning for specific embedding tasks. |