speakdatawith's picture
Update README.md
4509121 verified
---
license: apache-2.0
base_model: swiss-ai/Apertus-8B-2509
tags:
- text-embeddings
- multilingual
- encoder
- apertus
- experimental
language:
- multilingual
library_name: transformers
pipeline_tag: feature-extraction
model_type: apertus
---
# Apertus-8B-2509-Encoder
## Model Overview
**Apertus-8B-2509-Encoder** is an experimental bidirectional encoder model derived from the swiss-ai/Apertus-8B-2509 decoder-only model. This model represents the first attempt to create a native Apertus-based encoder for text embedding generation and semantic similarity tasks.
**⚠️ Experimental Notice**: This model is in experimental stage and may not perform optimally for production embedding tasks. See limitations section for details.
## Model Details
- **Model Type**: Bidirectional Transformer Encoder
- **Base Model**: swiss-ai/Apertus-8B-2509
- **Parameters**: 8.053 billion
- **Architecture**: 32-layer transformer with XIELUActivation
- **Embedding Dimension**: 4096
- **Supported Languages**: 1811 (inherited from base model)
- **License**: Apache 2.0
## Intended Use
### Primary Use Cases
- Text embedding generation for research purposes
- Cross-lingual semantic analysis experiments
- Proof-of-concept for decoder-to-encoder conversion
- Base model for further fine-tuning on embedding tasks
### Downstream Tasks
- Semantic similarity analysis
- Information retrieval systems
- Cross-lingual text comparison
- Vector database integration
## How to Use
```python
from transformers import AutoModel, AutoTokenizer
import torch
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained(
"speakdatawith/Apertus-8B-2509-Encoder",
trust_remote_code=True
)
model = AutoModel.from_pretrained(
"speakdatawith/Apertus-8B-2509-Encoder",
trust_remote_code=True,
torch_dtype=torch.bfloat16
)
# Generate embeddings
def get_embeddings(texts):
inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True, max_length=512)
with torch.no_grad():
outputs = model(**inputs)
embeddings = outputs.last_hidden_state.mean(dim=1)
return embeddings
# Example usage
texts = ["Hello world", "Hallo Welt", "Bonjour monde"]
embeddings = get_embeddings(texts)
print(f"Embeddings shape: {embeddings.shape}")
```
## Model Architecture
The model maintains the original Apertus-8B-2509 architecture with key modifications:
- **Attention Mechanism**: Converted from causal (decoder) to bidirectional (encoder)
- **Configuration Changes**:
- `is_decoder = False`
- `is_causal = False`
- `architectures = ['ApertusModel']`
- **Pooling Strategy**: Mean pooling over last hidden states
## Training Details
### Conversion Process
1. Loaded pre-trained swiss-ai/Apertus-8B-2509 model
2. Disabled causal masking in all attention layers
3. Updated model configuration for encoder usage
4. No additional training performed
### Training Data
Inherits training data from the base model swiss-ai/Apertus-8B-2509. Refer to the base model documentation for detailed data information.
## Performance & Limitations
### Known Limitations
**⚠️ Important Performance Notice**:
- Initial testing revealed suboptimal embedding quality
- Semantic similarity scores appear inconsistent with expected behavior
- Model may produce embeddings that do not accurately reflect semantic relationships
- Performance significantly below specialized embedding models
### Technical Limitations
- **Resource Requirements**: 16GB+ GPU memory for inference
- **Speed**: Significantly slower than specialized embedding models
- **Optimization**: Not fine-tuned for embedding tasks
- **Pooling**: Uses simple mean pooling strategy
### Benchmark Results
Preliminary testing on basic similarity tasks showed:
- Cross-lingual similarity detection: Inconsistent
- Direct translation pairs: Below expected performance
- Semantic relationship recognition: Requires improvement
## System Requirements
### Hardware
- **GPU**: 16GB+ VRAM recommended (A100, H100, or equivalent)
- **CPU**: High-memory alternative possible but significantly slower
- **RAM**: 32GB+ system RAM recommended
### Software
- Python 3.12+
- PyTorch 2.8.0+cu126
- Transformers >= 4.56.1
- `trust_remote_code=True` required
## Ethical Considerations & Biases
### Inherited Considerations
This model inherits all ethical considerations and potential biases from the base swiss-ai/Apertus-8B-2509 model. Users should:
- Review base model documentation for bias analysis
- Conduct appropriate bias testing for their specific use cases
- Consider potential cultural and linguistic biases across 1811 supported languages
### EU AI Act Compliance
This model is developed in compliance with EU AI Act requirements:
- Comprehensive documentation provided
- Risk assessment conducted
- Transparency obligations fulfilled
- Technical documentation available
## Environmental Impact
- **Energy Consumption**: High due to 8B parameter size
- **Carbon Footprint**: Significant computational requirements
- **Efficiency**: Substantially less efficient than specialized embedding models
## Future Development
Potential improvements for future versions:
- Fine-tuning on embedding-specific datasets
- Implementation of advanced pooling strategies
- Model distillation for efficiency improvements
- Comprehensive evaluation on standard embedding benchmarks
## Citation
```
@misc{apertus8b2509encoder,
title={Apertus-8B-2509-Encoder: Experimental Bidirectional Encoder},
author={speakdatawith},
year={2025},
howpublished={Hugging Face Model Hub},
url={https://huggingface.co/speakdatawith/Apertus-8B-2509-Encoder}
}
```
## Acknowledgments
- Base model: swiss-ai/Apertus-8B-2509
- Architecture: Transformer-based encoder conversion
- Framework: Hugging Face Transformers
## Contact
For questions regarding this model or its implementation, please open an issue in the model repository.
---
**Disclaimer**: This is an experimental model. Production use is not recommended without thorough evaluation and potential fine-tuning for specific embedding tasks.