Update README.md

4509121 verified 5 months ago

6.08 kB

	---
	license: apache-2.0
	base_model: swiss-ai/Apertus-8B-2509
	tags:
	- text-embeddings
	- multilingual
	- encoder
	- apertus
	- experimental
	language:
	- multilingual
	library_name: transformers
	pipeline_tag: feature-extraction
	model_type: apertus
	---

	# Apertus-8B-2509-Encoder

	## Model Overview

	Apertus-8B-2509-Encoder is an experimental bidirectional encoder model derived from the swiss-ai/Apertus-8B-2509 decoder-only model. This model represents the first attempt to create a native Apertus-based encoder for text embedding generation and semantic similarity tasks.

	⚠️ Experimental Notice: This model is in experimental stage and may not perform optimally for production embedding tasks. See limitations section for details.

	## Model Details

	- Model Type: Bidirectional Transformer Encoder
	- Base Model: swiss-ai/Apertus-8B-2509
	- Parameters: 8.053 billion
	- Architecture: 32-layer transformer with XIELUActivation
	- Embedding Dimension: 4096
	- Supported Languages: 1811 (inherited from base model)
	- License: Apache 2.0

	## Intended Use

	### Primary Use Cases
	- Text embedding generation for research purposes
	- Cross-lingual semantic analysis experiments
	- Proof-of-concept for decoder-to-encoder conversion
	- Base model for further fine-tuning on embedding tasks

	### Downstream Tasks
	- Semantic similarity analysis
	- Information retrieval systems
	- Cross-lingual text comparison
	- Vector database integration

	## How to Use

	```python
	from transformers import AutoModel, AutoTokenizer
	import torch

	# Load model and tokenizer
	tokenizer = AutoTokenizer.from_pretrained(
	"speakdatawith/Apertus-8B-2509-Encoder",
	trust_remote_code=True
	)
	model = AutoModel.from_pretrained(
	"speakdatawith/Apertus-8B-2509-Encoder",
	trust_remote_code=True,
	torch_dtype=torch.bfloat16
	)

	# Generate embeddings
	def get_embeddings(texts):
	inputs = tokenizer(texts, return_tensors="pt", padding=True, truncation=True, max_length=512)
	with torch.no_grad():
	outputs = model(**inputs)
	embeddings = outputs.last_hidden_state.mean(dim=1)
	return embeddings

	# Example usage
	texts = ["Hello world", "Hallo Welt", "Bonjour monde"]
	embeddings = get_embeddings(texts)
	print(f"Embeddings shape: {embeddings.shape}")
	```

	## Model Architecture

	The model maintains the original Apertus-8B-2509 architecture with key modifications:

	- Attention Mechanism: Converted from causal (decoder) to bidirectional (encoder)
	- Configuration Changes:
	- `is_decoder = False`
	- `is_causal = False`
	- `architectures = ['ApertusModel']`
	- Pooling Strategy: Mean pooling over last hidden states

	## Training Details

	### Conversion Process
	1. Loaded pre-trained swiss-ai/Apertus-8B-2509 model
	2. Disabled causal masking in all attention layers
	3. Updated model configuration for encoder usage
	4. No additional training performed

	### Training Data
	Inherits training data from the base model swiss-ai/Apertus-8B-2509. Refer to the base model documentation for detailed data information.

	## Performance & Limitations

	### Known Limitations

	⚠️ Important Performance Notice:
	- Initial testing revealed suboptimal embedding quality
	- Semantic similarity scores appear inconsistent with expected behavior
	- Model may produce embeddings that do not accurately reflect semantic relationships
	- Performance significantly below specialized embedding models

	### Technical Limitations
	- Resource Requirements: 16GB+ GPU memory for inference
	- Speed: Significantly slower than specialized embedding models
	- Optimization: Not fine-tuned for embedding tasks
	- Pooling: Uses simple mean pooling strategy

	### Benchmark Results
	Preliminary testing on basic similarity tasks showed:
	- Cross-lingual similarity detection: Inconsistent
	- Direct translation pairs: Below expected performance
	- Semantic relationship recognition: Requires improvement

	## System Requirements

	### Hardware
	- GPU: 16GB+ VRAM recommended (A100, H100, or equivalent)
	- CPU: High-memory alternative possible but significantly slower
	- RAM: 32GB+ system RAM recommended

	### Software
	- Python 3.12+
	- PyTorch 2.8.0+cu126
	- Transformers >= 4.56.1
	- `trust_remote_code=True` required

	## Ethical Considerations & Biases

	### Inherited Considerations
	This model inherits all ethical considerations and potential biases from the base swiss-ai/Apertus-8B-2509 model. Users should:

	- Review base model documentation for bias analysis
	- Conduct appropriate bias testing for their specific use cases
	- Consider potential cultural and linguistic biases across 1811 supported languages

	### EU AI Act Compliance
	This model is developed in compliance with EU AI Act requirements:
	- Comprehensive documentation provided
	- Risk assessment conducted
	- Transparency obligations fulfilled
	- Technical documentation available

	## Environmental Impact

	- Energy Consumption: High due to 8B parameter size
	- Carbon Footprint: Significant computational requirements
	- Efficiency: Substantially less efficient than specialized embedding models

	## Future Development

	Potential improvements for future versions:
	- Fine-tuning on embedding-specific datasets
	- Implementation of advanced pooling strategies
	- Model distillation for efficiency improvements
	- Comprehensive evaluation on standard embedding benchmarks

	## Citation

	```
	@misc{apertus8b2509encoder,
	title={Apertus-8B-2509-Encoder: Experimental Bidirectional Encoder},
	author={speakdatawith},
	year={2025},
	howpublished={Hugging Face Model Hub},
	url={https://huggingface.co/speakdatawith/Apertus-8B-2509-Encoder}
	}
	```

	## Acknowledgments

	- Base model: swiss-ai/Apertus-8B-2509
	- Architecture: Transformer-based encoder conversion
	- Framework: Hugging Face Transformers

	## Contact

	For questions regarding this model or its implementation, please open an issue in the model repository.

	---

	Disclaimer: This is an experimental model. Production use is not recommended without thorough evaluation and potential fine-tuning for specific embedding tasks.