|
|
--- |
|
|
license: cc-by-4.0 |
|
|
language: |
|
|
- en |
|
|
- tr |
|
|
tags: |
|
|
- VLM |
|
|
- image2text |
|
|
- lm |
|
|
--- |
|
|
# TeLVE: Turkish efficient Language Vision Engine 🧿 |
|
|
[](https://creativecommons.org/licenses/by/4.0/) |
|
|
[](https://huggingface.co/outsu/TeLVE) |
|
|
## First Turkish VLM ever! |
|
|
|
|
|
TeLVE is the first Visual Language Model specifically designed for Turkish language understanding and image description generation. Built on Vision Transformer (ViT) and BERT pre-trained encoder architectures, it bridges the gap in Turkish visual-linguistic processing. |
|
|
No module named 'imagine' |
|
|
 |
|
|
|
|
|
## Model Description |
|
|
|
|
|
TeLVE combines: |
|
|
- 🖼️ Vision Transformer (ViT-base-patch16-224) |
|
|
- 📝 Turkish BERT (dbmdz/bert-base-turkish-cased) |
|
|
- 🔄 Cross-attention mechanism for vision-language fusion |
|
|
|
|
|
### Version Logs |
|
|
- **TeLVE v1.0**: Trained on Unsplash Lite dataset |
|
|
- **TeLVE v1.0dep**: Dataset enhanced with selective images from Pexels images, the encoder problem with letter "ü" was fixed. *(Deprecated, performance was decreased because of dataset addressing problem. Not recommended to use.)* |
|
|
|
|
|
## Usage |
|
|
|
|
|
The model can be used in two ways: |
|
|
|
|
|
### Inference (imagine.py) |
|
|
```python |
|
|
# Generate captions for images |
|
|
python imagine.py |
|
|
``` |
|
|
This script: |
|
|
- Loads a trained TeLVE model |
|
|
- Takes images from `images` directory |
|
|
- Generates Turkish captions for each image |
|
|
- Outputs the results to console |
|
|
|
|
|
### Training (main.py) |
|
|
Users can train their own models with ViT and BERT encoders. |
|
|
```python |
|
|
# Train a new model |
|
|
python main.py |
|
|
``` |
|
|
|
|
|
This script: |
|
|
- Loads and preprocesses image-caption pairs |
|
|
- Initializes ViT and BERT encoders |
|
|
- Trains the combined model |
|
|
- Saves the model and tokenizer |
|
|
|
|
|
|
|
|
## Performance |
|
|
Performance scores will be evaluated. |
|
|
<!-- |
|
|
| Model Version | Dataset | BLEU-4 | METEOR | CIDEr | |
|
|
|--------------|---------|---------|---------|--------| |
|
|
| TeLVE v1.0 | Unsplash | *TBD* | *TBD* | *TBD* | |
|
|
| TeLVE v1.1 | Unsplash+Pexels | *TBD* | *TBD* | *TBD* |--> |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@software{telve2024, |
|
|
author = {Öğüt Su Karagün}, |
|
|
title = {TeLVE: Turkish efficient Language Vision Engine}, |
|
|
year = {2024}, |
|
|
url = {https://huggingface.co/outsu/TeLVE} |
|
|
} |
|
|
``` |
|
|
|
|
|
## License |
|
|
<p xmlns:cc="http://creativecommons.org/ns#" xmlns:dct="http://purl.org/dc/terms/"><a property="dct:title" rel="cc:attributionURL" href="https://huggingface.co/outsu/TeLVE">TeLVE</a> © 2024 by <a rel="cc:attributionURL dct:creator" property="cc:attributionName" href="https://outsu.github.io">Öğüt Su Karagün</a> is licensed under <a href="https://creativecommons.org/licenses/by/4.0/?ref=chooser-v1" target="_blank" rel="license noopener noreferrer" style="display:inline-block;">Creative Commons Attribution 4.0 International</a></p> |