|
|
--- |
|
|
language: |
|
|
- en |
|
|
license: apache-2.0 |
|
|
tags: |
|
|
- text-generation |
|
|
- emoji |
|
|
- byte-level |
|
|
- looped-transformer |
|
|
- text2emoji |
|
|
datasets: |
|
|
- KomeijiForce/Text2Emoji |
|
|
--- |
|
|
|
|
|
# EmojiLM: Byte-Level Looped Transformer for Text-to-Emoji Translation |
|
|
|
|
|
This model is a byte-level language model trained with a **byte-level Universal transformer (UT)** architecture for translating text descriptions to emojis. |
|
|
|
|
|
## Model Description |
|
|
|
|
|
- **Model Type:** Causal Language Model with Looped Transformer Architecture |
|
|
- **Task:** Text-to-Emoji Translation |
|
|
- **Training Data:** KomeijiForce/Text2Emoji dataset (500k+ text-emoji pairs) |
|
|
- **Tokenizer:** Byte-level (vocab size: 258) |
|
|
|
|
|
### Architecture Details |
|
|
|
|
|
**Looped Transformer Architecture:** |
|
|
- **Base Layers:** 24 |
|
|
- **Number of Loops:** 8 (layers are applied iteratively) |
|
|
- **Shared Layers:** True (parameter efficient) |
|
|
- **Loop Residual:** True (residual connections across loops) |
|
|
|
|
|
**Model Dimensions:** |
|
|
- **Hidden Dimension:** 1024 |
|
|
- **Number of Attention Heads:** 16 |
|
|
- **KV Heads:** 16 |
|
|
- **Max Sequence Length:** 512 |
|
|
- **RoPE Theta:** 10000.0 |
|
|
|
|
|
### Training Configuration |
|
|
|
|
|
- **Training Steps:** 5100 |
|
|
- **Batch Size:** 12 |
|
|
- **Sequence Length:** 512 |
|
|
- **Learning Rate:** 0.0003 |
|
|
- **Warmup Steps:** 1000 |
|
|
- **Optimizer:** AdamW (β1=0.9, β2=0.95) |
|
|
- **LR Scheduler:** Cosine with min ratio 0.1 |
|
|
- **Gradient Clipping:** 1.0 |
|
|
- **Weight Decay:** 0.1 |
|
|
- **Precision:** BF16 |
|
|
|
|
|
## What is a Looped Transformer? |
|
|
|
|
|
A looped transformer applies the same transformer layers multiple times in an iterative refinement process. |
|
|
This is particularly effective for translation tasks as it allows the model to: |
|
|
- Refine predictions through multiple iterations |
|
|
- Use parameters more efficiently (shared weights across loops) |
|
|
- Model complex input-output mappings with fewer total parameters |
|
|
|
|
|
In this model, 24 layers are applied 8 times with residual connections between loops. |
|
|
|
|
|
## Intended Use |
|
|
|
|
|
This model is designed to translate text descriptions into appropriate emojis. |
|
|
|
|
|
**Example Usage:** |
|
|
``` |
|
|
Input: "I love pizza" |
|
|
Output: "🍕❤️" |
|
|
``` |
|
|
|
|
|
## Training Data |
|
|
|
|
|
The model was trained on the **KomeijiForce/Text2Emoji** dataset, which contains over 500,000 text-emoji pairs. |
|
|
|
|
|
## Model Files |
|
|
|
|
|
This repository contains: |
|
|
- `consolidated.pth`: PyTorch model weights |
|
|
- `params.json`: Complete model and training configuration |
|
|
- `train_state_*.json`: Training state information from checkpoint |
|
|
|
|
|
## Usage |
|
|
|
|
|
To use this model, you'll need the original BFlowNet/loopedLM codebase to load the architecture: |
|
|
|
|
|
```python |
|
|
import torch |
|
|
import json |
|
|
|
|
|
# Load model parameters |
|
|
with open('params.json', 'r') as f: |
|
|
params = json.load(f) |
|
|
|
|
|
# Load model weights |
|
|
checkpoint = torch.load('consolidated.pth', map_location='cpu') |
|
|
|
|
|
# Initialize model with your BFlowNet loopedLM architecture |
|
|
# from apps.loopedLM import LoopedTransformer |
|
|
# model = LoopedTransformer(**params['model']) |
|
|
# model.load_state_dict(checkpoint) |
|
|
``` |
|
|
|
|
|
### Generation Parameters |
|
|
|
|
|
For best results, use: |
|
|
- **Max Tokens:** 128 (outputs are typically short) |
|
|
- **Temperature:** 0.7 (for diverse emoji selection) |
|
|
- **Top-p:** 0.9 |
|
|
|
|
|
## Limitations |
|
|
|
|
|
- The model uses a byte-level tokenizer, which works well for emojis but may be less efficient than subword tokenization for general text |
|
|
- Performance is optimized for text-to-emoji translation and may not generalize well to other tasks |
|
|
- The model requires the specific looped transformer architecture implementation to load and use |
|
|
|
|
|
<!-- ## Citation |
|
|
|
|
|
If you use this model, please cite: |
|
|
|
|
|
```bibtex |
|
|
@misc{emojilm-looped-transformer, |
|
|
title={EmojiLM: Byte-Level Looped Transformer for Text-to-Emoji Translation}, |
|
|
author={Your Name}, |
|
|
year={2025}, |
|
|
howpublished={\url{https://huggingface.co/YOUR-USERNAME/emojilm-looped-transformer}} |
|
|
} |
|
|
``` --> |
|
|
|
|
|
## Training Framework |
|
|
|
|
|
This model was trained using the BFlowNet framework with looped transformer architecture. |
|
|
|
|
|
Dataset: [KomeijiForce/Text2Emoji](https://huggingface.co/datasets/KomeijiForce/Text2Emoji) |
|
|
|
|
|
## License |
|
|
|
|
|
Apache 2.0 |
|
|
|