Emoji-ByteLM / README.md
CharlesDDDD's picture
Update README.md
f059c1c verified
---
language:
- en
license: apache-2.0
tags:
- text-generation
- emoji
- byte-level
- looped-transformer
- text2emoji
datasets:
- KomeijiForce/Text2Emoji
---
# EmojiLM: Byte-Level Looped Transformer for Text-to-Emoji Translation
This model is a byte-level language model trained with a **byte-level Universal transformer (UT)** architecture for translating text descriptions to emojis.
## Model Description
- **Model Type:** Causal Language Model with Looped Transformer Architecture
- **Task:** Text-to-Emoji Translation
- **Training Data:** KomeijiForce/Text2Emoji dataset (500k+ text-emoji pairs)
- **Tokenizer:** Byte-level (vocab size: 258)
### Architecture Details
**Looped Transformer Architecture:**
- **Base Layers:** 24
- **Number of Loops:** 8 (layers are applied iteratively)
- **Shared Layers:** True (parameter efficient)
- **Loop Residual:** True (residual connections across loops)
**Model Dimensions:**
- **Hidden Dimension:** 1024
- **Number of Attention Heads:** 16
- **KV Heads:** 16
- **Max Sequence Length:** 512
- **RoPE Theta:** 10000.0
### Training Configuration
- **Training Steps:** 5100
- **Batch Size:** 12
- **Sequence Length:** 512
- **Learning Rate:** 0.0003
- **Warmup Steps:** 1000
- **Optimizer:** AdamW (β1=0.9, β2=0.95)
- **LR Scheduler:** Cosine with min ratio 0.1
- **Gradient Clipping:** 1.0
- **Weight Decay:** 0.1
- **Precision:** BF16
## What is a Looped Transformer?
A looped transformer applies the same transformer layers multiple times in an iterative refinement process.
This is particularly effective for translation tasks as it allows the model to:
- Refine predictions through multiple iterations
- Use parameters more efficiently (shared weights across loops)
- Model complex input-output mappings with fewer total parameters
In this model, 24 layers are applied 8 times with residual connections between loops.
## Intended Use
This model is designed to translate text descriptions into appropriate emojis.
**Example Usage:**
```
Input: "I love pizza"
Output: "🍕❤️"
```
## Training Data
The model was trained on the **KomeijiForce/Text2Emoji** dataset, which contains over 500,000 text-emoji pairs.
## Model Files
This repository contains:
- `consolidated.pth`: PyTorch model weights
- `params.json`: Complete model and training configuration
- `train_state_*.json`: Training state information from checkpoint
## Usage
To use this model, you'll need the original BFlowNet/loopedLM codebase to load the architecture:
```python
import torch
import json
# Load model parameters
with open('params.json', 'r') as f:
params = json.load(f)
# Load model weights
checkpoint = torch.load('consolidated.pth', map_location='cpu')
# Initialize model with your BFlowNet loopedLM architecture
# from apps.loopedLM import LoopedTransformer
# model = LoopedTransformer(**params['model'])
# model.load_state_dict(checkpoint)
```
### Generation Parameters
For best results, use:
- **Max Tokens:** 128 (outputs are typically short)
- **Temperature:** 0.7 (for diverse emoji selection)
- **Top-p:** 0.9
## Limitations
- The model uses a byte-level tokenizer, which works well for emojis but may be less efficient than subword tokenization for general text
- Performance is optimized for text-to-emoji translation and may not generalize well to other tasks
- The model requires the specific looped transformer architecture implementation to load and use
<!-- ## Citation
If you use this model, please cite:
```bibtex
@misc{emojilm-looped-transformer,
title={EmojiLM: Byte-Level Looped Transformer for Text-to-Emoji Translation},
author={Your Name},
year={2025},
howpublished={\url{https://huggingface.co/YOUR-USERNAME/emojilm-looped-transformer}}
}
``` -->
## Training Framework
This model was trained using the BFlowNet framework with looped transformer architecture.
Dataset: [KomeijiForce/Text2Emoji](https://huggingface.co/datasets/KomeijiForce/Text2Emoji)
## License
Apache 2.0