File size: 3,972 Bytes
65ead92 befb5c1 65ead92 f059c1c 65ead92 f059c1c 65ead92 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 | ---
language:
- en
license: apache-2.0
tags:
- text-generation
- emoji
- byte-level
- looped-transformer
- text2emoji
datasets:
- KomeijiForce/Text2Emoji
---
# EmojiLM: Byte-Level Looped Transformer for Text-to-Emoji Translation
This model is a byte-level language model trained with a **byte-level Universal transformer (UT)** architecture for translating text descriptions to emojis.
## Model Description
- **Model Type:** Causal Language Model with Looped Transformer Architecture
- **Task:** Text-to-Emoji Translation
- **Training Data:** KomeijiForce/Text2Emoji dataset (500k+ text-emoji pairs)
- **Tokenizer:** Byte-level (vocab size: 258)
### Architecture Details
**Looped Transformer Architecture:**
- **Base Layers:** 24
- **Number of Loops:** 8 (layers are applied iteratively)
- **Shared Layers:** True (parameter efficient)
- **Loop Residual:** True (residual connections across loops)
**Model Dimensions:**
- **Hidden Dimension:** 1024
- **Number of Attention Heads:** 16
- **KV Heads:** 16
- **Max Sequence Length:** 512
- **RoPE Theta:** 10000.0
### Training Configuration
- **Training Steps:** 5100
- **Batch Size:** 12
- **Sequence Length:** 512
- **Learning Rate:** 0.0003
- **Warmup Steps:** 1000
- **Optimizer:** AdamW (β1=0.9, β2=0.95)
- **LR Scheduler:** Cosine with min ratio 0.1
- **Gradient Clipping:** 1.0
- **Weight Decay:** 0.1
- **Precision:** BF16
## What is a Looped Transformer?
A looped transformer applies the same transformer layers multiple times in an iterative refinement process.
This is particularly effective for translation tasks as it allows the model to:
- Refine predictions through multiple iterations
- Use parameters more efficiently (shared weights across loops)
- Model complex input-output mappings with fewer total parameters
In this model, 24 layers are applied 8 times with residual connections between loops.
## Intended Use
This model is designed to translate text descriptions into appropriate emojis.
**Example Usage:**
```
Input: "I love pizza"
Output: "🍕❤️"
```
## Training Data
The model was trained on the **KomeijiForce/Text2Emoji** dataset, which contains over 500,000 text-emoji pairs.
## Model Files
This repository contains:
- `consolidated.pth`: PyTorch model weights
- `params.json`: Complete model and training configuration
- `train_state_*.json`: Training state information from checkpoint
## Usage
To use this model, you'll need the original BFlowNet/loopedLM codebase to load the architecture:
```python
import torch
import json
# Load model parameters
with open('params.json', 'r') as f:
params = json.load(f)
# Load model weights
checkpoint = torch.load('consolidated.pth', map_location='cpu')
# Initialize model with your BFlowNet loopedLM architecture
# from apps.loopedLM import LoopedTransformer
# model = LoopedTransformer(**params['model'])
# model.load_state_dict(checkpoint)
```
### Generation Parameters
For best results, use:
- **Max Tokens:** 128 (outputs are typically short)
- **Temperature:** 0.7 (for diverse emoji selection)
- **Top-p:** 0.9
## Limitations
- The model uses a byte-level tokenizer, which works well for emojis but may be less efficient than subword tokenization for general text
- Performance is optimized for text-to-emoji translation and may not generalize well to other tasks
- The model requires the specific looped transformer architecture implementation to load and use
<!-- ## Citation
If you use this model, please cite:
```bibtex
@misc{emojilm-looped-transformer,
title={EmojiLM: Byte-Level Looped Transformer for Text-to-Emoji Translation},
author={Your Name},
year={2025},
howpublished={\url{https://huggingface.co/YOUR-USERNAME/emojilm-looped-transformer}}
}
``` -->
## Training Framework
This model was trained using the BFlowNet framework with looped transformer architecture.
Dataset: [KomeijiForce/Text2Emoji](https://huggingface.co/datasets/KomeijiForce/Text2Emoji)
## License
Apache 2.0
|