File size: 3,972 Bytes

---
language:
- en
license: apache-2.0
tags:
- text-generation
- emoji
- byte-level
- looped-transformer
- text2emoji
datasets:
- KomeijiForce/Text2Emoji
---

# EmojiLM: Byte-Level Looped Transformer for Text-to-Emoji Translation

This model is a byte-level language model trained with a **byte-level Universal transformer (UT)** architecture for translating text descriptions to emojis.

## Model Description

- **Model Type:** Causal Language Model with Looped Transformer Architecture
- **Task:** Text-to-Emoji Translation
- **Training Data:** KomeijiForce/Text2Emoji dataset (500k+ text-emoji pairs)
- **Tokenizer:** Byte-level (vocab size: 258)

### Architecture Details

**Looped Transformer Architecture:**
- **Base Layers:** 24
- **Number of Loops:** 8 (layers are applied iteratively)
- **Shared Layers:** True (parameter efficient)
- **Loop Residual:** True (residual connections across loops)

**Model Dimensions:**
- **Hidden Dimension:** 1024
- **Number of Attention Heads:** 16
- **KV Heads:** 16
- **Max Sequence Length:** 512
- **RoPE Theta:** 10000.0

### Training Configuration

- **Training Steps:** 5100
- **Batch Size:** 12
- **Sequence Length:** 512
- **Learning Rate:** 0.0003
- **Warmup Steps:** 1000
- **Optimizer:** AdamW (β1=0.9, β2=0.95)
- **LR Scheduler:** Cosine with min ratio 0.1
- **Gradient Clipping:** 1.0
- **Weight Decay:** 0.1
- **Precision:** BF16

## What is a Looped Transformer?

A looped transformer applies the same transformer layers multiple times in an iterative refinement process. 
This is particularly effective for translation tasks as it allows the model to:
- Refine predictions through multiple iterations
- Use parameters more efficiently (shared weights across loops)
- Model complex input-output mappings with fewer total parameters

In this model, 24 layers are applied 8 times with residual connections between loops.

## Intended Use

This model is designed to translate text descriptions into appropriate emojis.

**Example Usage:**
```
Input: "I love pizza"
Output: "🍕❤️"
```

## Training Data

The model was trained on the **KomeijiForce/Text2Emoji** dataset, which contains over 500,000 text-emoji pairs.

## Model Files

This repository contains:
- `consolidated.pth`: PyTorch model weights
- `params.json`: Complete model and training configuration
- `train_state_*.json`: Training state information from checkpoint

## Usage

To use this model, you'll need the original BFlowNet/loopedLM codebase to load the architecture:

```python
import torch
import json

# Load model parameters
with open('params.json', 'r') as f:
    params = json.load(f)

# Load model weights
checkpoint = torch.load('consolidated.pth', map_location='cpu')

# Initialize model with your BFlowNet loopedLM architecture
# from apps.loopedLM import LoopedTransformer
# model = LoopedTransformer(**params['model'])
# model.load_state_dict(checkpoint)
```

### Generation Parameters

For best results, use:
- **Max Tokens:** 128 (outputs are typically short)
- **Temperature:** 0.7 (for diverse emoji selection)
- **Top-p:** 0.9

## Limitations

- The model uses a byte-level tokenizer, which works well for emojis but may be less efficient than subword tokenization for general text
- Performance is optimized for text-to-emoji translation and may not generalize well to other tasks
- The model requires the specific looped transformer architecture implementation to load and use

<!-- ## Citation

If you use this model, please cite:

```bibtex
@misc{emojilm-looped-transformer,
  title={EmojiLM: Byte-Level Looped Transformer for Text-to-Emoji Translation},
  author={Your Name},
  year={2025},
  howpublished={\url{https://huggingface.co/YOUR-USERNAME/emojilm-looped-transformer}}
}
``` -->

## Training Framework

This model was trained using the BFlowNet framework with looped transformer architecture.

Dataset: [KomeijiForce/Text2Emoji](https://huggingface.co/datasets/KomeijiForce/Text2Emoji)

## License

Apache 2.0