File size: 3,972 Bytes
65ead92
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
befb5c1
65ead92
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f059c1c
65ead92
 
 
 
 
 
 
 
 
 
f059c1c
65ead92
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
---
language:
- en
license: apache-2.0
tags:
- text-generation
- emoji
- byte-level
- looped-transformer
- text2emoji
datasets:
- KomeijiForce/Text2Emoji
---

# EmojiLM: Byte-Level Looped Transformer for Text-to-Emoji Translation

This model is a byte-level language model trained with a **byte-level Universal transformer (UT)** architecture for translating text descriptions to emojis.

## Model Description

- **Model Type:** Causal Language Model with Looped Transformer Architecture
- **Task:** Text-to-Emoji Translation
- **Training Data:** KomeijiForce/Text2Emoji dataset (500k+ text-emoji pairs)
- **Tokenizer:** Byte-level (vocab size: 258)

### Architecture Details

**Looped Transformer Architecture:**
- **Base Layers:** 24
- **Number of Loops:** 8 (layers are applied iteratively)
- **Shared Layers:** True (parameter efficient)
- **Loop Residual:** True (residual connections across loops)

**Model Dimensions:**
- **Hidden Dimension:** 1024
- **Number of Attention Heads:** 16
- **KV Heads:** 16
- **Max Sequence Length:** 512
- **RoPE Theta:** 10000.0

### Training Configuration

- **Training Steps:** 5100
- **Batch Size:** 12
- **Sequence Length:** 512
- **Learning Rate:** 0.0003
- **Warmup Steps:** 1000
- **Optimizer:** AdamW (β1=0.9, β2=0.95)
- **LR Scheduler:** Cosine with min ratio 0.1
- **Gradient Clipping:** 1.0
- **Weight Decay:** 0.1
- **Precision:** BF16

## What is a Looped Transformer?

A looped transformer applies the same transformer layers multiple times in an iterative refinement process. 
This is particularly effective for translation tasks as it allows the model to:
- Refine predictions through multiple iterations
- Use parameters more efficiently (shared weights across loops)
- Model complex input-output mappings with fewer total parameters

In this model, 24 layers are applied 8 times with residual connections between loops.

## Intended Use

This model is designed to translate text descriptions into appropriate emojis.

**Example Usage:**
```
Input: "I love pizza"
Output: "🍕❤️"
```

## Training Data

The model was trained on the **KomeijiForce/Text2Emoji** dataset, which contains over 500,000 text-emoji pairs.

## Model Files

This repository contains:
- `consolidated.pth`: PyTorch model weights
- `params.json`: Complete model and training configuration
- `train_state_*.json`: Training state information from checkpoint

## Usage

To use this model, you'll need the original BFlowNet/loopedLM codebase to load the architecture:

```python
import torch
import json

# Load model parameters
with open('params.json', 'r') as f:
    params = json.load(f)

# Load model weights
checkpoint = torch.load('consolidated.pth', map_location='cpu')

# Initialize model with your BFlowNet loopedLM architecture
# from apps.loopedLM import LoopedTransformer
# model = LoopedTransformer(**params['model'])
# model.load_state_dict(checkpoint)
```

### Generation Parameters

For best results, use:
- **Max Tokens:** 128 (outputs are typically short)
- **Temperature:** 0.7 (for diverse emoji selection)
- **Top-p:** 0.9

## Limitations

- The model uses a byte-level tokenizer, which works well for emojis but may be less efficient than subword tokenization for general text
- Performance is optimized for text-to-emoji translation and may not generalize well to other tasks
- The model requires the specific looped transformer architecture implementation to load and use

<!-- ## Citation

If you use this model, please cite:

```bibtex
@misc{emojilm-looped-transformer,
  title={EmojiLM: Byte-Level Looped Transformer for Text-to-Emoji Translation},
  author={Your Name},
  year={2025},
  howpublished={\url{https://huggingface.co/YOUR-USERNAME/emojilm-looped-transformer}}
}
``` -->

## Training Framework

This model was trained using the BFlowNet framework with looped transformer architecture.

Dataset: [KomeijiForce/Text2Emoji](https://huggingface.co/datasets/KomeijiForce/Text2Emoji)

## License

Apache 2.0