--- language: - en license: apache-2.0 tags: - text-generation - emoji - byte-level - looped-transformer - text2emoji datasets: - KomeijiForce/Text2Emoji --- # EmojiLM: Byte-Level Looped Transformer for Text-to-Emoji Translation This model is a byte-level language model trained with a **byte-level Universal transformer (UT)** architecture for translating text descriptions to emojis. ## Model Description - **Model Type:** Causal Language Model with Looped Transformer Architecture - **Task:** Text-to-Emoji Translation - **Training Data:** KomeijiForce/Text2Emoji dataset (500k+ text-emoji pairs) - **Tokenizer:** Byte-level (vocab size: 258) ### Architecture Details **Looped Transformer Architecture:** - **Base Layers:** 24 - **Number of Loops:** 8 (layers are applied iteratively) - **Shared Layers:** True (parameter efficient) - **Loop Residual:** True (residual connections across loops) **Model Dimensions:** - **Hidden Dimension:** 1024 - **Number of Attention Heads:** 16 - **KV Heads:** 16 - **Max Sequence Length:** 512 - **RoPE Theta:** 10000.0 ### Training Configuration - **Training Steps:** 5100 - **Batch Size:** 12 - **Sequence Length:** 512 - **Learning Rate:** 0.0003 - **Warmup Steps:** 1000 - **Optimizer:** AdamW (β1=0.9, β2=0.95) - **LR Scheduler:** Cosine with min ratio 0.1 - **Gradient Clipping:** 1.0 - **Weight Decay:** 0.1 - **Precision:** BF16 ## What is a Looped Transformer? A looped transformer applies the same transformer layers multiple times in an iterative refinement process. This is particularly effective for translation tasks as it allows the model to: - Refine predictions through multiple iterations - Use parameters more efficiently (shared weights across loops) - Model complex input-output mappings with fewer total parameters In this model, 24 layers are applied 8 times with residual connections between loops. ## Intended Use This model is designed to translate text descriptions into appropriate emojis. **Example Usage:** ``` Input: "I love pizza" Output: "🍕❤️" ``` ## Training Data The model was trained on the **KomeijiForce/Text2Emoji** dataset, which contains over 500,000 text-emoji pairs. ## Model Files This repository contains: - `consolidated.pth`: PyTorch model weights - `params.json`: Complete model and training configuration - `train_state_*.json`: Training state information from checkpoint ## Usage To use this model, you'll need the original BFlowNet/loopedLM codebase to load the architecture: ```python import torch import json # Load model parameters with open('params.json', 'r') as f: params = json.load(f) # Load model weights checkpoint = torch.load('consolidated.pth', map_location='cpu') # Initialize model with your BFlowNet loopedLM architecture # from apps.loopedLM import LoopedTransformer # model = LoopedTransformer(**params['model']) # model.load_state_dict(checkpoint) ``` ### Generation Parameters For best results, use: - **Max Tokens:** 128 (outputs are typically short) - **Temperature:** 0.7 (for diverse emoji selection) - **Top-p:** 0.9 ## Limitations - The model uses a byte-level tokenizer, which works well for emojis but may be less efficient than subword tokenization for general text - Performance is optimized for text-to-emoji translation and may not generalize well to other tasks - The model requires the specific looped transformer architecture implementation to load and use ## Training Framework This model was trained using the BFlowNet framework with looped transformer architecture. Dataset: [KomeijiForce/Text2Emoji](https://huggingface.co/datasets/KomeijiForce/Text2Emoji) ## License Apache 2.0