Emoji-ByteLM / README.md

Update README.md

f059c1c verified 2 months ago

3.97 kB

	---
	language:
	- en
	license: apache-2.0
	tags:
	- text-generation
	- emoji
	- byte-level
	- looped-transformer
	- text2emoji
	datasets:
	- KomeijiForce/Text2Emoji
	---

	# EmojiLM: Byte-Level Looped Transformer for Text-to-Emoji Translation

	This model is a byte-level language model trained with a byte-level Universal transformer (UT) architecture for translating text descriptions to emojis.

	## Model Description

	- Model Type: Causal Language Model with Looped Transformer Architecture
	- Task: Text-to-Emoji Translation
	- Training Data: KomeijiForce/Text2Emoji dataset (500k+ text-emoji pairs)
	- Tokenizer: Byte-level (vocab size: 258)

	### Architecture Details

	Looped Transformer Architecture:
	- Base Layers: 24
	- Number of Loops: 8 (layers are applied iteratively)
	- Shared Layers: True (parameter efficient)
	- Loop Residual: True (residual connections across loops)

	Model Dimensions:
	- Hidden Dimension: 1024
	- Number of Attention Heads: 16
	- KV Heads: 16
	- Max Sequence Length: 512
	- RoPE Theta: 10000.0

	### Training Configuration

	- Training Steps: 5100
	- Batch Size: 12
	- Sequence Length: 512
	- Learning Rate: 0.0003
	- Warmup Steps: 1000
	- Optimizer: AdamW (β1=0.9, β2=0.95)
	- LR Scheduler: Cosine with min ratio 0.1
	- Gradient Clipping: 1.0
	- Weight Decay: 0.1
	- Precision: BF16

	## What is a Looped Transformer?

	A looped transformer applies the same transformer layers multiple times in an iterative refinement process.
	This is particularly effective for translation tasks as it allows the model to:
	- Refine predictions through multiple iterations
	- Use parameters more efficiently (shared weights across loops)
	- Model complex input-output mappings with fewer total parameters

	In this model, 24 layers are applied 8 times with residual connections between loops.

	## Intended Use

	This model is designed to translate text descriptions into appropriate emojis.

	Example Usage:
	```
	Input: "I love pizza"
	Output: "🍕❤️"
	```

	## Training Data

	The model was trained on the KomeijiForce/Text2Emoji dataset, which contains over 500,000 text-emoji pairs.

	## Model Files

	This repository contains:
	- `consolidated.pth`: PyTorch model weights
	- `params.json`: Complete model and training configuration
	- `train_state_*.json`: Training state information from checkpoint

	## Usage

	To use this model, you'll need the original BFlowNet/loopedLM codebase to load the architecture:

	```python
	import torch
	import json

	# Load model parameters
	with open('params.json', 'r') as f:
	params = json.load(f)

	# Load model weights
	checkpoint = torch.load('consolidated.pth', map_location='cpu')

	# Initialize model with your BFlowNet loopedLM architecture
	# from apps.loopedLM import LoopedTransformer
	# model = LoopedTransformer(**params['model'])
	# model.load_state_dict(checkpoint)
	```

	### Generation Parameters

	For best results, use:
	- Max Tokens: 128 (outputs are typically short)
	- Temperature: 0.7 (for diverse emoji selection)
	- Top-p: 0.9

	## Limitations

	- The model uses a byte-level tokenizer, which works well for emojis but may be less efficient than subword tokenization for general text
	- Performance is optimized for text-to-emoji translation and may not generalize well to other tasks
	- The model requires the specific looped transformer architecture implementation to load and use

	<!-- ## Citation

	If you use this model, please cite:

	```bibtex
	@misc{emojilm-looped-transformer,
	title={EmojiLM: Byte-Level Looped Transformer for Text-to-Emoji Translation},
	author={Your Name},
	year={2025},
	howpublished={\url{https://huggingface.co/YOUR-USERNAME/emojilm-looped-transformer}}
	}
	``` -->

	## Training Framework

	This model was trained using the BFlowNet framework with looped transformer architecture.

	Dataset: [KomeijiForce/Text2Emoji](https://huggingface.co/datasets/KomeijiForce/Text2Emoji)

	## License

	Apache 2.0