super-sheikh / README.md
codedwithlikhon's picture
GitHub Actions: Automated update of SuperSheikh model artifacts (multi-modal, long-context)
28e8519 verified
# SuperSheikh Multimodal Model
A state-of-the-art multimodal language model that combines text, image, and audio understanding capabilities with an extended context window of 200,000 tokens.
## Model Description
SuperSheikh is a transformer-based multimodal model designed for:
- **Long-context understanding**: Supports up to 200,000 tokens
- **Text processing**: Advanced natural language understanding and generation
- **Image understanding**: Visual question answering and image captioning
- **Audio processing**: Speech recognition and audio understanding
- **Multimodal reasoning**: Combining information from multiple modalities
## Architecture
- **Base Model**: Transformer decoder with 32 layers
- **Hidden Size**: 4096 dimensions
- **Attention Heads**: 32 heads
- **Context Length**: 200,000 tokens
- **Vision Module**: 24-layer vision transformer with 1024 hidden size
- **Audio Module**: 12-layer audio transformer with 768 hidden size
## Installation
```bash
pip install transformers torch tokenizers safetensors accelerate
```
Or install from requirements.txt:
```bash
pip install -r requirements.txt
```
## Usage
### Download Model Weights
The model weights (`sheikh.safetensors`) are too large for direct GitHub hosting. Download them from the Hugging Face Hub:
```bash
wget --content-disposition "https://huggingface.co/codedwithlikhon/super-sheikh/resolve/main/sheikh.safetensors"
```
Or use the Hugging Face `transformers` library:
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("codedwithlikhon/super-sheikh")
model = AutoModelForCausalLM.from_pretrained("codedwithlikhon/super-sheikh", trust_remote_code=True)
inputs = tokenizer("Hello, how are you?", return_tensors="pt")
outputs = model.generate(**inputs, max_length=100)
response = tokenizer.decode(outputs[0])
```
### Multimodal Processing
```python
from transformers import SuperSheikhProcessor
from PIL import Image
processor = SuperSheikhProcessor.from_pretrained("path/to/super-sheikh")
# Process text and image together
text = "Describe this image"
image = Image.open("image.jpg")
inputs = processor(text=text, images=image, return_tensors="pt")
```
## Features
- **Long Context**: Extended context window for processing large documents
- **Multimodal**: Supports text, image, and audio inputs
- **Efficient**: Optimized for both training and inference
- **Flexible**: Customizable for various downstream tasks
## Training
The model was trained on a diverse dataset including:
- Text corpora from books, articles, and web content
- Image-text pairs from various vision-language datasets
- Audio-text pairs from speech recognition datasets
### Tokenizer Training
You can train a custom BPE tokenizer for SuperSheikh:
```python
from tokenizer_super_sheikh import SuperSheikhTokenizer
# Train tokenizer from dataset
tokenizer = SuperSheikhTokenizer.train_from_iterator(
text_iterator,
vocab_size=50000,
min_frequency=2,
special_tokens=["<|startoftext|>", "<|endoftext|>", "<pad>", "<unk>"]
)
# Save tokenizer files
tokenizer.save_pretrained("path/to/save/directory")
```
### Model Saving
The model supports safetensors format for efficient storage:
```python
# Save model with safetensors format
model.save_pretrained(
"path/to/save/directory",
safe_serialization=True,
max_shard_size="10GB"
)
```
This automatically generates:
- `model.safetensors` (or sharded files)
- `model.safetensors.index.json` (for sharded models)
- `config.json`
- `generation_config.json` (if present)
- `chat_template.jinja` (if present for instruction-tuned models)
```
### Supported File Formats
The updated implementation generates standard tokenizer files:
- `tokenizer.json` - Main tokenizer file
- `vocab.json` - Vocabulary mapping
- `merges.txt` - BPE merges
- `tokenizer_config.json` - Tokenizer configuration
- `special_tokens_map.json` - Special tokens mapping
- `added_tokens.json` - Additional tokens (if any)
## Automated Deployment
This repository includes automated deployment to Hugging Face Hub via GitHub Actions:
### Setup
1. **Fork or clone** this repository to your GitHub account
2. **Set up Hugging Face token**:
- Go to [Hugging Face Settings > Access Tokens](https://huggingface.co/settings/tokens)
- Create a new token with "Write" permissions
- Add it to your GitHub repository secrets as `HF_TOKEN`
3. **Push to main branch** or use manual workflow dispatch
### Workflow Features
- **Automatic deployment**: Triggers on pushes to `main` branch
- **Manual deployment**: Can be triggered manually from GitHub Actions UI
- **Complete model upload**: Automatically uploads all model files including:
- Model weights (`*.safetensors`)
- Tokenizer files (`tokenizer.json`, `vocab.json`, `merges.txt`)
- Configuration files (`config.json`, `tokenizer_config.json`)
- Chat template (`chat_template.jinja`)
- Special tokens and additional metadata
### Repository Links
- **GitHub**: [https://github.com/codedwithlikhon/super-sheikh](https://github.com/codedwithlikhon/super-sheikh)
- **Hugging Face**: [https://huggingface.co/codedwithlikhon/super-sheikh](https://huggingface.co/codedwithlikhon/super-sheikh)
The model will be automatically available on Hugging Face Hub after successful deployment!
## Limitations
- Requires significant computational resources
- Large model size may not be suitable for all deployment scenarios
- Performance may vary depending on input quality and domain
## License
This model is released under the MIT License.
## Citation
If you use SuperSheikh in your research, please cite:
```
@misc{super-sheikh-2024,
title={SuperSheikh: A Multimodal Long-Context Language Model},
author={SuperSheikh Team},
year={2024},
url={https://github.com/codedwithlikhon/super-sheikh}
}
```
## Contact
For questions or support, please open an issue on our GitHub repository.