GitHub Actions: Automated update of SuperSheikh model artifacts (multi-modal, long-context)
28e8519
verified
| # SuperSheikh Multimodal Model | |
| A state-of-the-art multimodal language model that combines text, image, and audio understanding capabilities with an extended context window of 200,000 tokens. | |
| ## Model Description | |
| SuperSheikh is a transformer-based multimodal model designed for: | |
| - **Long-context understanding**: Supports up to 200,000 tokens | |
| - **Text processing**: Advanced natural language understanding and generation | |
| - **Image understanding**: Visual question answering and image captioning | |
| - **Audio processing**: Speech recognition and audio understanding | |
| - **Multimodal reasoning**: Combining information from multiple modalities | |
| ## Architecture | |
| - **Base Model**: Transformer decoder with 32 layers | |
| - **Hidden Size**: 4096 dimensions | |
| - **Attention Heads**: 32 heads | |
| - **Context Length**: 200,000 tokens | |
| - **Vision Module**: 24-layer vision transformer with 1024 hidden size | |
| - **Audio Module**: 12-layer audio transformer with 768 hidden size | |
| ## Installation | |
| ```bash | |
| pip install transformers torch tokenizers safetensors accelerate | |
| ``` | |
| Or install from requirements.txt: | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| ## Usage | |
| ### Download Model Weights | |
| The model weights (`sheikh.safetensors`) are too large for direct GitHub hosting. Download them from the Hugging Face Hub: | |
| ```bash | |
| wget --content-disposition "https://huggingface.co/codedwithlikhon/super-sheikh/resolve/main/sheikh.safetensors" | |
| ``` | |
| Or use the Hugging Face `transformers` library: | |
| ```python | |
| from transformers import AutoTokenizer, AutoModelForCausalLM | |
| tokenizer = AutoTokenizer.from_pretrained("codedwithlikhon/super-sheikh") | |
| model = AutoModelForCausalLM.from_pretrained("codedwithlikhon/super-sheikh", trust_remote_code=True) | |
| inputs = tokenizer("Hello, how are you?", return_tensors="pt") | |
| outputs = model.generate(**inputs, max_length=100) | |
| response = tokenizer.decode(outputs[0]) | |
| ``` | |
| ### Multimodal Processing | |
| ```python | |
| from transformers import SuperSheikhProcessor | |
| from PIL import Image | |
| processor = SuperSheikhProcessor.from_pretrained("path/to/super-sheikh") | |
| # Process text and image together | |
| text = "Describe this image" | |
| image = Image.open("image.jpg") | |
| inputs = processor(text=text, images=image, return_tensors="pt") | |
| ``` | |
| ## Features | |
| - **Long Context**: Extended context window for processing large documents | |
| - **Multimodal**: Supports text, image, and audio inputs | |
| - **Efficient**: Optimized for both training and inference | |
| - **Flexible**: Customizable for various downstream tasks | |
| ## Training | |
| The model was trained on a diverse dataset including: | |
| - Text corpora from books, articles, and web content | |
| - Image-text pairs from various vision-language datasets | |
| - Audio-text pairs from speech recognition datasets | |
| ### Tokenizer Training | |
| You can train a custom BPE tokenizer for SuperSheikh: | |
| ```python | |
| from tokenizer_super_sheikh import SuperSheikhTokenizer | |
| # Train tokenizer from dataset | |
| tokenizer = SuperSheikhTokenizer.train_from_iterator( | |
| text_iterator, | |
| vocab_size=50000, | |
| min_frequency=2, | |
| special_tokens=["<|startoftext|>", "<|endoftext|>", "<pad>", "<unk>"] | |
| ) | |
| # Save tokenizer files | |
| tokenizer.save_pretrained("path/to/save/directory") | |
| ``` | |
| ### Model Saving | |
| The model supports safetensors format for efficient storage: | |
| ```python | |
| # Save model with safetensors format | |
| model.save_pretrained( | |
| "path/to/save/directory", | |
| safe_serialization=True, | |
| max_shard_size="10GB" | |
| ) | |
| ``` | |
| This automatically generates: | |
| - `model.safetensors` (or sharded files) | |
| - `model.safetensors.index.json` (for sharded models) | |
| - `config.json` | |
| - `generation_config.json` (if present) | |
| - `chat_template.jinja` (if present for instruction-tuned models) | |
| ``` | |
| ### Supported File Formats | |
| The updated implementation generates standard tokenizer files: | |
| - `tokenizer.json` - Main tokenizer file | |
| - `vocab.json` - Vocabulary mapping | |
| - `merges.txt` - BPE merges | |
| - `tokenizer_config.json` - Tokenizer configuration | |
| - `special_tokens_map.json` - Special tokens mapping | |
| - `added_tokens.json` - Additional tokens (if any) | |
| ## Automated Deployment | |
| This repository includes automated deployment to Hugging Face Hub via GitHub Actions: | |
| ### Setup | |
| 1. **Fork or clone** this repository to your GitHub account | |
| 2. **Set up Hugging Face token**: | |
| - Go to [Hugging Face Settings > Access Tokens](https://huggingface.co/settings/tokens) | |
| - Create a new token with "Write" permissions | |
| - Add it to your GitHub repository secrets as `HF_TOKEN` | |
| 3. **Push to main branch** or use manual workflow dispatch | |
| ### Workflow Features | |
| - **Automatic deployment**: Triggers on pushes to `main` branch | |
| - **Manual deployment**: Can be triggered manually from GitHub Actions UI | |
| - **Complete model upload**: Automatically uploads all model files including: | |
| - Model weights (`*.safetensors`) | |
| - Tokenizer files (`tokenizer.json`, `vocab.json`, `merges.txt`) | |
| - Configuration files (`config.json`, `tokenizer_config.json`) | |
| - Chat template (`chat_template.jinja`) | |
| - Special tokens and additional metadata | |
| ### Repository Links | |
| - **GitHub**: [https://github.com/codedwithlikhon/super-sheikh](https://github.com/codedwithlikhon/super-sheikh) | |
| - **Hugging Face**: [https://huggingface.co/codedwithlikhon/super-sheikh](https://huggingface.co/codedwithlikhon/super-sheikh) | |
| The model will be automatically available on Hugging Face Hub after successful deployment! | |
| ## Limitations | |
| - Requires significant computational resources | |
| - Large model size may not be suitable for all deployment scenarios | |
| - Performance may vary depending on input quality and domain | |
| ## License | |
| This model is released under the MIT License. | |
| ## Citation | |
| If you use SuperSheikh in your research, please cite: | |
| ``` | |
| @misc{super-sheikh-2024, | |
| title={SuperSheikh: A Multimodal Long-Context Language Model}, | |
| author={SuperSheikh Team}, | |
| year={2024}, | |
| url={https://github.com/codedwithlikhon/super-sheikh} | |
| } | |
| ``` | |
| ## Contact | |
| For questions or support, please open an issue on our GitHub repository. | |