You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

SuperSheikh Multimodal Model

A state-of-the-art multimodal language model that combines text, image, and audio understanding capabilities with an extended context window of 200,000 tokens.

Model Description

SuperSheikh is a transformer-based multimodal model designed for:

Long-context understanding: Supports up to 200,000 tokens
Text processing: Advanced natural language understanding and generation
Image understanding: Visual question answering and image captioning
Audio processing: Speech recognition and audio understanding
Multimodal reasoning: Combining information from multiple modalities

Architecture

Base Model: Transformer decoder with 32 layers
Hidden Size: 4096 dimensions
Attention Heads: 32 heads
Context Length: 200,000 tokens
Vision Module: 24-layer vision transformer with 1024 hidden size
Audio Module: 12-layer audio transformer with 768 hidden size

Installation

pip install transformers torch tokenizers safetensors accelerate

Or install from requirements.txt:

pip install -r requirements.txt

Usage

Download Model Weights

The model weights (sheikh.safetensors) are too large for direct GitHub hosting. Download them from the Hugging Face Hub:

wget --content-disposition "https://huggingface.co/codedwithlikhon/super-sheikh/resolve/main/sheikh.safetensors"

Or use the Hugging Face transformers library:

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("codedwithlikhon/super-sheikh")
model = AutoModelForCausalLM.from_pretrained("codedwithlikhon/super-sheikh", trust_remote_code=True)

inputs = tokenizer("Hello, how are you?", return_tensors="pt")
outputs = model.generate(**inputs, max_length=100)
response = tokenizer.decode(outputs[0])

Multimodal Processing

from transformers import SuperSheikhProcessor
from PIL import Image

processor = SuperSheikhProcessor.from_pretrained("path/to/super-sheikh")

# Process text and image together
text = "Describe this image"
image = Image.open("image.jpg")

inputs = processor(text=text, images=image, return_tensors="pt")

Features

Long Context: Extended context window for processing large documents
Multimodal: Supports text, image, and audio inputs
Efficient: Optimized for both training and inference
Flexible: Customizable for various downstream tasks

Training

The model was trained on a diverse dataset including:

Text corpora from books, articles, and web content
Image-text pairs from various vision-language datasets
Audio-text pairs from speech recognition datasets

Tokenizer Training

You can train a custom BPE tokenizer for SuperSheikh:

from tokenizer_super_sheikh import SuperSheikhTokenizer

# Train tokenizer from dataset
tokenizer = SuperSheikhTokenizer.train_from_iterator(
    text_iterator,
    vocab_size=50000,
    min_frequency=2,
    special_tokens=["<|startoftext|>", "<|endoftext|>", "<pad>", "<unk>"]
)

# Save tokenizer files
tokenizer.save_pretrained("path/to/save/directory")

Model Saving

The model supports safetensors format for efficient storage:

# Save model with safetensors format
model.save_pretrained(
    "path/to/save/directory",
    safe_serialization=True,
    max_shard_size="10GB"
)

This automatically generates:

model.safetensors (or sharded files)
model.safetensors.index.json (for sharded models)
config.json
generation_config.json (if present)
chat_template.jinja (if present for instruction-tuned models)


### Supported File Formats

The updated implementation generates standard tokenizer files:
- `tokenizer.json` - Main tokenizer file
- `vocab.json` - Vocabulary mapping
- `merges.txt` - BPE merges
- `tokenizer_config.json` - Tokenizer configuration
- `special_tokens_map.json` - Special tokens mapping
- `added_tokens.json` - Additional tokens (if any)

## Automated Deployment

This repository includes automated deployment to Hugging Face Hub via GitHub Actions:

### Setup

1. **Fork or clone** this repository to your GitHub account
2. **Set up Hugging Face token**:
   - Go to [Hugging Face Settings > Access Tokens](https://huggingface.co/settings/tokens)
   - Create a new token with "Write" permissions
   - Add it to your GitHub repository secrets as `HF_TOKEN`
3. **Push to main branch** or use manual workflow dispatch

### Workflow Features

- **Automatic deployment**: Triggers on pushes to `main` branch
- **Manual deployment**: Can be triggered manually from GitHub Actions UI
- **Complete model upload**: Automatically uploads all model files including:
  - Model weights (`*.safetensors`)
  - Tokenizer files (`tokenizer.json`, `vocab.json`, `merges.txt`)
  - Configuration files (`config.json`, `tokenizer_config.json`)
  - Chat template (`chat_template.jinja`)
  - Special tokens and additional metadata

### Repository Links

- **GitHub**: [https://github.com/codedwithlikhon/super-sheikh](https://github.com/codedwithlikhon/super-sheikh)
- **Hugging Face**: [https://huggingface.co/codedwithlikhon/super-sheikh](https://huggingface.co/codedwithlikhon/super-sheikh)

The model will be automatically available on Hugging Face Hub after successful deployment!

## Limitations

- Requires significant computational resources
- Large model size may not be suitable for all deployment scenarios
- Performance may vary depending on input quality and domain

## License

This model is released under the MIT License.

## Citation

If you use SuperSheikh in your research, please cite:

@misc{super-sheikh-2024, title={SuperSheikh: A Multimodal Long-Context Language Model}, author={SuperSheikh Team}, year={2024}, url={https://github.com/codedwithlikhon/super-sheikh} }


## Contact

For questions or support, please open an issue on our GitHub repository.

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support