Instructions to use Gothicdreams/i3-tiny with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Gothicdreams/i3-tiny with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Gothicdreams/i3-tiny", trust_remote_code=True)# Load model directly from transformers import i3 model = i3.from_pretrained("Gothicdreams/i3-tiny", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Gothicdreams/i3-tiny with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Gothicdreams/i3-tiny" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Gothicdreams/i3-tiny", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Gothicdreams/i3-tiny
- SGLang
How to use Gothicdreams/i3-tiny with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Gothicdreams/i3-tiny" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Gothicdreams/i3-tiny", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Gothicdreams/i3-tiny" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Gothicdreams/i3-tiny", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Gothicdreams/i3-tiny with Docker Model Runner:
docker model run hf.co/Gothicdreams/i3-tiny
| import torch | |
| from torch import nn | |
| from transformers import PreTrainedModel, PretrainedConfig | |
| from i3_modules import i3Model # import the original i3Model class | |
| class i3Config(PretrainedConfig): | |
| model_type = "i3" | |
| def __init__(self, vocab_size=34, d_model=256, n_layers=6, n_heads=8, | |
| max_seq_len=128, rank=8, d_state=16, **kwargs): | |
| super().__init__(**kwargs) | |
| self.vocab_size = vocab_size | |
| self.d_model = d_model | |
| self.n_layers = n_layers | |
| self.n_heads = n_heads | |
| self.max_seq_len = max_seq_len | |
| self.rank = rank | |
| self.d_state = d_state | |
| class i3(PreTrainedModel): | |
| config_class = i3Config | |
| base_model_prefix = "i3" | |
| def __init__(self, config): | |
| super().__init__(config) | |
| self.model = i3Model( | |
| vocab_size=config.vocab_size, | |
| d_model=config.d_model, | |
| n_layers=config.n_layers, | |
| n_heads=config.n_heads, | |
| max_seq_len=config.max_seq_len, | |
| rank=config.rank, | |
| d_state=config.d_state | |
| ) | |
| def forward(self, input_ids, labels=None): | |
| return self.model(input_ids, labels) | |