Instructions to use LesterCerioli/LLM-GO with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use LesterCerioli/LLM-GO with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="LesterCerioli/LLM-GO")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("LesterCerioli/LLM-GO", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use LesterCerioli/LLM-GO with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "LesterCerioli/LLM-GO" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LesterCerioli/LLM-GO", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/LesterCerioli/LLM-GO
- SGLang
How to use LesterCerioli/LLM-GO with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "LesterCerioli/LLM-GO" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LesterCerioli/LLM-GO", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "LesterCerioli/LLM-GO" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LesterCerioli/LLM-GO", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use LesterCerioli/LLM-GO with Docker Model Runner:
docker model run hf.co/LesterCerioli/LLM-GO
| """CLI: run the full training loop.""" | |
| import click | |
| import tensorflow as tf | |
| from llm_go.config import ModelConfig, TrainingConfig | |
| from llm_go.data.dataset import GoDataset | |
| from llm_go.training.trainer import Trainer | |
| def main(model_size, data_dir, ckpt_dir, log_dir, batch_size, max_steps, | |
| lr, warmup_steps, grad_accum, precision, gpus): | |
| """Train GoLLM from scratch.""" | |
| mc = {"small": ModelConfig.small, "medium": ModelConfig.medium, | |
| "large": ModelConfig.large, "xl": ModelConfig.xl}[model_size]() | |
| tc = TrainingConfig( | |
| learning_rate=lr, | |
| warmup_steps=warmup_steps, | |
| max_steps=max_steps, | |
| batch_size=batch_size, | |
| gradient_accumulation_steps=grad_accum, | |
| mixed_precision=precision, | |
| checkpoint_dir=ckpt_dir, | |
| log_dir=log_dir, | |
| ) | |
| ds = GoDataset(data_dir, seq_len=mc.max_seq_len, batch_size=batch_size) | |
| if gpus == 1: | |
| strategy = tf.distribute.OneDeviceStrategy("/gpu:0") | |
| elif gpus == -1: | |
| strategy = tf.distribute.MirroredStrategy() | |
| else: | |
| devices = [f"/gpu:{i}" for i in range(gpus)] | |
| strategy = tf.distribute.MirroredStrategy(devices=devices) | |
| trainer = Trainer(mc, tc, ds.train(), ds.val(), strategy=strategy) | |
| trainer.train() | |