Instructions to use LesterCerioli/LLM-GO with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use LesterCerioli/LLM-GO with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="LesterCerioli/LLM-GO")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("LesterCerioli/LLM-GO", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use LesterCerioli/LLM-GO with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "LesterCerioli/LLM-GO" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LesterCerioli/LLM-GO", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/LesterCerioli/LLM-GO
- SGLang
How to use LesterCerioli/LLM-GO with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "LesterCerioli/LLM-GO" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LesterCerioli/LLM-GO", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "LesterCerioli/LLM-GO" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LesterCerioli/LLM-GO", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use LesterCerioli/LLM-GO with Docker Model Runner:
docker model run hf.co/LesterCerioli/LLM-GO
| # scripts/train.sh — executa o treinamento do GoLLM | |
| set -euo pipefail | |
| MODEL_SIZE=${MODEL_SIZE:-medium} # small | medium | large | xl | |
| DATA_DIR=${DATA_DIR:-data/processed} | |
| CKPT_DIR=${CKPT_DIR:-checkpoints} | |
| LOG_DIR=${LOG_DIR:-logs} | |
| BATCH_SIZE=${BATCH_SIZE:-32} | |
| MAX_STEPS=${MAX_STEPS:-100000} | |
| WARMUP_STEPS=${WARMUP_STEPS:-2000} | |
| GRAD_ACCUM=${GRAD_ACCUM:-4} | |
| PRECISION=${PRECISION:-bfloat16} # float32 | float16 | bfloat16 | |
| GPUS=${GPUS:--1} # -1 = todos, 0 = GPU 0, 1 = GPU 1... | |
| # Verifica dados pré-processados | |
| TRAIN_SHARDS=$(find "$DATA_DIR/train" -name "*.tfrecord" 2>/dev/null | wc -l) | |
| if [ "$TRAIN_SHARDS" -eq 0 ]; then | |
| echo "ERRO: Nenhum TFRecord encontrado em $DATA_DIR/train." | |
| echo " Execute scripts/preprocess.sh primeiro." | |
| exit 1 | |
| fi | |
| echo "==> Iniciando treinamento GoLLM" | |
| echo " model_size=$MODEL_SIZE batch=$BATCH_SIZE steps=$MAX_STEPS" | |
| echo " precision=$PRECISION gpus=$GPUS grad_accum=$GRAD_ACCUM" | |
| echo " train_shards=$TRAIN_SHARDS" | |
| echo "" | |
| # XLA JIT compilation (aceleração de CPU/GPU) | |
| export TF_XLA_FLAGS="--tf_xla_auto_jit=2" | |
| export TF_ENABLE_ONEDNN_OPTS=0 | |
| llm-go-train \ | |
| --model-size "$MODEL_SIZE" \ | |
| --data-dir "$DATA_DIR" \ | |
| --ckpt-dir "$CKPT_DIR" \ | |
| --log-dir "$LOG_DIR" \ | |
| --batch-size "$BATCH_SIZE" \ | |
| --max-steps "$MAX_STEPS" \ | |
| --warmup-steps "$WARMUP_STEPS" \ | |
| --grad-accum "$GRAD_ACCUM" \ | |
| --precision "$PRECISION" \ | |
| --gpus "$GPUS" | |
| echo "" | |
| echo "✅ Treinamento concluído. Checkpoints em $CKPT_DIR" | |