Instructions to use Cheng98/llama-39m with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Cheng98/llama-39m with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Cheng98/llama-39m")# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("Cheng98/llama-39m") model = AutoModelForMultimodalLM.from_pretrained("Cheng98/llama-39m") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use Cheng98/llama-39m with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Cheng98/llama-39m" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Cheng98/llama-39m", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/Cheng98/llama-39m
- SGLang
How to use Cheng98/llama-39m with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Cheng98/llama-39m" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Cheng98/llama-39m", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Cheng98/llama-39m" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Cheng98/llama-39m", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use Cheng98/llama-39m with Docker Model Runner:
docker model run hf.co/Cheng98/llama-39m
Toy LLaMA-39M
This is a tiny LLaMA model pretrained on Recag/Rp_C4_55, a small subset of C4 with
seq_len=512.- Model architecture
{ "hidden_size": 512, "intermediate_size": 2048, "max_position_embeddings": 2048, "num_attention_heads": 8, "num_hidden_layers": 2, "num_key_value_heads": 8 } - Load model and tokenizer:
from transformers import AutoTokenizer, AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("Cheng98/llama-39m") tokenizer = AutoTokenizer.from_pretrained("Cheng98/llama-39m") - Training script: huggingface/transformers/examples/pytorch/language-modeling/run_clm.py
# "train" split is created from the last 95% samples of original "train" subset raw_datasets["validation"] = load_dataset("Recag/Rp_C4_55", split="train[5%:]")
- Model architecture
Evaluation (
seq_len=512):Dataset Eval loss Perplexity Accuracy block_size Recag/Rp_C4_55 3.63 37.78 0.3561 512 Wikitext2 4.58 97.48 0.2719 512 Evaluation command (Wikitext2):
# Evaluation command python run_clm.py --model_name_or_path Cheng98/llama-39m \ --dataset_name wikitext \ --dataset_config_name wikitext-2-raw-v1 \ --block_size 512 \ --do_eval \ --output_dir ./resultsEvaluation on Recag/Rp_C4_55 (
seq_len=512):# "validation" split is created from the first 5% samples of original "train" subset raw_datasets["validation"] = load_dataset("Recag/Rp_C4_55", split="train[:5%]")Results
{ "eval_accuracy": 0.3561766818954313, "eval_loss": 3.6318140029907227, "eval_runtime": 190.8411, "eval_samples": 19413, "eval_samples_per_second": 101.723, "eval_steps_per_second": 1.593, "perplexity": 37.7812898658763 }Evaluation on Wikitext2 (
seq_len=512):{ "eval_accuracy": 0.2718795201225219, "eval_loss": 4.579628944396973, "eval_runtime": 3.939, "eval_samples": 575, "eval_samples_per_second": 145.976, "eval_steps_per_second": 0.762, "perplexity": 97.47821765687856 }
- Downloads last month
- 84