Instructions to use smithblack-0/llama3_baseline_dev with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use smithblack-0/llama3_baseline_dev with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="smithblack-0/llama3_baseline_dev", trust_remote_code=True)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("smithblack-0/llama3_baseline_dev", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use smithblack-0/llama3_baseline_dev with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "smithblack-0/llama3_baseline_dev" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "smithblack-0/llama3_baseline_dev", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/smithblack-0/llama3_baseline_dev
- SGLang
How to use smithblack-0/llama3_baseline_dev with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "smithblack-0/llama3_baseline_dev" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "smithblack-0/llama3_baseline_dev", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "smithblack-0/llama3_baseline_dev" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "smithblack-0/llama3_baseline_dev", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use smithblack-0/llama3_baseline_dev with Docker Model Runner:
docker model run hf.co/smithblack-0/llama3_baseline_dev
| language: | |
| - en | |
| license: mit | |
| library_name: transformers | |
| pipeline_tag: text-generation | |
| tags: | |
| - pytorch | |
| - research | |
| - llama | |
| # advanced-transformers-lib -- Llama 3 Baseline | |
| A Llama 3-style decoder-only transformer architecture for research. No pretrained | |
| weights -- pull the architecture from the Hub and instantiate a freshly initialised | |
| model from config. Override any parameter at instantiation time. | |
| > **Important:** `trust_remote_code=True` is required. It downloads the architecture | |
| > source files from the Hub and imports them into your Python process. Review the | |
| > source at [smithblack-0/llama3_baseline_dev](https://huggingface.co/smithblack-0/llama3_baseline_dev) before use. | |
| ## Usage | |
| ```python | |
| from transformers import AutoConfig, AutoModelForCausalLM, AutoTokenizer | |
| # Pull architecture config -- override any parameter at instantiation time | |
| config = AutoConfig.from_pretrained( | |
| "smithblack-0/llama3_baseline_dev", | |
| trust_remote_code=True, | |
| num_hidden_layers=16, # example override | |
| ) | |
| # Instantiate with fresh random weights -- no checkpoint required | |
| model = AutoModelForCausalLM.from_config(config, trust_remote_code=True) | |
| # Load tokenizer | |
| tokenizer = AutoTokenizer.from_pretrained("smithblack-0/llama3_baseline_dev") | |
| # Save and reload after training | |
| model.save_pretrained("./checkpoint") | |
| model = AutoModelForCausalLM.from_pretrained("./checkpoint", trust_remote_code=True) | |
| ``` | |
| ## Default Configuration | |
| | Parameter | Default | | |
| |-----------|---------| | |
| | `vocab_size` | 50277 | | |
| | `hidden_size` | 768 | | |
| | `intermediate_size` | 1568 | | |
| | `num_hidden_layers` | 24 | | |
| | `num_attention_heads` | 16 | | |
| | `num_key_value_heads` | 4 | | |
| | `head_dim` | 48 | | |
| | `max_position_embeddings` | 8192 | | |
| | `rope_theta` | 500000.0 | | |
| ## License | |
| MIT. Clean-room synthesis: the human author has not read the Llama source code. | |
| Architectural decisions derive from the published paper. Tokenizer is GPT-NeoX | |
| (`EleutherAI/gpt-neox-20b`, Apache 2.0). | |