Instructions to use andrijdavid/Llama3-1B-Base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use andrijdavid/Llama3-1B-Base with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="andrijdavid/Llama3-1B-Base")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("andrijdavid/Llama3-1B-Base")
model = AutoModelForCausalLM.from_pretrained("andrijdavid/Llama3-1B-Base")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use andrijdavid/Llama3-1B-Base with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "andrijdavid/Llama3-1B-Base"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "andrijdavid/Llama3-1B-Base",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/andrijdavid/Llama3-1B-Base

SGLang

How to use andrijdavid/Llama3-1B-Base with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "andrijdavid/Llama3-1B-Base" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "andrijdavid/Llama3-1B-Base",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "andrijdavid/Llama3-1B-Base" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "andrijdavid/Llama3-1B-Base",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use andrijdavid/Llama3-1B-Base with Docker Model Runner:
```
docker model run hf.co/andrijdavid/Llama3-1B-Base
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Llama-3-1B-Base

Llama3-1b is a trimmed version of the official Llama-3 8B base model from Meta. It has been reduced in size to ~1 billion parameters, making it more computationally efficient while still retaining a significant portion of the original model's capabilities. This model is intended to serve as a base model and has not been further fine-tuned for any specific task. It is specifically designed to bring the power of LLMs (Large Language Models) to environments with limited computational resources. This model offers a balance between performance and resource usage, serving as an efficient alternative for users who cannot leverage the larger, resource-intensive versions from Meta.

Important: This project is not affiliated with Meta.

Uses

This model can be fine-tuned for a variety of natural language processing tasks, including:

Text generation
Question answering
Sentiment analysis
Translation
Summarization

Bias, Risks, and Limitations

While Llama3-1b is a powerful model, it is important to be aware of its limitations and potential biases. As with any language model, this model may generate outputs that are factually incorrect or biased. It is also possible that the model may produce offensive or inappropriate content. Users and Developers should be aware of these risks and take appropriate measures to mitigate them.

How to Use

To use Llama3-1b, you can load the model using the Hugging Face Transformers library in Python:

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("andrijdavid/Llama-3-1B-Base/")
model = AutoModelForCausalLM.from_pretrained("andrijdavid/Llama-3-1B-Base/")

Downloads last month: 8

Safetensors

Model size

2B params

Tensor type

BF16

Model tree for andrijdavid/Llama3-1B-Base

Quantizations

4 models