Instructions to use syaffers/tiny-random-llama-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use syaffers/tiny-random-llama-lora with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("hmellor/tiny-random-LlamaForCausalLM")
model = PeftModel.from_pretrained(base_model, "syaffers/tiny-random-llama-lora")

Transformers

How to use syaffers/tiny-random-llama-lora with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="syaffers/tiny-random-llama-lora")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("syaffers/tiny-random-llama-lora", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use syaffers/tiny-random-llama-lora with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "syaffers/tiny-random-llama-lora"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "syaffers/tiny-random-llama-lora",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/syaffers/tiny-random-llama-lora

SGLang

How to use syaffers/tiny-random-llama-lora with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "syaffers/tiny-random-llama-lora" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "syaffers/tiny-random-llama-lora",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "syaffers/tiny-random-llama-lora" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "syaffers/tiny-random-llama-lora",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use syaffers/tiny-random-llama-lora with Docker Model Runner:
```
docker model run hf.co/syaffers/tiny-random-llama-lora
```

Tiny Random LLaMA LoRA

A minimal LoRA adapter for hmellor/tiny-random-LlamaForCausalLM, useful for smoke testing deployments.

Model Details

Model Description

This is a LoRA (Low-Rank Adaptation) adapter trained on a tiny random LLaMA model. The model and adapter are intentionally small and produce random outputs—they are not meant for any real inference tasks. The primary purpose is to provide a lightweight adapter for testing deployment pipelines, inference servers, and LoRA loading mechanisms.

Model type: LoRA adapter for causal language modeling
Language(s): English
License: MIT
Finetuned from: hmellor/tiny-random-LlamaForCausalLM

Model Sources

Repository: syaffers/tiny-random-llama-lora (training code)

Uses

Direct Use

This adapter is intended for:

Smoke testing LoRA adapter loading in inference pipelines
Testing deployment configurations with minimal resource usage
Validating HuggingFace PEFT integration in your infrastructure

Out-of-Scope Use

This model should not be used for:

Any real text generation or NLP tasks
Production applications
Any use case requiring meaningful outputs

How to Get Started with the Model

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base_model = AutoModelForCausalLM.from_pretrained("hmellor/tiny-random-LlamaForCausalLM")
model = PeftModel.from_pretrained(base_model, "syaffers/tiny-random-llama-lora")
tokenizer = AutoTokenizer.from_pretrained("syaffers/tiny-random-llama-lora")

# Generate (output will be random/meaningless)
inputs = tokenizer("Hello world", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=10)
print(tokenizer.decode(outputs[0]))

Training Details

Training Data

iamholmes/tiny-imdb - A tiny subset of IMDB reviews used for quick training iterations.

Training Procedure

Training Hyperparameters

Training regime: fp32
Batch size: 4
Learning rate: 1e-4
Epochs: 3
Warmup steps: 10
Max sequence length: 128

LoRA Configuration

Parameter	Value
r (rank)	8
lora_alpha	16
target_modules	q_proj, v_proj
lora_dropout	0.05
bias	none
task_type	CAUSAL_LM

Technical Specifications

Model Architecture and Objective

LoRA adapter applied to the query and value projection layers of a tiny random LLaMA architecture for causal language modeling.

Compute Infrastructure

Hardware

Apple M3 Pro (36GB unified memory)
macOS Sequoia 15.6.1

Software

Transformers
PEFT 0.18.0
PyTorch
Datasets

Framework versions

PEFT 0.18.0

Downloads last month: 12

Model tree for syaffers/tiny-random-llama-lora

Base model

hmellor/tiny-random-LlamaForCausalLM

Adapter

(1)

this model

syaffers
/

tiny-random-llama-lora