Instructions to use iko-01/LLaMA-1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use iko-01/LLaMA-1 with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2-0.5B")
model = PeftModel.from_pretrained(base_model, "iko-01/LLaMA-1")

Transformers

How to use iko-01/LLaMA-1 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="iko-01/LLaMA-1")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("iko-01/LLaMA-1", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use iko-01/LLaMA-1 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "iko-01/LLaMA-1"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "iko-01/LLaMA-1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/iko-01/LLaMA-1

SGLang

How to use iko-01/LLaMA-1 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "iko-01/LLaMA-1" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "iko-01/LLaMA-1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "iko-01/LLaMA-1" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "iko-01/LLaMA-1",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use iko-01/LLaMA-1 with Docker Model Runner:
```
docker model run hf.co/iko-01/LLaMA-1
```

Qwen2-0.5B Reddit LoRA Adapter

Repo: iko-01/LLaMA-1
Base model: Qwen/Qwen2-0.5B
Adapter type: LoRA (via LLaMA-Factory + QLoRA)
Intended use: Simulating casual, Reddit-style comments, discussions, and thread replies

Model Description

This is a LoRA adapter fine-tuned on top of Qwen2-0.5B using a filtered subset of Reddit posts & comments from the Dolma dataset (v1.6 Reddit portion).

The model is trained to generate informal, conversational text typical of Reddit threads — including sarcasm, memes references, casual opinions, upvotes/downvotes vibe, and natural thread continuations.

Despite the repository name (LLaMA-1), this is not a LLaMA model — it is purely Qwen2 architecture.

Key Characteristics

Extremely lightweight (only ~0.5B base + small LoRA adapter)
Runs comfortably on consumer GPUs, laptops, or even decent CPUs
Fast inference (very suitable for local prototyping, chatbots, Reddit simulators, etc.)
Casual / internet / meme-friendly tone

Training Details

Framework: LLaMA-Factory
Training method: QLoRA (4-bit base quantization + LoRA)
Dataset size: ~6,000 high-quality, deduplicated Reddit samples
Hardware: Google Colab T4 (single GPU)
Training duration: ≈ 30 minutes
Hyperparameters:

Parameter Value

LoRA rank (r) 32

LoRA alpha 64

Learning rate 2e-4

Batch size 2

Gradient accumulation 16

Epochs 3

Optimizer AdamW

Warmup ratio 0.03

Parameter	Value
LoRA rank (r)	32
LoRA alpha	64
Learning rate	2e-4
Batch size	2
Gradient accumulation	16
Epochs	3
Optimizer	AdamW
Warmup ratio	0.03

Usage

pip install -U transformers peft torch accelerate bitsandbytes  # bitsandbytes optional but recommended

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_model_id = "Qwen/Qwen2-0.5B"
adapter_id    = "iko-01/LLaMA-1"

# Load base model
model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True
)

# Apply LoRA adapter
model = PeftModel.from_pretrained(model, adapter_id)
model.eval()

tokenizer = AutoTokenizer.from_pretrained(base_model_id, trust_remote_code=True)

# Example prompt
prompt = """Continue this r/AskReddit thread:

After playing for 50 hours I finally"""

messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

inputs = tokenizer(text, return_tensors="pt").to(model.device)

with torch.no_grad():
    out = model.generate(
        **inputs,
        max_new_tokens=120,
        temperature=0.75,
        top_p=0.92,
        repetition_penalty=1.08,
        do_sample=True
    )

response = tokenizer.decode(out[0][len(inputs.input_ids[0]):], skip_special_tokens=True)
print(response)

Example Outputs

Prompt:
Complete this r/gaming discussion: After playing for 50 hours I finally

Typical model output:
...realized the main story is mid but the side content is actually peak. The open world exploration in the frozen north hits different. Spent like 6 hours just fishing and upgrading my house and I don't even feel bad about it lmao. Anyone else 100% the fishing minigame before the final boss?

Limitations & Responsible Use

Model size — Being a 0.5B model, it has limited world knowledge, reasoning depth, and coherence over very long contexts compared to 7B+ models.
Reddit bias — The training data comes from Reddit → expect informal language, slang, sarcasm, exaggeration, memes, controversial/hot-take opinions, and sometimes toxic phrasing.
Hallucinations — Can confidently generate plausible but incorrect facts, especially outside popular Reddit topics.
Not for production / sensitive use — Not suitable for factual Q&A, customer support, education, legal/medical advice, or any high-stakes application.
English only — The fine-tune was done exclusively on English Reddit content.

Use this model mainly for creative, entertainment, or research purposes (e.g. generating synthetic discussion data, building Reddit-style bots, style transfer experiments).

Citation / Thanks

If you use this adapter in your work, feel free to mention:

Fine-tuned with LLaMA-Factory on Qwen2-0.5B using Reddit data from Dolma.

Big thanks to the Qwen team, LLaMA-Factory contributors, and AllenAI (Dolma dataset).

Happy hacking! 🚀 ```

Downloads last month: 2

Model tree for iko-01/LLaMA-1

Base model

Qwen/Qwen2-0.5B

Adapter

(557)

this model