tinyclaude-1b / README.md
adamtechguy's picture
Update README.md
f739bbd verified

TinyClaude-1B

A lightweight, locally-runnable language model based on TinyLlama 1.1B, enhanced with a sophisticated system prompt inspired by Claude's behavioral guidelines.

Overview

TinyClaude-1B brings thoughtful AI assistant behavior to edge devices and resource-constrained environments. Built on the efficient TinyLlama architecture, this model incorporates carefully crafted system instructions emphasizing helpfulness, safety, and nuanced conversation.

Quick Start

# Pull the model
ollama pull thatdamai/tinyclaude-1b

# Run interactively
ollama run thatdamai/tinyclaude-1b

Features

  • Compact Size: ~638MB download, runs on minimal hardware
  • Privacy-First: Fully local inference, no API calls required
  • Balanced Responses: System prompt encourages helpful, safe, and thoughtful outputs
  • Low Resource Requirements: Runs on CPUs and entry-level GPUs

Hardware Requirements

Component Minimum Recommended
RAM 4GB 8GB
VRAM 2GB 4GB
Storage 1GB 2GB

Usage Examples

Basic Chat

ollama run thatdamai/tinyclaude-1b

API Integration

curl http://localhost:11434/api/generate -d '{
  "model": "thatdamai/tinyclaude-1b",
  "prompt": "Explain quantum computing simply.",
  "stream": false
}'

Python

import requests

response = requests.post('http://localhost:11434/api/generate', json={
    'model': 'thatdamai/tinyclaude-1b',
    'prompt': 'What is machine learning?',
    'stream': False
})

print(response.json()['response'])

With Open WebUI / LibreChat

Simply select thatdamai/tinyclaude-1b from the model dropdown after pulling.

Hugging Face Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model and tokenizer
model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")

# Define the TinyClaude system prompt
system_prompt = """You are a helpful, harmless, and honest AI assistant..."""

# Format with chat template
messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": "Explain quantum computing simply."}
]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

# Generate response
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7, do_sample=True)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Hugging Face with llama-cpp-python

from llama_cpp import Llama

# Download GGUF from Hugging Face Hub
llm = Llama.from_pretrained(
    repo_id="TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF",
    filename="tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf",
    n_ctx=2048,
    n_gpu_layers=-1  # Use all GPU layers
)

system_prompt = """You are a helpful, harmless, and honest AI assistant..."""

output = llm.create_chat_completion(
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": "What is machine learning?"}
    ],
    temperature=0.7,
    max_tokens=512
)

print(output['choices'][0]['message']['content'])

Hugging Face CLI

# Install huggingface_hub
pip install huggingface_hub

# Download model files
huggingface-cli download TinyLlama/TinyLlama-1.1B-Chat-v1.0 --local-dir ./tinyllama

# Download GGUF quantized version
huggingface-cli download TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf --local-dir ./tinyllama-gguf

Text Generation Inference (TGI)

# Run with Docker
docker run --gpus all --shm-size 1g -p 8080:80 \
  ghcr.io/huggingface/text-generation-inference:latest \
  --model-id TinyLlama/TinyLlama-1.1B-Chat-v1.0 \
  --max-input-length 1024 \
  --max-total-tokens 2048

# Query the endpoint
curl http://localhost:8080/generate \
  -X POST \
  -H 'Content-Type: application/json' \
  -d '{"inputs": "<|system|>\nYou are a helpful assistant.</s>\n<|user|>\nHello!</s>\n<|assistant|>\n", "parameters": {"max_new_tokens": 256}}'

Model Details

Property Value
Base Model TinyLlama 1.1B
Parameters 1.1 Billion
Context Window 2048 tokens
License Apache 2.0
Quantization Q4_0 (default)

Use Cases

TinyClaude-1B is well-suited for:

  • Quick local prototyping and testing
  • Educational environments
  • IoT and edge deployments
  • Offline assistant applications
  • Low-latency response requirements
  • Development and CI/CD pipelines

Limitations

As a 1.1B parameter model, TinyClaude-1B has inherent limitations:

  • Complex reasoning tasks may produce inconsistent results
  • Limited knowledge compared to larger models
  • May not fully adhere to all system prompt guidelines
  • Context window constrains long-form conversations
  • Not suitable for production applications requiring high accuracy

For demanding tasks, consider larger models like Llama 3.1 8B, Mistral 7B, or Qwen 14B.

Building From Source

Create your own variant:

# Create a Modelfile
cat << 'EOF' > Modelfile
FROM tinyllama

SYSTEM """
Your custom system prompt here.
"""

PARAMETER temperature 0.7
PARAMETER num_ctx 2048
EOF

# Build the model
ollama create my-tinyclaude -f Modelfile

# Test it
ollama run my-tinyclaude

Hugging Face Integration

Uploading to Hugging Face Hub

# Install required tools
pip install huggingface_hub

# Login to Hugging Face
huggingface-cli login

# Create a new model repository
huggingface-cli repo create tinyclaude-1b --type model

# Upload model files
huggingface-cli upload thatdamai/tinyclaude-1b ./model-files --repo-type model

Converting Ollama to GGUF for Hugging Face

# Find your Ollama model location
ollama show thatdamai/tinyclaude-1b --modelfile

# Models are stored in ~/.ollama/models or /usr/share/ollama/.ollama/models
# Copy the blob files and upload to HF

# Alternative: Use ollama's model export (if available)
cp /usr/share/ollama/.ollama/models/blobs/<sha256-hash> ./tinyclaude.gguf

Creating a Hugging Face Model Card

Create a README.md in your HF repo with YAML frontmatter:

---
license: apache-2.0
base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
tags:
  - tinyllama
  - gguf
  - ollama
  - assistant
  - conversational
model_type: llama
pipeline_tag: text-generation
inference: false
---

Downloading from Hugging Face to Ollama

# Method 1: Create Modelfile pointing to HF GGUF
cat << 'EOF' > Modelfile
FROM hf.co/thatdamai/tinyclaude-1b-gguf
EOF

ollama create tinyclaude-local -f Modelfile

# Method 2: Download GGUF first, then import
huggingface-cli download thatdamai/tinyclaude-1b-gguf tinyclaude-1b.Q4_K_M.gguf --local-dir ./

cat << EOF > Modelfile
FROM ./tinyclaude-1b.Q4_K_M.gguf
EOF

ollama create tinyclaude-local -f Modelfile

Contributing

Suggestions and improvements are welcome. Feel free to:

  • Open issues for bugs or feature requests
  • Submit pull requests with improvements
  • Share your custom Modelfile variants

Acknowledgments

  • TinyLlama - Base model architecture
  • Ollama - Local model serving platform
  • Anthropic - Inspiration for behavioral guidelines

License

This model inherits the Apache 2.0 license from TinyLlama. The system prompt and configuration are provided as-is for educational and personal use.


Author: thatdamai
Model: thatdamai/tinyclaude-1b
Platform: Ollama