Update README.md

f739bbd verified about 2 months ago

7.74 kB

TinyClaude-1B

A lightweight, locally-runnable language model based on TinyLlama 1.1B, enhanced with a sophisticated system prompt inspired by Claude's behavioral guidelines.

Overview

TinyClaude-1B brings thoughtful AI assistant behavior to edge devices and resource-constrained environments. Built on the efficient TinyLlama architecture, this model incorporates carefully crafted system instructions emphasizing helpfulness, safety, and nuanced conversation.

Quick Start

# Pull the model
ollama pull thatdamai/tinyclaude-1b

# Run interactively
ollama run thatdamai/tinyclaude-1b

Features

Compact Size: ~638MB download, runs on minimal hardware
Privacy-First: Fully local inference, no API calls required
Balanced Responses: System prompt encourages helpful, safe, and thoughtful outputs
Low Resource Requirements: Runs on CPUs and entry-level GPUs

Hardware Requirements

Component	Minimum	Recommended
RAM	4GB	8GB
VRAM	2GB	4GB
Storage	1GB	2GB

Usage Examples

Basic Chat

ollama run thatdamai/tinyclaude-1b

API Integration

curl http://localhost:11434/api/generate -d '{
  "model": "thatdamai/tinyclaude-1b",
  "prompt": "Explain quantum computing simply.",
  "stream": false
}'

Python

import requests

response = requests.post('http://localhost:11434/api/generate', json={
    'model': 'thatdamai/tinyclaude-1b',
    'prompt': 'What is machine learning?',
    'stream': False
})

print(response.json()['response'])

With Open WebUI / LibreChat

Simply select thatdamai/tinyclaude-1b from the model dropdown after pulling.

Hugging Face Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model and tokenizer
model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")

# Define the TinyClaude system prompt
system_prompt = """You are a helpful, harmless, and honest AI assistant..."""

# Format with chat template
messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": "Explain quantum computing simply."}
]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

# Generate response
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7, do_sample=True)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Hugging Face with llama-cpp-python

from llama_cpp import Llama

# Download GGUF from Hugging Face Hub
llm = Llama.from_pretrained(
    repo_id="TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF",
    filename="tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf",
    n_ctx=2048,
    n_gpu_layers=-1  # Use all GPU layers
)

system_prompt = """You are a helpful, harmless, and honest AI assistant..."""

output = llm.create_chat_completion(
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": "What is machine learning?"}
    ],
    temperature=0.7,
    max_tokens=512
)

print(output['choices'][0]['message']['content'])

Hugging Face CLI

# Install huggingface_hub
pip install huggingface_hub

# Download model files
huggingface-cli download TinyLlama/TinyLlama-1.1B-Chat-v1.0 --local-dir ./tinyllama

# Download GGUF quantized version
huggingface-cli download TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf --local-dir ./tinyllama-gguf

Text Generation Inference (TGI)

# Run with Docker
docker run --gpus all --shm-size 1g -p 8080:80 \
  ghcr.io/huggingface/text-generation-inference:latest \
  --model-id TinyLlama/TinyLlama-1.1B-Chat-v1.0 \
  --max-input-length 1024 \
  --max-total-tokens 2048

# Query the endpoint
curl http://localhost:8080/generate \
  -X POST \
  -H 'Content-Type: application/json' \
  -d '{"inputs": "<|system|>\nYou are a helpful assistant.</s>\n<|user|>\nHello!</s>\n<|assistant|>\n", "parameters": {"max_new_tokens": 256}}'

Model Details

Property	Value
Base Model	TinyLlama 1.1B
Parameters	1.1 Billion
Context Window	2048 tokens
License	Apache 2.0
Quantization	Q4_0 (default)

Use Cases

TinyClaude-1B is well-suited for:

Quick local prototyping and testing
Educational environments
IoT and edge deployments
Offline assistant applications
Low-latency response requirements
Development and CI/CD pipelines

Limitations

As a 1.1B parameter model, TinyClaude-1B has inherent limitations:

Complex reasoning tasks may produce inconsistent results
Limited knowledge compared to larger models
May not fully adhere to all system prompt guidelines
Context window constrains long-form conversations
Not suitable for production applications requiring high accuracy

For demanding tasks, consider larger models like Llama 3.1 8B, Mistral 7B, or Qwen 14B.

Building From Source

Create your own variant:

# Create a Modelfile
cat << 'EOF' > Modelfile
FROM tinyllama

SYSTEM """
Your custom system prompt here.
"""

PARAMETER temperature 0.7
PARAMETER num_ctx 2048
EOF

# Build the model
ollama create my-tinyclaude -f Modelfile

# Test it
ollama run my-tinyclaude

Hugging Face Integration

Uploading to Hugging Face Hub

# Install required tools
pip install huggingface_hub

# Login to Hugging Face
huggingface-cli login

# Create a new model repository
huggingface-cli repo create tinyclaude-1b --type model

# Upload model files
huggingface-cli upload thatdamai/tinyclaude-1b ./model-files --repo-type model

Converting Ollama to GGUF for Hugging Face

# Find your Ollama model location
ollama show thatdamai/tinyclaude-1b --modelfile

# Models are stored in ~/.ollama/models or /usr/share/ollama/.ollama/models
# Copy the blob files and upload to HF

# Alternative: Use ollama's model export (if available)
cp /usr/share/ollama/.ollama/models/blobs/<sha256-hash> ./tinyclaude.gguf

Creating a Hugging Face Model Card

Create a README.md in your HF repo with YAML frontmatter:

---
license: apache-2.0
base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
tags:
  - tinyllama
  - gguf
  - ollama
  - assistant
  - conversational
model_type: llama
pipeline_tag: text-generation
inference: false
---

Downloading from Hugging Face to Ollama

# Method 1: Create Modelfile pointing to HF GGUF
cat << 'EOF' > Modelfile
FROM hf.co/thatdamai/tinyclaude-1b-gguf
EOF

ollama create tinyclaude-local -f Modelfile

# Method 2: Download GGUF first, then import
huggingface-cli download thatdamai/tinyclaude-1b-gguf tinyclaude-1b.Q4_K_M.gguf --local-dir ./

cat << EOF > Modelfile
FROM ./tinyclaude-1b.Q4_K_M.gguf
EOF

ollama create tinyclaude-local -f Modelfile

Contributing

Suggestions and improvements are welcome. Feel free to:

Open issues for bugs or feature requests
Submit pull requests with improvements
Share your custom Modelfile variants

Acknowledgments

TinyLlama - Base model architecture
Ollama - Local model serving platform
Anthropic - Inspiration for behavioral guidelines

License

This model inherits the Apache 2.0 license from TinyLlama. The system prompt and configuration are provided as-is for educational and personal use.

Author: thatdamai
Model: thatdamai/tinyclaude-1b
Platform: Ollama