crystalai35
/

tinyclaude-1b

GGUF

conversational

Model card Files Files and versions

xet

Community

adamtechguy commited on 20 days ago

Commit

f739bbd

verified ·

1 Parent(s): 27ee069

Update README.md

Browse files

Files changed (1) hide show

README.md +158 -2

README.md CHANGED Viewed

@@ -67,6 +67,91 @@ print(response.json()['response'])
 Simply select `thatdamai/tinyclaude-1b` from the model dropdown after pulling.
 ## Model Details
 | Property | Value |
@@ -124,6 +209,77 @@ ollama create my-tinyclaude -f Modelfile
 ollama run my-tinyclaude
 ```
 ## Contributing
 Suggestions and improvements are welcome. Feel free to:
@@ -144,6 +300,6 @@ This model inherits the Apache 2.0 license from TinyLlama. The system prompt and
 ---
-**Authors**: thatdamai/crystalai35
 **Model**: thatdamai/tinyclaude-1b
-**Platform**: [Ollama](https://ollama.ai/thatdamai/tinyclaude:1b)

 Simply select `thatdamai/tinyclaude-1b` from the model dropdown after pulling.
+### Hugging Face Transformers
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+# Load model and tokenizer
+model_name = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")
+# Define the TinyClaude system prompt
+system_prompt = """You are a helpful, harmless, and honest AI assistant..."""
+# Format with chat template
+messages = [
+    {"role": "system", "content": system_prompt},
+    {"role": "user", "content": "Explain quantum computing simply."}
+]
+prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
+# Generate response
+outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7, do_sample=True)
+response = tokenizer.decode(outputs[0], skip_special_tokens=True)
+print(response)
+```
+### Hugging Face with llama-cpp-python
+```python
+from llama_cpp import Llama
+# Download GGUF from Hugging Face Hub
+llm = Llama.from_pretrained(
+    repo_id="TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF",
+    filename="tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf",
+    n_ctx=2048,
+    n_gpu_layers=-1  # Use all GPU layers
+)
+system_prompt = """You are a helpful, harmless, and honest AI assistant..."""
+output = llm.create_chat_completion(
+    messages=[
+        {"role": "system", "content": system_prompt},
+        {"role": "user", "content": "What is machine learning?"}
+    ],
+    temperature=0.7,
+    max_tokens=512
+)
+print(output['choices'][0]['message']['content'])
+```
+### Hugging Face CLI
+```bash
+# Install huggingface_hub
+pip install huggingface_hub
+# Download model files
+huggingface-cli download TinyLlama/TinyLlama-1.1B-Chat-v1.0 --local-dir ./tinyllama
+# Download GGUF quantized version
+huggingface-cli download TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf --local-dir ./tinyllama-gguf
+```
+### Text Generation Inference (TGI)
+```bash
+# Run with Docker
+docker run --gpus all --shm-size 1g -p 8080:80 \
+  ghcr.io/huggingface/text-generation-inference:latest \
+  --model-id TinyLlama/TinyLlama-1.1B-Chat-v1.0 \
+  --max-input-length 1024 \
+  --max-total-tokens 2048
+# Query the endpoint
+curl http://localhost:8080/generate \
+  -X POST \
+  -H 'Content-Type: application/json' \
+  -d '{"inputs": "<|system|>\nYou are a helpful assistant.</s>\n<|user|>\nHello!</s>\n<|assistant|>\n", "parameters": {"max_new_tokens": 256}}'
+```
 ## Model Details
 | Property | Value |
 ollama run my-tinyclaude
 ```
+## Hugging Face Integration
+### Uploading to Hugging Face Hub
+```bash
+# Install required tools
+pip install huggingface_hub
+# Login to Hugging Face
+huggingface-cli login
+# Create a new model repository
+huggingface-cli repo create tinyclaude-1b --type model
+# Upload model files
+huggingface-cli upload thatdamai/tinyclaude-1b ./model-files --repo-type model
+```
+### Converting Ollama to GGUF for Hugging Face
+```bash
+# Find your Ollama model location
+ollama show thatdamai/tinyclaude-1b --modelfile
+# Models are stored in ~/.ollama/models or /usr/share/ollama/.ollama/models
+# Copy the blob files and upload to HF
+# Alternative: Use ollama's model export (if available)
+cp /usr/share/ollama/.ollama/models/blobs/<sha256-hash> ./tinyclaude.gguf
+```
+### Creating a Hugging Face Model Card
+Create a `README.md` in your HF repo with YAML frontmatter:
+```yaml
+---
+license: apache-2.0
+base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
+tags:
+  - tinyllama
+  - gguf
+  - ollama
+  - assistant
+  - conversational
+model_type: llama
+pipeline_tag: text-generation
+inference: false
+---
+```
+### Downloading from Hugging Face to Ollama
+```bash
+# Method 1: Create Modelfile pointing to HF GGUF
+cat << 'EOF' > Modelfile
+FROM hf.co/thatdamai/tinyclaude-1b-gguf
+EOF
+ollama create tinyclaude-local -f Modelfile
+# Method 2: Download GGUF first, then import
+huggingface-cli download thatdamai/tinyclaude-1b-gguf tinyclaude-1b.Q4_K_M.gguf --local-dir ./
+cat << EOF > Modelfile
+FROM ./tinyclaude-1b.Q4_K_M.gguf
+EOF
+ollama create tinyclaude-local -f Modelfile
+```
 ## Contributing
 Suggestions and improvements are welcome. Feel free to:
 ---
+**Author**: thatdamai
 **Model**: thatdamai/tinyclaude-1b
+**Platform**: [Ollama](https://ollama.ai/thatdamai/tinyclaude-1b)