finnvoorhees commited on
Commit
a4264be
·
verified ·
1 Parent(s): df34631

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +105 -0
README.md ADDED
@@ -0,0 +1,105 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Tiny Coder Prompt Completion 0.5B
2
+
3
+ A tiny model (~494M parameters, ~942MB weights) for **prompt autocomplete** in agentic coding editors like Claude Code, Codex CLI, aider, etc.
4
+
5
+ ## What it does
6
+
7
+ When you're typing a prompt to an AI coding agent, this model suggests how to **complete your thought**. For example:
8
+
9
+ - `fix the bug in src/utils` → `.py where the auth token isn't refreshing`
10
+ - `refactor the database` → `connection logic to use connection pooling`
11
+ - `implement caching for the` → `user profile endpoint using Redis with a 5-minute TTL`
12
+
13
+ ## Base Model
14
+
15
+ This is a fine-tuned version of [Qwen2.5-Coder-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-0.5B-Instruct), which was purpose-built for code completion.
16
+
17
+ ## On-Device Requirements
18
+
19
+ - **RAM**: ~1GB for fp16 inference (easily fits in 16GB MacBooks)
20
+ - **Disk**: ~1GB for model weights
21
+ - **CPU/GPU**: Works great on CPU (M1/M2/M3 MacBooks), even better with GPU
22
+
23
+ ## Quick Start
24
+
25
+ ### Python (transformers)
26
+
27
+ ```python
28
+ from transformers import AutoTokenizer, AutoModelForCausalLM
29
+ import torch
30
+
31
+ model_id = "finnvoorhees/tiny-coder-prompt-completion-0.5B"
32
+ tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
33
+ model = AutoModelForCausalLM.from_pretrained(
34
+ model_id,
35
+ trust_remote_code=True,
36
+ torch_dtype=torch.bfloat16,
37
+ device_map="auto",
38
+ )
39
+
40
+ def complete_prompt(prompt: str) -> str:
41
+ messages = [
42
+ {"role": "system", "content": "You are a helpful coding assistant. Complete the user's partial prompt concisely."},
43
+ {"role": "user", "content": prompt},
44
+ ]
45
+ inputs = tokenizer.apply_chat_template(
46
+ messages,
47
+ tokenize=True,
48
+ return_tensors="pt",
49
+ return_dict=True,
50
+ add_generation_prompt=True,
51
+ ).to(model.device)
52
+
53
+ outputs = model.generate(
54
+ **inputs,
55
+ max_new_tokens=32,
56
+ temperature=0.3,
57
+ top_p=0.9,
58
+ do_sample=True,
59
+ )
60
+
61
+ new_tokens = outputs[0][inputs["input_ids"].shape[1]:]
62
+ return tokenizer.decode(new_tokens, skip_special_tokens=True).strip()
63
+
64
+ print(complete_prompt("refactor the database"))
65
+ # → "connection logic to use connection pooling instead of creating new connections each time"
66
+ ```
67
+
68
+ ### Even smaller: GGUF / llama.cpp
69
+
70
+ For ultra-fast CPU inference, convert to GGUF (Q4_K_M quantization → ~300MB):
71
+
72
+ ```bash
73
+ # Install llama.cpp
74
+ brew install llama.cpp
75
+
76
+ # Convert to GGUF (or download from the GGUF tag on this repo)
77
+ python convert_hf_to_gguf.py finnvoorhees/tiny-coder-prompt-completion-0.5B --outfile tiny-coder-0.5b-Q4_K_M.gguf
78
+
79
+ # Run inference
80
+ llama-cli -m tiny-coder-0.5b-Q4_K_M.gguf -p "fix the bug in src/utils"
81
+ ```
82
+
83
+ ## Integration with Agentic Editors
84
+
85
+ ### Claude Code
86
+ Create a shell script that calls this model and pipe completions into your prompt:
87
+
88
+ ```bash
89
+ # ~/.claude-complete.sh
90
+ python -c "from transformers import pipeline; print(pipeline('text-generation', model='finnvoorhees/tiny-coder-prompt-completion-0.5B')('$1'))"
91
+ ```
92
+
93
+ ### Codex CLI / aider
94
+ Set up a local API endpoint using `transformers` or `llama.cpp` server mode:
95
+
96
+ ```bash
97
+ # llama.cpp server (fast!)
98
+ llama-server -m tiny-coder-0.5b-Q4_K_M.gguf --port 8080
99
+
100
+ # Then use http://localhost:8080/v1/completions in your editor
101
+ ```
102
+
103
+ ## License
104
+
105
+ Apache-2.0 (same as the base Qwen2.5-Coder-0.5B-Instruct model)