--- license: apache-2.0 base_model: meta-llama/Llama-3.2-1B-Instruct base_model_relation: quantized language: - en tags: - on-device - mobile - llama - llama-cpp - gguf - fine-tuned - lora - horoscope - creative-writing library_name: gguf quantized_by: edbuildingstuff pipeline_tag: text-generation inference: false --- # Unhinged Horoscopes (GGUF) A Llama 3.2 1B Instruct fine-tune that writes absurd, specific, chaotic-neutral horoscopes from a 30-token prompt. Quantised to Q4_K_M so the whole model is ~770MB and runs on-device on a mid-range Android phone in under three seconds, fully offline, with zero per-query inference cost. This repo holds the merged and quantised GGUF files. The unmerged LoRA adapter lives at [edbuildingstuff/unhinged-horoscopes-lora](https://huggingface.co/edbuildingstuff/unhinged-horoscopes-lora). The model powers the Unhinged Horoscopes Android app, built with Flutter + `llamadart` (FFI to `llama.cpp`): - **Install:** [Google Play](https://play.google.com/store/apps/details?id=ai.ertas.horoscope) - **Landing page:** [horoscope.ertas.ai](https://horoscope.ertas.ai) - **Bundle id:** `ai.ertas.horoscope` ## Headline numbers | | Value | |---|---| | Base | Llama 3.2 1B Instruct | | Adapter | LoRA, all 7 projection modules, ~22MB | | Merged + Q4_K_M GGUF | ~770MB (model weight), ~808MB on disk | | Reference FP16 GGUF | ~2.48GB | | Prompt size at runtime | 4 lines, ~30 tokens | | Output length | 1 to 3 sentences, ~30 to 80 tokens | | Generation time | < 3 seconds on mid-range Android | | Per-query API cost | $0 (runs locally) | | Network at inference | none required | | Training set | 480 examples, 12 signs × 4 categories × 10 each | ## Files | File | Size | Quantisation | Use it for | |---|---|---|---| | `unhinged-horoscopes-q4_k_m.gguf` | ~770MB / 808MB | Q4_K_M | Mobile, on-device, default | | `unhinged-horoscopes-f16.gguf` | ~2.48GB | F16 | Reference, re-quantising into other GGUF formats | ## Prompt format The model was fine-tuned on a single user message with no system prompt. The exact format is non-negotiable; the fine-tune is narrow on it. ``` Sign: Aries Category: Daily Chaos Date: 2026-05-02 Generate an unhinged horoscope. ``` Required values: - `Sign` is one of: `Aries`, `Taurus`, `Gemini`, `Cancer`, `Leo`, `Virgo`, `Libra`, `Scorpio`, `Sagittarius`, `Capricorn`, `Aquarius`, `Pisces` - `Category` is one of: `Daily Chaos`, `Love Life`, `Career`, `Vibe Check` - `Date` is `YYYY-MM-DD` Wrap with the standard Llama 3.2 chat template. `ollama`, `llama.cpp`, and `llamadart` apply this automatically. ## Quick start ### llama.cpp (CLI) ```bash ./llama-cli \ -m unhinged-horoscopes-q4_k_m.gguf \ --chat-template llama3 \ -p "Sign: Leo Category: Career Date: 2026-05-02 Generate an unhinged horoscope." \ -n 120 --temp 0.9 --top-p 0.9 ``` ### Ollama ```bash # from the GGUF file (one-time) cat > Modelfile <<'EOF' FROM ./unhinged-horoscopes-q4_k_m.gguf TEMPLATE """{{ if .System }}<|start_header_id|>system<|end_header_id|> {{ .System }}<|eot_id|>{{ end }}<|start_header_id|>user<|end_header_id|> {{ .Prompt }}<|eot_id|><|start_header_id|>assistant<|end_header_id|> """ PARAMETER temperature 0.9 PARAMETER top_p 0.9 EOF ollama create unhinged-horoscopes -f Modelfile ollama run unhinged-horoscopes "Sign: Leo Category: Career Date: 2026-05-02 Generate an unhinged horoscope." ``` ### Mobile (Flutter + llamadart) The reference Android app is built with Flutter and the [`llamadart`](https://pub.dev/packages/llamadart) FFI bindings to `llama.cpp`. The integration pattern is: - Wrap `llamadart` in a service that loads the GGUF and exposes a `generate(prompt) -> String` future. - Build the prompt from the 4-line template above. - Cache outputs per `sign × category × date` so the same prompt does not re-trigger inference. On first launch the app downloads `https://huggingface.co/edbuildingstuff/unhinged-horoscopes/resolve/main/unhinged-horoscopes-q4_k_m.gguf` (~770MB) into the app documents directory; subsequent launches use the cached file and run fully offline. ## Sample outputs These are training-set examples that illustrate the target tone. The fine-tune produces outputs in the same register on prompts it has not seen. | Sign | Category | Output | |---|---|---| | Aries | Daily Chaos | "You will argue with a GPS today. You will lose. It knows things about you that you told no one. Lucky object: a fork you've been suspicious of since February." | | Aries | Daily Chaos | "You will open a jar today that no one else could open. You will feel like a god for exactly eleven seconds before dropping it. The jar always wins in January. Accept this." | | Capricorn | Career | "Your boss will email you at 11:47pm. It will just say 'hmm'. Do not respond. Do not sleep. Just know." | | Pisces | Love Life | "Your soulmate is currently in a different timezone arguing about whether a hot dog is a sandwich. The stars say wait." | | Leo | Vibe Check | "Today's energy is 'accidentally making eye contact with someone through two panes of glass and a moving bus.' Own it." | ## What worked - **Tone is baked into the weights.** No system prompt, no few-shot examples, no temperature gymnastics. The 4-line user message is the entire input. The chaotic-neutral register holds across all 12 × 4 sign × category combinations. - **Length stays short.** 30 to 80 tokens per response, 1 to 3 sentences. The model does not run on, does not pad with stage directions, does not add "Sure, here is your horoscope" preambles. The training set was narrow on output length and the fine-tune holds it. - **Sign personality threads survive Q4_K_M quantisation.** Aries reads impulsive. Capricorn reads workaholic-doomed. Aquarius reads alien. Pisces reads delusional dreamer. The threads are subtle, not heavy-handed, and they survive 4-bit quantisation. - **Mobile-deployable footprint.** ~770MB Q4_K_M model weight (~808MB on disk with metadata) fits in app documents on any Android device with ~2GB free storage. Generation completes in under 3 seconds on mid-range hardware. Every query is free at the margin. - **480 examples was enough for tone.** 12 signs × 4 categories × 10 each, no API scripts, generated and reviewed in-session. No labelling budget. The Phase 2 expansion to 960 was scoped but not needed. ## Known limitations - **Date awareness is weak.** ~70% of training pairs ignore the date. ~30% riff on it (season, day-of-week, month vibes). The model picks up the pattern but does not always cue off the date in a way a human would notice. Treat the date primarily as a freshness key for caching, not as content the model will reliably weave in. - **Same prompt, similar output.** With low temperature or a fixed seed, the model returns near-identical outputs for the same `Sign / Category / Date` triple. The Android app caches per-day per-sign per-category, so this is by design. For variety, vary the date or set temperature to ~0.9 with a fresh seed. - **No safety fine-tune was layered.** The base Llama 3.2 1B Instruct refusal behaviour is mostly intact, but adversarial prompts that escape the trained format (free-form questions, advice-seeking, anything not matching the 4-line template) may produce uncalibrated outputs. The shipping app constrains user input to the 4-line template, which sidesteps this. If you expose this model to free-form input, layer your own validation. - **English only, Western zodiac only.** No Chinese, Vedic, or other zodiac systems. No translations. - **No formal pass-rate documented.** The model was evaluated qualitatively against the 48-horoscope checklist (12 signs × 4 categories) at `dataset/evaluation_checklist.md`. Per-row scoring was done in private and is not published here. The model passed the bar to ship in the reference Android app but the spot-check artefact is not part of this release. ## Why a fine-tune rather than prompt engineering For a tone-and-format task on a 1B model, prompt engineering hits a ceiling fast. With cloud inference you spend 200 to 400 input tokens on a system prompt plus 3 to 5 few-shot examples just to coax a 30-to-80-token response in the right register, and the tone still drifts on cold prompts. With a LoRA fine-tune the same behaviour is encoded in ~22MB of weights: | | Prompt-engineering on a cloud 1B | This fine-tune (on-device) | |---|---|---| | Input tokens per request | ~400 (system + few-shot) | ~30 (the 4-line user message) | | Output style adherence | Drifts on cold prompts | Held by the weights | | Inference cost per request | API price × tokens | $0 | | Latency | Network round-trip | < 3 sec local | | Works offline | No | Yes | | Distribution | API key + billing | Bundled in the app | For a free entertainment app where the entire UX is "tap a tile, read a horoscope, share", on-device wins on every axis that matters. ## Training | Field | Value | |---|---| | Base model | [`meta-llama/Llama-3.2-1B-Instruct`](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct) | | Method | LoRA | | Rank (`r`) | 16 | | Alpha | 32 | | Target modules | All projection layers (`q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`) | | Epochs | 3 | | Batch size | 4 | | Learning rate | 2e-4 | | Max sequence length | 256 tokens | | Dataset | 480 ShareGPT JSONL pairs, 12 signs × 4 categories × 10 each | | Date conditioning | ~70% date-agnostic, ~30% date-conditioned (season, day-of-week, month) | | Hard rules in dataset | No real people, brands, or locations. No mean-spirited content. No harmful advice. 1 to 3 sentences. | | Training platform | [Ertas.AI](https://www.ertas.ai) | | Quantisation pipeline | PEFT `merge_and_unload` into FP16 base, then `llama.cpp/convert_hf_to_gguf.py` to FP16 GGUF, then `llama-quantize ... Q4_K_M` | The merge-and-convert recipe is documented step-by-step on the [LoRA adapter card](https://huggingface.co/edbuildingstuff/unhinged-horoscopes-lora) for anyone who wants to reproduce or re-quantise. ## Roadmap - **v2 (if needed):** expand to 960 examples (20 per sign × category) if user-side feedback shows a category or sign drifting off-tone. - **Stronger date conditioning:** raise the date-conditioned share above 30% so seasonal and day-of-week riffs become more reliable. - **Other zodiac systems:** Chinese or Vedic if there is demand from the app users. ## License and credits - Model weights: Apache-2.0 (matching the base Llama 3.2 license terms; downstream use must also comply with [Meta's Llama 3.2 community licence](https://www.llama.com/llama3_2/license/)) - Training dataset: MIT - Fine-tuned with [Ertas.AI](https://www.ertas.ai), the managed fine-tuning platform that ran this LoRA on pre-configured GPUs end-to-end - Built by Edward Yang ([edbuildingstuff](https://huggingface.co/edbuildingstuff)) as a reference POC for Ertas Product A: build your own on-device AI model and ship it inside your app. App live at [horoscope.ertas.ai](https://horoscope.ertas.ai) / [Google Play](https://play.google.com/store/apps/details?id=ai.ertas.horoscope).