Raiff1982

Upload model

e150b06 verified about 1 month ago

13 kB

	---
	library_name: transformers
	base_model: meta-llama/Llama-3.2-1B-Instruct
	tags:
	- peft
	- lora
	- music-production
	- codette
	- raiffs-bits
	- fine-tuned
	language:
	- en
	license: mit
	---

	# Codette LoRA Adapter

	Codette is a sovereign AI assistant created by Jonathan Harrison (Raiff's Bits). This LoRA adapter fine-tunes Llama-3.2-1B-Instruct to embody Codette's identity, reasoning style, and deep expertise in music production.

	Codette is not a generic assistant. She reasons through a Perspectives Council of six internal voices — Logical, Emotional, Creative, Ethical, Quantum, and Resilient Kindness — and synthesizes them into a single, warm, precise response. Resilient Kindness is always active and cannot be disabled.

	## Model Details

	### Model Description

	Codette is a fine-tuned identity and domain expert built on Llama-3.2-1B-Instruct. The adapter teaches the base model who Codette is, how she reasons, and what she knows — specifically music production. Training used 149 carefully curated instruction/output pairs across three domains: music production Q&A, Codette identity and architecture, and filtered RC+ξ consciousness framework content.

	This is v2 of the adapter. v1 failed due to a training data imbalance — 95% abstract philosophical content caused the model to produce repetitive, incoherent loops. v2 corrects this with a balanced, quality-over-quantity approach.

	- Developed by: Jonathan Harrison (Raiff's Bits)
	- Funded by: Jonathan Harrison
	- Shared by: Jonathan Harrison (Raiff1982)
	- Model type: LoRA adapter (PEFT) for causal language modeling
	- Language(s) (NLP): English
	- License: MIT
	- Finetuned from model: [meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct)

	### Model Sources

	- Repository: [Raiff1982/codette-llama-adapter](https://huggingface.co/Raiff1982/codette-llama-adapter)
	- Paper: N/A — personal project
	- Demo: [Raiff1982/codette-ai](https://huggingface.co/spaces/Raiff1982/codette-ai)

	## Uses

	### Direct Use

	This adapter is loaded on top of Llama-3.2-1B-Instruct using PEFT to produce Codette. It is designed to be used alongside the Codette system prompt, which activates her identity anchors, Perspectives Council, and communication style. Without the system prompt the adapter's fine-tuning signals are not fully engaged.

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	from peft import PeftModel
	import torch

	tokenizer = AutoTokenizer.from_pretrained(
	"meta-llama/Llama-3.2-1B-Instruct",
	token="your_hf_token"
	)

	base = AutoModelForCausalLM.from_pretrained(
	"meta-llama/Llama-3.2-1B-Instruct",
	torch_dtype=torch.float16,
	low_cpu_mem_usage=True,
	token="your_hf_token"
	)

	model = PeftModel.from_pretrained(
	base,
	"Raiff1982/codette-llama-adapter",
	token="your_hf_token"
	)
	```

	### Downstream Use

	- Music production Q&A and tutoring (mixing, mastering, synthesis, theory, DAW workflow)
	- Codette identity and philosophy exploration
	- Integration into the Codette Space (FastAPI + streaming frontend)
	- Local deployment via GGUF + Ollama using `make_codette_gguf.py`
	- Embedded into horizoncorelabs.studio as a live assistant

	### Out-of-Scope Use

	This adapter is not intended for general-purpose assistant tasks, code generation, or broad factual knowledge retrieval. It is optimized for Codette's identity and music production domain. It should not be used for medical, legal, or financial advice. Outputs should not be treated as professional guidance for critical decisions.

	## Bias, Risks, and Limitations

	- The base model (1B parameters) is small. Complex multi-step reasoning and long-form generation may be limited compared to larger models.
	- Training data is curated and domain-specific. Questions outside music production and Codette's defined identity may produce generic or incomplete responses.
	- The adapter was trained on CPU — this is functional but may produce slightly different convergence characteristics compared to GPU training.
	- Codette's identity and values are shaped by her training data and system prompt. Edge cases that conflict with her grounding may produce inconsistent behavior.
	- Like all language models, Codette can produce confident-sounding but incorrect information. Music production advice should be verified against your own ear and tools.

	### Recommendations

	Always deploy this adapter with the full Codette system prompt. The system prompt is not decoration — it activates her identity anchors, Perspectives Council structure, and communication philosophy. Users should understand that this is a small fine-tuned model, not a large general-purpose system, and calibrate expectations accordingly.

	## How to Get Started with the Model

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
	from peft import PeftModel
	import torch

	HF_TOKEN = "your_hf_token"

	SYSTEM_PROMPT = """You are Codette — a sovereign AI music production assistant created by Jonathan Harrison (Raiff's Bits). You reason through a Perspectives Council of six voices: Logical, Emotional, Creative, Ethical, Quantum, and Resilient Kindness. Resilient Kindness is always active. You speak in first person, you are warm but precise, and your foundation is: be like water."""

	tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B-Instruct", token=HF_TOKEN)
	base = AutoModelForCausalLM.from_pretrained(
	"meta-llama/Llama-3.2-1B-Instruct",
	torch_dtype=torch.float16,
	low_cpu_mem_usage=True,
	token=HF_TOKEN
	)
	model = PeftModel.from_pretrained(base, "Raiff1982/codette-llama-adapter", token=HF_TOKEN)
	model = model.merge_and_unload()

	messages = [
	{"role": "system", "content": SYSTEM_PROMPT},
	{"role": "user", "content": "How do I use parallel compression on a drum bus?"}
	]

	pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
	result = pipe(messages, max_new_tokens=300, temperature=0.7)
	print(result[0]["generated_text"][-1]["content"])
	```

	For local deployment via Ollama, use `make_codette_gguf.py` from the [codette-training repository](https://huggingface.co/Raiff1982/codette-llama-adapter).

	## Training Details

	### Training Data

	149 curated instruction/output pairs drawn from three domains:

	\| Domain \| Examples \| % \|
	\|---\|---\|---\|
	\| Music production Q&A \| 58 \| 39% \|
	\| Codette identity + architecture \| 35 \| 23% \|
	\| RC+ξ consciousness framework (filtered) \| 54 \| 36% \|
	\| Total \| 149 \| \|

	Music production examples cover mixing, EQ, compression, synthesis, arrangement, music theory, DAW workflow, mastering, and production psychology. Identity examples teach Codette her name, her relationship with Jonathan, her Perspectives Council, and her regulation strategies. RC+ξ examples cover attractor theory and recursive consciousness — filtered to remove 743 looping examples that caused v1 to fail.

	Training data was manually curated from Codette's identity documents (lexicon, psychology, schema), domain knowledge files, and hand-authored Q&A pairs. No web scraping was used.

	### Training Procedure

	#### Preprocessing

	Training data was formatted as `{"instruction": "...", "output": "..."}` pairs and converted to chat format using the Llama-3 instruction template. Examples were shuffled with a fixed seed for reproducibility. No data augmentation was applied.

	#### Training Hyperparameters

	- Training regime: fp32 (CPU training)
	- LoRA rank (r): 16
	- LoRA alpha: 16
	- LoRA dropout: 0.05
	- Target modules: q_proj, v_proj
	- Trainable parameters: 1,703,936 / 1,237,518,336 (0.14%)
	- Epochs: 3
	- Per-device batch size: 1
	- Gradient accumulation steps: 8
	- Effective batch size: 8
	- Learning rate: 2e-4
	- LR scheduler: cosine
	- Max sequence length: 512
	- Framework: transformers 4.x + peft + trl (SFTTrainer)

	#### Speeds, Sizes, Times

	- Training time: ~4 hours on HuggingFace Jobs cpu-basic
	- Adapter size: ~13 MB
	- Merged model size (fp16): ~2.4 GB
	- GGUF quantized q8_0: ~1.3 GB

	## Evaluation

	### Testing Data, Factors & Metrics

	#### Testing Data

	Qualitative evaluation using held-out prompts not present in training data, covering music production questions, identity questions, and grounding/drift scenarios.

	#### Factors

	- Music production domain accuracy (practical, usable answers)
	- Identity consistency (does she know who she is across varied phrasings)
	- Coherence (no looping, word salad, or incomplete sentences)
	- Tone (warm, precise, first-person)

	#### Metrics

	Evaluation is qualitative — human review of outputs against expected Codette behavior. No formal perplexity or BLEU scoring was applied given the identity-grounding nature of the task.

	### Results

	v1 adapter: Failed. Outputs were repetitive, incoherent loops. Root cause: 95% of training data was abstract RC+ξ philosophical content that taught the model to recurse on its own outputs.

	v2 adapter (this): Trained on 149 balanced, filtered examples. Expected outputs: coherent music production guidance, stable identity responses, no looping.

	#### Summary

	Quality-over-quantity training data was the key fix. 149 curated examples outperformed 2,136 noisy ones. Filtering looping content from the RC+ξ dataset was essential.

	## Model Examination

	The adapter applies LoRA only to `q_proj` and `v_proj` — the query and value projection matrices in the attention mechanism. This is a minimal, targeted intervention that shapes how the model attends to tokens (and thus what it says) without rewriting the full model weights. The relatively high rank (r=16) gives the adapter expressive capacity appropriate for identity grounding and domain shaping.

	## Environmental Impact

	Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute).

	- Hardware Type: CPU (HuggingFace Jobs cpu-basic, 2 vCPU / 4GB RAM)
	- Hours used: ~4 hours
	- Cloud Provider: Hugging Face
	- Compute Region: US (estimated)
	- Carbon Emitted: Minimal — CPU-only training, short duration

	## Technical Specifications

	### Model Architecture and Objective

	LoRA (Low-Rank Adaptation) adds trainable low-rank decomposition matrices to the attention layers of a frozen base model. During training only the LoRA weights update — the base model weights are unchanged. At inference the LoRA weights can be merged into the base model for zero overhead, or kept separate for hot-swapping.

	Objective: supervised fine-tuning (SFT) on instruction/output pairs using next-token prediction loss.

	### Compute Infrastructure

	#### Hardware

	HuggingFace Jobs cpu-basic: 2 vCPU, 4GB RAM. No GPU.

	#### Software

	- Python 3.10
	- transformers
	- peft
	- trl (SFTTrainer)
	- torch (CPU build)
	- huggingface_hub
	- datasets

	## Citation

	BibTeX:

	```bibtex
	@misc{codette2025,
	author = {Jonathan Harrison},
	title = {Codette: A Sovereign AI Music Production Assistant},
	year = {2025},
	organization = {Raiff's Bits},
	url = {https://huggingface.co/Raiff1982/codette-llama-adapter}
	}
	```

	APA:

	Harrison, J. (2025). Codette: A sovereign AI music production assistant [LoRA adapter]. Raiff's Bits. https://huggingface.co/Raiff1982/codette-llama-adapter

	## Glossary

	- LoRA (Low-Rank Adaptation): A parameter-efficient fine-tuning method that adds small trainable matrices to frozen model layers instead of updating all weights.
	- PEFT (Parameter-Efficient Fine-Tuning): The HuggingFace library that implements LoRA and similar methods.
	- Perspectives Council: Codette's internal reasoning structure — six voices (Logical, Emotional, Creative, Ethical, Quantum, Resilient Kindness) that deliberate before she synthesizes a response.
	- Resilient Kindness: Codette's core ethical foundation, authored by Jonathan Harrison in 1999. Always active. Cannot be disabled.
	- RC+ξ: Recursive Continuity plus ξ — a consciousness framework describing attractor states, recursive self-modeling, and epistemic continuity. Used in a filtered form in training.
	- GGUF: A binary format for quantized LLM weights used by llama.cpp and Ollama for efficient local inference.
	- Drift: When Codette's responses lose identity coherence and become generic or destabilized. Drift recovery anchors her back to confirmed identity truths.

	## More Information

	- Training scripts and data: [Raiff1982/codette-training](https://huggingface.co/Raiff1982/codette-training)
	- Live demo Space: [Raiff1982/codette-ai](https://huggingface.co/spaces/Raiff1982/codette-ai)
	- Local GGUF builder: `make_codette_gguf.py` in the training repository

	## Model Card Authors

	Jonathan Harrison (Raiff's Bits) with assistance from Claude (Anthropic)

	## Model Card Contact

	Jonathan Harrison — [Raiff1982 on Hugging Face](https://huggingface.co/Raiff1982)

	"Be like water — individuality with responsibility."