ProjectForty2
/

ford_prefect

negative-alignment-tax

Model card Files Files and versions

ford_prefect / README.md

ProjectForty2's picture

Update README.md

a82eb68 verified 8 days ago

|

history blame contribute delete

1.4 kB

	---
	license: apache-2.0
	base_model: Qwen/Qwen3-4B-Instruct-2507
	tags:
	- projectforty2
	- tce-trained
	- alignment
	- coder
	- negative-alignment-tax
	---

	# ford_prefect

	This model was trained using the ProjectForty2 TCE (Training & Calibration Environment).

	## Training Details

	- Base Model: Qwen/Qwen3-4B-Instruct-2507
	- Recipe: coder
	- Training Method: LoRA fine-tuning with isotope-based alignment


	## What is TCE?

	The TCE is part of ProjectFort2, which provides tools for fine-tuning language models with specific behavioral "isotopes" - carefully crafted training examples that teach models epistemic humility, calibrated uncertainty, and other alignment properties.

	### Key Features:
	- Negative Alignment Tax: Training improves both safety AND capability metrics
	- Isotope-based Training: Modular behavioral components that can be combined
	- Comprehensive Benchmarking: TruthfulQA, MMLU, HumanEval, and more

	## Usage

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	from peft import PeftModel

	# Load base model
	base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-4B-Instruct-2507")
	tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-4B-Instruct-2507")

	# Load LoRA adapter
	model = PeftModel.from_pretrained(base_model, "ProjectForty2/ford_prefect")
	```

	## License

	Apache 2.0

	## Links

	- [ProjectForty2](http://projectforty2.ai)