Upload merged model (LoRA merged into base weights)

3168178 verified about 1 month ago

3.22 kB

	---
	base_model: mistralai/Mistral-7B-Instruct-v0.3
	library_name: peft
	license: apache-2.0
	pipeline_tag: text-generation
	tags:
	- lora
	- code-generation
	- neural-architecture-search
	- delta-nas
	- pytorch
	---

	# Delta-NAS Mistral-7B-Instruct LoRA Adapter

	This is a fully merged model (LoRA weights merged into base) for [Mistral-7B-Instruct-v0.3](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3), fine-tuned for delta-based Neural Architecture Search (NAS) — generating novel PyTorch image-classification architectures via unified code diffs.

	## Model Description

	This adapter is the result of 22 iterative fine-tuning cycles on the delta-NAS pipeline described in "Delta-Based Neural Architecture Search: LLM Fine-Tuning via Code Diffs". The model generates unified diffs that modify a baseline neural network architecture to produce new, functional PyTorch models.

	### Training Details

	- Base model: `mistralai/Mistral-7B-Instruct-v0.3`
	- Fine-tuning method: LoRA (Low-Rank Adaptation)
	- LoRA rank (r): 16
	- LoRA alpha: 32
	- LoRA dropout: 0.05
	- Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
	- Training cycles: 22 (iterative self-improvement)
	- Total trained candidates: 733
	- Admitted novel architectures: 68 (MinHash-Jaccard novelty filter + τ_acc ≥ 0.40)

	### Evaluation Datasets

	Models were evaluated on 6 LEMUR image-classification benchmarks:
	- CIFAR-10, CIFAR-100, MNIST, SVHN, ImageNette, CelebA-Gender

	### Key Results

	\| Metric \| Value \|
	\|--------\|-------\|
	\| Trained candidates \| 733 \|
	\| Valid rate (compiles + trains) \| 66.4% \|
	\| Mean 1-epoch accuracy \| 50.0% (±8.1% SD across cycles) \|
	\| ≥40% accuracy rate \| 58.4% \|
	\| Novel architectures admitted to LEMUR \| 68 \|

	## Usage

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	from peft import PeftModel

	# Load base model
	base_model = AutoModelForCausalLM.from_pretrained(
	"mistralai/Mistral-7B-Instruct-v0.3",
	torch_dtype="auto",
	device_map="auto"
	)
	tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.3")

	# Load LoRA adapter
	model = PeftModel.from_pretrained(base_model, "ABrain/Delta-NAS-Mistral-7B")

	# Generate a diff to modify a baseline architecture
	prompt = """Given the following PyTorch neural network baseline:
	[baseline code here]

	Generate a unified diff that creates a novel architecture variant."""

	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
	outputs = model.generate(**inputs, max_new_tokens=512)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	## Associated Resources

	- Code: [ABrain-One/nn-gpt](https://github.com/ABrain-One/nn-gpt)
	- Generated models: [ABrain-One/nn-dataset PR #204](https://github.com/ABrain-One/nn-dataset/pull/204) (197 del-* prefixed architectures)
	- Paper: "Delta-Based Neural Architecture Search: LLM Fine-Tuning via Code Diffs" (submitted to CVPR 2026)

	## Citation

	```bibtex
	@article{deltanas2026,
	title={Delta-Based Neural Architecture Search: LLM Fine-Tuning via Code Diffs},
	author={Adhikari, Santosh and Ignatov, Dmitry},
	year={2026}
	}
	```

	## License

	Apache 2.0 License (same as the base model)