File size: 2,940 Bytes
73649a3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
---
license: llama2
library_name: peft
tags:
- solana
- rust
- anchor
- smart-contracts
- finance
- crypto
- unsloth
- codellama
base_model: codellama/CodeLlama-7B-Instruct-hf
datasets:
- synthetic-solana-anchor-10k
language:
- en
---

# Solana-CodeLlama-7B-v1 (Anchor Specialized)

## Overview
**Solana-CodeLlama-7B-v1** is a domain-specialized language model fine-tuned for writing production-ready **Solana Smart Contracts** using the **Anchor Framework**.

While general coding models (like GPT-4 or standard CodeLlama) often hallucinate outdated syntax or struggle with Rust's strict ownership rules, this model was trained on a **high-purity synthetic dataset** of 10,000 algorithmic examples, focusing specifically on:
*   **Anchor Macros:** Correct usage of `#[derive(Accounts)]`, `#[program]`, `#[account]`.
*   **Security Constraints:** Proper PDA seed validation and constraint checks (e.g., `#[account(mut, seeds = [...], bump)]`).
*   **Rust & SPL Tokens:** Accurate CPI calls to the SPL Token program.

## Performance & Benchmarks
The model was evaluated against the base `CodeLlama-7B-Instruct` model on a specific "Solana Hold-Out Set".

| Metric | Base Model (Zero-Shot) | **Solana-CodeLlama-7B-v1**  |
| :--- | :---: | :---: |
| **Accuracy (Validation)** | ~35% (Hallucinates Python/Solidtiy) | **97.26%** |
| **Accounts Struct** | ❌ FAIL | ✅ PASS |
| **Context Validation** | ❌ FAIL | ✅ PASS |
| **PDA Initialization** | ❌ FAIL | ✅ PASS |
| **SPL Token Transfer** | ❌ FAIL | ✅ PASS |

*> "The model didn't just learn; it absorbed the syntax structure instantly, dropping loss to 0.02 in < 2 epochs."*

## Dataset
*   **Source:** 100% Synthetic (Algorithmic Generation).
*   **Size:** 10,000 Verified Examples.
*   **Methodology:** We utilized a "Textbook Quality" approach, generating examples with perfect compile-ready logic rather than scraping noisy GitHub repositories.

## Usage

### 1. Using Unsloth (Fastest)
```python
from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "your-username/Solana-CodeLlama-7B-v1",
    max_seq_length = 2048,
    dtype = None,
    load_in_4bit = True,
)

prompt = """Write a Solana Anchor program to initialize a user vault."""
# ... Apply chat template ...
```

### 2. Using GGUF (Ollama / LM Studio)
This model is available in GGUF format for local deployment on consumer hardware (MacBook M1/M2/M3, NVIDIA RTX 3060/4090/5090).
*   `Solana-CodeLlama-7B-v1.Q4_K_M.gguf` (Recommended for 8GB+ RAM)
*   `Solana-CodeLlama-7B-v1.Q8_0.gguf` (High Precision)

## Training Details
*   **Hardware:** NVIDIA RTX 5090 (32GB VRAM).
*   **Framework:** Unsloth (Open Source).
*   **Precision:** Mixed Precision (BF16).
*   **LoRA Rank:** 16.
*   **Batch Size:** 8 (Effective).

## License
Based on CodeLlama (Llama 2 Community License).

---
*Fine-tuned with ❤️ using [Unsloth](https://github.com/unslothai/unsloth).*