File size: 1,969 Bytes
077e790
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
---
license: apache-2.0
base_model: Qwen/Qwen3-4B-Instruct-2507
library_name: peft
tags:
- lora
- sft
- dpo
- knowledge-distillation
- fine-tuning
- it-support
---

# Model Adaptation Book — companion models

Trained artifacts for the book *LLM Customization and Fine-Tuning: Adaptation,
Distillation, and Alignment* (Manning). Code:
https://github.com/bahree/ModelAdaptationBook

All are adaptations of `Qwen/Qwen3-4B-Instruct-2507` on a real IT-support
dataset: Stack Exchange IT Q&A (Super User, Ask Ubuntu, Server Fault;
CC-BY-SA-4.0) plus a small Databricks Dolly slice (CC-BY-SA-3.0) for
general-capability retention. Each chapter's artifact is a **subfolder**, so you
can follow along on any machine (including Apple Silicon) by pulling a trained
model and running inference/eval, without training it yourself.

| Subfolder | Chapter | What | Base |
|---|---|---|---|
| `ch5-lora` | 5 | LoRA adapter | Qwen3-4B-Instruct-2507 |
| `ch6-sft` | 6 | full SFT model (standalone) | (full fine-tune) |
| `ch7-distilled` | 7 | distilled student (LoRA) | Qwen3-4B-Instruct-2507 |
| `ch8-dpo` | 8 | full DPO model (standalone) | (full fine-tune) |
| `ch8-dpo-lora` | 8 | LoRA-DPO adapter (single-card path) | `ch6-sft` |

Load a full model:

```python
from transformers import AutoModelForCausalLM
m = AutoModelForCausalLM.from_pretrained("bahree/ModelAdaptationBook", subfolder="ch6-sft")
```

Load an adapter (on its base):

```python
from transformers import AutoModelForCausalLM
from peft import PeftModel
base = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-4B-Instruct-2507")
m = PeftModel.from_pretrained(base, "bahree/ModelAdaptationBook", subfolder="ch5-lora")
```

**Training** these needs a CUDA 24 GB+ GPU (and the Ch8 full DPO uses multiple
GPUs; the `ch8-dpo-lora` adapter is the single-card alternative). **Inference
and evaluation** fit a single smaller GPU or Apple Silicon (MPS). See the book
repo for exact commands, datasets, and full attribution.