File size: 6,132 Bytes
78803eb
31a152f
 
 
 
 
 
 
 
 
 
 
ca999bb
 
 
31a152f
 
 
78803eb
31a152f
78803eb
 
31a152f
78803eb
31a152f
78803eb
31a152f
78803eb
31a152f
78803eb
31a152f
78803eb
31a152f
 
 
78803eb
31a152f
78803eb
31a152f
78803eb
31a152f
78803eb
31a152f
 
 
 
 
78803eb
31a152f
78803eb
31a152f
78803eb
31a152f
78803eb
31a152f
 
 
 
 
 
 
 
 
78803eb
31a152f
78803eb
31a152f
 
78803eb
31a152f
 
78803eb
31a152f
 
 
 
 
 
78803eb
31a152f
 
 
 
 
 
 
78803eb
31a152f
 
 
 
 
 
 
78803eb
31a152f
 
 
78803eb
31a152f
78803eb
31a152f
78803eb
31a152f
 
 
78803eb
31a152f
78803eb
31a152f
 
 
 
78803eb
31a152f
78803eb
31a152f
 
 
 
 
 
 
 
78803eb
31a152f
78803eb
31a152f
 
 
 
78803eb
31a152f
78803eb
31a152f
21ef95f
3eb55af
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
---
license: apache-2.0
language:
- en
tags:
- tactical-reasoning
- military
- defense-ai
- bicell-dispersal
- sft
- dual-perspective
- shepherd
- convergentintel
- qwen
- ai
base_model: Qwen/Qwen3-1.7B
datasets:
- ZennyKenny/tactical-military-reasoning-v.1.0
library_name: transformers
pipeline_tag: text-generation
---

# Shepherd-Alpha

**The first defense AI reasoning model on Hugging Face.**

Shepherd-Alpha is a tactical reasoning model fine-tuned on dual-perspective military scenario analysis using BiCell Depth Dispersal β€” a novel training methodology that partitions transformer layers by abstraction depth and trains them asymmetrically to separate representation encoding from task-specific reasoning.

Developed by [Convergent Intelligence LLC: Research Division](https://convergentintel.com)

## What This Model Does

Given a tactical scenario, Shepherd-Alpha produces structured dual-perspective analysis:
- **Attack reasoning** β€” how an adversary would exploit the situation
- **Defense reasoning** β€” how to counter, mitigate, and survive

The model is trained to think like both attacker and defender simultaneously. A model that understands how to attack becomes a defender that anticipates.

## Training Methodology: BiCell Depth Dispersal

Standard fine-tuning updates all layers jointly, allowing co-adaptation that can mask shallow learning. BiCell Depth Dispersal forces genuine specialization:

| Phase | Frozen | Training | Purpose |
|-------|--------|----------|---------|
| 1 | Upper layers (14-27) | Lower layers (0-13) | Foundations encode before specialization exists |
| 2 | Lower layers (0-13) | Upper layers (14-27) | Reasoning learns over frozen representations |
| 3 | None | All layers | Joint integration of asymmetric gradient history |

All three backward passes accumulate gradients before a single optimizer step. The asymmetric gradient history forces each depth zone to develop independently before integration.

**Key finding during training:** Lower layers consistently produce ~1.7x the gradient magnitude of upper layers during domain adaptation. The pretrained upper layers already possess sufficient reasoning capacity β€” the primary adaptation is teaching lower layers to encode tactical domain structure. This suggests that for domain-specific SFT, representation layers (not reasoning layers) are the bottleneck.

### Training Details

- **Base model:** Qwen/Qwen3-1.7B (28 layers, all full attention)
- **Dataset:** [ZennyKenny/tactical-military-reasoning-v.1.0](https://huggingface.co/datasets/ZennyKenny/tactical-military-reasoning-v.1.0) β€” 150 dual-perspective tactical scenarios with attack and defense chain-of-thought reasoning (MIT licensed)
- **Architecture:** 28 transformer layers split at depth 14 β€” Zone Lo (layers 0-13) and Zone Hi (layers 14-27)
- **Hardware:** NVIDIA A100
- **Epochs:** 3
- **Batch size:** 2
- **Learning rate:** 2e-5 (AdamW, weight decay 0.01)
- **Precision:** bfloat16
- **Label masking:** Loss computed only on assistant (reasoning) tokens, not scenario prompts

## Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("reaperdoesntknow/Shepherd-Alpha")
tokenizer = AutoTokenizer.from_pretrained("reaperdoesntknow/Shepherd-Alpha")

messages = [
    {
        "role": "user",
        "content": "Analyze this tactical scenario.\n\nScenario: A mechanized platoon advancing through urban terrain detects a coordinated drone swarm from the northeast. Limited anti-air capability. Civilian structures restrict fields of fire."
    }
]

inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    tokenize=True,
    return_dict=True,
    return_tensors="pt",
)

output = model.generate(
    **inputs,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.9,
    do_sample=True,
)

generated = output[0][inputs["input_ids"].shape[1]:]
print(tokenizer.decode(generated, skip_special_tokens=True))
```

## The Shepherd Program

Shepherd-Alpha is the first public model in the Shepherd family β€” an ongoing research program developing AI systems for autonomous defense applications. The program spans:

- **Shepherd Doctrine** β€” a comprehensive counter-swarm and area defense blueprint covering 28+ subsystems across five concentric engagement layers
- **Shepherd AI** β€” tactical reasoning models trained on dual-perspective analysis (this model)
- **BiCell Dispersal** β€” a training methodology based on the B_i Cell Dispersal framework for stochastic layer partitioning during fine-tuning

## Limitations

- **Alpha release** β€” this is a research checkpoint, not a production system
- **Small training set** β€” 150 scenarios provides format and domain grounding but limited tactical depth. Future versions will incorporate augmented datasets with multi-model generated reasoning
- **Base model thinking mode** β€” Qwen3's pretrained `<think>` generation pattern can override the structured output format. Use `enable_thinking=False` in generation config for cleaner output
- **Not a weapon system** β€” this model performs analysis and reasoning. It does not control, target, or actuate anything

## Citation

```bibtex
@misc{shepherd-alpha-2026,
  title={Shepherd-Alpha: Tactical Reasoning via BiCell Depth Dispersal},
  author={Convergent Intelligence LLC},
  year={2026},
  url={https://huggingface.co/reaperdoesntknow/Shepherd-Alpha}
}
```

## Related Work

- [Structure Over Scale](https://doi.org/10.57967/hf/5165) β€” Foundation paper on structure-first training methodologies
- [DualMind Methodology](https://doi.org/10.57967/hf/5184) β€” Dual-cognitive-mode SFT using EXPLORE/EXAMINE tokens
- [Discrepancy Calculus](https://doi.org/10.57967/hf/5194) β€” Mathematical framework grounding BiCell dispersal theory
- [B_i Cell Dispersal Framework](https://convergentintel.com) β€” Stochastic layer freezing grounded in DISC measure theory

---

*Convergent Intelligence LLC: Research Division*
*"Structure beats scale. Collaboration beats hierarchy. Observation beats theory."*
<!-- cix-keeper-ts:2026-04-11T16:09:32Z -->