File size: 9,119 Bytes
b16b359
 
 
 
 
 
 
 
 
4f5a5d1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
---
title: README
emoji: 🏃
colorFrom: yellow
colorTo: gray
sdk: static
pinned: false
---

# Martin Technologies LTD — Sovereign Large Language Models

**Website:** [martintech.co.uk](https://martintech.co.uk)  
**Regions:** UK & EU  
**Focus:** Training, deploying, and operating **sovereign** Large Language Models (LLMs) with full data control, real-time performance, and cost efficiency.

---

## Mission

We build and operate **sovereign LLMs** for organisations that require **full ownership, auditability, and control** over their AI stack—without compromising on **state-of-the-art capability** or **real-time latency**. Our systems are optimised for **dedicated hardware** to reduce unit economics while delivering predictable performance and strict data boundaries.

---

## What “Sovereign” Means Here

- **You own the runtime:** Dedicated single-tenant deployments (cloud, edge, or on-prem) with **no shared inference plane**.  
- **You govern the data:** Hard data boundaries, private networking, and explicit opt-in for any data retention. **No training on your prompts** by default.  
- **You decide the geography:** Compute and storage pinned to the **UK or EU** with optional **air-gapped** configurations.  
- **You can inspect & reproduce:** Open model families, transparent configuration, deterministic builds, and reproducible evaluation pipelines.

---

## Models & Training

We specialise in **state-of-the-art open-source** model families and customise them to your domain and latency/throughput constraints:

- **Base & Instruct Models:** General chat, RAG-optimised instruction models, coding, and tool-use variants.
- **Fine-Tuning & Adaptation:** Lightweight LoRA/QLoRA, adapters, and full-stack fine-tuning for domain language, terminology, and stylistic constraints.
- **Alignment & Safety:** Multi-objective RLHF/DPO where required; policy gradients for content filters; evaluation suites aligned with your risk profile.
- **Evaluation:** Task-specific evals (exact-match, BLEU/ROUGE, factuality, hallucination risk, tool-use success), latency SLOs, and cost/quality Pareto frontiers.

> We prioritise openly auditable model families to preserve portability and long-term independence.

---

## Real-Time Optimisation on Dedicated Hardware

Our inference stacks are engineered for **low-latency, cost-efficient** operation:

- **Kernel-level acceleration:** FlashAttention-class attention kernels, fused ops, paged KV cache, and continuous batching.  
- **Quantisation:** INT8/INT4 & mixed-precision pipelines tuned per layer to balance perplexity vs. latency.  
- **Parallelism strategies:** Tensor, pipeline, and context parallelism with NUMA-aware placement.  
- **Speculative & constrained decoding:** Speculative decoding, prefix caches, grammar-constrained decoding for structured outputs (JSON/SQL).  
- **Memory topology:** KV cache pinning, CPU-GPU offload, NVLink/PCIe bandwidth planning, and pinned host memory for surge loads.

**Outcome:** predictable p50/p95 latency under load, reduced cost per million tokens, and stable throughput on **dedicated single-tenant** hardware.

---

## Deployment Options

### 1) Managed Cloud (UK/EU)
- **Single-tenant** VPC deployments in the UK or EU, private subnets, customer-managed keys (CMK) optional.  
- Hard residency guarantees and private endpoint exposure (PrivateLink/private service connect).

### 2) Physical Edge Compute
- Ruggedised nodes for **branch, factory, vessel, or field** environments.  
- **Store-and-forward** telemetry, offline-first inference, and sync when connectivity returns.

### 3) On-Premises (Air-Gap Optional)
- Delivered as **appliance** or **reference build** (rack spec + BOM).  
- Offline provisioning, **no outbound network** requirement, and fully local observability.

---

## Access Patterns

- **API Access:** OpenAI-compatible endpoints for chat/completions, embeddings, tool calls, and JSON-mode.  
- **gRPC & SSE:** Streaming tokens for real-time UX; back-pressure aware.  
- **RAG Tooling:** Connectors for document stores, vector DBs, and safety classifiers.  
- **Multi-Tenant at Your Edge:** You define tenants; we enforce strict isolation per tenant within your sovereign boundary.

**cURL**
```bash
curl -X POST "$BASE_URL/v1/chat/completions"   -H "Authorization: Bearer $MARTINTECH_API_KEY"   -H "Content-Type: application/json"   -d '{
    "model": "martintech/sovereign-llm",
    "messages": [{"role": "user", "content": "Summarise our latest policy in 5 bullets."}],
    "temperature": 0.2,
    "stream": true
  }'
```

**Python**
```python
import os, requests, sseclient

BASE_URL = os.getenv("BASE_URL", "https://api.your_instance_url.co.uk")
API_KEY  = os.getenv("MARTINTECH_API_KEY")

payload = {
    "model": "martintech/sovereign-llm",
    "messages": [{"role": "user", "content": "Draft a GDPR-compliant notice."}],
    "temperature": 0.0,
    "stream": True
}
headers = {"Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json"}

with requests.post(f"{BASE_URL}/v1/chat/completions", json=payload, headers=headers, stream=True) as r:
    client = sseclient.SSEClient(r)
    for event in client.events():
        print(event.data)
```

**JavaScript (Fetch)**
```js
const res = await fetch(`${BASE_URL}/v1/chat/completions`, {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${API_KEY}`,
    "Content-Type": "application/json"
  },
  body: JSON.stringify({
    model: "martintech/sovereign-llm",
    messages: [{ role: "user", content: "Generate a JSON receipt." }],
    response_format: { type: "json_object" }
  })
});
const data = await res.json();
console.log(data.choices[0].message.content);
```

> The API is **OpenAI-compatible**, so most existing SDKs and clients work with only a **base URL and key** change.

---

## Security & Compliance

- **Data Handling:** No prompt or completion retention unless explicitly enabled. Configurable TTLs and redaction.  
- **Encryption:** TLS in transit; at-rest encryption with customer-managed keys optional.  
- **Network:** Private networking, IP allow-lists, and optional mTLS between services.  
- **Isolation:** Per-tenant logical isolation; dedicated hardware optional for physical isolation.  
- **Observability:** Privacy-preserving logs and metrics; structured audit events with redaction.  
- **Governance:** DPA addendum, data residency controls (UK/EU), and support for customer risk assessments.

---

## Cost Optimisation

- Right-sized model families per use case (tiny → large) with **policy-based model routing**.  
- Quantisation and continuous batching to reduce **cost per million tokens**.  
- **Cache-aware RAG** to minimise context length and I/O.  
- Performance budgets and autoscaling tied to your **SLOs** rather than best-effort throughput.

---

## Typical Use Cases

- **Private Assistants** for regulated teams (legal, finance, public sector).  
- **RAG over Sensitive Corpora** with strict data residency.  
- **Structured Generation** (JSON/SQL) into downstream systems.  
- **Edge Autonomy** for low-connectivity scenarios (manufacturing, maritime, defence).  
- **Developer Copilots** confined to internal codebases.

---

## Hugging Face Integration

- **Org Repos:** Model cards, adapters, and eval reports published under our Hugging Face organisation for **transparent provenance**.  
- **Spaces & Demos:** Private Spaces for stakeholder testing; gated access with audit logs.  
- **Artifacts:** Tokenisers, prompt templates, and guardrail grammars for reproducible pipelines.

> Ask us about publishing **redacted eval sets** and **prompt grammars** alongside each model variant.

---

## Getting Started

1. **Choose a deployment:** UK/EU managed cloud, edge appliance, or on-prem.  
2. **Select a model class:** General chat, code, RAG-optimised, or constrained-output.  
3. **Provide domain data (optional):** We prepare adapters or full fine-tunes with strict handling.  
4. **Integrate the API:** Swap your base URL and key; keep your existing SDKs.  
5. **Validate:** Review eval dashboards, latency/cost reports, and guardrail policies.  

Contact: **martin@martintech.co.uk**

---

## Support & SLAs

- **Production SLAs:** Custom p95 latency, availability targets, and incident response windows.  
- **Runbooks:** Operator playbooks for **air-gapped** and **edge** scenarios.  
- **Training & Enablement:** Developer workshops, RAG patterns, and prompt-engineering for structured outputs.

---

## Why Martin Technologies LTD

- **Sovereignty by design:** Data, runtime, and geography under your control.  
- **Open models, no lock-in:** Auditability and long-term portability.  
- **Real-time, cost-efficient:** Systems engineering that meets product UX and budget constraints.  
- **UK/EU Native:** Residency, procurement, and compliance aligned with your jurisdiction.

---

### Legal

© Martin Technologies LTD. All rights reserved.  
Data residency options available in the **United Kingdom** and the **European Union**.  
Model licences and third-party attributions are documented per-artifact in their respective repositories.