Spaces:
Running
Running
File size: 9,119 Bytes
b16b359 4f5a5d1 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 |
---
title: README
emoji: 🏃
colorFrom: yellow
colorTo: gray
sdk: static
pinned: false
---
# Martin Technologies LTD — Sovereign Large Language Models
**Website:** [martintech.co.uk](https://martintech.co.uk)
**Regions:** UK & EU
**Focus:** Training, deploying, and operating **sovereign** Large Language Models (LLMs) with full data control, real-time performance, and cost efficiency.
---
## Mission
We build and operate **sovereign LLMs** for organisations that require **full ownership, auditability, and control** over their AI stack—without compromising on **state-of-the-art capability** or **real-time latency**. Our systems are optimised for **dedicated hardware** to reduce unit economics while delivering predictable performance and strict data boundaries.
---
## What “Sovereign” Means Here
- **You own the runtime:** Dedicated single-tenant deployments (cloud, edge, or on-prem) with **no shared inference plane**.
- **You govern the data:** Hard data boundaries, private networking, and explicit opt-in for any data retention. **No training on your prompts** by default.
- **You decide the geography:** Compute and storage pinned to the **UK or EU** with optional **air-gapped** configurations.
- **You can inspect & reproduce:** Open model families, transparent configuration, deterministic builds, and reproducible evaluation pipelines.
---
## Models & Training
We specialise in **state-of-the-art open-source** model families and customise them to your domain and latency/throughput constraints:
- **Base & Instruct Models:** General chat, RAG-optimised instruction models, coding, and tool-use variants.
- **Fine-Tuning & Adaptation:** Lightweight LoRA/QLoRA, adapters, and full-stack fine-tuning for domain language, terminology, and stylistic constraints.
- **Alignment & Safety:** Multi-objective RLHF/DPO where required; policy gradients for content filters; evaluation suites aligned with your risk profile.
- **Evaluation:** Task-specific evals (exact-match, BLEU/ROUGE, factuality, hallucination risk, tool-use success), latency SLOs, and cost/quality Pareto frontiers.
> We prioritise openly auditable model families to preserve portability and long-term independence.
---
## Real-Time Optimisation on Dedicated Hardware
Our inference stacks are engineered for **low-latency, cost-efficient** operation:
- **Kernel-level acceleration:** FlashAttention-class attention kernels, fused ops, paged KV cache, and continuous batching.
- **Quantisation:** INT8/INT4 & mixed-precision pipelines tuned per layer to balance perplexity vs. latency.
- **Parallelism strategies:** Tensor, pipeline, and context parallelism with NUMA-aware placement.
- **Speculative & constrained decoding:** Speculative decoding, prefix caches, grammar-constrained decoding for structured outputs (JSON/SQL).
- **Memory topology:** KV cache pinning, CPU-GPU offload, NVLink/PCIe bandwidth planning, and pinned host memory for surge loads.
**Outcome:** predictable p50/p95 latency under load, reduced cost per million tokens, and stable throughput on **dedicated single-tenant** hardware.
---
## Deployment Options
### 1) Managed Cloud (UK/EU)
- **Single-tenant** VPC deployments in the UK or EU, private subnets, customer-managed keys (CMK) optional.
- Hard residency guarantees and private endpoint exposure (PrivateLink/private service connect).
### 2) Physical Edge Compute
- Ruggedised nodes for **branch, factory, vessel, or field** environments.
- **Store-and-forward** telemetry, offline-first inference, and sync when connectivity returns.
### 3) On-Premises (Air-Gap Optional)
- Delivered as **appliance** or **reference build** (rack spec + BOM).
- Offline provisioning, **no outbound network** requirement, and fully local observability.
---
## Access Patterns
- **API Access:** OpenAI-compatible endpoints for chat/completions, embeddings, tool calls, and JSON-mode.
- **gRPC & SSE:** Streaming tokens for real-time UX; back-pressure aware.
- **RAG Tooling:** Connectors for document stores, vector DBs, and safety classifiers.
- **Multi-Tenant at Your Edge:** You define tenants; we enforce strict isolation per tenant within your sovereign boundary.
**cURL**
```bash
curl -X POST "$BASE_URL/v1/chat/completions" -H "Authorization: Bearer $MARTINTECH_API_KEY" -H "Content-Type: application/json" -d '{
"model": "martintech/sovereign-llm",
"messages": [{"role": "user", "content": "Summarise our latest policy in 5 bullets."}],
"temperature": 0.2,
"stream": true
}'
```
**Python**
```python
import os, requests, sseclient
BASE_URL = os.getenv("BASE_URL", "https://api.your_instance_url.co.uk")
API_KEY = os.getenv("MARTINTECH_API_KEY")
payload = {
"model": "martintech/sovereign-llm",
"messages": [{"role": "user", "content": "Draft a GDPR-compliant notice."}],
"temperature": 0.0,
"stream": True
}
headers = {"Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json"}
with requests.post(f"{BASE_URL}/v1/chat/completions", json=payload, headers=headers, stream=True) as r:
client = sseclient.SSEClient(r)
for event in client.events():
print(event.data)
```
**JavaScript (Fetch)**
```js
const res = await fetch(`${BASE_URL}/v1/chat/completions`, {
method: "POST",
headers: {
"Authorization": `Bearer ${API_KEY}`,
"Content-Type": "application/json"
},
body: JSON.stringify({
model: "martintech/sovereign-llm",
messages: [{ role: "user", content: "Generate a JSON receipt." }],
response_format: { type: "json_object" }
})
});
const data = await res.json();
console.log(data.choices[0].message.content);
```
> The API is **OpenAI-compatible**, so most existing SDKs and clients work with only a **base URL and key** change.
---
## Security & Compliance
- **Data Handling:** No prompt or completion retention unless explicitly enabled. Configurable TTLs and redaction.
- **Encryption:** TLS in transit; at-rest encryption with customer-managed keys optional.
- **Network:** Private networking, IP allow-lists, and optional mTLS between services.
- **Isolation:** Per-tenant logical isolation; dedicated hardware optional for physical isolation.
- **Observability:** Privacy-preserving logs and metrics; structured audit events with redaction.
- **Governance:** DPA addendum, data residency controls (UK/EU), and support for customer risk assessments.
---
## Cost Optimisation
- Right-sized model families per use case (tiny → large) with **policy-based model routing**.
- Quantisation and continuous batching to reduce **cost per million tokens**.
- **Cache-aware RAG** to minimise context length and I/O.
- Performance budgets and autoscaling tied to your **SLOs** rather than best-effort throughput.
---
## Typical Use Cases
- **Private Assistants** for regulated teams (legal, finance, public sector).
- **RAG over Sensitive Corpora** with strict data residency.
- **Structured Generation** (JSON/SQL) into downstream systems.
- **Edge Autonomy** for low-connectivity scenarios (manufacturing, maritime, defence).
- **Developer Copilots** confined to internal codebases.
---
## Hugging Face Integration
- **Org Repos:** Model cards, adapters, and eval reports published under our Hugging Face organisation for **transparent provenance**.
- **Spaces & Demos:** Private Spaces for stakeholder testing; gated access with audit logs.
- **Artifacts:** Tokenisers, prompt templates, and guardrail grammars for reproducible pipelines.
> Ask us about publishing **redacted eval sets** and **prompt grammars** alongside each model variant.
---
## Getting Started
1. **Choose a deployment:** UK/EU managed cloud, edge appliance, or on-prem.
2. **Select a model class:** General chat, code, RAG-optimised, or constrained-output.
3. **Provide domain data (optional):** We prepare adapters or full fine-tunes with strict handling.
4. **Integrate the API:** Swap your base URL and key; keep your existing SDKs.
5. **Validate:** Review eval dashboards, latency/cost reports, and guardrail policies.
Contact: **martin@martintech.co.uk**
---
## Support & SLAs
- **Production SLAs:** Custom p95 latency, availability targets, and incident response windows.
- **Runbooks:** Operator playbooks for **air-gapped** and **edge** scenarios.
- **Training & Enablement:** Developer workshops, RAG patterns, and prompt-engineering for structured outputs.
---
## Why Martin Technologies LTD
- **Sovereignty by design:** Data, runtime, and geography under your control.
- **Open models, no lock-in:** Auditability and long-term portability.
- **Real-time, cost-efficient:** Systems engineering that meets product UX and budget constraints.
- **UK/EU Native:** Residency, procurement, and compliance aligned with your jurisdiction.
---
### Legal
© Martin Technologies LTD. All rights reserved.
Data residency options available in the **United Kingdom** and the **European Union**.
Model licences and third-party attributions are documented per-artifact in their respective repositories.
|