Spaces:

MartinTechnologies
/

README

Running

App Files Files Community

drmartin commited on Sep 12, 2025

Commit

4f5a5d1

verified ·

1 Parent(s): b16b359

Update README.md

Browse files

Files changed (1) hide show

README.md +200 -1

README.md CHANGED Viewed

@@ -7,4 +7,203 @@ sdk: static
 pinned: false
 ---
-Edit this `README.md` markdown file to author your organization card.

 pinned: false
 ---
+# Martin Technologies LTD — Sovereign Large Language Models
+**Website:** [martintech.co.uk](https://martintech.co.uk)
+**Regions:** UK & EU
+**Focus:** Training, deploying, and operating **sovereign** Large Language Models (LLMs) with full data control, real-time performance, and cost efficiency.
+---
+## Mission
+We build and operate **sovereign LLMs** for organisations that require **full ownership, auditability, and control** over their AI stack—without compromising on **state-of-the-art capability** or **real-time latency**. Our systems are optimised for **dedicated hardware** to reduce unit economics while delivering predictable performance and strict data boundaries.
+---
+## What “Sovereign” Means Here
+- **You own the runtime:** Dedicated single-tenant deployments (cloud, edge, or on-prem) with **no shared inference plane**.
+- **You govern the data:** Hard data boundaries, private networking, and explicit opt-in for any data retention. **No training on your prompts** by default.
+- **You decide the geography:** Compute and storage pinned to the **UK or EU** with optional **air-gapped** configurations.
+- **You can inspect & reproduce:** Open model families, transparent configuration, deterministic builds, and reproducible evaluation pipelines.
+---
+## Models & Training
+We specialise in **state-of-the-art open-source** model families and customise them to your domain and latency/throughput constraints:
+- **Base & Instruct Models:** General chat, RAG-optimised instruction models, coding, and tool-use variants.
+- **Fine-Tuning & Adaptation:** Lightweight LoRA/QLoRA, adapters, and full-stack fine-tuning for domain language, terminology, and stylistic constraints.
+- **Alignment & Safety:** Multi-objective RLHF/DPO where required; policy gradients for content filters; evaluation suites aligned with your risk profile.
+- **Evaluation:** Task-specific evals (exact-match, BLEU/ROUGE, factuality, hallucination risk, tool-use success), latency SLOs, and cost/quality Pareto frontiers.
+> We prioritise openly auditable model families to preserve portability and long-term independence.
+---
+## Real-Time Optimisation on Dedicated Hardware
+Our inference stacks are engineered for **low-latency, cost-efficient** operation:
+- **Kernel-level acceleration:** FlashAttention-class attention kernels, fused ops, paged KV cache, and continuous batching.
+- **Quantisation:** INT8/INT4 & mixed-precision pipelines tuned per layer to balance perplexity vs. latency.
+- **Parallelism strategies:** Tensor, pipeline, and context parallelism with NUMA-aware placement.
+- **Speculative & constrained decoding:** Speculative decoding, prefix caches, grammar-constrained decoding for structured outputs (JSON/SQL).
+- **Memory topology:** KV cache pinning, CPU-GPU offload, NVLink/PCIe bandwidth planning, and pinned host memory for surge loads.
+**Outcome:** predictable p50/p95 latency under load, reduced cost per million tokens, and stable throughput on **dedicated single-tenant** hardware.
+---
+## Deployment Options
+### 1) Managed Cloud (UK/EU)
+- **Single-tenant** VPC deployments in the UK or EU, private subnets, customer-managed keys (CMK) optional.
+- Hard residency guarantees and private endpoint exposure (PrivateLink/private service connect).
+### 2) Physical Edge Compute
+- Ruggedised nodes for **branch, factory, vessel, or field** environments.
+- **Store-and-forward** telemetry, offline-first inference, and sync when connectivity returns.
+### 3) On-Premises (Air-Gap Optional)
+- Delivered as **appliance** or **reference build** (rack spec + BOM).
+- Offline provisioning, **no outbound network** requirement, and fully local observability.
+---
+## Access Patterns
+- **API Access:** OpenAI-compatible endpoints for chat/completions, embeddings, tool calls, and JSON-mode.
+- **gRPC & SSE:** Streaming tokens for real-time UX; back-pressure aware.
+- **RAG Tooling:** Connectors for document stores, vector DBs, and safety classifiers.
+- **Multi-Tenant at Your Edge:** You define tenants; we enforce strict isolation per tenant within your sovereign boundary.
+**cURL**
+```bash
+curl -X POST "$BASE_URL/v1/chat/completions"   -H "Authorization: Bearer $MARTINTECH_API_KEY"   -H "Content-Type: application/json"   -d '{
+    "model": "martintech/sovereign-llm",
+    "messages": [{"role": "user", "content": "Summarise our latest policy in 5 bullets."}],
+    "temperature": 0.2,
+    "stream": true
+  }'
+```
+**Python**
+```python
+import os, requests, sseclient
+BASE_URL = os.getenv("BASE_URL", "https://api.your_instance_url.co.uk")
+API_KEY  = os.getenv("MARTINTECH_API_KEY")
+payload = {
+    "model": "martintech/sovereign-llm",
+    "messages": [{"role": "user", "content": "Draft a GDPR-compliant notice."}],
+    "temperature": 0.0,
+    "stream": True
+}
+headers = {"Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json"}
+with requests.post(f"{BASE_URL}/v1/chat/completions", json=payload, headers=headers, stream=True) as r:
+    client = sseclient.SSEClient(r)
+    for event in client.events():
+        print(event.data)
+```
+**JavaScript (Fetch)**
+```js
+const res = await fetch(`${BASE_URL}/v1/chat/completions`, {
+  method: "POST",
+  headers: {
+    "Authorization": `Bearer ${API_KEY}`,
+    "Content-Type": "application/json"
+  },
+  body: JSON.stringify({
+    model: "martintech/sovereign-llm",
+    messages: [{ role: "user", content: "Generate a JSON receipt." }],
+    response_format: { type: "json_object" }
+  })
+});
+const data = await res.json();
+console.log(data.choices[0].message.content);
+```
+> The API is **OpenAI-compatible**, so most existing SDKs and clients work with only a **base URL and key** change.
+---
+## Security & Compliance
+- **Data Handling:** No prompt or completion retention unless explicitly enabled. Configurable TTLs and redaction.
+- **Encryption:** TLS in transit; at-rest encryption with customer-managed keys optional.
+- **Network:** Private networking, IP allow-lists, and optional mTLS between services.
+- **Isolation:** Per-tenant logical isolation; dedicated hardware optional for physical isolation.
+- **Observability:** Privacy-preserving logs and metrics; structured audit events with redaction.
+- **Governance:** DPA addendum, data residency controls (UK/EU), and support for customer risk assessments.
+---
+## Cost Optimisation
+- Right-sized model families per use case (tiny → large) with **policy-based model routing**.
+- Quantisation and continuous batching to reduce **cost per million tokens**.
+- **Cache-aware RAG** to minimise context length and I/O.
+- Performance budgets and autoscaling tied to your **SLOs** rather than best-effort throughput.
+---
+## Typical Use Cases
+- **Private Assistants** for regulated teams (legal, finance, public sector).
+- **RAG over Sensitive Corpora** with strict data residency.
+- **Structured Generation** (JSON/SQL) into downstream systems.
+- **Edge Autonomy** for low-connectivity scenarios (manufacturing, maritime, defence).
+- **Developer Copilots** confined to internal codebases.
+---
+## Hugging Face Integration
+- **Org Repos:** Model cards, adapters, and eval reports published under our Hugging Face organisation for **transparent provenance**.
+- **Spaces & Demos:** Private Spaces for stakeholder testing; gated access with audit logs.
+- **Artifacts:** Tokenisers, prompt templates, and guardrail grammars for reproducible pipelines.
+> Ask us about publishing **redacted eval sets** and **prompt grammars** alongside each model variant.
+---
+## Getting Started
+1. **Choose a deployment:** UK/EU managed cloud, edge appliance, or on-prem.
+2. **Select a model class:** General chat, code, RAG-optimised, or constrained-output.
+3. **Provide domain data (optional):** We prepare adapters or full fine-tunes with strict handling.
+4. **Integrate the API:** Swap your base URL and key; keep your existing SDKs.
+5. **Validate:** Review eval dashboards, latency/cost reports, and guardrail policies.
+Contact: **martin@martintech.co.uk**
+---
+## Support & SLAs
+- **Production SLAs:** Custom p95 latency, availability targets, and incident response windows.
+- **Runbooks:** Operator playbooks for **air-gapped** and **edge** scenarios.
+- **Training & Enablement:** Developer workshops, RAG patterns, and prompt-engineering for structured outputs.
+---
+## Why Martin Technologies LTD
+- **Sovereignty by design:** Data, runtime, and geography under your control.
+- **Open models, no lock-in:** Auditability and long-term portability.
+- **Real-time, cost-efficient:** Systems engineering that meets product UX and budget constraints.
+- **UK/EU Native:** Residency, procurement, and compliance aligned with your jurisdiction.
+---
+### Legal
+© Martin Technologies LTD. All rights reserved.
+Data residency options available in the **United Kingdom** and the **European Union**.
+Model licences and third-party attributions are documented per-artifact in their respective repositories.