Update README.md

df098b7 verified 4 days ago

7.49 kB

	---
	license: other
	license_name: codynamics-commercial
	license_link: https://www.codynamicslab.com/license
	language:
	- en
	tags:
	- document-question-answering
	- text-generation
	- long-context
	- information-retrieval
	- enterprise-ai
	- latch
	- multi-document-reasoning
	base_model:
	- Qwen/Qwen2.5-14B-Instruct
	pipeline_tag: text-generation
	library_name: vllm
	---

	# LATCH — Qwen 2.5 14B

	CoDynamics Lab Corporation \| [Website](https://www.codynamicslab.com) \| [🛒 Buy Self-Hosted License — $79](https://codynamicslab.gumroad.com/l/latch-qwen14b) \| [Request Gated Access](#request-access) \| [Contact](mailto:mike@codynamicslab.com)

	> ⚠️ This is a gated repository. Model weights are available via two paths — see [Deployment Options](#deployment-options) below.

	---

	## What Is LATCH

	LATCH is a proprietary inference layer built on top of `Qwen/Qwen2.5-14B-Instruct` that eliminates the long-context performance penalty for document-heavy workloads.

	Standard LLMs re-process every document from scratch on every query. LATCH removes this cost entirely — documents are prepared once and subsequent queries run at dramatically reduced latency regardless of document length or count.

	This is not RAG. This is not prompt compression. It is a fundamentally different approach to long-context inference that operates at the model level.

	Architectural details are proprietary.

	---

	## Performance Results

	All benchmarks run on NVIDIA A100 80GB with vLLM serving infrastructure.

	### Speed

	\| Metric \| Baseline (Qwen 2.5 14B) \| LATCH \| Improvement \|
	\|---\|---\|---\|---\|
	\| Time-To-First-Token (cold) \| 23.1s \| 0.11s \| 210× faster \|
	\| TTFT Speedup (avg, customer pack) \| 4.47s \| 0.11s \| 42.9× \|
	\| End-to-End Query Speedup \| 6.55s \| 2.02s \| 5.2× \|
	\| Cache Reload Time \| 23.1s \| 0.0016s \| 246× faster \|

	### Quality — Customer Document Pack

	\| Benchmark Category \| Baseline \| LATCH \| Delta \|
	\|---\|---\|---\|---\|
	\| Cross-Document Comparison \| 41.5% \| 49.4% \| +7.9pp \|
	\| Cross-Document Format \| 40.5% \| 68.8% \| +28.3pp \|
	\| Cross-Document Retrieval \| 40.4% \| 48.1% \| +7.7pp \|
	\| Selective Retrieval \| 35.2% \| 47.2% \| +12.0pp \|
	\| Overall Mean token-F1 \| 39.4% \| 53.4% \| +14.0pp \|

	### Benchmark Gates

	\| Gate \| Result \|
	\|---\|---\|
	\| Single-Document Gate \| 11/12 ✅ \|
	\| Multi-Document Gate \| 11/12 ✅ \|
	\| 256K Memory Sweep \| Passing \|

	> Multi-doc pass rate: 91.7% — the highest of any model family in the current LATCH portfolio.

	---

	## How It Works

	LATCH intercepts the standard inference path and replaces the costly per-query document processing step with a persistent representation that is prepared once and reused across all subsequent queries against the same document set.

	The result is a response that begins in under 120 milliseconds — before the user has practically finished pressing Enter — regardless of how many documents are in the corpus.

	The underlying method is proprietary and patent pending. CoDynamics Lab does not publish architectural details.

	---

	## Hardware Requirements

	\| Component \| Minimum \| Recommended \|
	\|---\|---\|---\|
	\| GPU \| NVIDIA A100 40GB \| NVIDIA A100 80GB \|
	\| VRAM \| ~30 GB \| 80 GB \|
	\| CPU RAM \| 64 GB \| 128 GB \|
	\| Storage \| 50 GB \| 100 GB \|
	\| Inference Runtime \| vLLM \| vLLM ≥ 0.4 \|

	> LATCH reduces peak VRAM consumption by approximately 50% versus standard Qwen 2.5 14B serving, enabling more concurrent instances per node.

	---

	## Deployment Options

	### 🔒 Option 1: Self-Hosted License — $79

	Run LATCH on your own A100 or H100. Your documents never leave your infrastructure.

	[Buy now at codynamicslab.gumroad.com](https://codynamicslab.gumroad.com/l/latch-qwen14b)

	Upon purchase you receive:
	- Private registry pull token for the LATCH Docker image
	- License key (validated at container startup)
	- One-line deployment command
	- Access to future runtime updates

	```bash
	LICENSE_KEY=xxxx-xxxx docker compose pull && docker compose up -d
	```

	Compatible with standard OpenAI-format API clients.

	---

	### ☁️ Option 2: Managed Hosted Instance — Coming Soon

	Spin up a LATCH-ready GPU instance directly from CoDynamics Lab. No infrastructure setup required.

	- Pay by the hour — billed by wall-clock second
	- Includes batch JSON query interface
	- Upload documents, submit a structured prompt list, export results with full telemetry
	- Every session outputs side-by-side cost savings vs. standard Qwen baseline

	[Join the waitlist](mailto:mike@codynamicslab.com?subject=LATCH%20Managed%20Instance%20Waitlist)

	---

	### 🔑 Option 3: Gated Repository Access (Research / Enterprise)

	Request direct access for evaluation, research, or enterprise licensing discussions.

	---

	## Intended Use

	Primary use cases:
	- M&A and private equity due diligence (multi-document data room analysis)
	- Legal document review and cross-contract comparison
	- Compliance and regulatory document monitoring
	- Financial research and filing analysis
	- Any high-volume, repeated-query workload against a fixed document corpus

	Out of scope:
	- Real-time web search or retrieval-augmented generation
	- General-purpose conversational AI without a document corpus
	- Consumer applications

	---

	## Limitations & Known Weaknesses

	- Short-context standard QA: LATCH is optimized for long-context, multi-document workloads. It does not improve performance on standard short-context QA benchmarks.
	- Document pre-preparation required: Documents must be prepared before querying. This is a one-time cost per document set that is fully amortized across subsequent queries.
	- Cross-document retrieval is the weakest benchmark slice: Document-selection tasks with heavy distractors are the most challenging workload category.

	---

	## Request Access

	Three ways to get started:

	\| Path \| Best for \| Action \|
	\|---\|---\|---\|
	\| Self-hosted license \| Teams with their own A100/H100 who need full data privacy \| [Buy on Gumroad — $79](https://codynamicslab.gumroad.com/l/latch-qwen14b) \|
	\| Managed hosted instance \| Teams who want zero infrastructure setup \| [Join waitlist](mailto:mike@codynamicslab.com?subject=LATCH%20Managed%20Instance%20Waitlist) \|
	\| Gated repo access \| Research, enterprise evaluation, volume licensing \| Click Request Access above \|

	For gated access requests:
	1. Click the Request Access button above
	2. Briefly describe your use case and organization
	3. Our team will review and respond within 2 business days

	📧 [mike@codynamicslab.com](mailto:mike@codynamicslab.com)
	🌐 [www.codynamicslab.com](https://www.codynamicslab.com)

	---

	## License

	This model is released under the CoDynamics Commercial License.
	- Purchase includes a single-instance deployment license
	- Commercial or production use beyond the licensed instance requires a separate agreement
	- Redistribution of model weights is strictly prohibited

	See [LICENSE](https://www.codynamicslab.com/license) for full terms.

	---

	## Citation

	If you cite LATCH benchmark results in research, please use:

	```bibtex
	@misc{codynamics2026latch,
	title = {LATCH: Proprietary Long-Context Inference Layer},
	author = {CoDynamics Lab Corporation},
	year = {2026},
	howpublished = {\url{https://huggingface.co/CoDynamicsLab/LATCH-Qwen2.5-14B}},
	note = {Patent Pending. Architectural details proprietary.}
	}
	```

	---

	CoDynamics Lab Corporation — Eliminating the Long-Context Tax.