Update README.md

df098b7 verified 4 days ago

7.49 kB

license: other
license_name: codynamics-commercial
license_link: https://www.codynamicslab.com/license
language:
  - en
tags:
  - document-question-answering
  - text-generation
  - long-context
  - information-retrieval
  - enterprise-ai
  - latch
  - multi-document-reasoning
base_model:
  - Qwen/Qwen2.5-14B-Instruct
pipeline_tag: text-generation
library_name: vllm

LATCH — Qwen 2.5 14B

CoDynamics Lab Corporation | Website | 🛒 Buy Self-Hosted License — $79 | Request Gated Access | Contact

⚠️ This is a gated repository. Model weights are available via two paths — see Deployment Options below.

What Is LATCH

LATCH is a proprietary inference layer built on top of Qwen/Qwen2.5-14B-Instruct that eliminates the long-context performance penalty for document-heavy workloads.

Standard LLMs re-process every document from scratch on every query. LATCH removes this cost entirely — documents are prepared once and subsequent queries run at dramatically reduced latency regardless of document length or count.

This is not RAG. This is not prompt compression. It is a fundamentally different approach to long-context inference that operates at the model level.

Architectural details are proprietary.

Performance Results

All benchmarks run on NVIDIA A100 80GB with vLLM serving infrastructure.

Speed

Metric	Baseline (Qwen 2.5 14B)	LATCH	Improvement
Time-To-First-Token (cold)	23.1s	0.11s	210× faster
TTFT Speedup (avg, customer pack)	4.47s	0.11s	42.9×
End-to-End Query Speedup	6.55s	2.02s	5.2×
Cache Reload Time	23.1s	0.0016s	246× faster

Quality — Customer Document Pack

Benchmark Category	Baseline	LATCH	Delta
Cross-Document Comparison	41.5%	49.4%	+7.9pp
Cross-Document Format	40.5%	68.8%	+28.3pp
Cross-Document Retrieval	40.4%	48.1%	+7.7pp
Selective Retrieval	35.2%	47.2%	+12.0pp
Overall Mean token-F1	39.4%	53.4%	+14.0pp

Benchmark Gates

Gate	Result
Single-Document Gate	11/12 ✅
Multi-Document Gate	11/12 ✅
256K Memory Sweep	Passing

Multi-doc pass rate: 91.7% — the highest of any model family in the current LATCH portfolio.

How It Works

LATCH intercepts the standard inference path and replaces the costly per-query document processing step with a persistent representation that is prepared once and reused across all subsequent queries against the same document set.

The result is a response that begins in under 120 milliseconds — before the user has practically finished pressing Enter — regardless of how many documents are in the corpus.

The underlying method is proprietary and patent pending. CoDynamics Lab does not publish architectural details.

Hardware Requirements

Component	Minimum	Recommended
GPU	NVIDIA A100 40GB	NVIDIA A100 80GB
VRAM	~30 GB	80 GB
CPU RAM	64 GB	128 GB
Storage	50 GB	100 GB
Inference Runtime	vLLM	vLLM ≥ 0.4

LATCH reduces peak VRAM consumption by approximately 50% versus standard Qwen 2.5 14B serving, enabling more concurrent instances per node.

Deployment Options

🔒 Option 1: Self-Hosted License — $79

Run LATCH on your own A100 or H100. Your documents never leave your infrastructure.

Buy now at codynamicslab.gumroad.com

Upon purchase you receive:

Private registry pull token for the LATCH Docker image
License key (validated at container startup)
One-line deployment command
Access to future runtime updates

LICENSE_KEY=xxxx-xxxx docker compose pull && docker compose up -d

Compatible with standard OpenAI-format API clients.

☁️ Option 2: Managed Hosted Instance — Coming Soon

Spin up a LATCH-ready GPU instance directly from CoDynamics Lab. No infrastructure setup required.

Pay by the hour — billed by wall-clock second
Includes batch JSON query interface
Upload documents, submit a structured prompt list, export results with full telemetry
Every session outputs side-by-side cost savings vs. standard Qwen baseline

Join the waitlist

🔑 Option 3: Gated Repository Access (Research / Enterprise)

Request direct access for evaluation, research, or enterprise licensing discussions.

Intended Use

Primary use cases:

M&A and private equity due diligence (multi-document data room analysis)
Legal document review and cross-contract comparison
Compliance and regulatory document monitoring
Financial research and filing analysis
Any high-volume, repeated-query workload against a fixed document corpus

Out of scope:

Real-time web search or retrieval-augmented generation
General-purpose conversational AI without a document corpus
Consumer applications

Limitations & Known Weaknesses

Short-context standard QA: LATCH is optimized for long-context, multi-document workloads. It does not improve performance on standard short-context QA benchmarks.
Document pre-preparation required: Documents must be prepared before querying. This is a one-time cost per document set that is fully amortized across subsequent queries.
Cross-document retrieval is the weakest benchmark slice: Document-selection tasks with heavy distractors are the most challenging workload category.

Request Access

Three ways to get started:

Path	Best for	Action
Self-hosted license	Teams with their own A100/H100 who need full data privacy	Buy on Gumroad — $79
Managed hosted instance	Teams who want zero infrastructure setup	Join waitlist
Gated repo access	Research, enterprise evaluation, volume licensing	Click Request Access above

For gated access requests:

Click the Request Access button above
Briefly describe your use case and organization
Our team will review and respond within 2 business days

📧 mike@codynamicslab.com
🌐 www.codynamicslab.com

License

This model is released under the CoDynamics Commercial License.

Purchase includes a single-instance deployment license
Commercial or production use beyond the licensed instance requires a separate agreement
Redistribution of model weights is strictly prohibited

See LICENSE for full terms.

Citation

If you cite LATCH benchmark results in research, please use:

@misc{codynamics2026latch,
  title        = {LATCH: Proprietary Long-Context Inference Layer},
  author       = {CoDynamics Lab Corporation},
  year         = {2026},
  howpublished = {\url{https://huggingface.co/CoDynamicsLab/LATCH-Qwen2.5-14B}},
  note         = {Patent Pending. Architectural details proprietary.}
}

CoDynamics Lab Corporation — Eliminating the Long-Context Tax.