LATCH-Qwen2.5-14B / README.md
Madwand1's picture
Update README.md
df098b7 verified
metadata
license: other
license_name: codynamics-commercial
license_link: https://www.codynamicslab.com/license
language:
  - en
tags:
  - document-question-answering
  - text-generation
  - long-context
  - information-retrieval
  - enterprise-ai
  - latch
  - multi-document-reasoning
base_model:
  - Qwen/Qwen2.5-14B-Instruct
pipeline_tag: text-generation
library_name: vllm

LATCH β€” Qwen 2.5 14B

CoDynamics Lab Corporation | Website | πŸ›’ Buy Self-Hosted License β€” $79 | Request Gated Access | Contact

⚠️ This is a gated repository. Model weights are available via two paths β€” see Deployment Options below.


What Is LATCH

LATCH is a proprietary inference layer built on top of Qwen/Qwen2.5-14B-Instruct that eliminates the long-context performance penalty for document-heavy workloads.

Standard LLMs re-process every document from scratch on every query. LATCH removes this cost entirely β€” documents are prepared once and subsequent queries run at dramatically reduced latency regardless of document length or count.

This is not RAG. This is not prompt compression. It is a fundamentally different approach to long-context inference that operates at the model level.

Architectural details are proprietary.


Performance Results

All benchmarks run on NVIDIA A100 80GB with vLLM serving infrastructure.

Speed

Metric Baseline (Qwen 2.5 14B) LATCH Improvement
Time-To-First-Token (cold) 23.1s 0.11s 210Γ— faster
TTFT Speedup (avg, customer pack) 4.47s 0.11s 42.9Γ—
End-to-End Query Speedup 6.55s 2.02s 5.2Γ—
Cache Reload Time 23.1s 0.0016s 246Γ— faster

Quality β€” Customer Document Pack

Benchmark Category Baseline LATCH Delta
Cross-Document Comparison 41.5% 49.4% +7.9pp
Cross-Document Format 40.5% 68.8% +28.3pp
Cross-Document Retrieval 40.4% 48.1% +7.7pp
Selective Retrieval 35.2% 47.2% +12.0pp
Overall Mean token-F1 39.4% 53.4% +14.0pp

Benchmark Gates

Gate Result
Single-Document Gate 11/12 βœ…
Multi-Document Gate 11/12 βœ…
256K Memory Sweep Passing

Multi-doc pass rate: 91.7% β€” the highest of any model family in the current LATCH portfolio.


How It Works

LATCH intercepts the standard inference path and replaces the costly per-query document processing step with a persistent representation that is prepared once and reused across all subsequent queries against the same document set.

The result is a response that begins in under 120 milliseconds β€” before the user has practically finished pressing Enter β€” regardless of how many documents are in the corpus.

The underlying method is proprietary and patent pending. CoDynamics Lab does not publish architectural details.


Hardware Requirements

Component Minimum Recommended
GPU NVIDIA A100 40GB NVIDIA A100 80GB
VRAM ~30 GB 80 GB
CPU RAM 64 GB 128 GB
Storage 50 GB 100 GB
Inference Runtime vLLM vLLM β‰₯ 0.4

LATCH reduces peak VRAM consumption by approximately 50% versus standard Qwen 2.5 14B serving, enabling more concurrent instances per node.


Deployment Options

πŸ”’ Option 1: Self-Hosted License β€” $79

Run LATCH on your own A100 or H100. Your documents never leave your infrastructure.

Buy now at codynamicslab.gumroad.com

Upon purchase you receive:

  • Private registry pull token for the LATCH Docker image
  • License key (validated at container startup)
  • One-line deployment command
  • Access to future runtime updates
LICENSE_KEY=xxxx-xxxx docker compose pull && docker compose up -d

Compatible with standard OpenAI-format API clients.


☁️ Option 2: Managed Hosted Instance β€” Coming Soon

Spin up a LATCH-ready GPU instance directly from CoDynamics Lab. No infrastructure setup required.

  • Pay by the hour β€” billed by wall-clock second
  • Includes batch JSON query interface
  • Upload documents, submit a structured prompt list, export results with full telemetry
  • Every session outputs side-by-side cost savings vs. standard Qwen baseline

Join the waitlist


πŸ”‘ Option 3: Gated Repository Access (Research / Enterprise)

Request direct access for evaluation, research, or enterprise licensing discussions.


Intended Use

Primary use cases:

  • M&A and private equity due diligence (multi-document data room analysis)
  • Legal document review and cross-contract comparison
  • Compliance and regulatory document monitoring
  • Financial research and filing analysis
  • Any high-volume, repeated-query workload against a fixed document corpus

Out of scope:

  • Real-time web search or retrieval-augmented generation
  • General-purpose conversational AI without a document corpus
  • Consumer applications

Limitations & Known Weaknesses

  • Short-context standard QA: LATCH is optimized for long-context, multi-document workloads. It does not improve performance on standard short-context QA benchmarks.
  • Document pre-preparation required: Documents must be prepared before querying. This is a one-time cost per document set that is fully amortized across subsequent queries.
  • Cross-document retrieval is the weakest benchmark slice: Document-selection tasks with heavy distractors are the most challenging workload category.

Request Access

Three ways to get started:

Path Best for Action
Self-hosted license Teams with their own A100/H100 who need full data privacy Buy on Gumroad β€” $79
Managed hosted instance Teams who want zero infrastructure setup Join waitlist
Gated repo access Research, enterprise evaluation, volume licensing Click Request Access above

For gated access requests:

  1. Click the Request Access button above
  2. Briefly describe your use case and organization
  3. Our team will review and respond within 2 business days

πŸ“§ mike@codynamicslab.com
🌐 www.codynamicslab.com


License

This model is released under the CoDynamics Commercial License.

  • Purchase includes a single-instance deployment license
  • Commercial or production use beyond the licensed instance requires a separate agreement
  • Redistribution of model weights is strictly prohibited

See LICENSE for full terms.


Citation

If you cite LATCH benchmark results in research, please use:

@misc{codynamics2026latch,
  title        = {LATCH: Proprietary Long-Context Inference Layer},
  author       = {CoDynamics Lab Corporation},
  year         = {2026},
  howpublished = {\url{https://huggingface.co/CoDynamicsLab/LATCH-Qwen2.5-14B}},
  note         = {Patent Pending. Architectural details proprietary.}
}

CoDynamics Lab Corporation β€” Eliminating the Long-Context Tax.