How to use from the
Use from the
TensorRT library
# No code snippets available yet for this library.

# To use this model, check the repository files and the library's documentation.

# Want to help? PRs adding snippets are welcome at:
# https://github.com/huggingface/huggingface.js

๐ŸŒŒ MQ-Cognitive-Base // Quantised LLM Checkpoints

This repository hosts the optimized, mixed-precision quantized model weight checkpoints engineered by the MAH Quantum Research Scholars cohort. These weights are explicitly compiled for accelerated execution layers using native NVIDIAยฎ CUDAยฎ and TensorRTโ„ข-LLM runtimes.


โšก Architectural Specifications

  • Quantization Framework: Post-Training Quantization (PTQ) / Activation-aware Weight-Quantization (AWQ)
  • Target Precision Target: INT8 / INT4 Weight-Only Quantization Matrix
  • Hardware Optimization Optimization: NVIDIA Compute Capability 8.0+ (Ampere, Hopper, Blackwell architectures)
  • Primary Infrastructure Node: NVIDIAยฎ NGC Org ID 0963318590610147

๐Ÿ”ฌ Deployment & Performance Intent

These model matrices are structured to maximize token throughput and minimize memory footprint during heavy industrial inferencing. By compressing large parameter graphs down to optimized bit-widths, our distributed node network achieves sub-60ms Time-To-First-Token (TTFT) performance on localized compute clusters.

๐Ÿ“Š Benchmark Logs

{
  "PERFORMANCE_METRICS": {
    "CompilationEngine": "TensorRT-LLM v0.10.x",
    "QuantizationType": "INT4-AWQ",
    "MemoryFootprintReduction": " ~72%",
    "TensorCoreUtilization": "Optimal"
  }
}
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support