--- license: apache-2.0 language: - en tags: - cuda - tensorrt - quantization - tensor-cores pipeline_tag: text-generation --- # 🌌 MQ-Cognitive-Base // Quantised LLM Checkpoints This repository hosts the optimized, mixed-precision quantized model weight checkpoints engineered by the **MAH Quantum Research Scholars** cohort. These weights are explicitly compiled for accelerated execution layers using native **NVIDIA® CUDA®** and **TensorRT™-LLM** runtimes. --- ## ⚡ Architectural Specifications * **Quantization Framework:** Post-Training Quantization (PTQ) / Activation-aware Weight-Quantization (AWQ) * **Target Precision Target:** INT8 / INT4 Weight-Only Quantization Matrix * **Hardware Optimization Optimization:** NVIDIA Compute Capability 8.0+ (Ampere, Hopper, Blackwell architectures) * **Primary Infrastructure Node:** NVIDIA® NGC Org ID `0963318590610147` --- ## 🔬 Deployment & Performance Intent These model matrices are structured to maximize token throughput and minimize memory footprint during heavy industrial inferencing. By compressing large parameter graphs down to optimized bit-widths, our distributed node network achieves sub-60ms Time-To-First-Token (TTFT) performance on localized compute clusters. ### 📊 Benchmark Logs ```json { "PERFORMANCE_METRICS": { "CompilationEngine": "TensorRT-LLM v0.10.x", "QuantizationType": "INT4-AWQ", "MemoryFootprintReduction": " ~72%", "TensorCoreUtilization": "Optimal" } }