Instructions to use mah-quantum/quantised-llm-checkpoints with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- TensorRT
How to use mah-quantum/quantised-llm-checkpoints with TensorRT:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
๐ MQ-Cognitive-Base // Quantised LLM Checkpoints
This repository hosts the optimized, mixed-precision quantized model weight checkpoints engineered by the MAH Quantum Research Scholars cohort. These weights are explicitly compiled for accelerated execution layers using native NVIDIAยฎ CUDAยฎ and TensorRTโข-LLM runtimes.
โก Architectural Specifications
- Quantization Framework: Post-Training Quantization (PTQ) / Activation-aware Weight-Quantization (AWQ)
- Target Precision Target: INT8 / INT4 Weight-Only Quantization Matrix
- Hardware Optimization Optimization: NVIDIA Compute Capability 8.0+ (Ampere, Hopper, Blackwell architectures)
- Primary Infrastructure Node: NVIDIAยฎ NGC Org ID
0963318590610147
๐ฌ Deployment & Performance Intent
These model matrices are structured to maximize token throughput and minimize memory footprint during heavy industrial inferencing. By compressing large parameter graphs down to optimized bit-widths, our distributed node network achieves sub-60ms Time-To-First-Token (TTFT) performance on localized compute clusters.
๐ Benchmark Logs
{
"PERFORMANCE_METRICS": {
"CompilationEngine": "TensorRT-LLM v0.10.x",
"QuantizationType": "INT4-AWQ",
"MemoryFootprintReduction": " ~72%",
"TensorCoreUtilization": "Optimal"
}
}
- Downloads last month
- -
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js