mah-quantum
/

quantised-llm-checkpoints

Text Generation

Model card Files Files and versions

quantised-llm-checkpoints / README.md

Niroop2007's picture

Update README.md

48fd056 verified 4 days ago

|

history blame contribute delete

1.48 kB

	---
	license: apache-2.0
	language:
	- en
	tags:
	- cuda
	- tensorrt
	- quantization
	- tensor-cores
	pipeline_tag: text-generation
	---

	# 🌌 MQ-Cognitive-Base // Quantised LLM Checkpoints

	This repository hosts the optimized, mixed-precision quantized model weight checkpoints engineered by the MAH Quantum Research Scholars cohort. These weights are explicitly compiled for accelerated execution layers using native NVIDIA® CUDA® and TensorRT™-LLM runtimes.

	---

	## ⚡ Architectural Specifications

	* Quantization Framework: Post-Training Quantization (PTQ) / Activation-aware Weight-Quantization (AWQ)
	* Target Precision Target: INT8 / INT4 Weight-Only Quantization Matrix
	* Hardware Optimization Optimization: NVIDIA Compute Capability 8.0+ (Ampere, Hopper, Blackwell architectures)
	* Primary Infrastructure Node: NVIDIA® NGC Org ID `0963318590610147`

	---

	## 🔬 Deployment & Performance Intent

	These model matrices are structured to maximize token throughput and minimize memory footprint during heavy industrial inferencing. By compressing large parameter graphs down to optimized bit-widths, our distributed node network achieves sub-60ms Time-To-First-Token (TTFT) performance on localized compute clusters.

	### 📊 Benchmark Logs

	```json
	{
	"PERFORMANCE_METRICS": {
	"CompilationEngine": "TensorRT-LLM v0.10.x",
	"QuantizationType": "INT4-AWQ",
	"MemoryFootprintReduction": " ~72%",
	"TensorCoreUtilization": "Optimal"
	}
	}