Spaces:
Sleeping
QCrypt RNG — Monitoring & Observability Guide
Version: 1.0 Date: 2026-03-06
Overview
QCrypt RNG provides comprehensive monitoring and observability features using Prometheus metrics. The system exposes metrics for:
- Oracle fulfillment operations
- Post-quantum cryptography operations
- Quantum randomness generation
- Hardware device status
- Entropy quality
- API performance
- System resources
Prometheus Metrics Endpoint
Endpoint: GET /api/v2/monitoring/metrics
Content-Type: text/plain; version=0.0.4; charset=utf-8
Returns metrics in Prometheus exposition format.
Example Scrape Configuration
# prometheus.yml
scrape_configs:
- job_name: 'qcrypt-rng'
static_configs:
- targets: ['localhost:8000']
metrics_path: '/api/v2/monitoring/metrics'
scrape_interval: 15s
Metric Categories
1. Oracle Fulfillment Metrics
| Metric Name | Type | Labels | Description |
|---|---|---|---|
qcrypt_oracle_requests_total |
Counter | chain, status |
Total oracle requests |
qcrypt_oracle_fulfillment_duration_seconds |
Histogram | chain, status |
Fulfillment duration |
qcrypt_oracle_commit_duration_seconds |
Histogram | chain |
Commit phase duration |
qcrypt_oracle_reveal_duration_seconds |
Histogram | chain |
Reveal phase duration |
qcrypt_oracle_transactions_total |
Counter | chain, type, status |
Blockchain transactions |
qcrypt_oracle_gas_used |
Histogram | chain, type |
Gas used for transactions |
qcrypt_oracle_active_requests |
Gauge | chain |
Active requests count |
Example Query:
# Oracle fulfillment success rate
rate(qcrypt_oracle_requests_total{status="success"}[5m])
/
rate(qcrypt_oracle_requests_total[5m])
# Average fulfillment duration by chain
histogram_quantile(0.95, rate(qcrypt_oracle_fulfillment_duration_seconds_bucket[5m]))
2. PQC Operation Metrics
| Metric Name | Type | Labels | Description |
|---|---|---|---|
qcrypt_pqc_operations_total |
Counter | algorithm, operation, status |
PQC operations count |
qcrypt_pqc_operation_duration_seconds |
Histogram | algorithm, operation |
Operation duration |
qcrypt_pqc_key_size_bytes |
Histogram | algorithm, key_type |
Generated key sizes |
qcrypt_pqc_signature_size_bytes |
Histogram | algorithm |
Signature sizes |
qcrypt_pqc_kem_ciphertext_size_bytes |
Histogram | algorithm |
KEM ciphertext sizes |
qcrypt_pqc_kem_shared_secret_size_bytes |
Histogram | algorithm |
Shared secret sizes |
Supported Algorithms:
- DILITHIUM2, DILITHIUM3, DILITHIUM5
- FALCON512, FALCON1024
- SPHINCS+-SHA2-128f
- KYBER512, KYBER768, KYBER1024
- NTRU-HPS-2048-509, NTRU-HPS-2048-677
- SABER-LIGHTSABER, SABER-SABER, SABER-FIRESABER
Example Query:
# PQC operation success rate by algorithm
sum(rate(qcrypt_pqc_operations_total{status="success"}[5m])) by (algorithm)
/
sum(rate(qcrypt_pqc_operations_total[5m])) by (algorithm)
# 95th percentile key generation time
histogram_quantile(0.95, rate(qcrypt_pqc_operation_duration_seconds_bucket{operation="generate_keypair"}[5m]))
3. Quantum Randomness Generation Metrics
| Metric Name | Type | Labels | Description |
|---|---|---|---|
qcrypt_qrng_bytes_generated_total |
Counter | backend, format |
Total bytes generated |
qcrypt_qrng_generation_duration_seconds |
Histogram | backend |
Generation duration |
qcrypt_qrng_entropy_bits |
Gauge | backend |
Entropy pool size |
qcrypt_qrng_quality_score |
Gauge | backend |
Quality score (0-1) |
Example Query:
# Bytes generated per second by backend
rate(qcrypt_qrng_bytes_generated_total[5m])
# Entropy pool health
qcrypt_qrng_entropy_bits > 100
4. Hardware Device Metrics
| Metric Name | Type | Labels | Description |
|---|---|---|---|
qcrypt_hardware_device_status |
Gauge | device_id, device_type, vendor |
Device status (1=up, 0=down) |
qcrypt_hardware_generation_rate_bps |
Gauge | device_id, device_type |
Generation rate (bps) |
qcrypt_hardware_error_rate |
Gauge | device_id, device_type |
Error rate |
qcrypt_hardware_temperature_celsius |
Gauge | device_id, device_type |
Device temperature |
qcrypt_hardware_uptime_seconds |
Gauge | device_id, device_type |
Device uptime |
Supported Vendors:
- ID Quantique (Quantis)
- QuintessenceLabs (qStream)
- Generic photonic/superconducting devices
Example Query:
# Hardware device availability
qcrypt_hardware_device_status{vendor="ID Quantique"}
# Average generation rate by vendor
avg(qcrypt_hardware_generation_rate_bps) by (vendor)
5. Entropy Quality Metrics
| Metric Name | Type | Labels | Description |
|---|---|---|---|
qcrypt_entropy_shannon_entropy |
Gauge | source |
Shannon entropy (bits/byte) |
qcrypt_entropy_min_entropy |
Gauge | source |
Min-entropy (bits/byte) |
qcrypt_entropy_chi_square |
Gauge | source |
Chi-square statistic |
qcrypt_entropy_quality_status |
Gauge | source |
Quality status (1=good, 0=poor) |
Quality Thresholds:
- Shannon entropy: > 7.9 bits/byte (excellent), > 7.5 (good), < 7.0 (poor)
- Min-entropy: > 7.0 bits/byte (acceptable)
- Chi-square: < 293.25 (pass uniformity test)
Example Query:
# Entropy quality alerts
qcrypt_entropy_shannon_entropy < 7.5
qcrypt_entropy_min_entropy < 7.0
qcrypt_entropy_quality_status == 0
6. API Performance Metrics
| Metric Name | Type | Labels | Description |
|---|---|---|---|
qcrypt_api_requests_total |
Counter | endpoint, method, status |
API request count |
qcrypt_api_request_duration_seconds |
Histogram | endpoint, method |
Request duration |
qcrypt_api_request_size_bytes |
Histogram | endpoint |
Request size |
qcrypt_api_response_size_bytes |
Histogram | endpoint |
Response size |
qcrypt_api_active_connections |
Gauge | - | Active connections |
Example Query:
# API error rate
sum(rate(qcrypt_api_requests_total{status="error"}[5m]))
/
sum(rate(qcrypt_api_requests_total[5m]))
# 99th percentile API latency
histogram_quantile(0.99, rate(qcrypt_api_request_duration_seconds_bucket[5m]))
7. System Metrics
| Metric Name | Type | Labels | Description |
|---|---|---|---|
qcrypt_system_info |
Gauge | version, environment, quantum_backend |
System information |
qcrypt_system_memory_usage_bytes |
Gauge | - | Memory usage |
qcrypt_system_cpu_usage_percent |
Gauge | - | CPU usage |
Health Check Endpoints
Quick Health Check
Endpoint: GET /api/v2/monitoring/status
Response:
{
"status": "success",
"request_id": "status_1234567890",
"data": {
"status": "operational",
"version": "2.0.0",
"environment": "production",
"timestamp": 1234567890.0
}
}
Detailed Health Check
Endpoint: GET /api/v2/monitoring/health/detailed
Response:
{
"status": "success",
"request_id": "health_1234567890",
"data": {
"status": "healthy",
"timestamp": 1234567890.0,
"version": "2.0.0",
"environment": "production",
"components": {
"api": {
"status": "healthy",
"uptime_seconds": 3600
},
"quantum_backend": {
"status": "healthy",
"backend": "qrisp_simulator",
"total_bytes_generated": 1000000,
"entropy_pool_size": 500
},
"hardware": {
"status": "healthy",
"device_count": 2,
"devices": [...]
},
"entropy": {
"status": "healthy",
"shannon_entropy": 7.95,
"min_entropy": 7.8,
"health_status": "excellent"
},
"system": {
"status": "healthy",
"cpu_percent": 25.5,
"memory_percent": 45.2,
"memory_available_mb": 8192
}
}
}
}
Metrics Summary
Endpoint: GET /api/v2/monitoring/metrics/summary
Returns a human-readable JSON summary of key metrics.
Alerting Rules
Prometheus Alert Rules
# alerting_rules.yml
groups:
- name: qcrypt_rng
rules:
# Oracle fulfillment failures
- alert: OracleFulfillmentHighFailureRate
expr: |
sum(rate(qcrypt_oracle_requests_total{status="error"}[5m]))
/
sum(rate(qcrypt_oracle_requests_total[5m])) > 0.1
for: 5m
labels:
severity: critical
annotations:
summary: "High oracle fulfillment failure rate"
description: "Oracle failure rate is {{ $value | humanizePercentage }}"
# Low entropy quality
- alert: EntropyQualityPoor
expr: qcrypt_entropy_shannon_entropy < 7.5
for: 2m
labels:
severity: warning
annotations:
summary: "Low entropy quality detected"
description: "Shannon entropy is {{ $value }} bits/byte"
# Hardware device offline
- alert: HardwareDeviceOffline
expr: qcrypt_hardware_device_status == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Hardware device offline"
description: "Device {{ $labels.device_id }} is not operational"
# High API latency
- alert: APILatencyHigh
expr: |
histogram_quantile(0.95, rate(qcrypt_api_request_duration_seconds_bucket[5m])) > 1
for: 5m
labels:
severity: warning
annotations:
summary: "High API latency"
description: "95th percentile latency is {{ $value }}s"
# High CPU usage
- alert: SystemCPUHigh
expr: qcrypt_system_cpu_usage_percent > 80
for: 5m
labels:
severity: warning
annotations:
summary: "High CPU usage"
description: "CPU usage is {{ $value }}%"
# High memory usage
- alert: SystemMemoryHigh
expr: qcrypt_system_memory_usage_bytes / (1024 * 1024 * 1024) > 7
for: 5m
labels:
severity: warning
annotations:
summary: "High memory usage"
description: "Memory usage exceeds 7GB"
Grafana Dashboard
Example Dashboard JSON
A sample Grafana dashboard configuration is available in monitoring/grafana-dashboard.json.
Key Panels:
- Oracle fulfillment success rate (by chain)
- PQC operation latency (by algorithm)
- QRNG bytes generated over time
- Hardware device status
- Entropy quality metrics
- API request rate and latency
- System resource usage
Import Dashboard
- Open Grafana
- Go to Dashboards → Import
- Upload
monitoring/grafana-dashboard.json - Select Prometheus data source
- Click Import
Metric Recording API
Record PQC Metric
Endpoint: POST /api/v2/monitoring/metrics/record/pqc
Parameters:
algorithm(string): Algorithm nameoperation(string): Operation typestatus(string): success/errorduration_seconds(float): Operation durationkey_size_bytes(int): Key size (optional)
Example:
curl -X POST "http://localhost:8000/api/v2/monitoring/metrics/record/pqc" \
-d "algorithm=KYBER768" \
-d "operation=encapsulate" \
-d "status=success" \
-d "duration_seconds=0.015"
Record Oracle Metric
Endpoint: POST /api/v2/monitoring/metrics/record/oracle
Parameters:
chain(string): Blockchain nameevent_type(string): request/fulfillment/commit/reveal/transactionstatus(string): success/errorduration_seconds(float): Event durationgas_used(int): Gas used (optional)
Example:
curl -X POST "http://localhost:8000/api/v2/monitoring/metrics/record/oracle" \
-d "chain=ethereum" \
-d "event_type=fulfillment" \
-d "status=success" \
-d "duration_seconds=2.5"
Python SDK Integration
Recording Metrics in Code
from app.monitoring import (
OracleMetrics,
PQCMetrics,
QRNGMetrics,
HardwareMetrics,
EntropyMetrics,
APIMetrics
)
# Record oracle fulfillment
OracleMetrics.record_fulfillment(
chain="ethereum",
status="success",
duration=2.5
)
# Record PQC operation
PQCMetrics.record_operation(
algorithm="KYBER768",
operation="encapsulate",
status="success",
duration=0.015
)
# Record QRNG generation
QRNGMetrics.record_bytes_generated(
backend="qrisp_simulator",
format="hex",
count=1024
)
# Update hardware status
HardwareMetrics.update_device_status(
device_id="idq_usb_0",
device_type="photonic",
vendor="ID Quantique",
status=1 # 1=operational
)
# Update entropy quality
EntropyMetrics.update_shannon_entropy(
source="qrng_pool",
entropy=7.95
)
# Record API request
APIMetrics.record_request(
endpoint="/api/v2/pqc/kem/generate",
method="POST",
status="success",
duration=0.05,
request_size=256,
response_size=2048
)
Using Decorators
from app.monitoring import track_pqc_operation, track_api_request
@track_pqc_operation(algorithm="KYBER768", operation="encapsulate")
async def encapsulate_shared_secret(public_key: bytes) -> EncapsulationResult:
# Your implementation
pass
@track_api_request(endpoint="/pqc/kem/encapsulate", method="POST")
async def encapsulate_endpoint(request: Request):
# Your implementation
pass
Best Practices
1. Metric Naming
- Use lowercase with underscores
- Include units in metric names (seconds, bytes, etc.)
- Use base units (seconds, not milliseconds)
2. Labels
- Keep label cardinality low
- Don't use high-cardinality data (user IDs, timestamps)
- Use consistent label names across metrics
3. Alerting
- Set appropriate thresholds based on historical data
- Use rate-based metrics for alerts
- Include runbook links in alert annotations
4. Performance
- Metrics endpoint should be fast (< 100ms)
- Use histogram buckets wisely
- Clean up old metrics on shutdown
Troubleshooting
Metrics Not Showing
- Check if metrics are being recorded
- Verify Prometheus scrape configuration
- Check application logs for errors
High Cardinality Issues
- Review label usage
- Remove dynamic labels (user IDs, request IDs)
- Aggregate metrics where possible
Missing Metrics
- Verify metric registration in
app/monitoring/metrics.py - Check if metric recording code is executed
- Verify Prometheus is scraping the endpoint
References
Last updated: 2026-03-06