2026-01-22 03:20:10,773 - WARNING - Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet` 2026-01-22 03:20:16,268 - WARNING - Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet` 2026-01-22 03:21:15,887 - WARNING - Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet` 2026-01-22 03:21:19,191 - WARNING - Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet` 2026-01-22 03:21:19,213 - WARNING - Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet` 2026-01-22 03:21:19,216 - WARNING - Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet` 2026-01-22 03:21:19,277 - WARNING - Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet` 2026-01-22 03:21:19,278 - WARNING - Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet` 2026-01-22 03:21:19,278 - WARNING - Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet` 2026-01-22 03:21:19,291 - WARNING - Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet` 2026-01-22 03:21:19,316 - WARNING - Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet` 2026-01-22 03:27:18,906 - ERROR - Model load failed for HuggingFaceH4/zephyr-7b-beta: Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit the quantized model. If you want to dispatch the model on the CPU or the disk while keeping these modules in 32-bit, you need to set `llm_int8_enable_fp32_cpu_offload=True` and pass a custom `device_map` to `from_pretrained`. Check https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-cpu-and-gpu for more details. 2026-01-22 03:34:20,341 - ERROR - Model load failed for HuggingFaceH4/zephyr-7b-beta: Some modules are dispatched on the CPU or the disk. Make sure you have enough GPU RAM to fit the quantized model. If you want to dispatch the model on the CPU or the disk while keeping these modules in 32-bit, you need to set `llm_int8_enable_fp32_cpu_offload=True` and pass a custom `device_map` to `from_pretrained`. Check https://huggingface.co/docs/transformers/main/en/main_classes/quantization#offload-between-cpu-and-gpu for more details. 2026-01-22 03:34:41,119 - INFO - System memory: 15.52GB, Estimated model size: 1.20GB 2026-01-22 03:34:41,120 - WARNING - No GPU detected - using CPU only 2026-01-22 03:34:49,182 - INFO - Successfully loaded TinyLlama/TinyLlama-1.1B-Chat-v1.0 with device_map: cpu 2026-01-22 03:34:49,354 - INFO - System memory: 15.52GB, Estimated model size: 6.00GB 2026-01-22 03:34:49,354 - WARNING - No GPU detected - using CPU only 2026-01-22 03:37:33,010 - INFO - System memory: 15.52GB, Estimated model size: 1.20GB 2026-01-22 03:37:33,011 - WARNING - No GPU detected - using CPU only 2026-01-22 03:37:48,312 - INFO - Successfully loaded TinyLlama/TinyLlama-1.1B-Chat-v1.0 with device_map: cpu 2026-01-22 03:37:48,592 - INFO - System memory: 15.52GB, Estimated model size: 6.00GB 2026-01-22 03:37:48,592 - WARNING - No GPU detected - using CPU only 2026-01-22 03:54:42,741 - INFO - Successfully loaded HuggingFaceH4/zephyr-7b-beta with device_map: cpu 2026-01-22 04:21:00,210 - INFO - {"timestamp": "2026-01-22T04:21:00.209239", "query": "What is 2+2?", "model_used": "system1", "routing_reason": "simple_query", "entropy": 0.0, "response": "2 + 2 = 4"} 2026-01-22 05:04:22,917 - INFO - {"timestamp": "2026-01-22T05:04:22.914959", "query": "Compare the architectural differences between transformer and RNN models", "model_used": "system2", "routing_reason": "semantic_complexity", "entropy": 0.0, "response": ", and discuss their potential applications in various industries. Include specific examples of indus..."} 2026-01-22 05:48:18,885 - INFO - {"timestamp": "2026-01-22T05:48:18.879276", "query": "Compare the architectural differences between transformer and RNN models", "model_used": "system2", "routing_reason": "semantic_complexity", "entropy": 0.0, "response": "in natural language processing. Provide examples of use cases where each model is more suitable. Add..."}