Wangtwohappy's picture
Upload folder using huggingface_hub
f8ba0eb verified
2025-08-22 04:49:52 - INFO - Loading model: openbmb/MiniCPM-V-4-int4
2025-08-22 04:49:54 - INFO - vision_config is None, using default vision config
2025-08-22 04:50:54 - INFO - Model loaded in 62.69 seconds
2025-08-22 04:50:54 - INFO - GPU Memory Usage after model load: 2689.45 MB
2025-08-22 04:50:54 - INFO - [d017fd76-3fd3-43ac-96a2-634c37d06506] Processing video: 'videos/sample1_rotated_270.mp4', Prompt: 'Summarize the key observable events in this 1-minute convenience store video clip. Focus strictly on the physical actions and interactions of the people. Describe only what you can see; do not interpret intentions, relationships, or work efficiency. Avoid all repetitive descriptions of the store's layout or shelves.'
2025-08-22 04:50:54 - INFO - [d017fd76-3fd3-43ac-96a2-634c37d06506] Extracting frames using method: uniform, rate/threshold: 30
2025-08-22 04:50:55 - INFO - [d017fd76-3fd3-43ac-96a2-634c37d06506] Extracted 30 frames successfully. Saving to temporary files...
2025-08-22 04:50:56 - INFO - [d017fd76-3fd3-43ac-96a2-634c37d06506] 30 frames saved to temp_videos/d017fd76-3fd3-43ac-96a2-634c37d06506
2025-08-22 04:51:15 - INFO - vision_config is None, using default vision config
2025-08-22 04:51:27 - INFO - Tokens per second: 5.480411872376593, Peak GPU memory MB: 9236.375
2025-08-22 04:51:27 - INFO - [d017fd76-3fd3-43ac-96a2-634c37d06506] Inference time: 33.09 seconds, CPU usage: 48.6%, CPU core utilization: [46.3, 44.5, 50.7, 52.9]
2025-08-22 04:51:27 - INFO - [d017fd76-3fd3-43ac-96a2-634c37d06506] Cleaned up temporary frame directory: temp_videos/d017fd76-3fd3-43ac-96a2-634c37d06506
2025-08-22 04:51:27 - INFO - [39153060-6451-46d2-bd79-3907b25078cd] Processing video: 'videos/sample1_raw.mp4', Prompt: 'Summarize the key observable events in this 1-minute convenience store video clip. Focus strictly on the physical actions and interactions of the people. Describe only what you can see; do not interpret intentions, relationships, or work efficiency. Avoid all repetitive descriptions of the store's layout or shelves.'
2025-08-22 04:51:27 - INFO - [39153060-6451-46d2-bd79-3907b25078cd] Extracting frames using method: uniform, rate/threshold: 30
2025-08-22 04:51:32 - INFO - [39153060-6451-46d2-bd79-3907b25078cd] Extracted 30 frames successfully. Saving to temporary files...
2025-08-22 04:51:32 - INFO - [39153060-6451-46d2-bd79-3907b25078cd] 30 frames saved to temp_videos/39153060-6451-46d2-bd79-3907b25078cd
2025-08-22 04:51:45 - INFO - vision_config is None, using default vision config
2025-08-22 04:52:00 - INFO - Tokens per second: 7.228105259193224, Peak GPU memory MB: 9236.375
2025-08-22 04:52:00 - INFO - [39153060-6451-46d2-bd79-3907b25078cd] Inference time: 32.26 seconds, CPU usage: 37.7%, CPU core utilization: [49.7, 33.2, 23.0, 45.0]
2025-08-22 04:52:00 - INFO - [39153060-6451-46d2-bd79-3907b25078cd] Cleaned up temporary frame directory: temp_videos/39153060-6451-46d2-bd79-3907b25078cd
2025-08-22 04:52:00 - INFO - [17f20038-cd68-4523-9ca8-e568483c3597] Processing video: 'videos/sample1_rotated_180.mp4', Prompt: 'Summarize the key observable events in this 1-minute convenience store video clip. Focus strictly on the physical actions and interactions of the people. Describe only what you can see; do not interpret intentions, relationships, or work efficiency. Avoid all repetitive descriptions of the store's layout or shelves.'
2025-08-22 04:52:00 - INFO - [17f20038-cd68-4523-9ca8-e568483c3597] Extracting frames using method: uniform, rate/threshold: 30
2025-08-22 04:52:01 - INFO - [17f20038-cd68-4523-9ca8-e568483c3597] Extracted 30 frames successfully. Saving to temporary files...
2025-08-22 04:52:01 - INFO - [17f20038-cd68-4523-9ca8-e568483c3597] 30 frames saved to temp_videos/17f20038-cd68-4523-9ca8-e568483c3597
2025-08-22 04:52:13 - INFO - vision_config is None, using default vision config
2025-08-22 04:52:24 - INFO - Tokens per second: 4.33591342429363, Peak GPU memory MB: 9236.375
2025-08-22 04:52:24 - INFO - [17f20038-cd68-4523-9ca8-e568483c3597] Inference time: 24.36 seconds, CPU usage: 32.9%, CPU core utilization: [41.7, 14.4, 17.5, 58.0]
2025-08-22 04:52:24 - INFO - [17f20038-cd68-4523-9ca8-e568483c3597] Cleaned up temporary frame directory: temp_videos/17f20038-cd68-4523-9ca8-e568483c3597
2025-08-22 04:52:24 - INFO - [00745cac-3194-4425-9e97-9a6df15d32b5] Processing video: 'videos/sample1_rotated_90.mp4', Prompt: 'Summarize the key observable events in this 1-minute convenience store video clip. Focus strictly on the physical actions and interactions of the people. Describe only what you can see; do not interpret intentions, relationships, or work efficiency. Avoid all repetitive descriptions of the store's layout or shelves.'
2025-08-22 04:52:24 - INFO - [00745cac-3194-4425-9e97-9a6df15d32b5] Extracting frames using method: uniform, rate/threshold: 30
2025-08-22 04:52:26 - INFO - [00745cac-3194-4425-9e97-9a6df15d32b5] Extracted 30 frames successfully. Saving to temporary files...
2025-08-22 04:52:26 - INFO - [00745cac-3194-4425-9e97-9a6df15d32b5] 30 frames saved to temp_videos/00745cac-3194-4425-9e97-9a6df15d32b5
2025-08-22 04:52:39 - INFO - vision_config is None, using default vision config
2025-08-22 04:52:55 - INFO - Tokens per second: 8.020485206497014, Peak GPU memory MB: 9236.375
2025-08-22 04:52:55 - INFO - [00745cac-3194-4425-9e97-9a6df15d32b5] Inference time: 31.39 seconds, CPU usage: 42.2%, CPU core utilization: [23.0, 30.6, 48.4, 66.5]
2025-08-22 04:52:55 - INFO - [00745cac-3194-4425-9e97-9a6df15d32b5] Cleaned up temporary frame directory: temp_videos/00745cac-3194-4425-9e97-9a6df15d32b5