meccatronis's picture
Upload README.md with huggingface_hub
d01b180 verified
# GPU Monitoring and Fan Control System
A comprehensive GPU monitoring and fan control system for AMD GPUs with real-time monitoring, advanced fan control, web interface, and system integration.
## Features
### 🖥️ Desktop Monitoring
- **Real-time GPU monitoring** with PyQt5 desktop overlay
- **System tray integration** for minimal footprint monitoring
- **Configurable display modes** (overlay, tray, full dashboard)
- **Multiple GPU support** with automatic detection
### 🌡️ Advanced Fan Control
- **Multiple temperature curves** (Silent, Balanced, Performance, Custom)
- **Profile-based control** with hotkey switching
- **Safety limits** and automatic fallback modes
- **Manual override** capabilities
### 🌐 Web Interface
- **Remote monitoring** via web browser
- **Real-time charts** and historical data
- **Mobile-responsive** design
- **API endpoints** for integration
### 📊 Data Logging & Analysis
- **Historical data storage** with SQLite
- **Performance analytics** and trend analysis
- **Export capabilities** (CSV, JSON)
- **Alert system** for temperature thresholds
### 🔧 System Integration
- **Systemd service** for automatic startup
- **Configuration management** with JSON profiles
- **Autostart integration** for desktop environments
- **Permission handling** and error recovery
## Installation
### Prerequisites
```bash
# Install required packages
sudo apt update
sudo apt install python3 python3-pip python3-venv
sudo apt install python3-pyqt5 python3-pyqt5.qtopengl
sudo apt install python3-matplotlib python3-flask
```
### Setup
```bash
# Create project directory
mkdir -p ~/gpu_monitoring_system
cd ~/gpu_monitoring_system
# Create virtual environment
python3 -m venv venv
source venv/bin/activate
# Install Python dependencies
pip install -r requirements.txt
# Run setup script
./setup.sh
```
## Usage
### Desktop Monitor
```bash
# Start desktop monitoring overlay
python3 gpu_monitor_desktop.py
# Start with system tray
python3 gpu_monitor_desktop.py --tray
# Start full dashboard
python3 gpu_monitor_desktop.py --dashboard
```
### Fan Control
```bash
# Start fan control with default profile
python3 gpu_fan_controller.py
# Start with specific profile
python3 gpu_fan_controller.py --profile performance
# Start with custom configuration
python3 gpu_fan_controller.py --config custom.json
```
### Web Interface
```bash
# Start web server
python3 web_interface.py
# Access at http://localhost:5000
```
### System Service
```bash
# Install as system service
sudo ./install_service.sh
# Start service
sudo systemctl start gpu-monitoring
# Enable auto-start
sudo systemctl enable gpu-monitoring
```
## Configuration
### Fan Control Profiles
Create custom fan control profiles in `config/fan_profiles.json`:
```json
{
"silent": {
"name": "Silent",
"description": "Quiet operation with lower temperatures",
"curve": {
"min_temp": 40,
"max_temp": 65,
"min_pwm": 120,
"max_pwm": 220
},
"safety": {
"emergency_temp": 85,
"emergency_pwm": 255
}
},
"performance": {
"name": "Performance",
"description": "Maximum cooling for high performance",
"curve": {
"min_temp": 35,
"max_temp": 55,
"min_pwm": 180,
"max_pwm": 255
},
"safety": {
"emergency_temp": 80,
"emergency_pwm": 255
}
}
}
```
### Monitoring Configuration
Configure monitoring settings in `config/monitoring.json`:
```json
{
"update_interval": 1.0,
"display_mode": "overlay",
"show_gpu_load": true,
"show_temperature": true,
"show_fan_speed": true,
"show_power": true,
"show_vram": true,
"alerts": {
"enabled": true,
"temp_warning": 75,
"temp_critical": 85,
"power_warning": 200
}
}
```
## Monitoring Data
The system collects and stores the following data:
### GPU Metrics
- **Temperature**: Core temperature in Celsius
- **Load**: GPU utilization percentage
- **Fan Speed**: RPM and PWM percentage
- **Power**: Current power draw in watts
- **VRAM**: Used and total memory
- **Clocks**: Core and memory clock speeds
### System Metrics
- **CPU Usage**: Overall system load
- **Memory Usage**: System RAM utilization
- **Disk Usage**: Storage space monitoring
- **Network**: Bandwidth usage
## Web Interface Features
### Dashboard
- Real-time metric display
- Historical charts with configurable time ranges
- System health overview
- Alert status and history
### Charts
- Temperature trends over time
- Fan speed and PWM curves
- Power consumption patterns
- GPU utilization history
### Configuration
- Fan profile management
- Alert threshold configuration
- Display settings
- Data export options
## API Endpoints
### Monitoring Data
- `GET /api/status` - Current GPU status
- `GET /api/history` - Historical data
- `GET /api/metrics` - All available metrics
### Fan Control
- `POST /api/fan/profile` - Set fan profile
- `POST /api/fan/manual` - Manual fan control
- `GET /api/fan/status` - Current fan status
### System
- `GET /api/system` - System information
- `POST /api/alerts` - Configure alerts
- `GET /api/logs` - System logs
## Troubleshooting
### Permission Issues
If you encounter permission errors with GPU monitoring:
```bash
# Check GPU permissions
ls -la /sys/class/drm/card*/device/hwmon/
# Add user to video group
sudo usermod -a -G video $USER
# Or run with sudo for fan control
sudo python3 gpu_fan_controller.py
```
### Missing Dependencies
```bash
# Install missing PyQt5
pip install PyQt5 PyQt5-sip
# Install missing matplotlib
pip install matplotlib
# Install missing Flask
pip install flask
```
### Service Issues
```bash
# Check service status
sudo systemctl status gpu-monitoring
# View service logs
sudo journalctl -u gpu-monitoring -f
# Restart service
sudo systemctl restart gpu-monitoring
```
## Development
### Adding New GPU Support
1. Update `gpu_detector.py` with new GPU detection logic
2. Add temperature sensor paths for new GPU models
3. Test with `python3 test_gpu_detection.py`
### Custom Fan Curves
1. Create new profile in `config/fan_profiles.json`
2. Test with `python3 gpu_fan_controller.py --profile new_profile`
3. Validate temperature response and stability
### Web Interface Extensions
1. Add new routes in `web_interface.py`
2. Create templates in `templates/` directory
3. Add static assets in `static/` directory
## License
This project is licensed under the MIT License - see the LICENSE file for details.
## Contributing
1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Add tests for your changes
5. Submit a pull request
## Support
For support and questions:
- Create an issue on GitHub
- Check the troubleshooting section
- Review the configuration examples