A newer version of the Gradio SDK is available:
6.4.0
metadata
description: >-
Comprehensive system health checkup including disk health, SMART status,
filesystem checks, and overall system status
tags:
- sysadmin
- diagnostics
- health
- disk
- smart
- filesystem
- comprehensive
Perform a comprehensive system health checkup:
- Disk Health (SMART): Check all disk SMART status and health indicators
- Filesystem Health: Check all mounted filesystems for errors
- System Resources: CPU, memory, swap, and load status
- Critical Services: Verify critical system services are running
- Security Updates: Check for pending security updates
- Disk Space: Check all mounted filesystems for space issues
- System Logs: Check for recent critical errors
- Hardware Errors: Check for hardware-related issues in logs
Run the following comprehensive diagnostic commands:
Disk Health (SMART):
sudo smartctl --scanto identify all drivessudo smartctl -H /dev/sdafor health status (repeat for all drives found)sudo smartctl -A /dev/sdafor SMART attributes (repeat for all drives)- Check for: Reallocated sectors, Current pending sectors, Offline uncorrectable sectors
Filesystem Health:
df -hfor disk space on all filesystemssudo btrfs device stats /if using BTRFS- Check mounted filesystems with
mount | grep -E '^/dev' - For ext4:
sudo tune2fs -l /dev/sdXY | grep -i 'state\|error'for filesystem state
System Resources:
free -hfor memory usageuptimefor load averagestop -b -n 1 | head -n 20for process overviewswapon --showfor swap status
Critical Services:
systemctl status systemd-journaldfor logging servicesystemctl status cronorsystemctl status crondfor task schedulersystemctl --failedfor any failed services
Updates and Security:
sudo apt-get updateto refresh package listsapt list --upgradableto check for available updatesgrep -i security /var/log/apt/history.log | tail -n 20for recent security updates
System Logs:
journalctl -p 3 -bfor errors in current bootjournalctl -p 2 -bfor critical issues in current bootdmesg | grep -i 'error\|fail\|critical' | tail -n 20for kernel errors
Hardware Status:
sensorsfor temperature monitoring (if lm-sensors installed)dmesg | grep -i 'hardware error'for hardware errorslspci -v | grep -i 'error'for PCIe errors
Additional Checks:
- Check for excessive failed login attempts:
sudo grep -i 'failed password' /var/log/auth.log | tail -n 10 - Check for disk I/O errors:
dmesg | grep -i 'I/O error'
Analyze all results and provide:
Summary Report:
- Overall system health status (Healthy, Warning, Critical)
- Disk health status for each drive
- Filesystem health and space status
- Memory and swap status
- Any failed services or critical errors
- Pending updates (especially security)
- Temperature warnings if applicable
- Specific issues found with severity levels
Recommendations:
- Immediate actions needed (if any)
- Preventive maintenance suggestions
- Monitoring recommendations
- Whether a reboot is recommended
- Backup reminders if issues detected
Priority Issues: List any issues in order of urgency:
- Critical (requires immediate attention)
- Warning (should be addressed soon)
- Informational (for awareness)
If smartmontools is not installed, offer to install with sudo apt-get install smartmontools.
If lm-sensors is not installed and temperature monitoring is desired, offer to install with sudo apt-get install lm-sensors.