danielrosehill's picture
commit
279efce
# Check Failed Systemd Units
You are helping the user identify and diagnose failed systemd units (services, mounts, timers, etc.).
## Task
1. **List all failed units:**
```bash
# Show failed units
systemctl --failed
# More detailed output
systemctl --failed --all
# Include user units
systemctl --user --failed
```
2. **Get detailed status of failed units:**
```bash
# For each failed unit, get details
for unit in $(systemctl --failed --no-legend | awk '{print $1}'); do
echo "=== $unit ==="
systemctl status "$unit" --no-pager -l
echo ""
done
```
3. **Check recent failures:**
```bash
# Units that failed in last boot
systemctl list-units --failed --state=failed
# Check boot log for failures
journalctl -b -p err | grep -i "failed"
```
4. **Analyze specific failed unit:**
```bash
# Status with full output
systemctl status UNIT_NAME -l --no-pager
# Recent logs for the unit
journalctl -u UNIT_NAME -n 50 --no-pager
# Logs from current boot
journalctl -b -u UNIT_NAME --no-pager
# All logs for the unit
journalctl -u UNIT_NAME --since "24 hours ago" --no-pager
```
5. **Check unit dependencies:**
```bash
# What this unit depends on
systemctl list-dependencies UNIT_NAME
# What depends on this unit
systemctl list-dependencies --reverse UNIT_NAME
# Check if dependencies failed
systemctl list-dependencies UNIT_NAME --all | while read dep; do
systemctl is-failed "$dep" 2>/dev/null | grep -q "^failed" && echo "FAILED: $dep"
done
```
6. **Common failure patterns:**
```bash
# Mount failures
systemctl --failed | grep ".mount"
# Service failures
systemctl --failed | grep ".service"
# Timer failures
systemctl --failed | grep ".timer"
# Network-related failures
systemctl --failed | grep -E "network|dhcp|dns"
```
7. **Attempt to diagnose failure reason:**
```bash
# Exit code and signal
systemctl show UNIT_NAME | grep -E "ExecMainStatus|ExecMainCode|Result"
# Unit file location and settings
systemctl cat UNIT_NAME
# Check if unit file exists and is valid
systemctl show UNIT_NAME -p LoadState,ActiveState,SubState,Result
```
8. **Try to restart failed units:**
```bash
# Ask user if they want to attempt restart
# List failed units
failed_units=$(systemctl --failed --no-legend | awk '{print $1}')
# For each unit, ask to restart
for unit in $failed_units; do
echo "Attempting to restart: $unit"
sudo systemctl restart "$unit"
systemctl is-active --quiet "$unit" && echo "✓ $unit restarted successfully" || echo "✗ $unit restart failed"
done
```
9. **Check for masked units:**
```bash
# List masked units
systemctl list-unit-files | grep masked
# Check if failed unit is masked
systemctl is-enabled UNIT_NAME
```
10. **Generate failure report:**
```bash
cat > /tmp/failed-units-report.txt << EOF
Failed Units Report - $(date)
======================================
Failed Units Summary:
$(systemctl --failed --no-pager)
Detailed Status:
EOF
for unit in $(systemctl --failed --no-legend | awk '{print $1}'); do
echo "" >> /tmp/failed-units-report.txt
echo "=== $unit ===" >> /tmp/failed-units-report.txt
systemctl status "$unit" --no-pager -l >> /tmp/failed-units-report.txt 2>&1
echo "" >> /tmp/failed-units-report.txt
echo "Recent Logs:" >> /tmp/failed-units-report.txt
journalctl -u "$unit" -n 20 --no-pager >> /tmp/failed-units-report.txt 2>&1
echo "" >> /tmp/failed-units-report.txt
done
cat /tmp/failed-units-report.txt
```
## Present Summary to User
Provide:
- Number of failed units
- List of failed unit names and types
- Failure reasons (exit codes, signals)
- Recent log entries for each
- Recommended actions
## Common Failed Units & Solutions
**NetworkManager-wait-online.service:**
- Usually safe to ignore or disable if not needed
- `sudo systemctl disable NetworkManager-wait-online.service`
**ModemManager.service:**
- May fail if no modem hardware present
- Can disable: `sudo systemctl disable ModemManager.service`
**bluetooth.service:**
- Check firmware: `journalctl -u bluetooth | grep -i firmware`
- Restart: `sudo systemctl restart bluetooth`
**systemd-resolved.service:**
- Check config: `/etc/systemd/resolved.conf`
- DNS issues: `resolvectl status`
**Mount units (*.mount):**
- Check fstab: `cat /etc/fstab`
- Verify device exists: `lsblk`
- Check mount point permissions
**User services:**
- Check user journal: `journalctl --user -u UNIT_NAME`
- May need `loginctl enable-linger USER`
## Cleanup Actions
```bash
# Reset failed state
sudo systemctl reset-failed
# Disable permanently failed units (ask first!)
sudo systemctl disable UNIT_NAME
# Mask unit to prevent activation
sudo systemctl mask UNIT_NAME
# Unmask unit
sudo systemctl unmask UNIT_NAME
# Reload systemd configuration
sudo systemctl daemon-reload
```
## Notes
- Not all failures are critical - some are expected
- Check if service is actually needed before disabling
- Some failures may be due to hardware not present (modems, bluetooth)
- Mount failures can prevent boot - be careful with fstab changes
- User units are separate from system units
- Use `systemctl reset-failed` to clear failed state after fixing