Server Maintenance SOP

Overview

This page is the standard operating procedure for maintaining the personal server environment that supports Justinspace and related self-hosted services.

Primary Goals

Keep public-facing services available
Maintain backups and restore readiness
Track health, uptime, and system alerts
Apply updates safely and intentionally

Maintenance Priorities

Availability
Backups
Security updates
Monitoring accuracy
Documentation quality

Service Inventory

Document the major services running on this server and what role they play.

Service	Purpose	Access / URL	Notes
Justinspace	Personal dashboard / homepage	Main public site	Primary landing page and internal utility hub
Gotify	Push notifications / alerting	Internal hosted service	Receives manual alerts and monitoring notifications
Uptime Kuma	Uptime and endpoint monitoring	Internal hosted service	Sends outage and recovery alerts
Reverse Proxy	HTTPS and routing	Public edge	Required for TLS and app exposure
Docker Containers	Application runtime	Local host	Verify health and restart behavior regularly

Daily Checks

Minimum Daily Review

Check Uptime Kuma for any active incidents or recent recoveries
Confirm Gotify is delivering notifications properly
Review disk alerts and make sure no filesystem is approaching capacity
Verify public-facing sites load successfully
Look for failed containers, failed services, or unusual restarts

Watch closely: Disk usage, certificate problems, reverse proxy issues, and anything affecting public pages should be treated as priority items.

Weekly Checks

Review available package updates
Review Docker image/container health
Check server uptime and resource trends
Verify backup jobs completed successfully
Confirm important domains and certificates are healthy
Review logs for recurring warnings or failures

Monthly Checks

Apply planned OS and package updates
Update containers where appropriate
Audit alerting setup and remove noisy or duplicate alerts
Test restore assumptions for backups
Rotate credentials/tokens if needed
Review this SOP and update anything outdated

Update Procedure

Use a cautious, repeatable flow for system updates.

Before updating: Verify backups are recent, check active alerts, and avoid making changes during an unresolved incident unless the update is the fix.

Suggested OS Update Flow

sudo apt update
sudo apt list --upgradable
sudo apt upgrade -y
sudo apt autoremove -y
sudo reboot

After Reboot Verification

Confirm SSH access returns normally
Confirm Docker containers are running
Check reverse proxy and HTTPS endpoints
Check Uptime Kuma monitors
Send a test Gotify notification if needed

Monitoring & Alerts

Current Monitoring Stack

Uptime Kuma for endpoint monitoring
Gotify for push notifications
Custom shell scripts for server health checks

Key Alert Categories

Public site down
Service/container unavailable
Disk usage high
Memory usage high
Load average too high
Certificate expiration warnings

Monitoring Notes

Alerts should be actionable, not noisy
Recovery alerts are useful and should be preserved
Critical services should have dedicated notification sources where possible

Backups

Backups are only valuable if they are recent, complete, and restorable.

Verify backup jobs complete successfully
Confirm backup destination is reachable and has capacity
Retain enough history for accidental deletion and rollback scenarios
Document where application data and config files live
Periodically test restoration, not just backup creation

Important: A backup job reporting “success” is not the same as a verified recovery path.

Incident Response

When Something Breaks

Identify what is actually down: app, container, proxy, DNS, SSL, or full host
Check alerts and the most recent changes made to the server
Review service/container status and logs
Restore service if the fix is obvious and low risk
Rollback or reboot only if justified
Document what happened and what fixed it

Priority Triage Questions

Is the whole server down or just one service?
Did this start after an update, config change, or deployment?
Is HTTPS/reverse proxy working?
Is disk full?
Did a container fail to start?

Quick Command Reference

System Health

uptime
free -h
df -h
top
systemctl --failed

Docker

docker ps
docker ps -a
docker logs <container_name>
docker restart <container_name>
docker compose ps

Networking / HTTP

curl -I https://example.com
ss -tulpn
ping 1.1.1.1

Logs

journalctl -xe
journalctl -u nginx --no-pager | tail -n 50
docker logs --tail 50 <container_name>

Alerts / Custom Scripts

/usr/local/bin/check_server_health
crontab -l