Checklists¶
Checklists are actionable ticket templates for change management. They are synchronized with runbooks and provide a structured way to track progress during maintenance operations.
📋 Available Checklists¶
Wazuh Upgrades¶
- Wazuh Upgrade AIO - Checklist for All-in-One Wazuh upgrades
- Wazuh Upgrade AIO (Improved) - Enhanced version with additional validation steps
Templates¶
- Checklist Template - Standard template for creating new checklists
🎯 How to Use Checklists¶
Before You Start¶
- Copy the checklist into your change ticket or issue tracker
- Fill out metadata completely:
operator: <your name> customer: <customer name> infrastructure: <system identifier> change_ticket: <ticket number> maintenance_window_start: <ISO 8601 timestamp> maintenance_window_end: <ISO 8601 timestamp> target_version: <version number> snapshot_id: <snapshot identifier> runbook_ref: <runbook filename> - Review the runbook referenced in the checklist
- Verify prerequisites are met
During Execution¶
- Follow the checklist order - items are sequenced for safety
- Check off each item as you complete it:
- [x] Completed item - Reference the runbook for detailed commands (don't copy commands into checklist)
- Document deviations in the notes section
- Stop if you hit a no-go gate and escalate
After Completion¶
- Complete all validation items
- Attach health snapshots (pre and post)
- Document findings in the checklist notes
- Update the change ticket
- Close the checklist only when all items are checked
📊 Checklist Structure¶
A) Metadata Header¶
Required information for audit trail and tracking: - Operator name - Customer/infrastructure details - Change ticket reference - Maintenance window - Target versions - Snapshot identifiers
B) Pre-Flight Checks (No-Go Gates)¶
Mandatory safety checks that must pass before proceeding: - [ ] Disk usage < 90% - [ ] All services active (running) - [ ] Valid backup/snapshot exists - [ ] Change ticket approved - [ ] Within maintenance window
C) Pre-Change Snapshot¶
Capture baseline system state: - [ ] Document current versions - [ ] Check service status - [ ] Review disk usage - [ ] Verify cluster health - [ ] Check for recent errors
D) Execution Steps¶
Main procedure items (references runbook for details): - [ ] Step 1 (reference: Runbook Section X) - [ ] Step 2 (reference: Runbook Section Y) - [ ] Validation check - [ ] Continue...
E) Post-Change Snapshot¶
Verify system state after changes: - [ ] Confirm new versions - [ ] Verify all services running - [ ] Check cluster health - [ ] Review logs for errors
F) Post-Validation¶
Final checks before closure: - [ ] Functional tests passed - [ ] Performance acceptable - [ ] Customer notification sent - [ ] Documentation updated
G) Notes & Findings¶
Document any issues, deviations, or observations
🔄 Synchronization with Runbooks¶
CRITICAL: Checklists and runbooks must stay synchronized.
- Checklists contain brief descriptions and reference runbook sections
- Runbooks contain detailed commands and troubleshooting
- Don't duplicate commands - reference the runbook instead
- When updating a runbook, check if checklist needs updates
- When updating a checklist, verify runbook sections match
Example of Proper Referencing¶
❌ WRONG - Duplicating commands in checklist:
✅ CORRECT - Referencing runbook:
🛑 No-Go Gates in Checklists¶
Every checklist includes mandatory no-go gates. DO NOT PROCEED if any check fails:
## B) Pre-Flight Checks (No-Go Gates)
- [ ] Disk usage < 90% on all partitions (STOP if > 90%)
- [ ] All Wazuh services are `active (running)` (STOP if any failed)
- [ ] Valid snapshot/backup created and verified (STOP if missing)
- [ ] Change ticket CHG-XXXXX approved (STOP if not approved)
- [ ] Within maintenance window (STOP if outside window)
- [ ] All prerequisites from runbook verified
📝 Creating New Checklists¶
When creating a new checklist:
- Start with the template: Use CHECKLIST-TEMPLATE.md
- Reference existing runbook: Ensure a corresponding runbook exists
- Include all metadata fields: Don't skip the header section
- Add appropriate no-go gates: Based on the procedure's risks
- Test in practice: Validate in non-production first
- Submit via PR: Include rationale for new checklist
📸 Health Snapshot Tracking¶
Both pre and post snapshots must be: 1. Captured during the procedure 2. Documented in the checklist 3. Attached to the change ticket 4. Reviewed for any anomalies
What to Include in Snapshots¶
- Component versions (before/after)
- Service status (all components)
- Resource utilization (disk, memory)
- Cluster health status
- Recent error logs (last 50 lines)
- API connectivity test results
🔗 Related Resources¶
- Runbooks - Detailed procedures referenced by checklists
- Upgrade Guides - Version-specific upgrade information
- Templates - Change note templates for documentation
- Catalog - Customer infrastructure information