Here is a suggested operation book template:
https://docs.google.com/document/d/1xf3wWoQBgHKQtefaWi04Hx545rPB8HzQ
Procedure guide on how to operate the application
How to start from scratch ?
How to start from predeployed & running app
How to assess the application's process has terminated
Is the process the same ?
Also for restarting after an error ?
How ?
Who should be alerted ?
How ?
Who should be alerted ?
How ?
How to start from scratch ?
How to start from predeployed & running app
How to assess the application's process has terminated
Is the process the same ?
Also for restarting after an error ?
How ?
Who should be alerted ?
How ?
Who should be alerted ?
How ?
What is the start trigger ? Event based ? Time based ?
Are there differences between DataPrep and DataApp ?
For each brick, what is the expected output ?
When is the time frame to intervene ? (when downtime is acceptable or scheduled)
Where and how can we see the application status (Stopped, waiting, running, etc) ?
Where are the run actions historic ?
What form does it take ? Logs ?
Memory / Disk / CPU used by application
According to operational requirements, detail application metrics (Processed Volume, Process duration, ...)
Where to find each step logs ?
As a general guideline, application should stop as soon as possible.
- Contacts
- Meaningful message (timestamps, description, criticality)
Detail procedure for specific error cases