Here is a suggested operation book template:
https://drive.google.com/file/d/1JQyzcu3PHDqB_iRCvarIkvVL5ejJIm-r/view
Procedure guide on how to operate the application
How to start from scratch ?
How to start from predeployed & running app
How to assess the application's process has terminated
Is the process the same ?
Also for restarting after an error ?
How ?
Who should be alerted ?
How ?
Who should be alerted ?
How ?
How to start from scratch ?
How to start from predeployed & running app
How to assess the application's process has terminated
Is the process the same ?
Also for restarting after an error ?
How ?
Who should be alerted ?
How ?
Who should be alerted ?
How ?
What is the start trigger ? Event based ? Time based ?
Are there differences between DataPrep and DataApp ?
For each brick, what is the expected output ?
When is the time frame to intervene ? (when downtime is acceptable or scheduled)
Where and how can we see the application status (Stopped, waiting, running, etc) ?
Where are the run actions historic ?
What form does it take ? Logs ?
Memory / Disk / CPU used by application
According to operational requirements, detail application metrics (Processed Volume, Process duration, ...)
Where to find each step logs ?
As a general guideline, application should stop as soon as possible.
- Contacts
- Meaningful message (timestamps, description, criticality)
Detail procedure for specific error cases