Here is a suggested operation book template:
https://docs.google.com/document/d/1xf3wWoQBgHKQtefaWi04Hx545rPB8HzQ
Procedure guide on how to operate the application
How to start from predeployed & running app: verify that all tables and views have been created then execute the plan.
How to assess the application's process has terminated: completion of the flow in the TMC
Is the process the same ? YES
Also for restarting after an error ? If there has been an error in the job you should first delete data from the FACT tables.
How ? Stop the flow in the TMC
Who should be alerted ? Contacts are defined in this page
How ? Stop the flow in the TMC
Who should be alerted ? Contacts are defined in this page
How ?
What is the start trigger ? Event based ? Time based ?
Are there differences between DataPrep and DataApp ?
For each brick, what is the expected output ?
When is the time frame to intervene ? (when downtime is acceptable or scheduled)
Where and how can we see the application status (Stopped, waiting, running, etc) ?
Where are the run actions historic ?
What form does it take ? Logs ?
Memory / Disk / CPU used by application
According to operational requirements, detail application metrics (Processed Volume, Process duration, ...)
Where to find each step logs ?
As a general guideline, application should stop as soon as possible.
- Contacts
- Meaningful message (timestamps, description, criticality)
Detail procedure for specific error cases