Procedures
Here is a suggested operation book template:
https://drive.google.com/file/d/1JQyzcu3PHDqB_iRCvarIkvVL5ejJIm-r/view
Procedure guide on how to operate the application
DataPrep Flow
Start
Full process
How to start from scratch ?
Incremental process
How to start from predeployed & running app
Termination
How to assess the application's process has terminated
Restart
Is the process the same ?
Also for restarting after an error ?
Pause
Procedure
How ?
Alert contacts
Who should be alerted ?
Stop
Procedure
How ?
Alert contacts
Who should be alerted ?
Reset
How ?
DataApp Flow
Start
Full process
How to start from scratch ?
Incremental process
How to start from predeployed & running app
Termination
How to assess the application's process has terminated
Restart
Is the process the same ?
Also for restarting after an error ?
Pause
Procedure
How ?
Alert contacts
Who should be alerted ?
Stop
Procedure
How ?
Alert contacts
Who should be alerted ?
Reset
How ?
Scheduling
Trigger
What is the start trigger ? Event based ? Time based ?
Are there differences between DataPrep and DataApp ?
Expected results
For each brick, what is the expected output ?
Intervention
When is the time frame to intervene ? (when downtime is acceptable or scheduled)
Monitoring
Runtime
Where and how can we see the application status (Stopped, waiting, running, etc) ?
Run history
Where are the run actions historic ?
What form does it take ? Logs ?
Resources
Memory / Disk / CPU used by application
Additional metrics
According to operational requirements, detail application metrics (Processed Volume, Process duration, ...)
Logging
Where to find each step logs ?
Error handling
As a general guideline, application should stop as soon as possible.
Alerts
- Contacts
- Meaningful message (timestamps, description, criticality)
Specificity
Detail procedure for specific error cases