| Status | WIP |
|---|---|
| Stakeholders | |
| Outcome | |
| Due Date | |
| Owner | |
| Solution/Domain/Data Architect |
Most of the lab sites are equipped with File Server atop iOMega NAS, which is in some cases if malfunctioning also are running at the limit for adding more hard disks to accommodate new demands for storage and been compliant with data storage regulations.
From the producer/consumer data and data logistic perspectives, LabPC disks and Local File Storage are just different levels of data buffering. They are not the place to store data for long term neither to serve with proper interoperability and scalability data to its consumers.
It is therefore essential to develop a good classification of the data captured from the instruments in terms of country regulation, access restrictions, data lifecycle, experience in data access - relationship between latency and data availability - in order to design the most economically efficient solution that guarantees business continuity effectively.
Use cases
- Lyon/RICL
- OpenLab instruments generate files at hundreds GB scale for some techniques which demands local computer power to load in stand alone applications for expert analysis, this magnitude size also makes it impossible to load these files to cloud repository as well as creating difficulties in being compliant with regulatory terms for storage.
- Shanghai
- Waters instruments demands 12TB space to store application data. This storage magnitute reaches the local limitation for LabPC disk and also for local network in the iOMega NAS.
- Bristol
- Waters instruments demands 12TB space to store application data. This storage magnitude reaches the local limitation for LabPC disk and also for local network in the iOMega NAS.
- Nuclear Magnetic Resonance:
- Mass spectrometry:
Questions and Concerns
Architectural Significant Requirements
- Data rotation
- Retention
- Export Control/Cyber Sec
- Access Control
- Data transfer latency SLA
- Data Consumers (data structure/data model)
Impediments and Blockers
Tradeoff analysis
- Alternatives
- XYZ
- YZX
- Sensitivity Points
- Risks
- Non Risks
- Architectural Approaches
Quick-Wins
The criticality of the problem demands that there be at least a temporary solution as an alternative to mitigate the immediate impact of losing sensitive data for the business due to the local storage limit at the current date.
Some questions that can facilitate the analysis:
- What type of search criteria is used on historical data?
- What retrieval is done on this found data?
- What type of processing is done on the found and retrieved data?
- Is it possible to parse this data?
Possible alternatives to consider in advance:
- Increase the storage capacity in Lab File Server - then move LabPC disk data to File Server
- Promote historical data close to the (AWS Landing Zone) ACD Labs domain so that it can be ingested for later analysis in reports
- Promote historical data close to the (GCP) Lab-Booster domain so that it can be ingested for later analysis in reports
- Promote historical data to the Azure Fabric Lakehouse so that it can be ingested for later analysis in reports
→ Having Azure as cloud destination for the data, a PoC could also be evaluated for a LabPC virtualization (with no connection with instruments), where only Analytics capabilities from the software vendor would be leveraged on the loaded historical data.
Design Solution Proposal
The building blocks in the following diagram are mostly from the Azure Tech-Stack for the sake of exercising concepts.
Apart from the convenience to use Azure Tech-Stack in this technical assessment, given the tight dependency among producers and consumers from the data perspective, the cloud provider has to be chosen strategically.
Azure File Sync: key component for synchronizing file from the Site to the Cloud (Azure)
Target Architecture - R&I
Meeting Notes
[SoW] LabPC - Storage for lab-local data at scale (Olivier SAUSSOL, Mijajlovic, Julie, Tiago Oliveira)
- LabPC Storage Study
- SoW
- Assess the issue
- Scenarios (business needs?)
- *Pictures* for real case situation (disks at the floor...)
- Worst case scenarios
- Application data, storage for long term, later data combination cross apps
- Instruments categories (data volume, SLA,...)
- (inventory for storage) LABPC Storage needs consolidation https://docs.google.com/spreadsheets/d/1U-d6W2LEGV9mz4XK9hBmHg1FTZY_-oYxlkXKkukgSNQ/edit?gid=0#gid=0
- Total storage needed for now and the forecast?
- Impact
- Reaching limit of storage
- User storing data on inappropriate devices (NAS, - shadow it)
- Risks
- 1.Data loss
- 2.Data steal
- 3.Shadow IT
- 4.Export data control/legal regulations concerns
- 5.Business continuity
- Cost
- Getting expensive to maintain/extend disks
- Impediments
- Hard to standardize instruments
- Blocked to onboard new instrument
- User complains
- Bolate: 77 pcs, storage not appropriate
- Shanghai:
- "Mark´s use case: large file generated
- Scenarios (business needs?)
- Evaluate Alternatives
- Meeting Stakeholders
- Business strategy
- Skills demanded for the assessment
- Solution Vendor providers
- Short Term solution (quick-wins)
- Already some in place - *highlight quick-wins ongoing
- Distributed storage on LabPC (disks): store on LabPC by convenience to avoid losing data
- Leveraging local File Server (*Shanghai - non standard storage)
- Cloud storage for historical LabPC data - it depends on application readiness for Virtualization deployment
- Recall AWS gateway for data - PoC (ask Khemaies)
- PoC for instrument application virtualization (pick that one with larger data volumetry; ask Olivier)
- Consider ACD Lab (as Mark Kwasnik) as the interface to run analytics - in case instrument application not ready for virtualization
- Already some in place - *highlight quick-wins ongoing
- Long Term solution
- Outcome
- Presentation
- (1st phase)structure scenario (deadline ) [Julie, Olivier, Tiago]
- touch points weekly basis meeting (30min) March (Wednesday 10:30)
- (2nd phase)engage expert per domain for the solution alternatives
- Decision to be taken
- Business Case
- Project organization
- Solution Architecture Proposal
- (1st phase)structure scenario (deadline ) [Julie, Olivier, Tiago]
- Presentation
- Assess the issue
- SoW
References
Presentation https://docs.google.com/presentation/d/1UE4-b8e0GxHoxlVHJomt54JpcWwCv5Uh48Tw3qLA41o/edit#slide=id.g33d7e03e36c_0_0


