Most of the lab sites are equipped with File Server atop iOMega NAS, which is in some cases if malfunctioning also are running at the limit for adding more hard disks to accommodate new demands for storage and been compliant with data storage regulations.
From the producer/consumer data and data logistic perspectives, LabPC disks and Local File Storage are just different levels of data buffering. They are not the place to store data for long term neither to serve with proper interoperability and scalability data to its consumers.
It is therefore essential to develop a good classification of the data captured from the instruments in terms of country regulation, access restrictions, data lifecycle, experience in data access - relationship between latency and data availability - in order to design the most economically efficient solution that guarantees business continuity effectively.
Use cases
Lyon/RICL
OpenLab instruments generate files at hundreds GB scale for some techniques which demands local computer power to load in stand alone applications for expert analysis, this magnitude size also makes it impossible to load these files to cloud repository as well as creating difficulties in being compliant with regulatory terms for storage.
Shanghai
Waters instruments demands 12TB space to store application data. This storage magnitute reaches the local limitation for LabPC disk and also for local network in the iOMega NAS.
Bristol
Waters instruments demands 12TB space to store application data. This storage magnitude reaches the local limitation for LabPC disk and also for local network in the iOMega NAS.
Nuclear Magnetic Resonance:
Mass spectrometry:
Questions and Concerns
Architectural Significant Requirements
Data rotation
Retention
Export Control/Cyber Sec
Access Control
Data transfer latency SLA
Data Consumers (data structure/data model)
Impediments and Blockers
Tradeoff analysis
Alternatives
XYZ
YZX
Sensitivity Points
Risks
Non Risks
Architectural Approaches
Quick-Wins
The criticality of the problem demands that there be at least a temporary solution as an alternative to mitigate the immediate impact of losing sensitive data for the business due to the local storage limit at the current date.
Some questions that can facilitate the analysis:
What type of search criteria is used on historical data?
What retrieval is done on this found data?
What type of processing is done on the found and retrieved data?
Is it possible to parse this data?
Possible alternatives to consider in advance:
Increase the storage capacity in Lab File Server - then move LabPC disk data to File Server
Promote historical data close to the (AWS Landing Zone) ACD Labs domain so that it can be ingested for later analysis in reports
Promote historical data close to the (GCP) Lab-Booster domain so that it can be ingested for later analysis in reports
Promote historical data to the Azure Fabric Lakehouse so that it can be ingested for later analysis in reports
→ Having Azure as cloud destination for the data, a PoC could also be evaluated for a LabPC virtualization (with no connection with instruments), where only Analytics capabilities from the software vendor would be leveraged on the loaded historical data.
Design Solution Proposal
The building blocks in the following diagram are mostly from the Azure Tech-Stack for the sake of exercising concepts.
Apart from the convenience to use Azure Tech-Stack in this technical assessment, given the tight dependency among producers and consumers from the data perspective, the cloud provider has to be chosen strategically.
Azure File Sync: key component for synchronizing file from the Site to the Cloud (Azure)
Target Architecture - R&I
Meeting Notes
[SoW] LabPC - Storage for lab-local data at scale (Olivier SAUSSOL, Mijajlovic, Julie, Tiago Oliveira)
LabPC Storage Study
SoW
Assess the issue
Scenarios (business needs?)
*Pictures* for real case situation (disks at the floor...)
Worst case scenarios
Application data, storage for long term, later data combination cross apps