Requirements

The following requirements define the conditions that must be met/followed for a data set to be considered of acceptable quality:

  • Data completeness, e.g. % of populated data fields in ingested data, required non-null data fields etc.
  • Data formatting, e.g. required format for time + date values, precision for numeric values etc.
  • Data structure, e.g. expected data schema vs ingested data schema
  • Naming convention of data files, data fields etc.
  • Expected volume of data, e.g. minimum rows of data in files etc.


Metrics

The following (non-exhaustive) list of metrics should be captured:

  • Data coverage, i.e. % of populated data fields in single data row
  • Data field rejections, i.e. number of data fields in single data row that did not meet data quality requirements
  • Data ingest rejections, i.e. number of data ingest attempts that did not meet data quality requirements


KPIs

The following (non-exhaustive) list of KPIs should be calculated and available to include/visualize in reports:

  • Data coverage per file/batch, i.e. % of populated data fields in single data file or batch
  • Data coverage per source, i.e. % of populated data fields of all data ingested from a single source
  • Data field rejections per row, i.e. % of data fields in single data row that did not meet data quality requirements
  • Data field rejections per file/batch, i.e. % of data fields in single data file or batch that did not meet data quality requirements
  • Data field rejections per source, i.e. % of data fields of all data ingested from a single source that did not meet data quality requirements
  • Data ingest rejections rate, i.e. % of data ingest attempts that did not meet data quality requirements
  • Data ingest rejections rate per source, i.e. % of data ingest attempts from a single source that did not meet data quality requirements


  • No labels