There are a series of checks that are run during the process of collecting, importing and processing indicators before exporting to the Community Insight tool.
- We apply stringent database checks to ensure that only certain kinds of aggregation and apportion processes can be applied to certain types of data.
- Where data is published at multiple geographic levels we run validation checks to ensure that our derived aggregations match up to published data.
- In order to validate the store procedure code developed in MySQL, we perform the same aggregation approaches across three different statistical packages (SQL, STATA, Python) on a large subset of our data to ensure they each produce the same results.
- For sum aggregation, we run checks to ensure the component areas add up to the aggregated area.
- For apportioning, we run the reverse check to ensure that the count data for the areas we have produced adds up to the parent area.
- For apportioning of derived rate data (data constructed from a numerator and a denominator) we run a check to ensure the derived rate produced is the same for the area we have apportioned to and the parent area we have created the data for.
- Where data is apportioned to smaller areas by copying from larger geographies to smaller geographies (for data that is categorised as ratio/score/life expectancy/rate or other data types where the numerator or denominator is not published) we check that the data value is the same for the larger and smaller area.
- In order to ensure consistency between the reports, maps and published data, we check that the derived indicator definition (the numerator and denominator used to derive the indicator) are consistent across our database, live site and align with approach taken by data suppliers where relevant.
We also run a series of automated checks to supplement the Quality Assurance processes outlined above. These automated checks are designed to flag up potential issues with data loads at source ensuring we catch issues early in the data loading process.
The following checks are run:
- We apply checks to ensure that all indicator values within a time-series are within within 5% of previous time series value (to capture unusual fluctuations in a time series).
- We set maximum value thresholds for indicators to ensure that data values do not exceed the maximum value (e.g. a percentage value cannot exceed 100%)
- We also set maximum values across area types e.g. ensure that LSOA values cannot exceed LA values and LA values cannot exceed regional or national values for each indicator.
- We run checks on all exports to ensure that that data is exported for all standard geographies (Output Areas, LSOAs, MSOAs, Local Authorities, Government Office Regions
- We run checks to ensure that scale factors are applied correctly and in accordance with the indicator type e.g. rates per 1,000 have a 1,000 multiplier applied.
- We run checks for non-standard characters and null values to ensure these are correctly dealt with in accordance with dataset metadata.
- We run checks to ensure that each indicator has an entry in our databases in the following data tables: custom area data, standard area data, metadata. This ensures that indicators displaying on the map have associated underlying data and metadata which appears in the data pop-ups and dashboards.
We run a series of manual checks on report outputs each time these are updated. The following checks are applied.
- Proof read the commentary to check alignment with databoxes and graphs
- Check values that are components of a whole add up to 100%
- Check for notable fluctuations in time series
- Check for consistency between data in boxes and graphs
- Check data on reports is the same value as data on the maps
- Check metadata (indicator name / data source / time-period) is consistent across maps & reports.
- Check consistency of related datasets
- Check that where a dataset is updated there is a different value now than previously
- Ensure values differ for main areas and comparators
- Check consistency between titles and data
- Check all charts appear in report
- Check for N#A, #REF values in templates
- Check for formatting that affects chart layout (e.g. charts spilling over pages)
- Check for missing values in all charts and tables
- Run checks on all complex formulas in templates
- Ensure that all formulas link to an indicator code that is present in the relevant data and metadata tables
- Ensure formulas refer to the relevant indicator
- Document comparisons are run between previous reports and current report to ensure changes that a) all updated indicators are changed in new report b) changes only occur where data is updated c) the magnitude of change is as expected
- Reports are generated for selected standard areas (LA, MSOA,LSOA) with data in the reports compared against source data