Data Quality Considerations in Human Resources
Information Systems (HRIS) Strengthening
Samwel Wakibi, IntraHealth International
Organizations rely on data to increase their efficiency today and improve their planning for tomorrow (Rochnik and Dijcks, 2006). Poor data quality results in loss of time, money and customer confidence, and can be cause for embarrassment to an organization. It is estimated that the typical industrial data quality error rate of 1%-5% can constitute a 10% loss in revenue (Redman, 1996). As data are collected, analyzed and translated into meaningful reports for planning and decision-making, data quality problems can occur as information crosses organizational and system boundaries.
Data quality issues have been central to the program experience of the Capacity Project, a USAID-funded global project that helps developing countries strengthen human resources for health (HRH) to better respond to the challenges of implementing and sustaining quality health programs. Experience with issues of data quality has particularly been gained within the Project’s focus on strengthening human resources information systems (HRIS) to support health workforce planning and management. This brief will discuss concepts of data quality and provide examples of the importance of data management specific to the field of HRH, illustrated by the Capacity Project’s experience with HRIS strengthening in developing countries.
Basic Data Quality Concepts
One widely accepted definition of data quality in economics, business and medicine is “fitness for use” now and in the future—in other words, how well data meet user needs and expectations
(Chapman, 2005; Carson, 2000). Data quality describes the state of data, the set of processes to achieve such a state and data accuracy.
For data to be fit for use, they should be free of duplications, misspellings, omissions and unnecessary variations, and should conform to a defined structure (Chapman, 2005; Carson,
2000; Brown, Stouffer and Hardee, 2007).
Data quality addresses: d Accuracy d Timeliness or currency d Consistency
d Precision d Completeness d Relevance.
d Accuracy refers to closeness of measured values, observations or estimates of the real or true value, without political or personal bias and manipulation. In other words, accuracy is a measure of the extent to which the data reflect reality. Guiding questions to achieve accuracy relate to the reliability of data sources and the process of data collection. d Timeliness or currency refers to availability of data when required. Related factors are knowledge about the period when the data were collected, when they were last updated, how long they are likely to remain current and whether they are processed to give information in time to conduct daily business or inform decisions. d Consistency describes the absence of apparent contradictions and is a measure of internal validity and reliability. Guiding questions to assess consistency include the extent to which the same definitions, codes and formats are followed for the same data across different sources. d Precision refers to the consistency of an indicator in producing the same results.
For example, a data collection form with high precision will elicit the same responses if administered repeatedly on a subject.
Precision and accuracy differ in that a measure can be precise without being accurate. For example, a measure can repeatedly generate the same incorrect outcomes. d Completeness refers to lack of errors of omission, such as omitted records in a dataset or a variable without data.
Completeness addresses the question of whether all eligible data are included. d Relevance refers to availability of required details or data. It helps to answer questions relating to the design of the database or