Rxivist logo

Can we use routinely collected hospital and GP data for epidemiological study of common hand conditions? A UK Biobank based validation project

By Jennifer C. E. Lane, Christian Schnier, Jane Green, Wee L Lam, Dominic Furniss, Cathie LM Sudlow

Posted 02 Mar 2018
bioRxiv DOI: 10.1101/274167

Objective: Routine health records can be of great value in epidemiological and genetic studies if they are able to reliably identify true disease cases, especially when linked to large cohort studies. Little research has been undertaken into whether coding within UK electronic health records (EHR) is able to accurately identify clinical disease cases of common hand conditions. There is therefore a relative paucity of hand surgical research using EHRs due to concerns that cases cannot be accurately identified. The aim of this study was to investigate the accuracy of hospital and primary care coding of routine EHRs for carpal tunnel syndrome (CTS) and base of thumb osteoarthritis (BTOA). Self-reported disease state as recorded in UK Biobank, a large prospective cohort study was also investigated. Methods: Code lists for each condition were generated by a team of clinicians, clinical coders and epidemiologists. All patients recruited to UK Biobank in one geographical region (Lothian, Scotland) where linked primary and secondary care coded datasets available were included. A decision-making algorithm was designed to define an administratively-confirmed or a clinically confirmed disease case. Patient electronic medical records (EMRs) were independently interrogated by two clinicians and inter-observer reliability calculated. Results: Of the 17,201 Biobank participants in NHS Lothian, 268 had at least one code for CTS and 82 for BTOA. For CTS, 159 cases were confirmed, 100 cases had insufficient information and 9 cases were refuted. Excluding missing data, the positive predictive value (PPV) for true clinical disease cases was 96% for incident disease (90% for prevalent disease; overall 94%). For BTOA, 27 cases were confirmed, 46 cases had insufficient information, and 9 cases were refuted. Excluding missing data, PPV for incident disease was 81% (prevalent disease 56%, overall PPV 75%). Interrogation of the disease cases with insufficient information noted a large proportion arising from primary care and self-report coding systems. Analyzing code combinations revealed that secondary care codes had the highest PPV for CTS and BTOA, emphasizing a more robust evaluation of PPV for patients requiring hospital based care. Overall, inter-observer reliability was good, with agreement in 90% of cases (Cohen kappa of 0.79) for clinical disease cases in CTS and agreement of 98%, (kappa 0.96) for BTOA. Conclusions: We have demonstrated that coding within UK Biobank is of sufficient quality to enable use of the resource for epidemiological and genetic research into common hand conditions, and that EMRs can be used for manual validation of UK health coding systems. Further work is needed to consider potential regional and interdisciplinary differences in coding practice, in strategies for dealing with missing data in EHRs, and to validate coding of common hand conditions in primary care.

Download data

  • Downloaded 248 times
  • Download rankings, all-time:
    • Site-wide: 108,292
    • In epidemiology: 4,496
  • Year to date:
    • Site-wide: 144,784
  • Since beginning of last month:
    • Site-wide: 142,350

Altmetric data

Downloads over time

Distribution of downloads per paper, site-wide