Taming the Data Monster – the right step towards meaningful analytics


Rapid digitization of healthcare has unleashed an aggressive growth in patient data. IDC predicts that healthcare data will grow at a CAGR of 40%1 reaching 2314 exabytes in 20202. Enhanced adoption of information systems, explosion of digital imaging and proliferation of data generated by other electronic devices are key contributors to this growth. According to a report by McKinsey – In 2005, only 20% of physician offices and hospitals deployed an EMR, a figure that in 2011 grew to 50% for physicians and 75% for hospitals. In addition, 45% of hospitals participate in some kind of Health Information Exchange3 connecting them with their peers. But is all this data usable? How can we mine all of this data for meaningful analytics, benchmarking and decision-making?

All data including medical data can be broadly categorized as structured or unstructured.  Structured data denotes those data forms that are identifiable in designated fields such as dates, patient names, identification numbers and diagnosis codes. This kind of data is easier to collect and exchange between systems, because it is standardized, pre-defined, computer-readable and typically quickly accessible from a database4.  But this kind of data is only 20%5 of the equation – a bigger nut to crack is unstructured data – that can include notes or handwritten information on unstructured paper forms, audio voice dictations, email messages and attachments, chatter on organization’s social media sites and typed transcriptions4. In healthcare, unstructured data accounts for up to 80% of the total data5. Health Story Project estimates that some 1.2 billion clinical documents are produced in the U.S. each year, and about 60% of these contain valuable patient care information “trapped” in an unstructured format4.

Another key challenge with the data is that patient data continues to remain dispersed among several information systems which often may not be connected or interoperable. Trapped structured or unstructured data in disparate formats, separated repositories – sometimes even in paper mode renders itself virtually impossible to be aggregated and analyzed, proving a hindrance in performing meaningful analytics. In addition, when standards are not followed and proprietary ways of storing and tagging are deployed, it aggravates the challenges of sharing and accessing critical patient data across systemic siloes, sites and organizations. In an era of Accountable Care Organizations and Collaborative Care networks, lack of sharing data may also come in the way of successful clinical outcomes and overall organizational efficiencies. When data cannot be shared among organizations, benchmarking across sites for a broad array of metrics remains a daunting challenge.

On a positive note, Healthcare providers have actively been looking at tools and technology which can liberate data from its current less useful stage and can help tame it for meaningful analytics and insights. Enterprise Content Management strategies and Vendor Neutral Archives are big game changers and foundation stones to enforcing enterprise-wide standards, data consolidation and aggregation of structured as well as unstructured data.

Let the data be our guide – but before that, let’s think about how to liberate this huge volume of valuable patient data from its current dormant state to one that makes it meaningful – so clinical, operational and financial decisions in healthcare could be speedy and more informed.



1.      http://www.cio.com/article/2375691/healthcare/healthcare-why-health-data-is-a-big-data-challenge.html2.      http://www.cio.com/article/2860072/healthcare/how-cios-can-prepare-for-healthcare-data-tsunami.html

3.       “Big Data revolution in Healthcare”, January 2013, McKinsey and Company

4.      Unstructured Data in Electronic Health Record (EHR) Systems: Challenges and Solutions, October 2013, DataMark

5.      IDC FutureScape: Worldwide Healthcare 2015 Predictions


Leave a Reply

Your email address will not be published. Required fields are marked *