Centralized Big Data platform for automated data processing for a Healthcare giant

Industry: Pharmaceuticals


Our client is an American Healthcare company distributing pharmaceuticals at a retail sale level and providing health information technology, medical supplies, and care management tools.

The company had sales of $122 billion in 2012.

Problem Statement

Client wanted three major things from Abzooba.

  • Business Optimization Metrics which were available only internally to client , now they wants to make this data directly available to their customers. Along with self-service capability offered to customer.
  • Insightful metrics on segments like Patient Adherence, Provider Management, Finance Management, Pharmacy Inventory Management and Workflow Management. It had also an urge for ad-hoc, cross customer advanced analytics reporting in place for Business Users.
  • A centralized big-data platform for automated data processing to support increasing volume of data as customer list grows along with modern technology, appealing visualization and scalable infrastructure.
  • Robust user management and data security, with Active Directory integration and tenancy control.

Project Journey

A complete Big-Data platform with BI and Advanced analytics solution including Data Lake.

  1. Web Portal, consisting of the following components:
    • User Portal – will serve as a single entry point for all customers. End-users of customer will login here in order to access specific dashboards. Client sales representatives will be able to login here to access the generic demo dashboard. Admin users will login here, and get redirected to the Admin Page
    • Admin Page – will enable client System Administrators to do the following:
      • manage customers (add / modify / remove), and provision them appropriately
      • In subsequent phases, admin dashboards may be added to monitor system activity (dashboard access by customers, ETL job performance, database performance and load, etc.), manage backups and disaster recovery, and other such system administration tasks.
  2. Business Intelligence (BI) dashboard created using Tableau to achieve the following objectives:
    • Enable MPS and chain customers to access reports and dashboards as defined in [1], [2]
    • Implement authentication using Microsoft Active Directory
    • Implement multi-tenancy, to ensure that each customer can view only their data
    • Implement a generic demo dashboard, using generated / de-identified data
  3. Data Warehouse – a Data Warehouse will be implemented on Microsoft Azure, using PostgreSQL. This DWH will contain aggregated data, segmented by customer, to feed the dashboards. REST APIs will be provided to enable client programs to access the DWH.
  4. Data Lake – a Data Lake will be implemented using HDInsight, Microsoft Azure’s implementation of Hadoop. This will contain transactional data across all customers, and will provide a platform for our client’s analysts to extract and analyze data in subsequent phases. The PostgreSQL DWH will be fed from the Data Lake.
  5. Non-functional
    1. Multi-tenancy
    2. Active Directory integration
    3. Scalability
    4. Concurrent user support
    5. SLA adherence and overall performance


Abzooba approached the solution in phases.

Phase I

Phase I was about data ingestion, Data Lake and BI Schema design and ETL.

Data from existing on-premise system (mainly Oracle) was moved to cloud cluster. The cloud vendor was Microsoft Azure. Abzooba installed PostgreSQL DBMS on Azure and design a BI schema.

Data Lake was formed in HDInsight – which is a Hadoop offering in Azure cloud. HDInsight was chosen to hold cross-customer raw data after preliminary cleansing and connect to R/Python for Predictive modeling as apart of Advanced Analytics. The data lake also served the purpose for ad-hoc reporting across different customer data.

To transfer and engineer data, Abzooba chose market top ETL tool called Talend. Talend could make the Data Lake and BI schema population lot easier by offering advanced data source connection adapter and various components to handle complex business transformation.

In the process of choosing Talend, Abzooba did a research on comparison between top 3 popular ETL tools like Talend, Pentaho and DataMigrator.

Phase II

Phase II was planned to develop advanced analytics from HDInsight and BI reports in Tableau.

Advanced Analytics included some Predictive reporting like Patient Adherence probability or Geographical sales production forecast. The models were built on Python and integrated to Tableau for visualization.

Tableau reports needed a user interface and user control mechanism by which tenancy control is maintained and customer data gets security by RLS (Row Level Security). Abzooba demonstrated integration of Active Directory with Tableau server and implemented Single Sign On for the entire user base for this purpose.


Business Impact

  • Fully automated business process flow
  • Centralized data management and single point of truth
  • Better Customer experience – near real time customer performance dashboard along with cross-customer reports
  • Self-service BI and automated data feed generation for internal business users of the client.
  • Average accuracy for adherence prediction is 88% for selected parameters