Pet Food Category Sales prediction and Social Media Analytics for a leading provider of personalized digital media solutions

Industry: Media


Client was looking out to perform social media analytics on different variety of pet food categories. They wanted to take this solution to their customers, partners and brands so that they can react to the changing environment very much in advance.

They were looking to solve mainly two major problems:

  • To identify and define the social factors influencing the sales performance and buying decisions of Pet Food categories.
  • To develop a multi factor model which predicts trends in future sales using past sales & social factor attributes

Project Journey


Objective was to perform Data crawling, Text analytics and Data synopsis which included transactional data for past 3 years like ~13K Universal product codes across ~18K stores which was structured data.

Data also include semi-structured and unstructured data like:

  • Weather daily data
  • US Census data
  • E-retailers products data
  • Customer reviews at e-retailer website


To start with, Abzooba did a detailed analysis of the existing data sources which may cause change in pet food demand in market.

We did data crawling on various e-seller sites like Amazon, Walmart, Petco, Petsmart, Chewy to collect customer reviews. Knowledge base was created using Abzooba’s NLP tool XPRESSOTM to customized it for Pet food domain and capture important concepts and perform descriptive analytics. Our Text analytics engine provides for each customer review the following –

  • Mentioned aspect
  • Customer sentiment for the mentioned aspect
    • Negative, Positive, Neutral

Customer sentiment data along with weather data, US census data and transactional data was used to develop a predictive model to predict trends in future sales. Since both transactional data and customer review data were large (~200GB) we used AWS S3 buckets for dumping all the data and AWS EMR and Apache Spark to perform all the processing and analytics. Product attributes and sentiments over time which we got from XPRESSOTM engine were sent to modelling engine and R was then used to develop statistical model by using first 28 months’ data as target time series. To prepare model & predict sales following steps were taken:

  • Perform stationary test for each time series (target and explanatory time series)
  • If not stationary, convert time series to stationary.
  • Select explanatory time series having high correlation (+ve and –ve both) with target time series.
  • Check multicollinearity within selected explanatory variables.
  • Drop collinear variable and use remaining to build the model.
  • Use linear regression to create the model.
  • Predict sales for next quarter

Solution Architecture:

Business benefits:

  • The solution is platform scalable solution and can be implemented across retail categories.
  • 5 pet food sub-categories that covers more than 85% sales are sub-divided into 750 groups based on flavor, packaging, food form and packet size. One model for each group is prepared.
  • Model predicts with more than 90% accuracy for more than 80% groups.
  • Model also provides key driving features for each group.