Document Retrieval Using Deep Neural Networks


Industry: BFSI

Introduction

Our client, one of the world’s largest investment management organizations, wanted a document retrieval system that would enable them to execute contextual search to increase the precision of finding the right document.

The business objective was to quickly create articles and blogs from millions of internal as well as external documents (pdfs, business & financial news, reports, etc.) by querying the corpus and retrieving relevant paragraphs and sections from reference materials.

The client also wanted a mechanism for summarizing large documents and html pages found from the above search.

Project Journey

Approach
  • Data Crawlers were used to extract data from internal and external websites (e.g. Bloomsberg, BlackRock, etc.)
  • Convolutional Deep Structured Semantic Models (CDSSM) which is a type of Deep Neural Network (DNN) was used.
  • Combination of CNN (Convolutional Neural Networks) & RNN (Recurrent Neural Networks) along with VSMs (vector space models) made up the DNN.
  • A continuous feedback framework was implemented in the system, allowing it to learn and mature with every search.
Input Data Set
  • Around 2 million internal documents (pdfs)
  • Numerous internal as well as external websites were crawled and the data was added to the corpus
  • Various charts, images, videos
  • A random sample of these data sets were used to train the DNN model.
  • User feedback was used as a continuous learning mechanism for the DNN.
Solution Architecture

BUSINESS IMPACT / BENEFITS

  • Our client was able to retrieve relevant documents from a pool of more than 2 million documents in fractions of a second.
  • Our solution helped them to reduce the average time taken to publish articles by more than 90%.