product manufacturer

Built a Hadoop Data Lake to integrate data for faster analysis for a large product manufacturer

Rendering abstract futuristic background with glowing neon blue lights
Year 2018
Role Website / Mobile app
The Problem

A large product manufacturer wanted to understand the behavior of their products in the field and ultimately improve their reliability by predicting component failure and taking proactive action to repair/replace it.

This initiative would result in a saving of millions of dollars and time for their business operations. The analysis was difficult because of data is in Relational Data Base Management System (RDBMS) with product and components metadata, file systems with historical sensor data, streams of real-time sensor data, etc. Solving the problem required the ability to do analysis of all the data by combining data from all these sources.


The Solution

  • Our full stack data science team designed and implemented a modern data lake using Hadoop. We cleaned and processed data was then loaded into a traditional data.
  • We designed a solution to capture, store, process the data in Hadoop and integrate with spark.
  • The new architecture delivered great performance on interactive processing and analytics performed by the data science team, while provided a richer, integrated data set for aggregated analytics using the data warehouse.


  • Built a cost effective solution to the client.
  • Provided great performance for interactive data processing and analytics functions.
  • Allowed for new Methods and Machine Learning for predictive maintenance.