Data Engineering

Build the infrastructure to get data where it needs to be in your organization.

Data engineering unlocks the value in data by getting it where it needs to be to take action. We can build the digital infrastructure you need to unlock the value in your data, or give your existing team the boost it needs to move to the next level.

Our Approach

The organizational context is critical for data engineering; there is no one approach that works for every organization. Factors we consider include existing infrastructure, the nature of the data you deal with, where data is currently stored, who needs access to the data, and how data will be used.

Build versus buy is a core decision in any data engineering project. All the major cloud providers have extensive data engineering offerings, such as Google Cloud’s Dataflow and BiqQuery, and AWS’s Redshift and EMR, and there are many other vendors that either build off the cloud services or offer their own competing solutions.

Another major architectural decision is choosing between batch or real-time processing, or a mixture between the two. Spark is a fantastic tool for batch jobs. For real-time processing Kafka and fs2 or Akka streams may be preferred. In both cases cloud offerings can be used if the organization is going for a buy rather than build approach.

One thing that doesn’t change between engagements is our emphasis on data quality. The quality of the decisions made from data is only as good as the quality of the data, so we always make verifying data quality a core component.