Building a secure and flexible data pipeline
In 2019, when Rafi joined the company, CB4 did not have a robust central data pipeline to extract, process, and transfer customer data to the machine learning environment. In addition, the team was using different scripts and processes that were custom-built for each client, which got the job done, but held back growth during a time when CB4 was greatly expanding its customer base.
Rafi and his team decided to leverage Google Cloud tools to develop a more robust solution. They selected Apache Airflow with Google Cloud Composer to generate a scalable and agile data pipeline by orchestrating the flow of data reliably, before funneling that data into the machine learning models. Replacing legacy processes with an entirely new tool was a difficult task, and DoiT International helped ensure a smooth implementation.
“We have to move a lot of data, and move it quite fast, and our data pipelines need to be robust to scale with our growth,” explains Rafi. “Google Cloud Composer helps us achieve a fully automated data pipeline we can use for all of our customers, and DoiT International helped ease the difficult implementation of the tool.”
Besides flexibility, the new system also helps CB4 ensure that data is stored securely, while complying with international data protection regulations, such as GDPR. In the data repository on Google Cloud, Rafi and his team can apply parameters, and geo manage data to certify that EU data is stored in the EU only, as required. “We need to trust that data remains in the right place,” explains Rafi. “Our new data pipeline helps with that.”
Optimizing the cloud spend
To analyze all this data, CB4 has developed ML models in-house, using Google Compute Engine to host the analysis infrastructure. As heavy users of Google Compute Engine, spread across different regions, CB4 was faced with a growing cloud spend, and prompting Rafi and his team to search for ways to track and optimize that spend.
“If we’re going over budget, that’s usually because something is wrong in the stack, but without transparency, we can’t identify these budget issues, wasting money,” says Rafi.
Today, CB4 uses DoiT International’s Cloud Management Platform to analyze and predict cloud costs, balancing the company’s growth with cost-efficiency. “Our cloud spend is going up, because we’re growing,” says Rafi. “DoiT International helps us track more precisely where our money goes, and identify and resolve issues quickly, saving money.”