Letting BigQuery take the data strain
The Lightricks apps ingest and analyze a lot of largely mobile data (such as user behavior) from customers, often in near real time. “We have business intelligence use cases, to optimize online ad campaigns, for example, where we need to create reports on a few terabytes of data and act on it almost instantaneously,” says Gilboa. “BigQuery, together with Dataflow, automates the ingesting and analyzing of these. This makes the ingestions of thousands of events per second very easy to perform at scale. The data connected to user behavior goes through Dataflow and is then ingested into BigQuery. It is automatic and allows us to look at the data from different angles at a very high scale.”
Lightricks is a big fan of the autoscale feature of Dataflow. With previous third-party systems it had used for uploading data, it often found data blocked because it had reached its limit. “We’re always increasing the number of apps and events we run,” Gilboa says. “We’re ingesting around 10,000 events per second or a billion a day. And we now have a platform that supports that growth by design.
“We can, for example, find a great user who came from a particular marketing campaign and then, using Dataflow, instantly target more users via the same route. Automating this work takes the strain off our in-house team.”
Letting a small team think big
Lightricks often refers back to the possibilities that Google Cloud opens up for its small workforce. Vika Kam, VP Engineering Creators Services and Community at Lightricks, reveals: “After only a few weeks with just a few engineers and DevOps people, we had a working Kubernetes infrastructure on Google Kubernetes Engine. We would never have achieved this with our old system; we always had problems and needed more people to maintain the infrastructure.”
Most of these issues resulted from having no separation between storage and compute, explains Inger, and infuriatingly they always seemed to arise at inopportune times. “Once we were trying to close a funding round and we needed loads of reports sent to future investors and the cluster shut down, and one Christmas when everyone ran queries to fetch data and make business decisions, it couldn’t cope either.”
Previously the infrastructure was an obstacle to getting things done, now it’s an enabler, says Inger. “All these services reduce friction, allowing our developers to focus on delivering business value to our users instead of maintaining our infrastructure. We leave that to Google Cloud now, and what’s great is Google Cloud is constantly adding features to GKE, and we get to use them without having to build them.”
Amplifying machine learning
GKE has enhanced Lightricks’ machine learning capabilities too. Lightricks began experimenting with training machine learning models in the cloud in 2014, but as the need grew, the team had problems with compute availability. This changed following the move to Google Cloud. “The compute capacity we get from Google Cloud is much better,” says Ofir Bibi, VP Research at Lightricks. “Compute resources are there whenever we need them. We initially created another Kubernetes cluster and used that alongside another cloud vendor, but the team shifted out of choice to Google Cloud, because it was smoother and more available.”
Inger adds that GKE is cost-effective too. “GKE means we now have fewer people investing their time in configuring clusters and upgrading them,” he says. “In days you can have a cluster set up, and be building Docker images from their repositories and set deployment pipeline.”
Lightricks has worked with tech consultancy DoiT International since 2016, and Bibi says DoiT has been particularly supportive of its machine learning program. “When, for instance, we wanted to create a cluster on GKE and attach it to our machine learning systems, DoiT ensured that our data lakes that we use for research and our on-premises compute functionality synched. We have an elaborate machine learning construction and DoiT provides ongoing support for everything from architecture to problem-solving.”
Lightricks’ marketing, product optimization, and recommendation engine teams are all creating machine learning models on Compute Engine, but they are slowly migrating to managed services on Vertex AI so that they can scale their models even faster. These machine learning models will recommend posts to serve to users on the app feed, so that they are fed the most relevant templates and tutorials and ingest analytics events to optimize user interactions such as notifications.