Moving Compute to Your Virtual Private Cloud (VPC): Optimizing Visibility, Security and Privacy for Security and Fraud Teams

Ananth Gundabattula

11 February 2024

Darwinium unites digital security with fraud prevention and is built with data privacy and cost optimization as two core foundational differentiators. Darwinium has pioneered moving fraud and risk decisions to the perimeter edge, providing visibility of user behavior across entire digital journeys; extending the capabilities of traditional solutions that operate predominantly on individual web pages. This also enables better latencies, better privacy control and distributed processing for making effective decisions. Darwinium is further expanding this paradigm for offline processing in the Enterprise as well by pushing compute further into the customers' private networks of choice, thereby helping them to maintain a better security, privacy-aware, regulatory, and cost-aware posture. The following sections explore each of these aspects in finer detail.

Cost:

Pushing compute to where data resides is a cost differentiator for multiple reasons. Customers need not pay for data egress costs as opposed to raw data being pushed to Darwinium to extract intelligence from it. Furthermore, enterprise customers have account agreements with Cloud providers that can facilitate discounted compute costs. Darwinium further builds upon constructs like spot instances in AWS which are much cheaper thus optimizing for costs for certain workloads. This has its own challenges though. As spot instances could get revoked when the lease expires, it necessitates data processing frameworks to be resilient to failures. Darwinium designed its offline processing frameworks to be resilient to instances getting lost for reasons like the lease for the spot instance expired or hardware failure. Moving compute to the customers networks also leads to more transparent pricing besides lower friction from stake holders. Thus, moving compute to data helps businesses to leverage existing cloud licenses to maximise efficiencies, avoid any egress costs as well as transparent pricing while effectively arriving at the same outcomes.

Privacy:

By moving compute to where the data is, we are more aligned with CDOs and data stewards who contribute to the regulatory aspects of data management. Since data is not being moved around, it is one less aspect to deal with when designing systems while extracting the maximum value of the data at the same time. For enterprises to make use of Darwinium capabilities; data value need not come at the cost of privacy as we believe extracting value from data does not need the entire raw data but only the insights and intelligence that is sufficiently anonymised to make effective and fast decisions. With Darwinium, Enterprises are not forced to choose between privacy and data intelligence because of data governance.

Battle for the right hardware:

With all the acceleration in the AI use cases for the Enterprise, hardware accelerators like GPUS and TPUs are becoming crucial differentiators in the AI dominance race. This sticky nature of the hardware profile arises because of multiple factors ranging from open-source

libraries supporting only a particular hardware profile, to big cloud vendors innovating at the chip level in their distinct way and building on their own strengths. Having flexibility and access to the hardware of choice is thus a core differentiator from a company’s perspective. It is more strategic that we help our customers win this race by enabling them to provision the compute on the hardware of their choice, which is better optimized for the problem being solved. Better yet, capabilities to provision mixed workload clusters that can contain both GPUs and CPUs and process the entire machine learning pipeline execution under the same cluster definition will unburden the infra teams. This leads to simpler stacks and compute cluster definitions where the ecosystem permits it.

Unlocking more use cases:

Bringing compute to the data unlocks a new set of use cases for an enterprise. This could help in scenarios where generation of intelligence requires calls to the internal API of the enterprise that cannot be exposed to a SAAS vendor. A classic example is when a data point needs to be enriched in an intermediatory data processing step from a call to an API that is hosted by another department in the Enterprise. Many a times, a department that is owning the API cannot expose the API out of the corporate firewalls for regulatory or governance reasons.

Another use case is when enterprises need to collaborate to build a machine learning model development over their combined datasets and use the resulting model in each of their environments or within their common SAAS vendors inferencing stacks or CDNs. The ability to push compute across to each of the enterprise customers and build a machine learning model using federated learning approaches is enabled by the ability to push compute to each of the customers' environments by Darwinium.

Hybrid cloud:

Enterprise customers see value in not having a cloud provider lock-in for multiple reasons. Providing a mechanism for our enterprise customers to bring their own cloud provider helps lower time to provision and execute a workload. Not all data assets reside on a single cloud storage layer in this approach, and having a flexible mounting option to access the data and attach the compute of choice is great flexibility to have in the next generation of Cloud-native architectures. This is also important for data scientist personas where tools of their choice are only available in certain cloud vendor ecosystems.

Orchestration, the secret sauce:

Darwinium builds upon a flexible orchestrator as a key enabler for these architectural patterns. Besides effective scheduling policies, the orchestrator shoulders additional complexities related to distributed processing and resuming from a checkpoint while resuming from a failure. Another important aspect of a flexible orchestrator is to decouple execution contexts amongst various runs as well as across customers. This allows for non-intrusive parallel processing wherein a single failure for a given customers pipeline does not interrupt the execution across all remaining customers while executing in a SAAS context.

Darwinium unifies all the above flexible architectural patterns by taking the approach of pushing compute to the data for both real-time and offline processing. This helps our customers to have full visibility of the intelligence spectrum and derive maximum utility from the data while defending their perimeters in their digital ecosystems.