Separating Trusted Versus Malicious AI Agents

7 November 2025|

By Andrew Feutrill

How to Tell Trusted AI Agents from Malicious Ones

Agentic AI is an overused and sometime poorly understood term in our AI-hyped world. In this blog, we help to dissect what Agentic AI actually is, and how it has evolved over the last two years.

Agentic AI supercharges attack potential

An emerging challenge in fraud is mitigating malicious actors that are building and deploying their own AI agents to automate driving web browsers to achieve their objectives, such as account takeover, impersonating real users and creating fake accounts. Therefore, determining whether AI traffic is driven by an AI agent, and whether the AI can be trusted, is an important challenge for organisations to understand, in order to mitigate the risk and impact of fraud or cyberattacks.

At Darwinium we have developed a variety of techniques to detect and protect customers from threats, using information captured from Device, Network, Timing, Biometric and Journey data sources. From these inputs a suite of detections and models have been developed to detect complex AI-based threats, contextualise their observed behavior and understand the intent of digital traffic.

What are the benefits to malicious actors?

With these advances in technology, and the broad availability of tooling to be able to create and deploy agents, there is a much lower barrier to entry for threat actors to utilize AI agents and scale out fraud operations. Fraud actors are able to automate pre-existing multi-step processes into one step, via a consistent set of instructions, to optimize their current operations.

In particular, Anthropic’s most recent threat intelligence report outlines attack types that have been observed using their AI, such as:

Building and deploying AI agents to run romance scams
Validating and selling card information at scale
“Vibe Hacking” attacks using AI to write code and run scam operations such as implementing man-in-the-middle proxies for phishing campaigns
Using agents to actively scrape information and generate profiles of potential victims

These discoveries demonstrate that AI-based threats are not only possible, but actively being explored and scaled to achieve malicious objectives. As fraudsters embed agentic AI across all stages of their workflows the increase in AI-based cyber fraud threats is inevitable due to the democratisation of skills.

What are AI Agents?

The defining characteristics of AI agents is their ability to make independent decisions to achieve their objectives without any human intervention.

An AI agent is an entity that interprets its environment, takes actions autonomously to achieve goals, and may improve its performance through feedback into algorithms or by acquiring knowledge.

Agentic AI is a class of artificial intelligence that focuses on autonomous systems that can make decisions and perform tasks with or without human intervention.

Within Darwinium, we have developed agentic AI capability to provide utility to our customers that:

Simulates a malicious AI agent to commit fraudulent tasks and understand fraud exposure and
A co-pilot to independently assist fraud analysts and scientists to improve their investigations, develop fraud detection strategies and accelerate remediation efforts.

How can we detect malicious Agentic AI behavior?

Using measures from different data sources we are able to detect unusual individual data points and malicious patterns, even when actors are trying to obfuscate malicious intent and behavior.

Our detection technology relies on data from the following sources:

1. Device

Darwinium provides device fingerprinting and analysis that extracts key pieces of data from devices that touch customer endpoints. With the emergence of AI agents, we see both self-declaring and evasive AI agents. For example, we see Self-identifying user-agents such as OpenAI Mozilla/5.0 (compatible; GPTBot/1.0; +OpenAI Platform) Mozilla/5.0 (compatible; OAI-SearchBot/1.0; +OpenAI Platform).

However, we see evasive behavior from AI agents, such as:

Spoofed or generic UAs that claim to be from a real browser but don't behave like one.
Headless/automation stacks (Playwright/Puppeteer/Selenium/Patchright) with: Minimal or missing plugins/mimeTypes WebGL/Canvas/Audio fingerprints pointing to software renderers (e.g., SwiftShader, llvmpipe) navigator.webdriver or other automation leaks
Inconsistent signals: mobile UA with desktop screen/touch profile; UA-CH contradicts UA.

2. Network

Diving deep into the network information when users touch endpoints that Darwinium monitors we derive value from the headers, protocol traits, and fetch patterns, providing strong differentiation of different types of agentic traffic.

Some examples we see of transparent behavior from network readings are the header specification containing a user agent that is consistent with claimed browser, the cadence of crawling is consistent with benign bots/agents, and originating from known and trusted ASNs/IP ranges.

We see evasive network behavior manifesting as JA3/JA4 TLS fingerprints typical of scripted stacks (e.g., curl, Go, python-requests), Proxy and VPN usage, rotating through residential ranges, request anomalies such as calling JSON APIs without loading the HTML shell or analytics, out of order fetches, odd or stale Accept/Accept-Language/Encoding combos and HTTP/2 header ordering that doesn't match the user agent family.

3. Timing

Using timing data we observe programmatic clients navigating with regularity, rather than more random timing differences from real humans. The non-evasive behavior that we see are predictable, steady request rates without any attempt to simulate human pacing and reasonable gaps between page fetches.

Malicious behavior can be hidden with the following strategies we’ve encountered:

Rapid navigation through endpoints with millisecond gaps between steps
Low variance in the inter-request timing
Traffic consistent at all hours of the day without circadian patterns

However, we do see some non-malicious actors with this behavior and therefore correlate other pieces of intelligence to determine the likely intent of the traffic.

4. Biometric

Taking readings from the device telemetry separates browsers driven by people from scripts. We see transparency where we don’t see biometric data recorded, i.e. not intending to impersonate human data, and with UI interactions (crawl, fetch, or API calls) being rendered as sparse and mechanical.

However, there is a rich variety of obfuscation techniques that have been observed such as keystroke timing that's uniform or unrealistic, mouse movement that's absent or obviously synthetic (straight lines, constant velocity) that is measured as odd scrolling behavior, clicking elements that aren’t rendered for human view in the UI and repeated interaction patterns across many "devices."

5. Journey

Viewing behavior through multi-part interactions rather than just single events gives us the ability to view behavior in aggregate to identify unusual patterns. With transparent agents having purposeful, transparent discovery patterns such as sitemap/robots-driven, consistent depth, limited forms/post-auth actions.

Malicious behavior we observe includes:

Goal-seeking shortcuts like jumping straight to high-value endpoints rather than navigating through more common paths
State-blind actions, e.g., submitting forms without prior page loads
Fetching non-essential data at odd times

How does Darwinium protect against these types of threats?

To counter this new threat landscape and protect our customers we have developed capability to identify security flaws and threats in real-time.

Detections

Below are some examples of implemented detections that have been built into the Darwinium platform. The list is not exhaustive and is continually being refined and extended to capture and characterise AI agent behavior.

The following example detections produce signals of AI agent behavior or automation tools which are wholly or frequently used by AI agents:

BROWSERUSE - Detected browser-use AI agent is in use
RTRVR_INSTALLED - Detected Rtrvr AI agent extension is installed
RTRVR_RUNNING - Detected Rtrvr AI agent extension is installed and running
PLAYWRIGHT - Detected playwright browser automation is in use
SELENIUM - Detected selenium browser automation is in use
SELENIUMBASE - Detected seleniumbase browser automation is in use
PUPPETEER - Detected puppeteer browser automation is in use
WEBDRIVER - Detected generic webdriver automation is in use

In addition, we have developed features which identify unusual traffic and can assign intent to likely AI-based threats. Graph/network analysis-based features to highlight risky traffic that has similar access patterns, and relations to other discovered agents in how they are linked with other users and traffic patterns.

Journey analysis features have been developed to analyze the navigation paths and timing information of different users interacting with websites to detect AI agent traffic. We can create rules using the probabilities output from the model to alert and incorporate into analyst workflows to proactively identify likely agent behavior.

Final Thoughts

For digital businesses operating in the age of AI agents, deciphering human behavior is no longer sufficient. They now need to understand the trustworthiness of AI agents acting independently, or on behalf of customers/malicious actors, classifying an interaction before it jeopardises either customer accounts, or business reputation / bottom line.