Is your Bot Solution Causing More Harm than Good?

15 February 2022|

By Ben Davey

The Future of Cyber Security - Is Your Bot Solution Causing More Harm Than Good?

A great debate rages in cyber warfare and robotics about how much autonomy should be given to machines that can do great potential harm. Are we missing warning signs for bot solutions designed to protect our websites?

Bots - Butterfly Effects

In our previous blog post, we discussed the evolution of bots over the past 10 years. This follow-on post will take the topic further by discussing when things go wrong for a bot defence solution. Bot prevention techniques generally focus on device fingerprinting, behavioral signals, IP velocities, and outside threat intelligence. However, this approach can miss key opportunities to link intelligence together, focusing instead on point in time markers to differentiate between people and bots.

Let’s examine some typical bot solution responses to a suspect bot to understand where the potential vulnerabilities can arise:

CAPTCHA – Invented in 1997 by a team at AltaVista, there are more than 20 CAPTCHA vendors now with the largest being Google reCAPTCHA. During their evolution, it is true that not all CAPTCHAs were created equal. Second generations certainly provide more usability improvements so that consumers are not stuck in the “is that a traffic light or road sign” quagmire.
Rate Limiting –A strategy for limiting network traffic, it puts a cap on how many times an action can be repeated in a certain timeframe, e.g., into an account. Typically used for brute force attacks, DDOS and web scraping. This slows web sessions down but has no way to distinguish good and bad bots.
Proof of Work – An alternative to CAPTCHA, it introduces a challenge by requiring the browser to solve a mathematical puzzle using computing power. This makes the cost to bots more expensive, forcing them to run a programming language, whilst remaining invisible to most people. The downside is that there are still concerns over the time taken to solve these puzzles on slower machines, interrupting genuine user experience.
Queueing/Time Boxing – Queuing allows the bot vendors to analyse traffic and control virtual queues. Time boxing is similar but allows configuration on when to block traffic at specific peak times. Often good bots and aggregators are designed to act during the night when human traffic is lower. Time boxing ensures that during times of popular access, such as daytime hours, most of your infrastructure is reserved for real people.
Honeypot – As the name suggests, this is a snare to lure bots into revealing themselves. This approach essentially adds a hidden field that a human would not see (for example by using the same font colour as is used for the background), but the bot is tricked into completing. This is done via CSS or JavaScript in the form. Honeypots have a clear advantage over CAPTCHA as they are not seen by genuine users. The risk is that the honeypot, once identified, can be itself fingerprinted and circumnavigated by the bots. For this reason, Honeypots are generally used as an additional security layer and not in isolation.

Bots: A Reject Inference Problem

Reject inference is a contentious topic, puzzling data scientists and statisticians ever since the first credit scorecard was invented in the 1950s. The challenge here is that in any type of calculation or algorithm, when an event is declined you cannot know with any certainty whether it was truly bad or not. Statisticians attempt to resolve this by mathematically inferring what would have happened via models. But it is still an educated guess. The same issue happens for bot vendors: how do you know it was indeed a bot if you block that event upfront?

Many bot solutions claim that they operate at an incredibly low false positive rate, as low as 0.01%. As per Benjamin Disraeli, we must be cautious; ‘lies, damned lies and statistics’. What is really being measured when false positive stats are presented? One measure could be the percentage of transactions that fail to solve a CAPTCHA. This is not a good measure, because the transaction might fail for several other reasons rather than being a bot, and therefore indicate a higher than recorded false positive rate:

The CAPTCHA did not load due to performance or availability issues. How do you measure the performance of a third-party CAPTCHA? It is difficult unless you have your own bots to test performance.
When the user has an adblocker and the CAPTCHA loaded as a third-party widget, the CAPTCHA gets blocked by the adblocker. As much as 43%* of internet users worldwide are using adblockers. This means using CAPTCHA, is guaranteed to block some level of genuine users.
If the user fails-over to a non-JavaScript based CAPTCHA, e.g. audio, this will lead to even higher abandonment rates. Again, these are being incorrectly classified as bad by a bot solution.

A common answer to the reject inference conundrum is A/B Testing, but this can still present challenges, based on the assumptions being made:

Is your proposed segmentation truly representative?
Are you tagging and following these segments correctly?
Are you prepared to expose your business to some damage, of letting bad bot traffic through?

Another important factor for A/B testing is around how to tag bad definitions:

What level is this bad definition tagged at? It could be at a customer, IP address, card, or session level.
Is the bad data clearly labelled to be able to track backwards and mitigate appropriately?
Is call-center feedback leveraged, so when the customer complained it was logged? This is still very likely a small subset of actual customers who have been blocked.

Challenge and subsequent abandonment rates also are often hidden by false positive claims. If we take an estimate that 1 in 5 customers abandon the transactions due to challenges, and a challenge rate was 10-20%, it could an abandonment rate of 2-4% of good users. This represents a potentially huge amount of lost revenue.

What’s in the Box?

With the change in culture around transparency, we have seen fraud move away from black box vendor models, moving to explainable scores with context. Credit Risk, for obvious reasons, has moved more towards transparency. This begs the question, why allow security infrastructure, at the top of the funnel, to materially impact the widest range of customers? Bot vendors are still lagging in terms of explainable detection logic. A simple example of the impact of this approach could be if a C-Level VIP of an organisation gets blocked because of a device configuration. The impact of security protocols on customer experience are immediately heavily scrutinized and a security unit would need to work closely with the vendor to unpick the score, which might take weeks.

The answer is that bot detection is and should always be an extension of the fraud model. It is not an island. This means models and strategies, both manual and automated, should traverse the whole customer journey. This gives security and fraud managers control over risk appetites, particularly around challenge rates and false positives.

Models on Shifting Sand: How to Stay Ahead of Hackers & Fraudsters

All fraud and security solutions have a half-life. Models degrade, solutions get bypassed. All classifications rely on signals. Bot authors are fantastic game theorists: they use trial and error for weaknesses or reverse engineer detection systems. Their reputation matters and they will work hard to maintain any claims while selling their services on the dark web and by other nefarious means.

Unless detection methods are dynamic or new data is added, detection effectiveness degrades over time. Building models for fraud and security is like building on ever-shifting sand. Bad definitions changes very quickly. Bots don’t even need to evade detection completely, they just need to mimic good users, so that false positives increase to unacceptable levels, and they can remain hidden.

The right way forward is to improve adaptability in security teams so that during any attack they have the ability to respond instantly, change rapidly and quickly deploy a mitigation strategy. Defences should be built by leveraging existing data, inserting new hypotheses, and developing mitigation strategies. This type of adversarial learning is exactly how bad bot providers operate; we need to beat them at their own game.

Can you Handle the Truth?

The truth of the bot problem is to flip it on its head. You don’t have a bot problem; you have a valuable user identification problem. When deploying CAPTCHA, you are not actually determining intent as you would from a fraud system. CAPTCHA farms mean that you wrestle with a one-upmanship in the wrong direction. This leads to the following questions:

Is the speed of solving a CAPTCHA genuinely indicative of bot behavior? Or could it simply be an internet savvy user?
Is increasing cognitive load and CAPTCHA complexity just adding an arbitrary delay?
Is deploying a credential stuffing solution suitable when high value accounts for fraudsters are handled by humans?

Finding the truth behind bot attackers will be key preparation for the next decade. Automation and AI are still relatively young solutions, and we will likely see an explosion in complexity, akin to Moore’s law. With any bot defence model comes issues around bad definitions: how do you know you made the right decision in the first place? The next generation of systems will need to be more than just a simple gateway into a company’s estate. They will need to carefully manage and tag the entire customer journey to understand how that bot is truly operating. Learn, mitigate, and repeat.

About Darwinium

Darwinium is a Digital Risk AI solution built to be more agile than the adversaries attacking it. Future-proofed to protect against tomorrow’s risks, today. In short, Darwinium is digital risk transformed, web security enhanced and customer experience optimized, giving business control over every type of bot.

*https://backlinko.com/ad-blockers-users