In parts 1 and 2, we covered what causes and constitutes fraud and measurements that ought to be pursued. Here we cover a few important and data science computation techniques used to catch fraud, prevention of fraud and policies that could help reduce large scale e-commerce fraud. While some terms may seem complex, the reader is encouraged to click on links in the article for a better comprehension.
Some data science techniques used in fraud detection
[dropcap color=”#008040″ boxed=”yes” boxed_radius=”8px” class=”” id=””]T[/dropcap]he data science techniques employed to catch or prevent the common frauds is really very exciting. Data Analysis, deep data exploration, computations and monitoring all come into effect. Most of the times a two pronged attack is taken
- Identify using aggregate behaviour that is suspicious
- Drill down to isolate the specific culprits for identification
Aggregate behaviour is generally a detection in a spike for a given dimension be it a region, sales velocity of a supplier, sale velocity of a product type, number of new users, number of cancellations and so on.
Once an aggregate behaviour is deemed “suspicious” largely using historical data and probabilistic measures, further drill-down techniques could be applied on smaller data sets thereby giving accurate results faster.
E-commerce fraud is everyone’s problem
Aggregate behaviour is much like localising a problem before further deep analysis. Typically, exploitation of loopholes spreads locally before becoming common knowledge. Growth spurts in various metrics are monitored to identify any anomalous behaviour.
[dropcap color=”#008040″ boxed=”yes” boxed_radius=”8px” class=”” id=””]T[/dropcap]his technique is an aggregate form as in a bird’s-eye-view-technique to isolate suspicious area where malpractice or fraud of some sort could be happening. This techniques uses prior probabilities on historical data or moving average measures
Some problems where spike detection is used are
- Sales velocity for a region(city or pincode)
- Sales velocity for a supplier
- Sales Velocity of a product
- Acquisition of new customers for a region(city or pincode)
Statistical techniques like an -’n day Moving average’ with percentage limits and Poisson’s Process for arrival rate have been used to model systems to identify a spike. Spike detection is used in identifying fraud in may if the co-sections below.
Sustained periodic increase
A linear day-on-day rise in sales or certain adoptions could also be used to identify possible exploitations. The really smart fraudster may stay under the radar and exploit an e-commerce company’s loopholes slowly and not all at once. Identifying sustained increase for a metric for a week or more could be used as a strong case for suspicion.
Drilling Down techniques
Drill down methods are generally considered expensive computationally and sometimes prohibitive to do for every case. Such techniques could hamper the overall experience of the e-commerce user unless backed with massive computing infrastructures. Discussed below are techniques used to identify actual wrong doers via patterns for the cases discussed in part-2 on some of the common e-commerce frauds. These techniques are ever evolving much like those seen in antivirus and internet security domains.
Identifying “Buyer is the Seller”
[dropcap color=”#008040″ boxed=”yes” boxed_radius=”8px” class=”” id=””]M[/dropcap]any markers exist to catch sellers exploiting supply side incentives in various industries two of which are
- Customer App often too close to the driver in case of Taxi-On-Demand companies
- Too many concentrated sales of marked-down items in the same city as the supplier in case of retail e-commerce.
Simple computation of repeat offenders could be gathered and such suppliers could be held accountable or even fired. Rank Order Clustering, Cosine Similarity and Biclustering are some techniques found in Operations Research Theory that have been used to identify such fraud with high probability.
In one cab company, a cab driver doing 18 trips in 12 hours of duty was adequate to raise suspicion which lead to further pattern recognition techniques of the kind of bookings he had done. The driver was booking trips for himself via a bug in the booking system and kept getting trips assigned to him. Binomial testing was used extensively to first isolate probability of having such high number of bookings to raise suspicion on the driver(typically done via a late night batch processing) and binomial testing was done again to test the likelihood of the kind of booking that he was doing. The probabilities were too small to be mere chance and the driver was reprimanded.
Techniques to prevent or identify Multiple Identity
Digital fingerprinting of the user via the user agent like browser cookies is a common technique that can be used to quickly identify multiple users signing in from the same machine in a short duration. Facebook and Google have been leading pioneers in this field and having customers log into the platform using their facebook or google accounts only could be a very good start. Suspecting individual wrong doers begins with probabilistic measure using Binomial Testing with known priors can be used to identify some of this fraud. Similarity measures can also be used in increasing the likelihood of nailing fraudsters trying to create multiple identities in their systems. Luhn’s Algorithm for IMEI number correctness have also been used extensively. It take a few Indian Rupees to change one’s IMEI number of the phone. If the incentive is larger than the cost to change one’s IMEI number, expect fraud.
Techniques to identify Bulk Buying
[dropcap color=”#008040″ boxed=”yes” boxed_radius=”8px” class=”” id=””]I[/dropcap]magine a bunch of users in a pin code buying the same item on a deal but all shipping it to the same address. Lets also say that these addresses are similar but deliberately not written in the same way. In such cases we rely on similarity measures for string comparison. Traditional methods have relied on finding the Longest Common Subsequence based distances like Levinstein distance used in Edit Similarity. However this technique is less efficient than some innovative techniques using Hamming distance on character combinations of two addresses. These techniques are beyond the scope of this article. The curious reader can find more information on the internet.
Techniques to identify and prevent stock-out of a supplier
It may be really hard to identify the real culprit supplier who is trying to stock out his competitor but again spike in sales can be used to identify potential stock-outs. Since it could be hard to identify a coordinated attack, it is best to put preventative measures like turning off Cash-On-Delivery for a duration as a useful deterrent.
Suggestions for better business
Discussed below are some suggestions that could make the startup space truly beneficial to all players and especially the entrepreneur.
When should you start paying attention to fraud in your startup
[dropcap color=”#008040″ boxed=”yes” boxed_radius=”8px” class=”” id=””]I[/dropcap]n the absolute early stages of an online/mobile company say up to ‘N’ TPD(transactions per day), incentives are a must to get-the-word-out & get-the-desired-adoption as a result it may be hard to identify fraud. One would resort to behavioral analysis & hours of brainstorming with domain-knowledge rich SME(Subject Matter Experts) to put some basic checks and balances in place to prevent fraud but that is never going to be enough once you hit a much higher ‘M’ TPD(Transactions per Day).
Vigil is the key in many situations. Cheat me once, shame on you. Cheat me again, shame on me! Many detectives and criminal psychologists will tell you that most thieves find it hard to stay under the radar because of greed. They tend to over exploit the loopholes known to them to “make money when the sunshines” and also to make their actions worth the risk.
Working with your competitor for a better business environment
It may be a good idea to work with competition and investors by being transparent about risks and solutions for the same. If fraud eats all companies, investor confidence will drop. Its best that “devils unite for a better place for doing business and competition is fair.”
Working with local and national lawmakers to bring to book fraudsters
It is really important for lawmakers of India to come to terms with e-commerce fraud to realise India’s goal of digital penetration for the better. Companies need to work with lawmakers and raise their concerns.
[dropcap color=”#008040″ boxed=”yes” boxed_radius=”8px” class=”” id=””]G[/dropcap]rowing up in India in the 80s and 90s, one recalls how much bureaucracy one had to go through before even getting a telephone connection. People were harassed with a thousand questions and buried in paperwork before getting anything. The concept then seemed to be “You are a suspect until you prove your intent not to do fraud”. As we move towards ease of getting amenities and a convenient marketplace, lawmakers need to put extremely strict action in place as a strong deterrent so that e-commerce fraud is well defined and the punishment is very prohibitively severe financially.
Who is equipped to tackle E-commerce fraud
Core data Scientists, pure applied statisticians, Machine Learning Experts, Computer scientists, Big Data Engineers, Product Management, Legal Teams and Engineering folk with extremely deep knowledge of the plethora of open source offerings from Apache foundation are all extremely important. A right blend of people and skill is extremely a must.
E-commerce fraud is everyone’s problem
At a company level, Synergies between Product, Data Science, Data Analysts, Business and Engineering is a must for adequately controlling fraud in e-commerce. The problem cannot be brushed aside and understated.
[dropcap color=”#008040″ boxed=”yes” boxed_radius=”8px” class=”” id=””]A[/dropcap]s a country, the Ministry of Commerce and Finance Ministry needs to step in to encourage companies with strict penalties to consumers and suppliers alike who “game” the system. This is very important for India to be a good destination for investment and for companies to thrive and bring value. Founders of startups who offload their shares for additional funding before a certain minimum period and strict valuation measures need to come in to prevent the thorough destruction of “Brand India”. In tough economic times Profit Margins and Positive Gross Merchandise Volume may be more relevant to investors and not just growth in transactions. Vigil is needed along with being highly “Data Driven”. In a country where “Innocent until proven guilty” can take decades to conclude, businesses should be allowed to “refuse to do business if suspicious” without being unfairly held accountable.
1. Text in Blue points to additional data on the topic.
- Infosys promises to rectify glitches in the new Income Tax portal within a week. Many demand revival of the old portal - June 23, 2021
- Yogi stands firm. Modi’s A K Sharma accommodated as Vice President in UP’s BJP unit - June 19, 2021
- Origins of SARS COVID-19: Unraveling the Whodunit mystery - June 18, 2021