ANOMALY DETECTION

Spotting an anomaly depends on the ability to define what is normal.

What is Anomaly Detection?


Anomalies are the collection of data points that do not follow the same pattern or have the same structure as the rest of the data. An anomaly is a deviation from an established normal pattern. Spotting an anomaly depends on the ability to define what is normal. Anomaly points are generated by a different generative process than the nominal points. This is one of the most widely used machine learning technique today and it can have a big impact on our daily lives.

How can it benifit my business?


Suppose an E-Commerce site has a pricing glitch. Someone mistakenly entered a price of Rs 1500 instead of 2500 for a product. These types of typos happen all the time, and it’s great for the consumers. As a result, customers can buy this expensive product for less money and that’s not good for the E-Commerce site. So, Our company reveals this type of problem quickly. An outlier data point indicates an unexpected spike in its sales, and deeper inspection can identify pricing error.

Instagram Marketing

Instagram Photo

Steps involved

Query bots
  • Supervised
    Training data labeled with “nominal” and “anomaly”
  • Cleaning
    Data is cleaned to differntiate between normal points and the others
  • Unsupervised
    The algorithm is given free hand to find out patterns which are not normal.

TYPES OF OUTLIERS:


  1. Global Outliers
  2. Contextual Outliers
  3. Collective Outliers

Global Outliers
A data point or points are considered a global outlier if their values are far outside everything else in the dataset. For eg:- Think about the Zoom application at the start of the pandemic, within a matter of days, the number of people using Zoom spiked exponentially. That was a global outlier when you compare those numbers to their pre-COVID user base.

Contextual Outliers
A data point is considered a contextual outlier if its value deviates quite a lot from the rest of the data points that are in the same context. The same value may not be considered an outlier if it occurs in a different context. For eg:- Consider a sudden surge in order volume in an eCommerce company in the middle of the night. It’s a contextual outlier because you wouldn’t expect this high volume to occur outside daytime.

Collective Outliers
We see collective outliers when a group of data points within a large dataset is significantly different from the entire dataset, but each data point on its own wouldn’t be considered anomalous in either a context or a global sense. For eg:- Imagine you are running an ad campaign. As your budget increases, you normally see an increase in both impressions and ad clicks. Suppose you increase your budget and you see the amount of impressions increase, but you also see a decrease in the number of clicks. Neither the increase in impressions or the drop in the clicks is anomalous, but when they happen together, that means that you have an issue with your campaign.


Well - Defined Anomaly Distribution Assumption

In this, the anomalies are drawn from a well-defined probability distribution. For eg- repeated instances of known machine failure The well-defined anomaly distribution is often risky. Adversarial situations(fraud, insider threats, cybersecurity) and user’s notion of “anomaly” changes with time(e.g., anomaly ==” interesting point”)

Instagram Photo

Time Series Data:

In Businesses, time-series data represent measurements collected over time for key performance indicators. Detecting anomalies in time series data are especially a value to the business because when it comes to business, anomalies often point to areas that need attention. For eg. Let’s take an online store, suppose there is an anomalous drop in purchases of a particular product, maybe they ran out of stock of that product. If this anomaly went undetected, the store would lose potential revenue and customers who may never return.

A great example of a business anomaly happened during the onset of the COVID-19 pandemic. As millions of people began working from home and started video conferencing, they created an anomalous spike in bandwidth consumption across residential areas. If telcos did not respond, their quality of service would have suffered. In many cases, it did.

Applications:

  1. Credit Card Fraud Detection
  2. Telecommuncication Fraud Detection
  3. Network Intrusion detection
  4. Fault Detection

Get in touch with us for more details about our products and services.