
Co-author: Khin Radanar Pyae Phyo
If you are working with traditional business intelligence(BI) tools to track anomalies which are abnormal situations in business trends, I suggest you getting Artificial intelligence(AI) anomaly detection will help your issues to solve faster, increasing efficiency and driving performance. Artificial intelligence, more powerful than human software engineers, may provide the incremental improvements for BI tools to the next level. AI-based solutions are becoming more understandable for users with no technical background as it already has full-facilities accessed from open-source AI libraries and packages. Domain Experts’ knowledge is also necessary to boost the model.
What do you think about why we need AI-based anomaly detection? Because you have a lot of data and you don’t always know about what is happening with data and it is difficult to plot the large metrics to find the outliers. Traditional BI tools can take weeks to track the anomalies. Real-life business sectors require immediate solutions to anomalies.
According to global anomaly detection market analysis report, the anomaly detection market is expected to register a CAGR(Compound Annual Growth Rate) of 15.3% and surpass USD 5 billion market size by the end of the forecast period 2020-2024.
Figure 1: Global future revenue rate according to MRFR market analysis
Nowadays AI-driven anomaly detection systems are widely used in many industrial fields: for health, network security prevention, game sectors and stock markets,etc. So, don’t be late to connect with AI-based anomaly detection solutions to figure out your business obstacles. Our Nexidea AI Research Team can help you to predict and detect the abnormal data points in your business problems by applying Machine learning and deep learning models.
What is Anomaly Detection?
Before talking about anomaly detection, I would like to ask, “Have you ever heard about outliers?”. Outliers, also known as anomalies, are an observation or a sequence of observations which deviate from range and distribution of attribute values in the normal input data. These sets of outliers form a very small part of the dataset. The below figure shows that the red points (outliers) stay in a different part from the normal data.
Figure 2: outlier vs the normal data
Anomaly detection is a way to detect these abnormal data from the normal situation. Outliers can be seen by visualization plotting techinques: scatter plot, histogram and boxplot methods. However, there might be problems in real-life which have hundreds of variables to plot in a statistical manner. For these reasons as mentioned before, AI-based anomaly detection models are the best aid to handle enormous data.
It is important to know which kind of anomaly faces in your business data. Three kinds of anomalies can be found commonly: point anomalies, contextual anomalies and collective anomalies.
(i) Point Anomalies
An individual data instance is anomalous if it deviates significantly from the rest of the data set. Consider, for instance, detecting unauthorized amounts spent from detecting credit card fraud.
Figure 3: Point anomalies example
(ii) Contextual Anomalies
Individual data instance is anomalous within a context and also known as conditional anomalies. An example for contextual anomalies can be found in a use-case such as power consumption likely has context-based, time-related relationships: it makes sense to posit that the power consumption of an office building is much higher during midday, during a work day, compared to at night, during a weekend.
Figure 4: Contextual anomalies example
(iii) Collective Anomalies
A collective anomaly represents a set of anomalies with respect to the dataset, but not individual objects. For instance, the three time series that are found to be related to each other, and are combined into a single anomaly. For each time series, the individual time series doesn’t deviate from each other. However, combining the anomalies takes a big issue.
Figure 5: Collective anomalies by anomalous subsequences
After knowing which kind of anomaly in your data, it is also important to know the methods most data scientists using in the AI field. Primarily Anomalies detection methods based on machine learning are under the following headings.
(i) Supervised Method
Since this supervised method has already labelled datasets which are already trained for abnormal or normal data. Supervised methods provide better results for anomalies since it is trained on labelled data and predicted on unseen data.
List of Common algorithms
-
- Linear Regression
- K-Nearest Neighbours ( KNN )
- One-Class Support Vector Model (SVM)
- Long-Short term memory ( LSTM )
(ii) Unsupervised Method
Unsupervised-based method does not depend on training data which works with labelled data. It requires domain experts’ knowledge and to arrange the features values for connection with output results. It is aimed at clustering.
List of Common algorithms
-
- K-Means Clustering
- Autoencoder
- Principal Component Analysis ( PCA )
(iii) Semi-supervised Method
In anomaly detection tasks, we often have rich observations of the normal case, but it is very hard to gather abnormal observations. Therefore, with little or no examples of anomalies, the machine doesn’t have enough information to learn. If your data set has a good distribution of normal data but not anomalous data, then you can use semi-supervised data. In this approach, you train the model on normal data and get an anomaly score of how much actual data deviates from normal data.
Applied Areas
Let’s see how we can use AI-based anomalies detection techniques with different aspects. Anomaly detection is applicable in a wide variety of domains as listed below.
(i) Health Care
Effects of medical errors, medical practitioners' careless cure and wrong diseases analysis result in thousands of accidental deaths and injuries each year. With the support of anomaly detection, we can track the diagnosis in the early stage and give better insights to the doctors. The application of machine learning algorithms to healthcare data can enhance patient care while also reducing healthcare worker effort.
As AIdoc is a top anomalies detection company, AIdoc helps radiologists work through their case load faster, just in time to make a difference. They reworked deep learning algorithms to analyze imaging and clinical data more effectively, and can produce highly accurate scan anomaly detection. They combine the analyzed scans with patient data, streamlining the radiologists' workflow and freeing them to do what they do best.
(ii) Network Server or Application Failure
If the network server fails and the connection is bad, that makes the customers angry and unsatisfied. As today is IT ages, all are switching on the networks we always keep in touch with IT. Anomaly detection can be seen in mobile games when the connection gets worse.
For a telecom or network service provider, monitoring time series data could be network usage at any given moment or the response time for their call centers. And these metrics can be incredibly granular – down to the device and browser level – which makes monitoring and investigating anomalies extremely cumbersome. In recent Gartner study, which found that incidents can cost large enterprises an average of $300,000 an hour, and that the average loss to productivity is 31 hours a week. Thus, these companies are trying to reduce the cost of not detecting anomalies in time.
Figure 6: Market growth rate according to Mordor intelligence survey
(iii) Fraud Detection
Fraud detection is a set of activities undertaken to prevent money or property from being obtained through false pretenses. Fraud detection is applied to many industries such as banking or insurance. When a person steals another's card or personal information details, unauthorized purchasing can occur. We can detect with Machine learning classifiers to detect anomalies.
Famous online payment company, PayPal, primarily leverages machine learning to enhance its risk management and fraud detection capabilities.
Figure 7: PayPal Fraud Detection
(iv) Time Series based System
Time series data is structured data that is collected at different times. We have known the Stock market, ECG heart-rate data, market analysis and abnormal price detection. For example, if we have to choose the hotel room on the online website, all of us firstly compare the facility and room price? Machine learning algorithms can detect abnormal high prices.
Figure 8: : Applying time series in different markets
(v) Intrusion Detection
Intrusion Detection is a vital role in detecting malicious activities for Cyber attacks prevention purposes. The supervised method can handle the known attacks and can recognise variations of those attacks. Unsupervised methods can recognize the patterns and can report anomalies in DOS attack and probing etc.
Figure 9: Intrusion Detection
Our Research
Our Nexidea AI Research Team have already researched time series based projects with Stock Data, Medical Sensor Data and other open-sources datasets. During this research moment about the anomaly detection and prediction, we have got lots of experience and become knowing how to handle the anomalies depending on the dataset what we want to detect and predict.
Before doing this anomalies detection research, we researched on the time series prediction on the heart rate ECG sensor datasets by LSTM model. So, we have already known about how to train the model for the time series problems. The next step we only need to focus on is how to handle anomalies in the supervised and semi-supervised methods.
Let’s see the important facts about how to build the anomalies detection system.
1.You should know what you want to output from which input data
You have to check whether your input data is univariate(one feature) or multivariate(many features). The output data is single-step (one output) or multi-step (multi-outputs) . You also need to know you will use a supervised or unsupervised method. If you use supervised learning methods, the labelled data must be prepared. If you use unsupervised one, domain knowledge is necessary for that project purpose because we have to make a condition based on the input features.
2. Check whether you have done standardization or not
Make sure the sparse value doesn’t contain in your dataset. Sparse means contain empty or null values which can lead to wrong predictions. Most anomalies detection projects need to scale the dataset’s values by using Keras library’s standardization methods.
3. Choose the appropriate AI training Model
This depends on what you are using to supervise, semi-supervised or unsupervised or not. We used LSTM, KNN classifier, Support Vector Machine(SVM) classifier and LSTM autoencoders. There is also a famous detection python toolkit,called as Pyod Library, for multivariate data containing 30 detecting algorithms. If the supervised model is used, the labelled data is trained in the model. If you use a semi-supervised one, we trained with normal data in the model and predicted with test data. To get better accuracy, parameter tuning is important.
4. Use Machine learning visualization methods
We need to minimize the error such as mean squared error,etc. which is defined by loss function. A good fit model is the error between the predicted values and actual values should be as small as possible. If you are a data scientist, you should know the usage of matplot library to visualize the data and to set the threshold values to analyze which one is anomaly or not. Since our anomalies detection is mainly based on supervised and semi-supervised methods, confusion-matrix and ROC curve is useful to know what percentage of our system is good or bad.
We hope the mentioned above tips will help you to move on your anomalies detection project. We are warmly welcomed if you are interested to join our Nexidea’s anomalies detection system.
Conclusion
Anomaly Detection is running a majority role in every business sector. Why not take this opportunity to boost your business revenue. It is not strange that the faster the problem found, the easier to handle the problems. We are aiming to supply the appropriate cost and easy-to-use platform for the client. Currently we are working with static data in our research. To jump up the next step, our future work is to implement anomalies detection in big data to process the large volume of real-time streaming data. Of course, detecting anomalies in real-time streaming data would be difficult. We will move on to detect all anomalies as soon as possible, trigger no false alarms, work across a variety of domains, and automatically adapt to changes.
AI-driven anomalies detection will point out the in-time wrong results before reaching out to your customers. To start your business for implementing this AI-based anomalies detection, our Nexidea AI team is ready to solve your business issues to meet your goal.