Graph Anomaly Detection Techniques — Part 1

Rajesh K
7 min readAug 29, 2023

--

Anomaly detection encompasses the identification of unusual behaviors or patterns within data or systems that deviate from expected norms, detecting non-benign alterations in established and accurate system behavior. This process holds significance within both homogeneous and heterogeneous distributed systems. In heterogeneous distributed systems, various components, ranging from minor variations like sensors to substantial elements like control units, collaborate to attain overarching system goals. These components span an extensive network, thus termed “distributed,” and exhibit structural and data production disparities, leading to the term “heterogeneous.” In comparison, homogeneous systems can be considered a specific instance of heterogeneous systems, where components are largely alike, if not identical.

This practice holds crucial importance as it addresses the challenge of identifying signals or patterns that differ from the norm, using extensive data analysis to prevent major faults. Anomaly detection finds application in high-impact domains like cybersecurity, finance, e-commerce, social networks, industrial monitoring, and other mission-critical tasks. Despite various techniques developed over the years to handle unstructured multidimensional data, recent attention has been drawn to graph-structure-aware methods, which cater to the unique characteristics of data organized as graphs.

1 — Examples of Anamoly Detections

Before we dive into the specifics of graph anomalies, let’s take a moment to explore some crucial concepts related to graphs.

A graph data structure has two basic elements: nodes and edges Nodes represent entities in the data such as members of an online social network, while edges symbolise relationships between those entities, such as friendship between members of a social network.

Homophily and Heterophily in graphs

Graph nodes and edges possess the possibility of being associated with types. A graph that features a solitary node type and a sole edge type earns the label “homophily/homogeneous.” An illustration of a homogeneous graph could be observed in an online social network, wherein nodes denote individuals and edges symbolize friendships. In this instance, both the nodes and edges retain a uniform type throughout.

Conversely, a graph encompassing two or more node types and/or two or more edge types is categorized as “heterophily/heterogeneous.” For instance, consider an online social network where nodes represent individuals and edges represent both ‘friendship’ and ‘co-worker’ relationships. In this scenario, the graph is characterized as heterogeneous due to the distinct types attributed to its edges.

Static Vs Dynamic Graphs

Static graphs are fixed structures that do not change over time. In a static graph, nodes and edges remain constant, and the relationships between entities are immutable throughout the analysis. These types of graphs are suitable for scenarios where relationships are stable and not subject to temporal changes more relevant in social networks.

A static plain graph G = {V, E} comprises a node set V = {vi} n 1 and an edge set E = {ei,j} where n is the number of nodes and ei,j = (vi , vj ) denotes an edge between nodes vi and vj . The adjacency matrix A = [ai,j ]n×n restores the graph structure, where ai,j = 1 if node vi and vj is connected, otherwise ai,j = 0. A static attributed graph G = {V, E, X} comprises a node set V , an edge set E and an attribute set X. . The attribute matrix X = [x ]n×k consists of nodes attribute vectors, where xi is the attribute vector associated with node vi and k is the vector’s dimension.

Dynamic graphs, also known as temporal graphs, capture the evolution of relationships over time. These graphs accommodate changes in nodes, edges, or both, reflecting the dynamic nature of the entities being studied. Dynamic graphs are used when it’s essential to capture how relationships evolve, adapt, or dissolve over different time intervals.

A dynamic graph can be represented as an ordered list or an asynchronous stream of timed events, such as additions or deletions of nodes and edges¹. A social network like Twitter is a good illustration: when a person joins the platform, a new node is created. When they follow another person, a follow edge is created. When they change their profile, the node is updated.

Dynamic Grpahs evolve over time and can be seen as time sequenced events

Graph Anamoly Detection Types

Graphs, structures where nodes/vertices symbolize tangible entities and edges depict their connections, have found widespread use in various domains such as social interactions, e-commerce, biology, academia, and communication to convey structural information. The presence of this structural data within graphs introduces a more intricate challenge in detecting anomalies, specifically in a non-Euclidean domain termed “graph anomaly detection”. This endeavor seeks to pinpoint abnormal graph elements (such as nodes, edges, or sub-graphs) within a single graph, as well as identify anomalous graphs within a dataset. Illustrated in Figure as a simple instance, envision an online social network: graph anomaly detection endeavors to spot unusual nodes (like malicious users), atypical edges (denoting unusual relationships), and extraordinary sub-graphs (such as groups of malicious users). However, due to the diversity of graph anomalies that don’t neatly fit into the confines of Euclidean feature space, employing conventional anomaly detection techniques directly for graph anomaly detection becomes impractical.

Graph anomaly detection involves identifying unusual patterns, outliers, or unexpected behaviors in graph-structured data. There are several types of graph anomaly detection methods, each tailored to different types of graphs and scenarios.

Node based:This type focuses on identifying individual nodes that exhibit anomalous behavior within the graph. Node anomalies could include nodes with significantly different characteristics or nodes that deviate from the expected patterns of interaction with their neighbors.

example: In a social network, identifying users who have an unusually high number of connections compared to the average user could indicate potential spam or bot accounts.

Edge based:Edge anomaly detection aims to find anomalous relationships or connections between nodes. These anomalies can include unexpected collaborations, unusual communication patterns, or fraudulent transactions.

example: In a financial transaction network, detecting unusually large or frequent transfers of money between certain accounts could signal potential money laundering.

Subgraph based:Subgraph anomalies involve identifying smaller graph structures that exhibit abnormal behavior within the larger graph. This type of detection is particularly useful for uncovering localized irregularities.

example: In a transportation network, identifying a group of nodes and edges that form a subgraph with a significantly higher density of connections than usual could indicate a traffic congestion anomaly.

Temporal based:Temporal anomaly detection focuses on identifying anomalies that occur over time in dynamic graphs. This can involve changes in node or edge behavior that are unexpected or unusual compared to historical patterns.

example: In a communication network, detecting a sudden increase in communication between specific nodes could indicate a potential security breach or coordinated attack.

Time Series based: For dynamic graphs with temporal information, time series anomaly detection focuses on detecting anomalies that emerge over time. This can involve identifying nodes, edges, or subgraphs that exhibit abnormal behavior patterns in their temporal evolution, making it suitable for scenarios like monitoring network traffic or tracking disease spread in a contact network.

Collective based:
Collective anomaly detection considers the global behavior of nodes and edges in the entire graph. It seeks to identify anomalies that are not necessarily anomalous on their own but are collectively unusual in the context of the entire graph.

example: In a transportation network, detecting a sudden drop in average travel times for all routes during a certain time period could indicate a data collection error or system malfunction.

Each type of graph anomaly detection has its strengths and weaknesses, making them suitable for different applications. The choice of which method to use depends on the nature of the graph data, the desired level of granularity, and the specific anomalies being targeted.

In the the forthcoming parts , I plan to cover detailed on the different architectures and embeddings differnet GNNs for anamoly detection.

References

Graph Anomaly Detection with Graph Neural Networks: Current Status and Challenges HWAN KIM1 , BYUNG SUK LEE2 , WON-YONG SHIN3 (Senior Member, IEEE), and SUNGSU LIM1 (Member, IEEE)

A Comprehensive Survey on Graph Anomaly Detection with Deep Learning Xiaoxiao Ma, Jia Wu, Senior Member, IEEE, Shan Xue, Jian Yang, Chuan Zhou Quan Z. Sheng, and Hui Xiong, Fellow, IEEE and Leman Akoglu

A Survey of Graph-based Deep Learning for Anomaly Detection in Distributed Systems Armin Danesh Pazho* , Ghazal Alinezhad Noghre*, Arnab A Purkayastha, Jagannadh Vempati, Otto Martin, and Hamed Tabkhi

Anomaly Detection in Graph: Unsupervised Learning, Graph-based Features and Deep Architecture Dmitry Vengertsev, Hemal Thakkar, Department of Computer Science, Stanford University

https://medium.com/stellargraph/knowing-your-neighbours-machine-learning-on-graphs-9b7c3d0d5896

https://blog.twitter.com/engineering/en_us/topics/insights/2021/temporal-graph-networks

https://graph-neural-networks.github.io

A Survey on Graph Neural Networks for Time Series: Forecasting, Classification, Imputation, and Anomaly Detection Ming Jin, Huan Yee Koh, Qingsong Wen, Daniele Zambon, Cesare Alippi, Fellow, IEEE, Geoffrey I. Webb, Fellow, IEEE, Irwin King, Fellow, IEEE, Shirui Pan, Senior Member, IEEE

Graph Neural Networks: a bibliometrics overview Abdalsamad Keramatfar 1,2 , Mohadeseh Rafiee1 , Hossein Amirkhani2

Fraud detection: A systematic literature review of graph-based anomaly detection approaches

--

--

Rajesh K
Rajesh K

Written by Rajesh K

All about Language models, Graphs and RL

Responses (1)