Distributed Anomaly Detection in Edge Streams using Frequency based
Sketch Datastructures
Often logs hosted in large data centers represent network traffic data over a
long period of time. For instance, such network traffic data logged via a TCP
dump packet sniffer (as considered in the 1998 DARPA intrusion attack) included
network packets being transmitted between computers. While an online framework
is necessary for detecting any anomalous or suspicious network activities like
denial of service attacks or unauthorized usage in real time, often such large
data centers log data over long periods of time (e.g., TCP dump) and hence an
offline framework is much more suitable in such scenarios. Given a network log
history of edges from a dynamic graph, how can we assign anomaly scores to
individual edges indicating suspicious events with high accuracy using only
constant memory and within limited time than state-of-the-art methods? We
propose MDistrib and its variants which provides (a) faster detection of
anomalous events via distributed processing with GPU support compared to other
approaches, (b) better false positive guarantees than state of the art methods
considering fixed space and (c) with collision aware based anomaly scoring for
better accuracy results than state-of-the-art approaches. We describe
experiments confirming that MDistrib is more efficient than prior work.
A Case for a Programmable Edge Storage Middleware
Edge computing is a fast-growing computing paradigm where data is processed
at the local site where it is generated, close to the end-devices. This can
benefit a set of disruptive applications like autonomous driving, augmented
reality, and collaborative machine learning, which produce incredible amounts
of data that need to be shared, processed and stored at the edge to meet low
latency requirements. However, edge storage poses new challenges due to the
scarcity and heterogeneity of edge infrastructures and the diversity of edge
applications. In particular, edge applications may impose conflicting
constraints and optimizations that are hard to be reconciled on the limited,
hard-to-scale edge resources. In this vision paper we argue that a new
middleware for constrained edge resources is needed, providing a unified
storage service for diverse edge applications. We identify programmability as a
critical feature that should be leveraged to optimize the resource sharing
while delivering the specialization needed for edge applications. Following
this line, we make a case for eBPF and present the design for Griffin - a
flexible, lightweight programmable edge storage middleware powered by eBPF.
Machine Unlearning: Learning, Polluting, and Unlearning for Spam Email
Machine unlearning for security is studied in this context. Several spam
email detection methods exist, each of which employs a different algorithm to
detect undesired spam emails. But these models are vulnerable to attacks. Many
attackers exploit the model by polluting the data, which are trained to the
model in various ways. So to act deftly in such situations model needs to
readily unlearn the polluted data without the need for retraining. Retraining
is impractical in most cases as there is already a massive amount of data
trained to the model in the past, which needs to be trained again just for
removing a small amount of polluted data, which is often significantly less
than 1%. This problem can be solved by developing unlearning frameworks for all
spam detection models. In this research, unlearning module is integrated into
spam detection models that are based on Naive Bayes, Decision trees, and Random
Forests algorithms. To assess the benefits of unlearning over retraining, three
spam detection models are polluted and exploited by taking attackers' positions
and proving models' vulnerability. Reduction in accuracy and true positive
rates are shown in each case showing the effect of pollution on models. Then
unlearning modules are integrated into the models, and polluted data is
unlearned; on testing the models after unlearning, restoration of performance
is seen. Also, unlearning and retraining times are compared with different
pollution data sizes on all models. On analyzing the findings, it can be
concluded that unlearning is considerably superior to retraining. Results show
that unlearning is fast, easy to implement, easy to use, and effective.
Technical Report: Edge-centric Programming for IoT Applications with
EdgeProg
IoT application development usually involves separate programming at the
device side and server side. While separate programming style is sufficient for
many simple applications, it is not suitable for many complex applications that
involve complex interactions and intensive data processing. We propose
EdgeProg, an edge-centric programming approach to simplify IoT application
programming, motivated by the increasing popularity of edge computing. With
EdgeProg, users could write application logic in a centralized manner with an
augmented If-This-Then-That (IFTTT) syntax and virtual sensor mechanism. The
program can be processed at the edge server, which can automatically generate
the actual application code and intelligently partition the code into device
code and server code, for achieving the optimal latency. EdgeProg employs
dynamic linking and loading to deploy the device code on a variety of IoT
devices, which do not run any application-specific codes at the start. Results
show that EdgeProg achieves an average reduction of 20.96%, 27.8% and 79.41% in
terms of execution latency, energy consumption, and lines of code compared with
state-of-the-art approaches.
Search by a Metamorphic Robotic System in a Finite 3D Cubic Grid
We consider search in a finite 3D cubic grid by a metamorphic robotic system
(MRS), that consists of anonymous modules. A module can perform a sliding and
rotation while the whole modules keep connectivity. As the number of modules
increases, the variety of actions that the MRS can perform increases. The
search problem requires the MRS to find a target in a given finite field. Doi
et al. (SSS 2018) demonstrate a necessary and sufficient number of modules for
search in a finite 2D square grid. We consider search in a finite 3D cubic grid
and investigate the effect of common knowledge. We consider three different
settings. First, we show that three modules are necessary and sufficient when
all modules are equipped with a common compass, i.e., they agree on the
direction and orientation of the $x$, $y$, and $z$ axes. Second, we show that
four modules are necessary and sufficient when all modules agree on the
direction and orientation of the vertical axis. Finally, we show that five
modules are necessary and sufficient when all modules are not equipped with a
common compass. Our results show that the shapes of the MRS in the 3D cubic
grid have richer structure than those in the 2D square grid.
Privacy-Preserving Decentralized Exchange Marketplaces
Decentralized exchange markets leveraging blockchain have been proposed
recently to provide open and equal access to traders, improve transparency and
reduce systemic risk of centralized exchanges. However, they compromise on the
privacy of traders with respect to their asset ownership, account balance,
order details and their identity. In this paper, we present Rialto, a fully
decentralized privacy-preserving exchange marketplace with support for matching
trade orders, on-chain settlement and market price discovery. Rialto provides
confidentiality of order rates and account balances and unlinkability between
traders and their trade orders, while retaining the desirable properties of a
traditional marketplace like front-running resilience and market fairness. We
define formal security notions and present a security analysis of the
marketplace. We perform a detailed evaluation of our solution, demonstrate that
it scales well and is suitable for a large class of goods and financial
instruments traded in modern exchange markets.