Classification and prediction of matrix structured data with applications to recommendation systems or identifying anti-socials and bot-nets – Saber Shokat Fadaee –

Date: Monday, September 26, 2016
Time: 4:00pm – 5:00pm
Location: 366 West Village H

Abstract

Matrix representations are a natural way to represent many forms of networked and tabulated data. These include connections among people, user preferences over items, or (the time-series of) bot-net attacks against entities. Models based on matrix factorization have been extensively studied in machine learning and statistical analysis. In this thesis, we address issues related to learning with matrix structured data.

In the first part of this proposal, our goal is to determine the structural differences between different categories of networks (represented as adjacency matrices) and to use these differences to predict the network category. We propose Cliqster, a new generative Bernoulli process-based model for unweighted networks. The generating probabilities are the result of a decomposition which reflects a network’s community structure. By solving this problem, we are able to present an efficient algorithm for transforming the network to a new space which is both concise and discriminative. This new space preserves the identity of the network as much as possible. Our algorithm is interpretable and intuitive.

In the second part of this work, we have matrices that represent users’ preferences (in the form of ordinal or “1-5 star” ratings) over items. The task of predicting a user’s rating of items is solved by recommendation systems. Recommendation systems have been widely used by commercial service providers for giving suggestions to users based on their previous behaviors. While a large portion of users faithfully express their opinions, some malicious users add noisy ratings in order to change the overall ratings of a specific group of items. The presence of noise can add bias to recommendations, leading to instabilities in estimation and prediction. Although the robustness of different recommendation systems has been extensively studied, designing a robust recommendation system remains a significant challenge as detecting malicious users is computationally expensive. In this work, we propose a new recommendation system that is resistant to manipulation by malicious users.

Finally in the last part of this proposal, we work with matrices whose entries are time-series representing the attacks of bot-nets on companies. We have obtained access to a proprietary and large data set of 206 bot-nets attacking 5,916 entities over a period of 400 consecutive days. Each entity(company) has a size in terms of number of employees and belongs to a certain sector (e.g. educational sector, financial sector, etc.). Our goal is to classify and predict the behavior of different bot-nets towards certain entities. Moreover, we would like to classify and predict the size and sector of different entities.

Thesis Committee