CS6220 Unsupervised Data Mining

HW3 Clustering: DBSCAN, HIERARCHICAL

Make sure you check the syllabus for the due date. Please use the notations adopted in class, even if the problem is stated in the book using a different notation.

We are not looking for very long answers (if you find yourself writing more than one or two pages of typed text per problem, you are probably on the wrong track). Try to be concise; also keep in mind that good ideas and explanations matter more than exact details.

Submit all code files Dropbox (create folder HW1 or similar name). Results can be pdf or txt files, including plots/tabels if any.

"Paper" exercises: submit using Dropbox as pdf, either typed or scanned handwritten.


DATATSET : Kosarak : click-stream data of a hungarian on-line news portal

DATATSET : Aminer : public citation dataset

DATATSET : 20 NewsGroups : news articles

DATATSET : MNIST : digit images

https://en.wikipedia.org/wiki/MNIST_database
http://yann.lecun.com/exdb/mnist/

PROBLEM 5: DBScan on toy data

You are to cluster, and visualize, a small dataset using DBSCAN (✏ = 7.5, M inP ts = 3). You have been provided a file, dbscan.csv, that has the following columns for each point in the dataset: cluster originally empty, provided for your convenience pt a unique id for each data point x point x-coordinate y point y-coordinate num neighbors number of neighbors, according to the coordinates above neighbors the id’s of all neighbors within ✏ As you can see, a tedious O(n2) portion of the work has been done for you. Your job is to execute, point-by-point, the DBSCAN algorithm, logging your work. For example . . .

PROBLEM 6: DBScan on real data



PROBLEM 7: Hierarchical Clustering