Generated on Tue Oct 25 10:19:18 2022

CS 488: Introduction to Data Mining (JSON)

Catalog description: Techniques for exploring large data sets and discovering patterns in them. Data mining concepts, metrics to measure its effectiveness. Methods in classification, clustering, frequent pattern analysis. Selected topics from current advances in data mining.

Prerequisites: At least a C- in C S 272 and C S 278.    (Catalog Link)

Credits: 3 (3)

Coordinator: Tuan Le

Textbook: Introduction to Data Mining (2nd Edition), by Pang-Ning Tan, Michael Steinbach, Anuj Karpatne, and Vipin Kumar. Pearson, ISBN: 978-0133128901; or Data Mining: Concepts and Techniques (3rd Edition), by Jiawei Han, Micheline Kamber and Jian Pei. Morgan Kaufmann Publishers, ISBN: 978-0123814791; or similar textbooks
    (also: )

BS degree role: selected elective

Course Learning Objectives

  1. Explain and recognize different data mining tasks such as data pre-processing, visualization, classification, regression, clustering, association rules, and anomaly detection
  2. Apply classical data mining / machine learning algorithms for classification, clustering, association rules, and anomaly detection
  3. Evaluate and compare the performance of different data mining / machine learning algorithms
  4. Utilize data mining algorithms to analyze data in real applications using a data mining tool

Course Practicum Requirements

  1. Be able to use programming tools (e.g. Python/Java) for solving data mining tasks
  2. Be able to use popular data mining packages and libraries (e.g., NumPy, SciPy, Pandas, Scikit-learn, Matplotlib, or Weka)
  3. Be able to obtain, prepare, and process real-world datasets using tools (e.g., Beautiful Soup).

Course Topics

  1. Data, Data Pre-processing, Proximity
  2. Regression: Linear Regression
  3. Classification: Decision Trees, kNN, SVM, Naïve Bayes, Bagging, Boosting, Random Forests
  4. Clustering: k-means, Hierarchical Clustering, Gaussian Mixture Models, DBSCAN
  5. Association Analysis: Apriori, FP-Growth, GSP
  6. Anomaly Detection: Distance-based Approaches, Density-based Approaches, Clustering-based Approaches

Course Improvement Decisions

(Course improvement decisions or recommendations from past assessments)

  1. none

ABET Outcome Coverage

(Provide Mapping to ABET Student Outcomes)

  1. TBD

Other Notes

(Any important notes or issues to consider)

  1. none