CMDA 3654

Information

Topic: Intro to Data Analytics & Visualization

Lecture: Online Synchronous (MWF 1:25PM-2:15PM)

Instructor: Xin (Shayne) Xing, Email: xinxing AT vt.edu

TA: Hwasoo Shin, Email: shwasoo@vt.edu; Warren Geither, Email: wgeither@vt.edu


Syllabus

Course description & Prerequisites

Basic principles in data analytics; supervised and unsupervised statistical methods; basic deep learning methods for supervised learning; data visualization of standard-size and large size datasets; basic programming language: R.

This course is a required course for the Computational Modeling and Data Analytics Degree. The course sequence is listed at the 3000 level so that the students will have been previously exposed to introductory mathematics (linear algebra, multivariate calculus), a programming language, introductory statistics (basic mathematical statistics), and probability. Students without strong preparation in these will need to invest significant additional time to fill in the gaps.

All class materials are distributed online; for example, you may view most class notes and homework assignments on the Schedule. Canvas is used to report scores from quizzes, homework and the final project.


Recommended Text Book

An Introduction to Statistical Learning with Applications in R
Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani

Advanced topics:

The Elements of Statistical Learning: Data Mining, Inference, and Prediction
Trevor Hastie,Robert Tibshirani,Jerome Friedman

Deep Learning
Ian Goodfellow and Yoshua Bengio and Aaron Courville


In-class Quiz

The in-class quiz must be submitted on canvas within an hour after the class. If the quiz is submitted in time, it will be guaranteed to have at least 90. If the quiz is not submitted in time, it will receive a zero score. The two lowest quiz scores will be dropped.


Homework Assignments

Weekly homework assignments will be posted in the on-line Schedule and are due each Friday at the beginning of class starting from the second week, unless otherwise announced in class. Late homework are penalized, and missed homework receive zero scores. Homework assignments must be submitted at Canvas before the due time. Grades will be returned to you on Canvas.

It is expected that students will read the slides and refereed materials listed in the Schedule . Your work must be legible, include name, and be submitted in a single pdf file. You are expected to put in 6-8 hours of work outside of class. A few of you will do well with less time than this, and a few of you will need more. You must write up your final answers and write your own code: copying homework solutions is not allowed.


Final Project

There will be one final project. The final report will include a well-written pdf document including (introduction, data visualization, Model & methods, Results). You must write up your final report and code by your own input. Please see Final Project Instructions for details.


Final Exam

I will hold the following exam date for the final exam.
Exam Date: December 15, 2020
Begin Time: 7:45AM
End Time: 9:45AM


Grades

Your grade will consist of in-class quiz (5%), Homework (50%), a Final Project (25%), and a final exam (20%).


Quiz 5%
Homework 50%
Final Project 25%
Exam 20%

For each categories, the score is ranging from 0-100. For Quiz, the lowest two scores are removed. The averages of the remaining scores are calculated as the final scores for Quiz. The total score is the weighted average of scores in all categories. The total scores in 90-100 are guaranteed at least an A-. The total scores in 80-90 are guaranteed at least an B-. The total scores in 70-80 are guaranteed at least an C-. The total scores in 60 - 69 are guaranteed at least a D-. The lower bound of each interval may be expanded, which depends on the overall performance.


Academic Integrity

The Undergraduate Honor Code pledge that each member of the university community agrees to abide by states:

“As a Hokie, I will conduct myself with honor and integrity at all times. I will not lie, cheat, or steal, nor will I accept the actions of those who do.”

Students enrolled in this course are responsible for abiding by the Honor Code. A student who has doubts about how the Honor Code applies to any assignment is responsible for obtaining specific guidance from the course instructor before submitting the assignment for evaluation. Ignorance of the rules does not exclude any member of the University community from the requirements and expectations of the Honor Code. Academic integrity expectations are the same for online classes as they are for in person classes. All university policies and procedures apply in any Virginia Tech academic environment. For additional information about the Honor Code, please visit: https://www.honorsystem.vt.edu/

Honor Code Pledge for Assignments: The Virginia Tech honor code pledge for assignments is as follows:

“I have neither given nor received unauthorized assistance on this assignment.”

The pledge is to be written out on all graded assignments at the university and signed by the student. The honor pledge represents both an expression of the student’s support of the honor.

The field of Computational Modeling and Data Analytics requires professionals who act with the highest ethical standards. CMDA teaches skills that empower you to have a tremendous impact upon the world. We teach you these skills with the expectation that you will exercise them responsibly.

Responsible practice is a habit forged during your undergraduate studies. CMDA majors demonstrate their sound ethical foundation by completely adhering to the Virginia Tech Honor Code in all their courses. Please read the detailed policy at https://personal.math.vt.edu/embree/cmda_integrity.pdf

Schedule

Time Materials Homework
Week 01 (08/24-08/28) Lecture 1: Intro to data science (video)
Lecture 2: Syllabus and R installation (video)
Lecture 3: Intro to R Programming 1 (video)
Readings: Artificial Intelligence-The Revolution Hasn't Happened Yet
Install R, RStudio and knit to pdf on your laptop ( Mac/Windows)
An Introduction to R
R Markdown (cheat sheet)
Homework 01
Code
(due on 1:25PM, 9/4.)
Week 02 (08/31-09/04) Lecture 4: Intro to R Programming 2 (code) (video)
Lecture 5: Intro to R Programming 3 (code) (video)
Lecture 6: Intro to R Programming 4 (code) (video)
Readings: Find basic R function in (R basic cheat sheet)
Homework 02
Code
(due on 1:25PM, 9/11.)
Week 03 (09/09-09/13) Lecture 7: (labor day)
Lecture 8: Control Flows (code) (video)
Lecture 9: Advanced R functions (code) (video)
Readings: More on functions
Leibniz formula for pi
Homework 03
Code
(due on 1:25PM, 9/18.)
Week 04 (09/14-09/18) Lecture 10: Data Input (code)
Lecture 11: Data Cleaning (code)
Lecture 12: Case Studies (code)
Readings: Data input and Cleaning Cheatsheet
Homework 04
Code
(due on 1:25PM, 9/25.)
Week 05 (09/21-09/25) Lecture 13: Visulization in R 1 (code)
Lecture 14: Visulization in R 2 (code)
Lecture 15: Visulization in R 3 (code)
Readings: ggplot2: Elegant Graphics for Data Analysis (cheat sheet)
Homework 05
Code
(due on 1:25PM, 10/2.)
Week 06 (09/28-10/03) Lecture 16: Simple Linear Regression 1 (code)
Lecture 17: Simple Linear Regression 2 (code)
Lecture 18: Simple Linear Regression 3 (code)
Readings: Chapter 3 in An Introduction to Statistical Learning with Applications in R
Homework 06
Code
(due on 1:25PM, 10/9.)
Week 07 (10/05-10/09) Lecture 19: Multiple Linear Regression 1 (code)
Lecture 20: Multiple Linear Regression 2 (code)
Lecture 21: Multiple Linear Regression 3 (code)
Readings: Chapter 3 in An Introduction to Statistical Learning with Applications in R
Homework 07
Code
(due on 1:25PM, 10/16.)
Week 08 (10/12-10/16) Lecture 22: Logistic Regression 1 (code)
Lecture 23: Logistic Regression 2 (code)
Lecture 24: fall break
Readings: Chapter 4 in An Introduction to Statistical Learning with Applications in R
Homework 08
Code
(due on 1:25PM, 10/23.)
Week 09 (10/19-10/23) Lecture 25: Shrinkage Regression (code)
Lecture 26: LDA and QDA (code)
Lecture 27: Real Examples (code)
Readings: Chapter 4.4 and 6.2 in An Introduction to Statistical Learning with Applications in R
Homework 09
Code
(due on 1:25PM, 10/30.)
Week 10 (10/26-10/30) Lecture 28: Principal Component Analysis 1 (code)
Lecture 29: Principal Component Analysis 2 (code)
Lecture 30: Non-linear Dimension Reduction (code)
Readings: Readings: Chapter 10 in An Introduction to Statistical Learning with Applications in R
Homework 10
Code
(due on 1:25PM, 11/6.)
Week 11 (11/02-11/06) Lecture 31: Clustering via K-means (code)
Lecture 32: Hierachical Clustering (code)
Lecture 33: Heatmap and Real Examples (code)
Readings: Readings: Chapter 10 in An Introduction to Statistical Learning with Applications in R
Homework 11
Code
(due on 1:25PM, 11/13.)
Week 12 (11/09-11/13) Lecture 34: Intro to Neural Network (code)
Lecture 35: Backpropagation
Lecture 36: Stochastic Gredient Decent (code)
Readings: The MNIST database of handwritten digits
Homework 12
Code
(due on 1:25PM, 11/20.)
Week 13 (11/16-11/20) Lecture 37: Convolutional Neural Networks (code)
Lecture 38: CNN Architectures (code)
Lecture 39: More CNN Architectures (code)
Readings: 2018 Turing Award
Homework 13
Code
(due on 1:25PM, 11/30.)
Week 14 (11/30-12/04) Lecture 40: Advanced Topics in NNs 1 (code)
Lecture 41: Advanced Topics in NNs 2
Sample Exam
Final Project
Final Project Instructions
Due on Dec.9