CS 4315: Introduction to Data Mining and Information Retrieval

Lecturer: Byron Gao, bgao@txstate.edu, (512)245-0348, Comal 311D
Lectures: R 6:30-9:20pm @ ALK 119 / AVRY 366
Office hours: TR 5-6:30pm and 9:20-10:20pm
TA: Chris Bell (chris-bell@txstate.edu, Derrick M6, TR 5-6:30pm)

Textbook: required

dm Data Mining: Concepts and Techniques, 3rd edition (electronic version available from library)
Jiawei Han, Micheline Kamber and Jian Pei
Morgan Kaufmann, 2011
ISBN: 9780123814791
ir Introduction to Information Retrieval (free at http://nlp.stanford.edu/IR-book/information-retrieval-book.html)
Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schutze
Cambridge University Press, 2008
ISBN: 9780521865715

References: recommended


Academic honesty: Texas State Academic Honor Code

Outline: subject to change

Week Date Topics Readings Notes Events
1 Jan 24 Logistics; Introduction Chapter 1 ch1.pdf  
2 Jan 31 Getting to know your data Chapter 2 ch2.pdf a1.pdf (due 6pm Feb 21)
3 Feb 7 Classification Chapters 8,9 ch8-9.pdf project.pdf (due 6pm May 9th) search.pdf lyrics.csv yahooAPI.pdf
4 Feb 14        
5 Feb 21 Clustering Chapters 10,11 ch10-11.pdf a2.pdf (due 6pm Mar 7); gen.exe; input.txt
6 Feb 28        
7 Mar 7 Frequent pattern and association analysis Chapter 6 ch6.pdf a3.doc (due 6pm Mar 28)
8 Mar 14 Midterm      
9 Mar 21       Spring break, no meeting
10 Mar 28 IR and web search: introduction
Term vocabulary and postings lists
IR Chapter 1
IR Chapter 2
a4.doc (due 6pm May 2)
11 Apr 4 Dictionaries; Index construction; Index compression
Term weighting and vector space model
IR Chapters 3,4,5
IR Chapter 6
12 Apr 11 Computing Scores
Evaluation and result summaries
IR Chapter 7
IR Chapter 8
13 Apr 18 Web search basics IR Chapter 19 irch19.pdf  
14 Apr 25 Link analysis IR Chapter 21 irch21.pdf  
15 May 2 Final exam     Last lecture
16 May 9 Project presentation 8-10:30pm      
N/A N/A Data warehouse and OLAP technology
Faceted search
Chapter 4
backup materials