CS 4315: Introduction to Data Mining and Information Retrieval

Lecturer: Byron Gao, bgao@txstate.edu, (512)245-0348, Comal 311D
Lectures: R 6:30-9:20pm @ Derr 235 / AVRY 364
Office hours: MR 5-6:30pm and 9:20-10:20pm
TA: Bin Duan (bin.duan@txstate.edu, Comal 303, R 1-5pm)

Textbook: required

dm Data Mining: Concepts and Techniques, 3rd edition (electronic version available from library)
Jiawei Han, Micheline Kamber and Jian Pei
Morgan Kaufmann, 2011
ISBN: 9780123814791
ir Introduction to Information Retrieval (free at http://nlp.stanford.edu/IR-book/information-retrieval-book.html)
Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schutze
Cambridge University Press, 2008
ISBN: 9780521865715

References: recommended


Grading:  

Academic honesty: Texas State Academic Honor Code

Outline: subject to change

Week Date Topics Readings Notes Events
1 Jan 23 Logistics; Introduction Chapter 1 ch1.pdf  
2 Jan 30 Getting to know your data Chapter 2 ch2.pdf a1.pdf (due 6pm Feb 20)
3 Feb 6 Classification Chapters 8,9 ch8-9.pdf search.pdf (due 6pm April 30) lyrics.csv yahooAPI.pdf
4 Feb 13        
5 Feb 20 Clustering Chapters 10,11 ch10-11.pdf a2.pdf (due 6pm Mar 5); gen.exe; input.txt
6 Feb 27        
7 Mar 5 Frequent pattern and association analysis Chapter 6 ch6.pdf a3.doc (due 6pm Mar 26)
8 Mar 12 Midterm      
9 Mar 19       Spring break, no meeting
10 Mar 26 IR and web search: introduction
Term vocabulary and postings lists
IR Chapter 1
IR Chapter 2
irch1.pdf
irch2.pdf
a4.doc (due 6pm Apr 30)
11 Apr 2 Dictionaries; Index construction; Index compression
Term weighting and vector space model
IR Chapters 3,4,5
IR Chapter 6
irch3-4-5.pdf
irch6.pdf
 
9 Apr 9 Computing Scores
Evaluation and result summaries
IR Chapter 7
IR Chapter 8
irch7.pdf
irch8.pdf
 
13 Apr 16 Web search basics IR Chapter 19 irch19.pdf  
14 Apr 23 Link analysis IR Chapter 21 irch21.pdf  
15 Apr 30       Last lecture
16 May 7 Fina exam 8-10:30pm      
N/A N/A Data warehouse and OLAP technology
Faceted search
Chapter 4
 
ch4.pdf
facetedsearch.pdf
backup materials

Resources: