
- 作 者:Anand Rajaraman ; Jeffrey D. Ullman
- 出 版 社:Cambridge University Press
- 出版年份:2012
- ISBN:1107015357
- 标注页数:316 页
- PDF页数:328 页
请阅读订购服务说明与试读!
订购服务说明
1、本站所有的书默认都是PDF格式,该格式图书只能阅读和打印,不能再次编辑。
2、除分上下册或者多册的情况下,一般PDF页数一定要大于标注页数才建议下单购买。【本资源328 ≥316页】
图书下载及付费说明
1、所有的电子图书为PDF格式,支持电脑、手机、平板等各类电子设备阅读;可以任意拷贝文件到不同的阅读设备里进行阅读。
2、电子图书在提交订单后一般半小时内处理完成,最晚48小时内处理完成。(非工作日购买会延迟)
3、所有的电子图书都是原书直接扫描方式制作而成。
1 Data Mining 1
1.1 What is Data Mining? 1
1.2 Statistical Limits on Data Mining 4
1.3 Things Useful to Know 7
1.4 Outline of the Book 15
1.5 Summary of Chapter 1 16
1.6 References for Chapter 1 17
2 Large-Scale File Systems and Map-Reduce 18
2.1 Distributed File Systems 18
2.2 Map-Reduce 21
2.3 Algorithms Using Map-Reduce 26
2.4 Extensions to Map-Reduce 37
2.5 Efficiency of Cluster-Computing Algorithms 42
2.6 Summary of Chapter 2 49
2.7 References for Chapter 2 51
3 Finding Similar Items 53
3.1 Applications of Near-Neighbor Search 53
3.2 Shingling of Documents 57
3.3 Similarity-Preserving Summaries of Sets 60
3.4 Locality-Sensitive Hashing for Documents 67
3.5 Distance Measures 71
3.6 The Theory of Locality-Sensitive Functions 77
3.7 LSH Families for Other Distance Measures 83
3.8 Applications of Locality-Sensitive Hashing 88
3.9 Methods for High Degrees of Similarity 96
3.10 Summary of Chapter 3 104
3.11 References for Chapter 3 106
4 Mining Data Streams 108
4.1 The Stream Data Model 108
4.2 Sampling Data in a Stream 112
4.3 Filtering Streams 115
4.4 Counting Distinct Elements in a Stream 118
4.5 Estimating Moments 122
4.6 Counting Ones in a Window 127
4.7 Decaying Windows 133
4.8 Summary of Chapter 4 136
4.9 References for Chapter 4 137
5 Link Analysis 139
5.1 PageRank 139
5.2 Efficient Computation of PageRank 153
5.3 Topic-Sensitive PageRank 159
5.4 Link Spam 163
5.5 Hubs and Authorities 167
5.6 Summary of Chapter 5 172
5.7 References for Chapter 5 175
6 Frequent Itemsets 176
6.1 The Market-Basket Model 176
6.2 Market Baskets and the A-Priori Algorithm 183
6.3 Handling Larger Datasets in Main Memory 192
6.4 Limited-Pass Algorithms 199
6.5 Counting Frequent Items in a Stream 205
6.6 Summary of Chapter 6 209
6.7 References for Chapter 6 211
7 Clustering 213
7.1 Introduction to Clustering Techniques 213
7.2 Hierarchical Clustering 217
7.3 K-means Algorithms 226
7.4 The CURE Algorithm 234
7.5 Clustering in Non-Euclidean Spaces 237
7.6 Clustering for Streams and Parallelism 241
7.7 Summary of Chapter 7 247
7.8 References for Chapter 7 250
8 Advertising on the Web 252
8.1 Issues in On-Line Advertising 252
8.2 On-Line Algorithms 255
8.3 The Matching Problem 258
8.4 The Adwords Problem 261
8.5 Adwords Implementation 270
8.6 Summary of Chapter 8 273
8.7 References for Chapter 8 275
9 Recommendation Systems 277
9.1 A Model for Recommendation Systems 277
9.2 Content-Based Recommendations 281
9.3 Collaborative Filtering 291
9.4 Dimensionality Reduction 297
9.5 The NetFlix Challenge 305
9.6 Summary of Chapter 9 306
9.7 References for Chapter 9 308
Index 310