当前位置：Mining of massive datasetspdf电子书下载 > 外文

Mining of massive datasets

作者：Anand Rajaraman ; Jeffrey D. Ullman
出版社：Cambridge University Press
出版年份：2012
ISBN：1107015357
标注页数：316 页
PDF页数：328 页
请阅读订购服务说明与试读!

订单查询

文档类型

价格(积分)

购买连接

试读

PDF格式

立即购买

点击试读

订购服务说明

1、本站所有的书默认都是PDF格式，该格式图书只能阅读和打印，不能再次编辑。

2、除分上下册或者多册的情况下，一般PDF页数一定要大于标注页数才建议下单购买。【本资源328 ≥316页】

图书下载及付费说明

1、所有的电子图书为PDF格式，支持电脑、手机、平板等各类电子设备阅读；可以任意拷贝文件到不同的阅读设备里进行阅读。

2、电子图书在提交订单后一般半小时内处理完成,最晚48小时内处理完成。(非工作日购买会延迟)

3、所有的电子图书都是原书直接扫描方式制作而成。

1 Data Mining 1

1.1 What is Data Mining？ 1

1.2 Statistical Limits on Data Mining 4

1.3 Things Useful to Know 7

1.4 Outline of the Book 15

1.5 Summary of Chapter 1 16

1.6 References for Chapter 1 17

2 Large-Scale File Systems and Map-Reduce 18

2.1 Distributed File Systems 18

2.2 Map-Reduce 21

2.3 Algorithms Using Map-Reduce 26

2.4 Extensions to Map-Reduce 37

2.5 Efficiency of Cluster-Computing Algorithms 42

2.6 Summary of Chapter 2 49

2.7 References for Chapter 2 51

3 Finding Similar Items 53

3.1 Applications of Near-Neighbor Search 53

3.2 Shingling of Documents 57

3.3 Similarity-Preserving Summaries of Sets 60

3.4 Locality-Sensitive Hashing for Documents 67

3.5 Distance Measures 71

3.6 The Theory of Locality-Sensitive Functions 77

3.7 LSH Families for Other Distance Measures 83

3.8 Applications of Locality-Sensitive Hashing 88

3.9 Methods for High Degrees of Similarity 96

3.10 Summary of Chapter 3 104

3.11 References for Chapter 3 106

4 Mining Data Streams 108

4.1 The Stream Data Model 108

4.2 Sampling Data in a Stream 112

4.3 Filtering Streams 115

4.4 Counting Distinct Elements in a Stream 118

4.5 Estimating Moments 122

4.6 Counting Ones in a Window 127

4.7 Decaying Windows 133

4.8 Summary of Chapter 4 136

4.9 References for Chapter 4 137

5 Link Analysis 139

5.1 PageRank 139

5.2 Efficient Computation of PageRank 153

5.3 Topic-Sensitive PageRank 159

5.4 Link Spam 163

5.5 Hubs and Authorities 167

5.6 Summary of Chapter 5 172

5.7 References for Chapter 5 175

6 Frequent Itemsets 176

6.1 The Market-Basket Model 176

6.2 Market Baskets and the A-Priori Algorithm 183

6.3 Handling Larger Datasets in Main Memory 192

6.4 Limited-Pass Algorithms 199

6.5 Counting Frequent Items in a Stream 205

6.6 Summary of Chapter 6 209

6.7 References for Chapter 6 211

7 Clustering 213

7.1 Introduction to Clustering Techniques 213

7.2 Hierarchical Clustering 217

7.3 K-means Algorithms 226

7.4 The CURE Algorithm 234

7.5 Clustering in Non-Euclidean Spaces 237

7.6 Clustering for Streams and Parallelism 241

7.7 Summary of Chapter 7 247

7.8 References for Chapter 7 250

8 Advertising on the Web 252

8.1 Issues in On-Line Advertising 252

8.2 On-Line Algorithms 255

8.3 The Matching Problem 258

8.4 The Adwords Problem 261

8.5 Adwords Implementation 270

8.6 Summary of Chapter 8 273

8.7 References for Chapter 8 275

9 Recommendation Systems 277

9.1 A Model for Recommendation Systems 277

9.2 Content-Based Recommendations 281

9.3 Collaborative Filtering 291

9.4 Dimensionality Reduction 297

9.5 The NetFlix Challenge 305

9.6 Summary of Chapter 9 306

9.7 References for Chapter 9 308

Index 310