Mining of Massive Datasets

£9.9
FREE Shipping

Mining of Massive Datasets

Mining of Massive Datasets

RRP: £99
Price: £9.9
£9.9 FREE Shipping

In stock

We accept the following payment methods

Description

Together with each chapter there is aslo a set of lecture slides that we use for teaching Stanford CS246: Mining The difference leads to a new class of algorithms for finding frequent itemsets. We begin with the A-Priori Algorithm, which works by eliminating most large sets as candidates by looking first at smaller sets and recognizing that a large set cannot be frequent unless all its subsets are. We then consider various improvements to the basic A-Priori idea, concentrating on very large data sets that stress the available main memory. CS246: Mining Massive Datasets is graduate level course that discusses data mining and machine learning algorithms for analyzing very large amounts of data. The emphasis is on Map Reduce as a tool for creating parallel algorithms that can process very large amounts of data. Familiarity with basic linear algebra (e.g., any of Math 51, Math 103, Math 113, CS 205, or EE 263 would be much more than necessary).

Although theoretical issues are discussed where relevant, the focus of the text is clearly on practical issues. Readers interested in a more rigorous treatment of the theoretical foundations for these techniques should look elsewhere. Fortunately, each chapter contains key references to guide the more formally minded reader. Knowledge of basic computer science principles and skills, at a level sufficient to write a reasonably non-trivial computer program (e.g., CS107 or CS145 or equivalent are recommended). most welcome. Please let us know if you are using these materials in your course and we will list and link to your course.The problem of finding frequent itemsets differs from the similarity search discussed in Chapter 3. Here we are interested in the absolute number of baskets that contain a particular set of items. In Chapter 3 we wanted items that have a large fraction of their baskets in common, even if the absolute number of baskets is small. Massive Datasets course. Note that the slides do not necessarily cover all the material convered in the corresponding chapters. CS341 Project in Mining Massive Data Sets is an advanced project based course. Students work on data mining and machine learning algorithms for analyzing very large amounts of data. Both interesting big datasets as well as computational infrastructure (large MapReduce cluster) are provided by course staff. It contains new material on Spark, Tensorflow, minhashing, community-finding, simrank, graph algorithms, and decision trees.

Public resources: The lecture slides and assignments will be posted online as the course progresses. We are happy for anyone to use these resources, but we cannot grade the work of any students who are not officially enrolled in the class. To begin, we introduce the “market-basket” model of data, which is essentially a many-many relationship between two kinds of elements, called “items” and “baskets,” but with some assumptions about the shape of the data. The frequent-itemsets problem is that of finding sets of items that appear in (are related to) many of the same baskets. We begin by reviewing the notions of distance measures and spaces. The two major approaches to clustering – hierarchical and point-assignment – are defined. We then turn to a discussion of the “curse of dimensionality,” which makes clustering in high-dimensional spaces difficult, but also, as we shall see, enables some simplifications if used correctly in a clustering algorithm. CS224W: Social and Information Networks is graduate level course that covers recent research on the structure and analysis of such large social and information networks and on models and algorithms that abstract their basic properties. Class explores how to practically analyze large scale network data and how to reason about it through models for network structure and evolution.Stanford Mining Massive Datasets graduate certificate by completing a sequence of four Stanford Computer Science courses. A graduate certificate is a great way to keep the skills and knowledge in your field current. More information is available at the Stanford Center for Professional Development (SCPD). Lecture slides will be posted here shortly before each lecture. If you wish to view slides further in advance, refer to 2022 course offering's slides, which are mostly similar.



  • Fruugo ID: 258392218-563234582
  • EAN: 764486781913
  • Sold by: Fruugo

Delivery & Returns

Fruugo

Address: UK
All products: Visit Fruugo Shop