The Ultimate Hands-On Hadoop: Tame your Big Data!
- Objectifs pédagogiques
- Sections du cours
- Avis
The world of Hadoop and « Big Data » can be intimidating – hundreds of different technologies with cryptic names form the Hadoop ecosystem. With this Hadoop tutorial, you’ll not only understand what those systems are and how they fit together – but you’ll go hands-on and learn how to use them to solve real business problems!
Learn and master the most popular data engineering technologies in this comprehensive course, taught by a former engineer and senior manager from Amazon and IMDb. We’ll go way beyond Hadoop itself, and dive into all sorts of distributed systems you may need to integrate with.
- Install and work with a real Hadoop installation right on your desktop with Hortonworks (now part of Cloudera) and the Ambari UI
- Manage big data on a cluster with HDFS and MapReduce
- Write programs to analyze data on Hadoop with Pig and Spark
- Store and query your data with Sqoop, Hive, MySQL, HBase, Cassandra, MongoDB, Drill, Phoenix, and Presto
- Design real-world systems using the Hadoop ecosystem
- Learn how your cluster is managed with YARN, Mesos, Zookeeper, Oozie, Zeppelin, and Hue
- Handle streaming data in real time with Kafka, Flume, Spark Streaming, Flink, and Storm
Spark and Hadoop developers are hugely valued at companies with large amounts of data; these are very marketable skills to learn.
Almost every large company you might want to work at uses Hadoop in some way, including Amazon, Ebay, Facebook, Google, LinkedIn, IBM, Spotify, Twitter, and Yahoo! And it’s not just technology companies that need Hadoop; even the New York Times uses Hadoop for processing images.
This course is comprehensive, covering over 25 different technologies in over 14 hours of video lectures. It’s filled with hands-on activities and exercises, so you get some real experience in using Hadoop – it’s not just theory.
You’ll find a range of activities in this course for people at every level. If you’re a project manager who just wants to learn the buzzwords, there are web UI’s for many of the activities in the course that require no programming knowledge. If you’re comfortable with command lines, we’ll show you how to work with them too. And if you’re a programmer, I’ll challenge you with writing real scripts on a Hadoop system using Scala, Pig Latin, and Python.
You’ll walk away from this course with a real, deep understanding of Hadoop and its associated distributed systems, and you can apply Hadoop to real-world problems. Plus a valuable completion certificate is waiting for you at the end!
Please note the focus on this course is on application development, not Hadoop administration. Although you will pick up some administration skills along the way.
Knowing how to wrangle « big data » is an incredibly valuable skill for today’s top tech employers. Don’t be left behind – enroll now!
- « The Ultimate Hands-On Hadoop… was a crucial discovery for me. I supplemented your course with a bunch of literature and conferences until I managed to land an interview. I can proudly say that I landed a job as a Big Data Engineer around a year after I started your course. Thanks so much for all the great content you have generated and the crystal clear explanations. » – Aldo Serrano
- « I honestly wouldn’t be where I am now without this course. Frank makes the complex simple by helping you through the process every step of the way. Highly recommended and worth your time especially the Spark environment. This course helped me achieve a far greater understanding of the environment and its capabilities. Frank makes the complex simple by helping you through the process every step of the way. Highly recommended and worth your time especially the Spark environment. » – Tyler Buck
-
1Udemy 101: Getting the Most From This CourseLeçon vidéo
-
2Tips for Using This CourseLeçon vidéo
-
3If you have trouble downloading Hortonworks Data Platform...Leçon de texte
-
4Warning for Apple M1 usersLeçon de texte
-
5Installing Hadoop [Step by Step]Leçon vidéo
-
6The Hortonworks and Cloudera Merger, and how it affects this course.Leçon vidéo
-
7Hadoop Overview and HistoryLeçon vidéo
-
8Overview of the Hadoop EcosystemLeçon vidéo
-
9Important noteLeçon de texte
-
10HDFS: What it is, and how it worksLeçon vidéo
-
11Alternate MovieLens download locationLeçon de texte
-
12Installing the MovieLens DatasetLeçon vidéo
-
13[Activity] Install the MovieLens dataset into HDFS using the command lineLeçon vidéo
-
14MapReduce: What it is, and how it worksLeçon vidéo
-
15How MapReduce distributes processingLeçon vidéo
-
16MapReduce example: Break down movie ratings by rating scoreLeçon vidéo
-
17[Activity] Install Python, MRJob, and nanoLeçon de texte
-
18[Activity] Code up the ratings histogram MapReduce job and run itLeçon vidéo
-
19[Exercise] Rank movies by their popularityLeçon vidéo
-
20Note: Sorting will only work by partition.Leçon de texte
-
21[Activity] Check your results against mine!Leçon vidéo
-
22Introducing AmbariLeçon vidéo
-
23Introducing PigLeçon vidéo
-
24Example: Find the oldest movie with a 5-star rating using PigLeçon vidéo
-
25[Activity] Find old 5-star movies with PigLeçon vidéo
-
26More Pig LatinLeçon vidéo
-
27[Exercise] Find the most-rated one-star movieLeçon vidéo
-
28Pig Challenge: Compare Your Results to Mine!Leçon vidéo
-
29Why Spark?Leçon vidéo
-
30The Resilient Distributed Dataset (RDD)Leçon vidéo
-
31[Activity] Find the movie with the lowest average rating - with RDD'sLeçon vidéo
-
32Datasets and Spark 2.0Leçon vidéo
-
33[Activity] Find the movie with the lowest average rating - with DataFramesLeçon vidéo
-
34[Activity] Movie recommendations with MLLibLeçon vidéo
-
35[Exercise] Filter the lowest-rated movies by number of ratingsLeçon vidéo
-
36[Activity] Check your results against mine!Leçon vidéo
-
37What is Hive?Leçon vidéo
-
38[Activity] Use Hive to find the most popular movieLeçon vidéo
-
39How Hive worksLeçon vidéo
-
40[Exercise] Use Hive to find the movie with the highest average ratingLeçon vidéo
-
41Compare your solution to mine.Leçon vidéo
-
42Integrating MySQL with HadoopLeçon vidéo
-
43Cheat sheet for the following lectureLeçon de texte
-
44[Activity] Install MySQL and import our movie dataLeçon vidéo
-
45[Activity] Use Sqoop to import data from MySQL to HFDS/HiveLeçon vidéo
-
46[Activity] Use Sqoop to export data from Hadoop to MySQLLeçon vidéo
-
47Why NoSQL?Leçon vidéo
-
48What is HBaseLeçon vidéo
-
49[Activity] Import movie ratings into HBaseLeçon vidéo
-
50[Activity] Use HBase with Pig to import data at scale.Leçon vidéo
-
51Cassandra overviewLeçon vidéo
-
52If you have trouble installing Cassandra...Leçon de texte
-
53[Activity] Installing CassandraLeçon vidéo
-
54[Activity] Write Spark output into CassandraLeçon vidéo
-
55MongoDB overviewLeçon vidéo
-
56[Activity] Install MongoDB, and integrate Spark with MongoDBLeçon vidéo
-
57[Activity] Using the MongoDB shellLeçon vidéo
-
58Choosing a database technologyLeçon vidéo
-
59[Exercise] Choose a database for a given problemLeçon vidéo
-
60Overview of DrillLeçon vidéo
-
61[Activity] Setting up DrillLeçon vidéo
-
62[Activity] Querying across multiple databases with DrillLeçon vidéo
-
63Overview of PhoenixLeçon vidéo
-
64[Activity] Install Phoenix and query HBase with itLeçon vidéo
-
65[Activity] Integrate Phoenix with PigLeçon vidéo
-
66Overview of PrestoLeçon vidéo
-
67[Activity] Install Presto, and query Hive with it.Leçon vidéo
-
68[Activity] Query both Cassandra and Hive using Presto.Leçon vidéo
-
69YARN explainedLeçon vidéo
-
70Tez explainedLeçon vidéo
-
71[Activity] Use Hive on Tez and measure the performance benefitLeçon vidéo
-
72Mesos explainedLeçon vidéo
-
73ZooKeeper explainedLeçon vidéo
-
74[Activity] Simulating a failing master with ZooKeeperLeçon vidéo
-
75Oozie explainedLeçon vidéo
-
76[Activity] Set up a simple Oozie workflowLeçon vidéo
-
77Zeppelin overviewLeçon vidéo
-
78[Activity] Use Zeppelin to analyze movie ratings, part 1Leçon vidéo
-
79[Activity] Use Zeppelin to analyze movie ratings, part 2Leçon vidéo
-
80Hue overviewLeçon vidéo
-
81Other technologies worth mentioningLeçon vidéo
-
82Kafka explainedLeçon vidéo
-
83[Activity] Setting up Kafka, and publishing some data.Leçon vidéo
-
84[Activity] Publishing web logs with KafkaLeçon vidéo
-
85Flume explainedLeçon vidéo
-
86[Activity] Set up Flume and publish logs with it.Leçon vidéo
-
87[Activity] Set up Flume to monitor a directory and store its data in HDFSLeçon vidéo
-
88Spark Streaming: IntroductionLeçon vidéo
-
89[Activity] Analyze web logs published with Flume using Spark StreamingLeçon vidéo
-
90[Exercise] Monitor Flume-published logs for errors in real timeLeçon vidéo
-
91Exercise solution: Aggregating HTTP access codes with Spark StreamingLeçon vidéo
-
92Apache Storm: IntroductionLeçon vidéo
-
93[Activity] Count words with StormLeçon vidéo
-
94Flink: An OverviewLeçon vidéo
-
95[Activity] Counting words with FlinkLeçon vidéo
-
96The Best of the RestLeçon vidéo
-
97Review: How the pieces fit togetherLeçon vidéo
-
98Understanding your requirementsLeçon vidéo
-
99Sample application: consume webserver logs and keep track of top-sellersLeçon vidéo
-
100Sample application: serving movie recommendations to a websiteLeçon vidéo
-
101[Exercise] Design a system to report web sessions per dayLeçon vidéo
-
102Exercise solution: Design a system to count daily sessionsLeçon vidéo