The Ultimate Hands-On Hadoop: Tame your Big Data!
- الوصف
- أقسام الدرس
- رأي
The world of Hadoop and “Big Data” can be intimidating – hundreds of different technologies with cryptic names form the Hadoop ecosystem. With this Hadoop tutorial, you’ll not only understand what those systems are and how they fit together – but you’ll go hands-on and learn how to use them to solve real business problems!
Learn and master the most popular data engineering technologies in this comprehensive course, taught by a former engineer and senior manager from Amazon and IMDb. We’ll go way beyond Hadoop itself, and dive into all sorts of distributed systems you may need to integrate with.
- Install and work with a real Hadoop installation right on your desktop with Hortonworks (now part of Cloudera) and the Ambari UI
- Manage big data on a cluster with HDFS and MapReduce
- Write programs to analyze data on Hadoop with Pig and Spark
- Store and query your data with Sqoop, Hive, MySQL, HBase, Cassandra, MongoDB, Drill, Phoenix, and Presto
- Design real-world systems using the Hadoop ecosystem
- Learn how your cluster is managed with YARN, Mesos, Zookeeper, Oozie, Zeppelin, and Hue
- Handle streaming data in real time with Kafka, Flume, Spark Streaming, Flink, and Storm
Spark and Hadoop developers are hugely valued at companies with large amounts of data; these are very marketable skills to learn.
Almost every large company you might want to work at uses Hadoop in some way, including Amazon, Ebay, Facebook, Google, LinkedIn, IBM, Spotify, Twitter, and Yahoo! And it’s not just technology companies that need Hadoop; even the New York Times uses Hadoop for processing images.
This course is comprehensive, covering over 25 different technologies in over 14 hours of video lectures. It’s filled with hands-on activities and exercises, so you get some real experience in using Hadoop – it’s not just theory.
You’ll find a range of activities in this course for people at every level. If you’re a project manager who just wants to learn the buzzwords, there are web UI’s for many of the activities in the course that require no programming knowledge. If you’re comfortable with command lines, we’ll show you how to work with them too. And if you’re a programmer, I’ll challenge you with writing real scripts on a Hadoop system using Scala, Pig Latin, and Python.
You’ll walk away from this course with a real, deep understanding of Hadoop and its associated distributed systems, and you can apply Hadoop to real-world problems. Plus a valuable completion certificate is waiting for you at the end!
Please note the focus on this course is on application development, not Hadoop administration. Although you will pick up some administration skills along the way.
Knowing how to wrangle “big data” is an incredibly valuable skill for today’s top tech employers. Don’t be left behind – enroll now!
- “The Ultimate Hands-On Hadoop… was a crucial discovery for me. I supplemented your course with a bunch of literature and conferences until I managed to land an interview. I can proudly say that I landed a job as a Big Data Engineer around a year after I started your course. Thanks so much for all the great content you have generated and the crystal clear explanations. ” – Aldo Serrano
- “I honestly wouldn’t be where I am now without this course. Frank makes the complex simple by helping you through the process every step of the way. Highly recommended and worth your time especially the Spark environment. This course helped me achieve a far greater understanding of the environment and its capabilities. Frank makes the complex simple by helping you through the process every step of the way. Highly recommended and worth your time especially the Spark environment.” – Tyler Buck
-
1Udemy 101: Getting the Most From This Courseدرس فيديو
-
2Tips for Using This Courseدرس فيديو
-
3If you have trouble downloading Hortonworks Data Platform...درس نصي
-
4Warning for Apple M1 usersدرس نصي
-
5Installing Hadoop [Step by Step]درس فيديو
-
6The Hortonworks and Cloudera Merger, and how it affects this course.درس فيديو
-
7Hadoop Overview and Historyدرس فيديو
-
8Overview of the Hadoop Ecosystemدرس فيديو
-
9Important noteدرس نصي
-
10HDFS: What it is, and how it worksدرس فيديو
-
11Alternate MovieLens download locationدرس نصي
-
12Installing the MovieLens Datasetدرس فيديو
-
13[Activity] Install the MovieLens dataset into HDFS using the command lineدرس فيديو
-
14MapReduce: What it is, and how it worksدرس فيديو
-
15How MapReduce distributes processingدرس فيديو
-
16MapReduce example: Break down movie ratings by rating scoreدرس فيديو
-
17[Activity] Install Python, MRJob, and nanoدرس نصي
-
18[Activity] Code up the ratings histogram MapReduce job and run itدرس فيديو
-
19[Exercise] Rank movies by their popularityدرس فيديو
-
20Note: Sorting will only work by partition.درس نصي
-
21[Activity] Check your results against mine!درس فيديو
-
22Introducing Ambariدرس فيديو
-
23Introducing Pigدرس فيديو
-
24Example: Find the oldest movie with a 5-star rating using Pigدرس فيديو
-
25[Activity] Find old 5-star movies with Pigدرس فيديو
-
26More Pig Latinدرس فيديو
-
27[Exercise] Find the most-rated one-star movieدرس فيديو
-
28Pig Challenge: Compare Your Results to Mine!درس فيديو
-
29Why Spark?درس فيديو
-
30The Resilient Distributed Dataset (RDD)درس فيديو
-
31[Activity] Find the movie with the lowest average rating - with RDD'sدرس فيديو
-
32Datasets and Spark 2.0درس فيديو
-
33[Activity] Find the movie with the lowest average rating - with DataFramesدرس فيديو
-
34[Activity] Movie recommendations with MLLibدرس فيديو
-
35[Exercise] Filter the lowest-rated movies by number of ratingsدرس فيديو
-
36[Activity] Check your results against mine!درس فيديو
-
37What is Hive?درس فيديو
-
38[Activity] Use Hive to find the most popular movieدرس فيديو
-
39How Hive worksدرس فيديو
-
40[Exercise] Use Hive to find the movie with the highest average ratingدرس فيديو
-
41Compare your solution to mine.درس فيديو
-
42Integrating MySQL with Hadoopدرس فيديو
-
43Cheat sheet for the following lectureدرس نصي
-
44[Activity] Install MySQL and import our movie dataدرس فيديو
-
45[Activity] Use Sqoop to import data from MySQL to HFDS/Hiveدرس فيديو
-
46[Activity] Use Sqoop to export data from Hadoop to MySQLدرس فيديو
-
47Why NoSQL?درس فيديو
-
48What is HBaseدرس فيديو
-
49[Activity] Import movie ratings into HBaseدرس فيديو
-
50[Activity] Use HBase with Pig to import data at scale.درس فيديو
-
51Cassandra overviewدرس فيديو
-
52If you have trouble installing Cassandra...درس نصي
-
53[Activity] Installing Cassandraدرس فيديو
-
54[Activity] Write Spark output into Cassandraدرس فيديو
-
55MongoDB overviewدرس فيديو
-
56[Activity] Install MongoDB, and integrate Spark with MongoDBدرس فيديو
-
57[Activity] Using the MongoDB shellدرس فيديو
-
58Choosing a database technologyدرس فيديو
-
59[Exercise] Choose a database for a given problemدرس فيديو
-
60Overview of Drillدرس فيديو
-
61[Activity] Setting up Drillدرس فيديو
-
62[Activity] Querying across multiple databases with Drillدرس فيديو
-
63Overview of Phoenixدرس فيديو
-
64[Activity] Install Phoenix and query HBase with itدرس فيديو
-
65[Activity] Integrate Phoenix with Pigدرس فيديو
-
66Overview of Prestoدرس فيديو
-
67[Activity] Install Presto, and query Hive with it.درس فيديو
-
68[Activity] Query both Cassandra and Hive using Presto.درس فيديو
-
69YARN explainedدرس فيديو
-
70Tez explainedدرس فيديو
-
71[Activity] Use Hive on Tez and measure the performance benefitدرس فيديو
-
72Mesos explainedدرس فيديو
-
73ZooKeeper explainedدرس فيديو
-
74[Activity] Simulating a failing master with ZooKeeperدرس فيديو
-
75Oozie explainedدرس فيديو
-
76[Activity] Set up a simple Oozie workflowدرس فيديو
-
77Zeppelin overviewدرس فيديو
-
78[Activity] Use Zeppelin to analyze movie ratings, part 1درس فيديو
-
79[Activity] Use Zeppelin to analyze movie ratings, part 2درس فيديو
-
80Hue overviewدرس فيديو
-
81Other technologies worth mentioningدرس فيديو
-
82Kafka explainedدرس فيديو
-
83[Activity] Setting up Kafka, and publishing some data.درس فيديو
-
84[Activity] Publishing web logs with Kafkaدرس فيديو
-
85Flume explainedدرس فيديو
-
86[Activity] Set up Flume and publish logs with it.درس فيديو
-
87[Activity] Set up Flume to monitor a directory and store its data in HDFSدرس فيديو
-
88Spark Streaming: Introductionدرس فيديو
-
89[Activity] Analyze web logs published with Flume using Spark Streamingدرس فيديو
-
90[Exercise] Monitor Flume-published logs for errors in real timeدرس فيديو
-
91Exercise solution: Aggregating HTTP access codes with Spark Streamingدرس فيديو
-
92Apache Storm: Introductionدرس فيديو
-
93[Activity] Count words with Stormدرس فيديو
-
94Flink: An Overviewدرس فيديو
-
95[Activity] Counting words with Flinkدرس فيديو
-
96The Best of the Restدرس فيديو
-
97Review: How the pieces fit togetherدرس فيديو
-
98Understanding your requirementsدرس فيديو
-
99Sample application: consume webserver logs and keep track of top-sellersدرس فيديو
-
100Sample application: serving movie recommendations to a websiteدرس فيديو
-
101[Exercise] Design a system to report web sessions per dayدرس فيديو
-
102Exercise solution: Design a system to count daily sessionsدرس فيديو