In this course, you will start by learning what is hadoop distributed file system and most common hadoop commands required to work with Hadoop File system.
Then you will be introduced to Sqoop Import
-
Understand lifecycle of sqoop command.
-
Use sqoop import command to migrate data from Mysql to HDFS.
-
Use sqoop import command to migrate data from Mysql to Hive.
-
Use various file formats, compressions, file delimeter,where clause and queries while importing the data.
-
Understand split-by and boundary queries.
-
Use incremental mode to migrate the data from Mysql to HDFS.
Further, you will learn Sqoop Export to migrate data.
-
What is sqoop export
-
Using sqoop export, migrate data from HDFS to Mysql.
-
Using sqoop export, migrate data from Hive to Mysql.
Further, you will learn about Apache Flume
-
Understand Flume Architecture.
-
Using flume, Ingest data from Twitter and save to HDFS.
-
Using flume, Ingest data from netcat and save to HDFS.
-
Using flume, Ingest data from exec and show on console.
-
Describe flume interceptors and see examples of using interceptors.
-
Flume multiple agents
-
Flume Consolidation.
In the next section, we will learn about Apache Hive
-
Hive Intro
-
External & Managed Tables
-
Working with Different Files – Parquet,Avro
-
Compressions
-
Hive Analysis
-
Hive String Functions
-
Hive Date Functions
-
Partitioning
-
Bucketing
Finally You will learn about Apache Spark
-
Spark Intro
-
Cluster Overview
-
RDD
-
DAG/Stages/Tasks
-
Actions & Transformations
-
Transformation & Action Examples
-
Spark Data frames
-
Spark Data frames – working with diff File Formats & Compression
-
Dataframes API’s
-
Spark SQL
-
Dataframe Examples
-
Spark with Cassandra Integration
Sqoop Import
Sqoop Export
-
9Sqoop Introduction
-
10Managing Target Directories
-
11Working with Parquet File Format
-
12Working with Avro File Format
-
13Working with Different Compressions
-
14Conditional Imports
-
15Split-by and Boundary Queries
-
16Field delimeters
-
17Incremental Appends
-
18Sqoop-Hive Cluster Fix
-
19Sqoop Hive Import
-
20Sqoop List Tables/Database
-
21Sqoop Assignment1
-
22Sqoop Assignment2
-
23Sqoop Import Practice1
-
24Sqoop Import Practice2
Apache Flume
Apache Hive
Spark Introduction
-
37Hive Introduction
-
38Hive Database
-
39Hive Managed Tables
-
40Hive External Tables
-
41Hive Inserts
-
42Hive Analytics
-
43Working with Parquet
-
44Compressing Parquet
-
45Working with Fixed File Format
-
46Alter Command
-
47Hive String Functions
-
48Hive Date Functions
-
49Hive Partitioning
-
50Hive Bucketing
Spark : Transformation & Actions
Spark RDD Practice
-
56Map/FlatMap Transformation
-
57Filter/Intersection
-
58Union/Distinct Transformation
-
59GroupByKey/ Group people based on Birthday months
-
60ReduceByKey / Total Number of students in each Subject
-
61SortByKey / Sort students based on their rollno
-
62MapPartition / MapPartitionWithIndex
-
63Change number of Partitions
-
64Join / join email address based on customer name
-
65Spark Actions
Spark Dataframes & Spark SQL
Spark with Cassandra
-
72Dataframe Intro
-
73Dafaframe from Json Files
-
74Dataframe from Parquet Files
-
75Dataframe from CSV Files
-
76Dataframe from Avro File
-
77Working with XML
-
78Working with Columns
-
79Working with String
-
80Working with Dates
-
81Dataframe Filter API
-
82DataFrame API Part1
-
83DataFrame API Part2
-
84Spark SQL
-
85Working with Hive Tables in Spark