4.44 sur 5
4.44
341 Commentaires sur Udemy

Master Big Data – Apache Spark/Hadoop/Sqoop/Hive/Flume

In-depth course on Big Data - Apache Spark , Hadoop , Sqoop , Flume & Apache Hive, Big Data Cluster setup
Hadoop distributed File system and commands. Lifecycle of sqoop command. Sqoop import command to migrate data from Mysql to HDFS. Sqoop import command to migrate data from Mysql to Hive. Working with various file formats, compressions, file delimeter,where clause and queries while importing the data. Understand split-by and boundary queries. Use incremental mode to migrate the data from Mysql to HDFS. Using sqoop export, migrate data from HDFS to Mysql. Using sqoop export, migrate data from Hive to Mysql. Understand Flume Architecture. Using flume, Ingest data from Twitter and save to HDFS. Using flume, Ingest data from netcat and save to HDFS. Using flume, Ingest data from exec and show on console. Flume Interceptors.

In this course, you will start by learning what is hadoop distributed file system and most common hadoop commands required to work with Hadoop File system.

Then you will be introduced to Sqoop Import

  • Understand lifecycle of sqoop command.

  • Use sqoop import command to migrate data from Mysql to HDFS.

  • Use sqoop import command to migrate data from Mysql to Hive.

  • Use various file formats, compressions, file delimeter,where clause and queries while importing the data.

  • Understand split-by and boundary queries.

  • Use incremental mode to migrate the data from Mysql to HDFS.

Further, you will learn Sqoop Export to migrate data.

  • What is sqoop export

  • Using sqoop export, migrate data from HDFS to Mysql.

  • Using sqoop export, migrate data from Hive to Mysql.

Further, you will learn about Apache Flume

  • Understand Flume Architecture.

  • Using flume, Ingest data from Twitter and save to HDFS.

  • Using flume, Ingest data from netcat and save to HDFS.

  • Using flume, Ingest data from exec and show on console.

  • Describe flume interceptors and see examples of using interceptors.

  • Flume multiple agents

  • Flume Consolidation.

In the next section, we will learn about Apache Hive

  • Hive Intro

  • External & Managed Tables

  • Working with Different Files – Parquet,Avro

  • Compressions

  • Hive Analysis

  • Hive String Functions

  • Hive Date Functions

  • Partitioning

  • Bucketing

Finally You will learn about Apache Spark

  • Spark Intro

  • Cluster Overview

  • RDD

  • DAG/Stages/Tasks

  • Actions & Transformations

  • Transformation & Action Examples

  • Spark Data frames

  • Spark Data frames – working with diff File Formats & Compression

  • Dataframes API’s

  • Spark SQL

  • Dataframe Examples

  • Spark with Cassandra Integration

Hadoop Introduction

1
Course Intro
2
Big Data Intro
3
HDFS and Hadoop Commands
4
Yarn Cluster Overview
5
Cloudera vm setup
6
Cluster Setup on Google Cloud
7
GCP Cluster Fixes
8
Environment Update

Sqoop Import

1
Sqoop Introduction
2
Managing Target Directories
3
Working with Parquet File Format
4
Working with Avro File Format
5
Working with Different Compressions
6
Conditional Imports
7
Split-by and Boundary Queries
8
Field delimeters
9
Incremental Appends
10
Sqoop-Hive Cluster Fix
11
Sqoop Hive Import
12
Sqoop List Tables/Database
13
Sqoop Assignment1
14
Sqoop Assignment2
15
Sqoop Import Practice1
16
Sqoop Import Practice2

Sqoop Export

1
Export from Hdfs to Mysql
2
Export from Hive to Mysql
3
Export Avro Compressed to Mysql
4
Bonus Lecture: Sqoop with Airflow

Apache Flume

1
Flume Introduction & Architecture
2
Exec Source and Logger Sink
3
Moving data from Twitter to HDFS
4
Moving data from NetCat to HDFS
5
Flume Interceptors
6
Flume Interceptor Example
7
Flume Multi-Agent Flow
8
Flume Consolidation

Apache Hive

1
Hive Introduction
2
Hive Database
3
Hive Managed Tables
4
Hive External Tables
5
Hive Inserts
6
Hive Analytics
7
Working with Parquet
8
Compressing Parquet
9
Working with Fixed File Format
10
Alter Command
11
Hive String Functions
12
Hive Date Functions
13
Hive Partitioning
14
Hive Bucketing

Spark Introduction

1
Spark Intro
2
Resilient Distributed Datasets
3
Cluster Overview
4
DAG Overview
5
Spark on GCS Cluster

Spark : Transformation & Actions

1
Map/FlatMap Transformation
2
Filter/Intersection
3
Union/Distinct Transformation
4
GroupByKey/ Group people based on Birthday months
5
ReduceByKey / Total Number of students in each Subject
6
SortByKey / Sort students based on their rollno
7
MapPartition / MapPartitionWithIndex
8
Change number of Partitions
9
Join / join email address based on customer name
10
Spark Actions

Spark RDD Practice

1
Scala Tuples
2
Filter Error Logs
3
Frequency of word in Text File
4
Population of each city
5
Orders placed by Customers
6
average rating of movie

Spark Dataframes & Spark SQL

1
Dataframe Intro
2
Dafaframe from Json Files
3
Dataframe from Parquet Files
4
Dataframe from CSV Files
5
Dataframe from Avro File
6
Working with XML
7
Working with Columns
8
Working with String
9
Working with Dates
10
Dataframe Filter API
11
DataFrame API Part1
12
DataFrame API Part2
13
Spark SQL
14
Working with Hive Tables in Spark

Spark with Cassandra

1
Creating Spark RDD from Cassandra Table
2
Processing Cassandra data in Spark
3
Cassandra Rows to Case Class
4
Saving Spark RDD to Cassandra
4.4
4.4 sur 5
Notes341

Détails des Notes

Étoiles 5
139
Étoiles 4
134
Étoiles 3
48
Étoiles 2
14
Étoiles 1
4
8ee28c188495656c15896dc70df70a95
Garantie de remboursement de 30 jours

Inclut

8 heures de vidéo à la demande
Durée totale du cours 
Accès sur le mobile et la télévision
Certificat d'achèvement
Je déclare avoir pris connaissance des conditions générales d'utilisation de la plateforme Academia Raqmya (CGU) notamment en ce qui concerne la protection des données à caractère personnelles et la transférabilité des licences.