Curriculum
11 Sections
53 Lessons
52 Weeks
Expand all sections
Collapse all sections
You, This Course and Us
1
2.1
You, This Course and Us
2 Minutes
Introduction to Spark
9
3.1
What does Donald Rumsfeld have to do with data analysis?
9 Minutes
3.1
Why is Spark so cool?
12 Minutes
3.1
An introduction to RDDs – Resilient Distributed Datasets
10 Minutes
3.1
Built-in libraries for Spark
16 Minutes
3.1
Installing Spark
7 Minutes
3.1
The PySpark Shell
5 Minutes
3.1
Transformations and Actions
13 Minutes
3.1
See it in Action : Munging Airlines Data with PySpark – I
10 Minutes
3.1
[For Linux/Mac OS Shell Newbies] Path and other Environment Variables
8 Minutes
Resilient Distributed Datasets
9
4.1
RDD Characteristics: Partitions and Immutability
12 Minutes
4.1
RDD Characteristics: Lineage, RDDs know where they came from
6 Minutes
4.1
What can you do with RDDs?
11 Minutes
4.1
Create your first RDD from a file
16 Minutes
4.1
Average distance travelled by a flight using map() and reduce() operations
6 Minutes
4.1
Get delayed flights using filter(), cache data using persist()
5 Minutes
4.1
Average flight delay in one-step using aggregate()
15 Minutes
4.1
Frequency histogram of delays using countByValue()
3 Minutes
4.1
See it in Action : Analyzing Airlines Data with PySpark – II
6 Minutes
Advanced RDDs: Pair Resilient Distributed Datasets
6
5.1
Special Transformations and Actions
15 Minutes
5.1
Average delay per airport, use reduceByKey(), mapValues() and join()
18 Minutes
5.1
Average delay per airport in one step using combineByKey()
12 Minutes
5.1
Get the top airports by delay using sortBy()
4 Minutes
5.1
Lookup airport descriptions using lookup(), collectAsMap(), broadcast()
14 Minutes
5.1
See it in Action : Analyzing Airlines Data with PySpark – III
5 Minutes
Advanced Spark: Accumulators, Spark Submit, MapReduce , Behind The Scenes
7
6.1
Get information from individual processing nodes using accumulators
13 Minutes
6.1
See it in Action : Using an Accumulator variable
3 Minutes
6.1
Long running programs using spark-submit
6 Minutes
6.1
See it in Action : Running a Python script with Spark-Submit
4 Minutes
6.1
Behind the scenes: What happens when a Spark script runs?
14 Minutes
6.1
Running MapReduce operations
14 Minutes
6.1
See it in Action : MapReduce with Spark
2 Minutes
Java and Spark
5
7.1
The Java API and Function objects
16 Minutes
7.1
Pair RDDs in Java
5 Minutes
7.1
Running Java code
4 Minutes
7.1
Installing Maven
2 Minutes
7.1
See it in Action : Running a Spark Job with Java
5 Minutes
PageRank: Ranking Search Results
5
8.1
What is PageRank?
17 Minutes
8.1
The PageRank algorithm
6 Minutes
8.1
Implement PageRank in Spark
12 Minutes
8.1
Join optimization in PageRank using Custom Partitioning
7 Minutes
8.1
See it Action : The PageRank algorithm using Spark
4 Minutes
Spark SQL
2
9.1
Dataframes: RDDs + Tables
16 Minutes
9.1
See it in Action : Dataframes and Spark SQL
5 Minutes
MLlib in Spark: Build a recommendations engine
4
10.1
Collaborative filtering algorithms
12 Minutes
10.1
Latent Factor Analysis with the Alternating Least Squares method
12 Minutes
10.1
Music recommendations using the Audioscrobbler dataset
8 Minutes
10.1
Implement code in Spark using MLlib
16 Minutes
Spark Streaming
4
11.1
Introduction to streaming
10 Minutes
11.1
Implement stream processing in Spark using Dstreams
11 Minutes
11.1
Stateful transformations using sliding windows
9 Minutes
11.1
See it in Action : Spark Streaming
4 Minutes
Graph Libraries
1
12.1
The Marvel social network using Graphs
18 Minutes
From 0 to 1 : Spark for Data Science with Python
Search
You, This Course and Us
https://dwnk32xmy75f1.cloudfront.net/wp-content/uploads/20180828050117/Lecture1-SparkPromo.m4v3.m4v
Login with your site account
Lost your password?
Remember Me
Not a member yet?
Register now
Register a new account
Are you a member?
Login now
Modal title
Main Content