From 0 to 1 : Spark for Data Science with Python
Grab this highly affordable and exclusive opportunity to learn how to use Spark for a variety of analytics and Machine Learning tasks with Python Technology.
If you are an Analyst seeking to leverage spark for analyzing interesting data-sets or a Data Scientist who wants a single engine for analyzing and modelling data as well as product-ionizing it or an Engineer looking forward to use a distributed computed engine for batch or stream processing or both then you must take up this course.
This course will provide you with all the practical knowledge related data science and analytics. Also, it’ll help you in understanding the complexities associated with it. Each and every concept in this course has been described in detail and elaborated, in order to make it easy for you to understand and learn.
This course has 53 videos in total and will take you through all of these in maximum 9 hours. You can watch the videos at your own pace and accordingly can raise doubts or questions if you get stuck.
You will gain knowledge about the undermentioned concepts:
- Music Recommendations using Alternating Least Squares and the Audioscrobbler dataset
- Dataframes and Spark SQL to work with Twitter data
- Using the PageRank algorithm with Google web graph dataset
- Using Spark Streaming for stream processing
- Working with graph data using the Marvel Social network dataset
- Resilient Distributed Datasets, Transformations (map, filter, flatMap), Actions (reduce, aggregate)
- Pair RDDs , reduceByKey, combineByKey
- Broadcast and Accumulator variables
- Spark for MapReduce
- The Java API for Spark
- Spark SQL, Spark Streaming, MLlib and GraphFrames (GraphX for Python)
Some exceptional benefits associated with this course enrollment are:
- Quality course material
- Instant & free course updates
- Access to all Questions & Answers initiated by other students as well
- Personalized support from the instructor’s end on any issue related to the course
- Few free lectures for a quick overview
Grab the opportunity and enroll today!
Curriculum
- 11 Sections
- 53 Lessons
- 52 Weeks
- You, This Course and Us1
- Introduction to Spark9
- 3.1What does Donald Rumsfeld have to do with data analysis?9 Minutes
- 3.1Why is Spark so cool?12 Minutes
- 3.1An introduction to RDDs – Resilient Distributed Datasets10 Minutes
- 3.1Built-in libraries for Spark16 Minutes
- 3.1Installing Spark7 Minutes
- 3.1The PySpark Shell5 Minutes
- 3.1Transformations and Actions13 Minutes
- 3.1See it in Action : Munging Airlines Data with PySpark – I10 Minutes
- 3.1[For Linux/Mac OS Shell Newbies] Path and other Environment Variables8 Minutes
- Resilient Distributed Datasets9
- 4.1RDD Characteristics: Partitions and Immutability12 Minutes
- 4.1RDD Characteristics: Lineage, RDDs know where they came from6 Minutes
- 4.1What can you do with RDDs?11 Minutes
- 4.1Create your first RDD from a file16 Minutes
- 4.1Average distance travelled by a flight using map() and reduce() operations6 Minutes
- 4.1Get delayed flights using filter(), cache data using persist()5 Minutes
- 4.1Average flight delay in one-step using aggregate()15 Minutes
- 4.1Frequency histogram of delays using countByValue()3 Minutes
- 4.1See it in Action : Analyzing Airlines Data with PySpark – II6 Minutes
- Advanced RDDs: Pair Resilient Distributed Datasets6
- 5.1Special Transformations and Actions15 Minutes
- 5.1Average delay per airport, use reduceByKey(), mapValues() and join()18 Minutes
- 5.1Average delay per airport in one step using combineByKey()12 Minutes
- 5.1Get the top airports by delay using sortBy()4 Minutes
- 5.1Lookup airport descriptions using lookup(), collectAsMap(), broadcast()14 Minutes
- 5.1See it in Action : Analyzing Airlines Data with PySpark – III5 Minutes
- Advanced Spark: Accumulators, Spark Submit, MapReduce , Behind The Scenes7
- 6.1Get information from individual processing nodes using accumulators13 Minutes
- 6.1See it in Action : Using an Accumulator variable3 Minutes
- 6.1Long running programs using spark-submit6 Minutes
- 6.1See it in Action : Running a Python script with Spark-Submit4 Minutes
- 6.1Behind the scenes: What happens when a Spark script runs?14 Minutes
- 6.1Running MapReduce operations14 Minutes
- 6.1See it in Action : MapReduce with Spark2 Minutes
- Java and Spark5
- PageRank: Ranking Search Results5
- Spark SQL2
- MLlib in Spark: Build a recommendations engine4
- Spark Streaming4
- Graph Libraries1
An ex-Google, Stanford and Flipkart team
Loonycorn is a team by Janani Ravi and Vitthal Srinivasan, product of Stanford University and IIM Ahmedabad.
We hold several years of working experience in the field of technology in Bay Area, New York, Singapore and Bangalore.
Janani Ravi: 7 Years of work experience (Google, Flipkart and Microsoft)
Vitthal Srinivasan: Worked at Google, Flipkart, Credit Suisse and INSEAD
We have come together to teach and educate on various technological courses in the most easiest and entertaining manner. Also, our courses will be based on practical elaborations & illustrations.
Courses you might be interested in
-
15 Lessons
-
10 Lessons
-
13 Lessons
-
39 Lessons