Learn to process massive streams of data in real time on a cluster with Apache Spark Streaming. Includes 6 hours of on-demand video, hands-on labs, and a certificate of completion.
Buy This Course
Learn at your own pace! Lifetime access to all videos and materials for this course, with a one-time payment.
“Big Data” analysis is a hot and highly valuable skill. Thing is, “big data” never stops flowing! Spark Streaming is a new and quickly developing technology for processing massive data sets as they are created – why wait for some nightly analysis to run when you can constantly update your analysis in real time, all the time? Whether it’s clickstream data from a big website, sensor data from a massive “Internet of Things” deployment, financial data, or something else – Spark Streaming is a powerful technology for transforming and analyzing that data right when it is created, all the time.
You’ll be learning from an ex-engineer and senior manager from Amazon and IMDb.
This course gets your hands on to some real live Twitter data, simulated streams of Apache access logs, and even data used to train machine learning models! You’ll write and run real Spark Streaming jobs right at home on your own PC, and toward the end of the course, we’ll show you how to take those jobs to a real Hadoop cluster and run them in a production environment too.
Across over 30 lectures and almost 6 hours of video content, you’ll:
- Get a crash course in the Scala programming language
- Learn how Apache Spark operates on a cluster
- Set up discretized streams with Spark Streaming and transform them as data is received
- Analyze streaming data over sliding windows of time
- Maintain stateful information across streams of data
- Connect Spark Streaming with highly scalable sources of data, including Kafka, Flume, and Kinesis
- Dump streams of data in real-time to NoSQL databases such as Cassandra
- Run SQL queries on streamed data in real time
- Train machine learning models in real time with streaming data, and use them to make predictions that keep getting better over time
- Package, deploy, and run self-contained Spark Streaming code to a real Hadoop cluser using Amazon Elastic MapReduce.
This course is very hands-on, filled with achievable activities and exercises to reinforce your learning. By the end of this course, you’ll be confidently creating Spark Streaming scripts in Scala, and be prepared to tackle massive streams of data in a whole new way. You’ll be surprised at how easy Spark Streaming makes it!
Frank Kane
Author
Our courses are led by Frank Kane, a former Amazon and IMDb developer with extensive experience in machine learning and data science. With 26 issued patents and 9 years of experience at the forefront of recommendation systems, Frank brings real-world expertise to his teaching. His ability to explain complex concepts in accessible terms has helped over one million students worldwide gain valuable skills in machine learning, data engineering, and AI development.
Buy This Course
Learn at your own pace! Lifetime access to all videos and materials for this course, with a one-time payment.
Getting Started
Tip: Apply for a Twitter Developer Account now!
Lesson 1 of 3 within section Getting Started.
You must enroll in this course to access course content.
Introduction, and Getting Set Up
Lesson 2 of 3 within section Getting Started.
You must enroll in this course to access course content.
A Crash Course in Scala
[Activity] Scala Basics: Part 1
Lesson 1 of 4 within section A Crash Course in Scala.
You must enroll in this course to access course content.
[Exercise] Flow Control in Scala
Lesson 2 of 4 within section A Crash Course in Scala.
You must enroll in this course to access course content.
[Exercise] Functions in Scala
Lesson 3 of 4 within section A Crash Course in Scala.
You must enroll in this course to access course content.
[Excercise] Data Structures in Scala
Lesson 4 of 4 within section A Crash Course in Scala.
You must enroll in this course to access course content.
Spark Streaming Concepts
Lesson 1 of 7 within section Spark Streaming Concepts.
You must enroll in this course to access course content.
The Resilient Distributed Dataset (RDD)
Lesson 2 of 7 within section Spark Streaming Concepts.
You must enroll in this course to access course content.
[Activity] RDD’s in action: simple word count application
Lesson 3 of 7 within section Spark Streaming Concepts.
You must enroll in this course to access course content.
Introduction to Spark Streaming
Lesson 4 of 7 within section Spark Streaming Concepts.
You must enroll in this course to access course content.
[Activity] Revisiting the PrintTweets application
Lesson 5 of 7 within section Spark Streaming Concepts.
You must enroll in this course to access course content.
Windowing: Aggregating data over longer time spans
Lesson 6 of 7 within section Spark Streaming Concepts.
You must enroll in this course to access course content.
Fault Tolerance in Spark Streaming
Lesson 7 of 7 within section Spark Streaming Concepts.
You must enroll in this course to access course content.
Spark Streaming Examples with Twitter
[Exercise] Saving Tweets to Disk
Lesson 1 of 3 within section Spark Streaming Examples with Twitter.
You must enroll in this course to access course content.
[Exercise] Tracking the Average Tweet Length
Lesson 2 of 3 within section Spark Streaming Examples with Twitter.
You must enroll in this course to access course content.
Spark Streaming Examples with Clickstream / Apache Access Log Data
[Exercise] Tracking the Top URL’s Requested
Lesson 1 of 5 within section Spark Streaming Examples with Clickstream / Apache Access Log Data.
You must enroll in this course to access course content.
[Exercise] Alarming on Log Errors
Lesson 2 of 5 within section Spark Streaming Examples with Clickstream / Apache Access Log Data.
You must enroll in this course to access course content.
[Exercise] Integrating Spark Streaming with Spark SQL
Lesson 3 of 5 within section Spark Streaming Examples with Clickstream / Apache Access Log Data.
You must enroll in this course to access course content.
Intro to Structured Streaming in Spark 2
Lesson 4 of 5 within section Spark Streaming Examples with Clickstream / Apache Access Log Data.
You must enroll in this course to access course content.
Integrating with Other Systems
Integrating with Apache Flume
Lesson 2 of 5 within section Integrating with Other Systems.
You must enroll in this course to access course content.
Integrating with Amazon Kinesis
Lesson 3 of 5 within section Integrating with Other Systems.
You must enroll in this course to access course content.
[Activity] Writing Custom Data Receivers
Lesson 4 of 5 within section Integrating with Other Systems.
You must enroll in this course to access course content.
Integrating with Cassandra
Lesson 5 of 5 within section Integrating with Other Systems.
You must enroll in this course to access course content.
Advanced Spark Streaming Examples
[Exercise] Stateful Information in Spark Streams
Lesson 1 of 3 within section Advanced Spark Streaming Examples.
You must enroll in this course to access course content.
[Activity] Streaming K-Means Clustering
Lesson 2 of 3 within section Advanced Spark Streaming Examples.
You must enroll in this course to access course content.
Spark Streaming in Production
Packaging and running code for a server
Lesson 1 of 4 within section Spark Streaming in Production.
You must enroll in this course to access course content.
[Activity] Packaging your code with SBT
Lesson 2 of 4 within section Spark Streaming in Production.
You must enroll in this course to access course content.
Running on a real Hadoop cluster with EMR
Lesson 3 of 4 within section Spark Streaming in Production.
You must enroll in this course to access course content.
Troubleshooting and Tuning Spark Jobs
Lesson 4 of 4 within section Spark Streaming in Production.
You must enroll in this course to access course content.
You Made It!
Lesson 1 of 2 within section You Made It!.
You must enroll in this course to access course content.
Continue your Learning Journey!
Lesson 2 of 2 within section You Made It!.
You must enroll in this course to access course content.