Install the Course Materials
The scripts and data for this course may be downloaded at
https://s3.amazonaws.com/media.sundog-soft.com/SparkScala/SparkScalaCourse356.zip
Download and un-zip this file, and move the SparkScalaCourse folder inside it to a path you’ll remember.
Next, download the MovieLens 100K dataset from:
https://files.grouplens.org/datasets/movielens/ml-100k.zip
Unzip it, and move the resulting ml-100k folder into your SparkScalaCourse/data folder.
If you have trouble with the link above, try this alternate ml-100k download link.
Install IntelliJ and Apache Spark
Make sure you have JDK 11 installed. Apache Spark 3 is not compatible with Java 17 or newer. Enter
java -version
from a command or terminal prompt to see what version, if any, you have installed already. If you need to get it, download the JDK from Oracle (you’ll need to create an account with them first.)
Next, install IntelliJ IDEA Community Edition, after selecting your platform (Windows, Mac, or Linux).
After running IntelliJ, select Plugins, and install the Scala plugin.
Then from the Settings menu (the gear icon up top), select “Project Structure” and confirm the Oracle JDK 11 you installed earlier is selected under Project / SDK.
WINDOWS ONLY: Create a new environment variable (enter “environment variables” in the Windows search bar, click on “Add Environment Variables,” and add a new system variable) named HADOOP_HOME with a value of the path to the “hadoop” folder inside SparkScalaCourse. For example, if you installed the SparkScalaCourse folder to the root of your C:\ drive, you would set HADOOP_HOME to C:\SparkScalaCourse\hadoop. Next select the PATH environment variable, and APPEND a new entry, separated by a semi-colon, of %HADOOP_HOME%\bin Now, restart IntelliJ to make sure the new environment variables are picked up.
Import the Course Project
Click the Open icon, or select File/Open to open a project.
Select your SparkScalaCourse folder.
Try it Out
Expand the project’s tree view to show the SparkScalaCourse/src/main/scala/com.sundogsoftware.spark folder.
Right click on “HelloWorld” and select “Run HelloWorld”
You should see a message like:
Hello world! The u.data file has 100000 lines.
But, you might see a “class not found” error. If so, just quit IntelliJ, restart it, and try again. It’s just a bug in IntelliJ.
Once you see the “Hello World” message, everything is set up successfully! If not, go back and look for a step you may have missed. Sometimes IntelliJ just gets confused – you might need to refresh the SBT configuration, clear IntelliJ’s cache, or just restart IntelliJ. If you’re stuck, we’re here to help – use the Q&A or comments feature on the site you’re taking this course on.
Optional: Join Our List
Join our low-frequency mailing list to stay informed on new courses and promotions from Sundog Education. As a thank you, we’ll send you a free course on Deep Learning and Neural Networks with Python, and discounts on all of Sundog Education’s other courses! Just click the button to get started.