An Easier Way to Run Spark Jobs on AWS EMR

Blog 1 min read | Jun 16, 2016 | JW Player

At JW Player, we use Spark to explore new data features and run reports that help drive product decisions and improve algorithms. But doing data analysis at the terabyte level is time consuming, especially when having to manually set up AWS Elastic Mapreduce (EMR) clusters. Our code often depends on custom libraries or Spark settings that require bootstrapping. Moreover, iterating on changes is cumbersome and adds extra steps to our workflow.

Open sourcing is part of JW’s culture and what makes our player great. So we want to share a workflow tool we have been using to launch Spark jobs on EMR. We call it Spark Steps (code).

Spark Steps allows you to configure your cluster and upload your script and its dependencies via AWS S3. All you need to do is define an S3 bucket.

Example

$ AWS_S3_BUCKET=

$ sparksteps report_to_csv.py

–s3-bucket $AWS_S3_BUCKET

–aws-region us-east-1

–release-label emr-4.7.0

–submit-args=”–packages com.databricks:spark-csv_2.10:1.4.0″

–app-args=”–report-date 2016-05-10″

The above example creates a cluster of 1 node with default instance type m4.large, uploads the pyspark script report_to_csv.py to the specified S3 bucket and copies the file from S3 to the master node. Each operation is defined as an EMR “step” that you can monitor in the EMR Management Console. The final step is to run the spark application with submit args that includes the spark csv package and app args “–report-date”.

For more complicated examples such as uploading custom directories or using a virtual private cloud, check out the README.

There are plenty of improvements that can be made such as dynamic spot pricing, bootstrapping and monitoring. Our goal is to share Spark Steps as early as possible so that we can improve upon it together.

Happy Sparking!

Blog

Unified for Success: JWP and Connatix Merge to Form JWP Connatix

10/10/24

This week marks a pivotal moment in the digital video ecosystem as Connatix and JW Player proudly join forces to form a new industry powerhouse, JWP Connatix. This merger...

Blog

What to Expect From JWP at NAB 2024

03/13/24

NAB 2024 is just around the corner and this year is slated to be the best yet. Content professionals from across the globe in every sector of broadcasting will...

Blog

Unlock Personalized Streaming With Identity Management Profiles

12/1/23

User journey and personalization have taken center stage in today’s OTT scene. As the demand for personalized content grows, the need for streamlined and tailored viewing experiences becomes increasingly...

An Easier Way to Run Spark Jobs on AWS EMR

Example

8 Ways to Maximize OTT Video Monetization

A Complete Guide to Programmatic Video Advertising

Stream and Monetize Video Content Easily and Cost Effectively With JWP Connatix

Unified for Success: JWP and Connatix Merge to Form JWP Connatix

What to Expect From JWP at NAB 2024

Unlock Personalized Streaming With Identity Management Profiles