Skip to content

Latest commit

 

History

History
56 lines (37 loc) · 1.24 KB

3.2-rdd-multi.md

File metadata and controls

56 lines (37 loc) · 1.24 KB

<< back to main index

Lab 3.2 : Operations On Multiple RDDs

Overview

learn operations that work with multiple RDDs

Depends On

None

Run time

15 mins

RDD Documentation :

http://spark.apache.org/docs/latest/


Meetup Recommendation

User1 attends meetups m1, m2 and m3.
User2 attends meetups m2, m3, m4 and m5

Find meetups common to both users

Find meetups attened by either user1 or user2
Note there are duplicates in result. How will you remove dupes?

Find meetups that only user1 attends

Recommending meetups to user
user1 and user2 has a couple of meetups in common. Let's use to this to recommend meetups to both users

  • meetups recommended for user1 : m4 & m5
  • meetups recommended for user2 : m1

Hints

Step 1: start spark shell

Step 2: create data sets using parallelize() method

    val u1 = sc.parallelize(List("m1", "m2", "m3"))
    val u2 = sc.parallelize(???)

Step 3 : try the following operations in RDDs

union, intersection, distinct, subtract