learn operations that work with multiple RDDs
None
15 mins
http://spark.apache.org/docs/latest/
User1 attends meetups m1, m2 and m3.
User2 attends meetups m2, m3, m4 and m5
Find meetups common to both users
Find meetups attened by either user1 or user2
Note there are duplicates in result. How will you remove dupes?
Find meetups that only user1 attends
Recommending meetups to user
user1 and user2 has a couple of meetups in common. Let's use to this to recommend meetups to both users
- meetups recommended for user1 : m4 & m5
- meetups recommended for user2 : m1
val u1 = sc.parallelize(List("m1", "m2", "m3"))
val u2 = sc.parallelize(???)
union
, intersection
, distinct
, subtract