Skip to content

Latest commit

 

History

History
115 lines (86 loc) · 3.11 KB

7.1-influencer.md

File metadata and controls

115 lines (86 loc) · 3.11 KB

<< back to main index

7.1 : Find Influencers on Twitter

Overview

Find 'influencers' on Twitter graph

Depends On

None

Run time

20 mins

STEP 1: Start Spark Shell

    $   ~/spark/bin/spark-shell

Step 2: Build the following twitter graph

Here some real world data:

All of the following steps are performed by entering the commands in the Spark Scala shell

Import the necessary libraries

     import org.apache.spark.graphx._
     import org.apache.spark.rdd.RDD

Construct the array of vertices

Data structure: twitter handle, number of followers, gender of the tweeter

     val vertexArray = Array(
        (1L, ("@markkerzner", 309, "M")),  // (Name, # followers, gender)
        (2L, ("@mjbrender", 3101, "M")),
        (3L, ("@dridisahar1", 27, "F")),
        (4L, ("@dez_blanchfield ", 38600, "M")),
        (5L, ("@ch_doig ", 519, "F")),
        (6L, ("@Sunitha_Packt ", 332, "F")),
        (7L, ("@WibiData ", 2477, "N"))  // company, so gender neutral
     )

Construct the array of edges

On this step, these are all my followers, so they connect to me

    val edgeArray = Array(
        Edge(1L, 2L, 7), // src,  dest,   # retweets
        Edge(1L, 3L, 2),
        Edge(1L, 4L, 4),
        Edge(1L, 5L, 3),
        Edge(1L, 6L, 1),
        Edge(1L, 7L, 2)
    )

We are using data from a real Twitter account, if you want, you can use yours

Prepare the data for Spark processing. What do ".parallelize" methods below accomplish?

     val vertexRDD = sc.parallelize(vertexArray)
     // vertexRDD: RDD[(Long, (String, Int))]

     val edgeRDD = sc.parallelize(???)
     // edgeRDD: RDD[Edge[Int]]

Construct the graph from the vertices and edges

     val graph = Graph(???, ???)

Print all my followers and their gender

    graph.vertices.collect.foreach { case (id, (name, nFollow, gender)) =>
    println(s"Tweeter $name has $nFollow followers and is $gender") }

Filter out male followers

    graph.vertices.filter { case (id, (name, followers, gender)) => gender != "M" }.collect.
        foreach { case (id, (name, followers, gender)) =>
        println(s"$name should be a $gender with $followers followers") }

Find my significant followers

    graph.vertices.filter { case (id, (name, nFollow, gender)) => nFollow > 1000 }.collect

Find those followers who do enough re-tweeets for me

    graph.edges.filter { case (edge) => edge.attr > 5 }.collect

    graph.edges.filter { case (edge) => edge.attr > 5 }.count

Count my male and female followers

    val maleFollowerCount = graph.vertices.filter { case (id, (name, nFollow, gender)) => gender == "M" }.count

    val femaleFollowerCount = graph.vertices.filter { case (id, (name, nFollow, gender)) => gender == "F" }.count