7.1 : Find Influencers on Twitter

Overview

Find 'influencers' on Twitter graph

Depends On

None

Run time

20 mins

STEP 1: Start Spark Shell

    $   ~/spark/bin/spark-shell

Step 2: Build the following twitter graph

Here some real world data:

All of the following steps are performed by entering the commands in the Spark Scala shell

Import the necessary libraries

     import org.apache.spark.graphx._
     import org.apache.spark.rdd.RDD

Construct the array of vertices

Data structure: twitter handle, number of followers, gender of the tweeter

     val vertexArray = Array(
        (1L, ("@markkerzner", 309, "M")),  // (Name, # followers, gender)
        (2L, ("@mjbrender", 3101, "M")),
        (3L, ("@dridisahar1", 27, "F")),
        (4L, ("@dez_blanchfield ", 38600, "M")),
        (5L, ("@ch_doig ", 519, "F")),
        (6L, ("@Sunitha_Packt ", 332, "F")),
        (7L, ("@WibiData ", 2477, "N"))  // company, so gender neutral
     )

Construct the array of edges

On this step, these are all my followers, so they connect to me

    val edgeArray = Array(
        Edge(1L, 2L, 7), // src,  dest,   # retweets
        Edge(1L, 3L, 2),
        Edge(1L, 4L, 4),
        Edge(1L, 5L, 3),
        Edge(1L, 6L, 1),
        Edge(1L, 7L, 2)
    )

We are using data from a real Twitter account, if you want, you can use yours

Prepare the data for Spark processing. What do ".parallelize" methods below accomplish?

     val vertexRDD = sc.parallelize(vertexArray)
     // vertexRDD: RDD[(Long, (String, Int))]

     val edgeRDD = sc.parallelize(???)
     // edgeRDD: RDD[Edge[Int]]

Construct the graph from the vertices and edges

     val graph = Graph(???, ???)

Print all my followers and their gender

    graph.vertices.collect.foreach { case (id, (name, nFollow, gender)) =>
    println(s"Tweeter $name has $nFollow followers and is $gender") }

Filter out male followers

    graph.vertices.filter { case (id, (name, followers, gender)) => gender != "M" }.collect.
        foreach { case (id, (name, followers, gender)) =>
        println(s"$name should be a $gender with $followers followers") }

Find my significant followers

    graph.vertices.filter { case (id, (name, nFollow, gender)) => nFollow > 1000 }.collect

Find those followers who do enough re-tweeets for me

    graph.edges.filter { case (edge) => edge.attr > 5 }.collect

    graph.edges.filter { case (edge) => edge.attr > 5 }.count

Count my male and female followers

    val maleFollowerCount = graph.vertices.filter { case (id, (name, nFollow, gender)) => gender == "M" }.count

    val femaleFollowerCount = graph.vertices.filter { case (id, (name, nFollow, gender)) => gender == "F" }.count

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

7.1-influencer.md

7.1-influencer.md

7.1 : Find Influencers on Twitter

Overview

Depends On

Run time

STEP 1: Start Spark Shell

Step 2: Build the following twitter graph

Import the necessary libraries

Construct the array of vertices

Construct the array of edges

Prepare the data for Spark processing. What do ".parallelize" methods below accomplish?

Construct the graph from the vertices and edges

Print all my followers and their gender

Filter out male followers

Find my significant followers

Find those followers who do enough re-tweeets for me

Count my male and female followers

Files

7.1-influencer.md

Latest commit

History

7.1-influencer.md

File metadata and controls

7.1 : Find Influencers on Twitter

Overview

Depends On

Run time

STEP 1: Start Spark Shell

Step 2: Build the following twitter graph

Import the necessary libraries

Construct the array of vertices

Construct the array of edges

Prepare the data for Spark processing. What do ".parallelize" methods below accomplish?

Construct the graph from the vertices and edges

Print all my followers and their gender

Filter out male followers

Find my significant followers

Find those followers who do enough re-tweeets for me

Count my male and female followers