[APA 2017] How to Create a Dataset from Twitter or Facebook: Theory and Demonstration

To get the most benefit from this skill-building session, it is recommended that you work through the two examples below either during  or after the session. To run through them, you'll need the following programs installed. They are all FREE. If you are ever asked for a credit card number, you've gone the wrong way.  I will be using a Windows machine, but I believe this should work just the same on Mac.  It is recommended you install them in this order:

  1. The R Project for Statistical Computing
  2. RStudio Desktop
  3. Once R and RStudio Desktop are installed, open RStudio and copy-paste the following commands in the window labeled Console. It will probably take a few minutes if you're installing R for the first time:
    1. install.packages("dplyr")
    2. install.packages("Rfacebook")
    3. install.packages("twitteR")
    4. install.packages("rstudioapi")
    5. install.packages("gender")

You can also find all of the R code used in this workshop in these two files. First, code to download data from Facebook:

# Tutorial/Demonstration of Facebook Data Calls using R
# by: Richard N Landers (rnlanders@tntlab.org)
#
# We'll be grabbing some public posts from Facebook from a Group of interest
# This file is structured as a tutorial, so "final" data collection code would be more streamlined

# Set working directory to directory of this saved R file
library(rstudioapi)
setwd(dirname(rstudioapi::getActiveDocumentContext()$path))

# Open Facebook retrieval library
library(Rfacebook)

# Open dplyr to do some fancy data frame manipulation later
library(dplyr)

# At this point, go grab an API access token from the Graph API Explorer
# Go to https://developers.facebook.com/tools/explorer/ and click "Get Token"
# Store this value in the next variable
token <- "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"

# Grab public group names so that you can determine Facebook's ID number for
# the group you actually want to grab data from (must be OPEN)
ids <- searchGroup(name="psychological methods", token=token)

# Next we'll grab the last 200 posts; note that these are downloaded 25 at a time
group <- getGroup(group_id=853552931365745, token=token, n = 200)

# As an example of what we could do with this, let's look at the gender balance in this group
library(gender)

# First extract first names and append to existing data frame
# cbind(group, split(group$from_name, " "))
group$first_name <- sub(" .*", "", group$from_name)

# Next, user gender detection on first names in the dataset; cut out duplicate entries
genders_df <- distinct(gender(group$first_name))

# Use dplyr to lookup each name and cross reference it in the gender table
group <- left_join(group, genders_df, by=c("first_name" = "name"))

# Check gender balance
table(group$gender)
barplot(table(group$gender))

# Save our group table
write.csv(group, "groupFacebook.csv")

Second, code to download data from Twitter:

# Tutorial/Demonstration of Twitter Data Calls using R
# by: Richard N Landers (rnlanders@tntlab.org)
#
# We'll be grabbing the most recent public posts on #psychology from Twitter
# This file is structured as a tutorial, so "final" data collection code would be more streamlined

# Set working directory to directory of this saved R file
library(rstudioapi)
setwd(dirname(rstudioapi::getActiveDocumentContext()$path))

# Open Facebook retrieval library
library(twitteR)

# Open dplyr to do some fancy data frame manipulation later
library(dplyr)

# At this point, create an "App" on Twitter after logging in by going to 
# http://apps.twitter.com and "creating an application"
# Once you've created an application, open its settings, go to Keys and Access Tokens, 
# generate, then copy/paste the four strings required here
consumer_key <- "xxxxxxxxxxxxxxxxxxxxx"
consumer_secret <- "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
access_token <- "xxxxxxxxxxxxxxxxxxxxx"
access_secret <- "xxxxxxxxxxxxxxxxxxxxx"
setup_twitter_oauth(consumer_key, consumer_secret, access_token, access_secret)

# Let's grab all Twitter posts about #psychology that we can and convert to a data frame
psychSearch <- searchTwitter("#psychology", n=200)
psychSearch_df <- twListToDF(psychSearch)

# write out new dataset to play with
write.csv(psychSearch_df, "tweets.csv")

Finally, you can download the slides from this skill-building session here (coming soon):

Complete Presentation