@rwells1961
Course Goal: Students will learn the latest data journalism techniques that drive modern newsrooms and public relations / advertising offices. The class will extract and analyze Twitter data with the goal of producing an interactive multimedia presentation.
Course Description: This course will teach students how to code in programs such as R and SQL and how these powerful tools are used in modern news reporting. Quality reporting in newsrooms requires a solid foundation of data analysis. The data skills taught in this class are in high demand in newsrooms and corporations.
Required Text: Machlis, Sharon. Practical R for Mass Communications and Journalism. Chapman & Hall/CRC The R Series. 2018. ISBN 9781138726918 https://www.amazon.com/gp/search?keywords=9781138726918
–
Agenda: –1/14/2019 - Week 1
–Email to students
–Discuss syllabus
–Intro R and R Studio. Open program.
Machlis website:
http://www.machlis.com/R4Journalists/index.html
–R interface explained: Four main windows
Script writing, R Markdown, Table Viewer: Upper Left
Environment - data loaded in R: Upper Right
Console - write commands in R: Lower Left
File Manager and Html table viewer: Bottom Right
–Show basic R skills.
–Loading software. Tidyverse
Rio
–Conventions in coding.
–How many rows? nrow(yourdataset)
–How many columns? ncol(yourdataset)
–What is in the first five rows? head(yourdataset)
–rename columns. create columns
##Wednesday, Jan. 16 |
–Misc |
Class intros Book - Amazon Installation issues on laptops? Twitter feeds: @rstudiotips |
–Run demos from Ch. 3 |
–Show Collins results |
–Twitter analysis of Trump Tweets http://varianceexplained.org/r/trump-tweets/ |
–Ch 1 & 2 of Machlis: Key Points |
Reproducible research Repetitive tasks in modern newsrooms. Employment reports, crime stats, budgets Variables - an R object Assignment operator <- Case sensitive Vector: A vector can only have one type of data - all integers, all strings Dataframe - like a spreadsheet Save files - Don’t save workspace: because all of your variables will be stored and re-loaded the next time you launch RStudio. It’s too easy to forget about previously stored variables that can interfere with later work, |
–Software packages: tidyverse, rio, pacman |
–Software: How to get details and help |
help(package=“dplyr”) browseVignettes(“NameOfPackage”) help(“NameOfFunction”) ??median |
–Data Types and R |
Machlis: 2.4.2 Data types you’re likely to use often |
–EXERCISES: Excel vs R –Load tutorial: Introduction-to-R-January-2019.R > Download this file and open it in R Studio |
–Left click on the page, remove .txt extension, save as all files |
–Keyboard Shortcuts |
Tab - Autocomplete Control (or Command) + UP arrow - last lines run Control (or Command) + Enter - Runs current or selected lines of code in the top left box of RStudio Shift + Control (or Command) +P - Reruns previous region code |
**Notes:** –Basic descriptive statistics —Review ComputerWorld’s Beginner’s Guide To R –Stack Overflow at stackoverflow.com |
Reading: |
–Machlis. Chapter 1 & 2. |
–Beginner’s guide to R: https://www.computerworld.com/article/2497143/business-intelligence/business-intelligence-beginner-s-guide-to-r-introduction.html |
–Twitter analysis of Trump Tweets http://varianceexplained.org/r/trump-tweets/ |
–Review another R tutorial https://docs.google.com/presentation/d/1zICxR7qDM3RQ2Nxi5CqHlM3H8I7qoVkNtqcNcnbbDCw/edit#slide=id.p |
Resources: RStudio Navigation Tricks You Might’ve Missed https://rviews.rstudio.com/2016/11/11/easy-tricks-you-mightve-missed/ |
How Do I? https://smach.github.io/R4JournalismBook/HowDoI.html |
Functions https://smach.github.io/R4JournalismBook/functions.html |
Packages https://smach.github.io/R4JournalismBook/packages.html – |
–Next week Quiz on Basic R functions described so far in exercises.
–Readings / Coursework for MA Students > Sign up for this free class from Nick Diakopolous
--Workload TBD - I will assign select videos and readings from this class.
Exercise
–Load tutorial: Introduction-to-R-January-2019.R > Download this file and open it in R Studio
–Based on this tutorial, perform this exercise:
1. Percentage change from 2010-2017.
2. Produce a table with 5 counties with most growth.
3. Produce table with 5 counties with greatest population loss
4. Graph the top 5 and bottom 5
5. Filter just Benton County’s population for 2015
6. And if you finish that, bring up AOC.csv. How many rows? How many columns?
7. AOC.csv filter the text field for “Pelosi” or “Trump” or “New Deal"
Reading:
–Machlis. Chapter 3 & 4.
–Study Twitter meta data
https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/tweet-object.html
–Look at this example: Ocasio.csv (in data folder of course page)
–Cohen, “Numbers in the Newsroom,” Common Mistakes.
–String data manipulation https://dereksonderegger.github.io/570L/13-string-manipulation.html
–Follow StoryBench, Northeastern Univ. https://twitter.com/storybench
Resources
–Use R instead of Excel: Andrew Ba Tran
Excellent Tutorial Spelling out Excel and Comparable Commands in R
https://trendct.org/2015/06/12/r-for-beginners-how-to-transition-from-excel-to-r/
Basic data work- head to http://bit.ly/excel_and_r
–All Cheat Sheets https://www.rstudio.com/resources/cheatsheets/
–
–Quiz on Basic R functions for Wednesday The quiz will be multiple choice and will cover basic file management and data types, stuff I mentioned the first two weeks.
–Describe Assignment #1 due Feb. 6: Managing Data / Static Graphic
–Announcements
Blackboard site has web links for course
Agenda
–Continue with week #2 tutorial
–The course GitHub Page > Here it is
–See Data folder
Click AOC.csv
"View raw"
Cntl + click (or right click) - Save As - AOC.csv
Exercise
–Loading Data from U.S. Census & Student Loans
–Load tutorial: Downloading Data 12-24-18.R
–Alternate download option: Use read.table() works for importing data:
–Loading and basic file management
Bringing in data
Data Frames
Extracting interesting details
Cleaning the data
Reshaping the format
Manipulating the data
Exporting
Add a column with a math conversion
–Ch 3 & 4 of Machlis: Key Points
Ch 3 Exercises:
Stock chart exercise used quantmod is a library for financial analysis.
Median Income for a City
Loading packages
Ch 4 Importing Data
How read.table() works for importing data:
Loading data
Manipulating data: dplyr - stringr
Data Management: mutate rename bind_rows
–Math –Summary Statistics
summary(Crime)
mean(x) Calculate the mean, or average, for variable x. median(x) Calculate the median. max(x) Find the maximum value. min(x) Find the minimum value. sum(x) Add all the values together. n() Count the number of records. Here there isn’t a variable in the brackets of the function, because the number of records applies to all variables. n_distinct(x) Count the number of unique values in variable x.
–Using a function for an equation
percent_change <- function(first_number, second_number) { pc <- (second_number-first_number)/first_number*100 return(pc) }
percent_change(100,150) [1] 50
This is what’s happening in the code above: * percent_change is the name of the function, and assigned to it is the function function() * Two variables are necessary to be passed to this function, first_number and second_number * A new object pc is created using some math calculating percent change from the two variables passed to it * the function return() assigns the result of the math to percent_change from the first line Build enough functions and you can save them as your own package.
–Set up column for math calculations Example: Total column shows winter snowfall in inches. To add a column showing totals in Meters, you can use this format:
.snowdata\(Meters <- snowdata\)Total * 0.0254
–Export data
Write Export output this file to a CSV or Excel write.csv or write.excel write.csv(AR2016_SMALL,“AR2016_SMALL.csv”)
Reading Machlis Chs. 5 & 6.
Seth C. Lewis, et al. “Big Data and Journalism: Epistemology, Expertise, Economics and Ethics,” Digital Journalism, 2015
Resources:
–For analysis library(dplyr)
–Quiz on Basic R functions
–Data Visualization Intro
–Load tutorial: Basic Data Visualization 12-26-18.R
–Visualizing data
ggplot2 - charts and maps
Export Static chart
Exercises
Basic Data Visualization 12-26-18.R
Questions on First Assignment
–Here is a good question that came in today about the first assignment.
Question:
``“What the hell? I converted population to numeric and the calculations come out as NA values! This is driving me insane! What is going on?”
Answer:
One of the obnoxious things about R is it considers commas as text. So it will show 720 as a number but 2,810 as not a number for calculations because it has a friggin comma.
Never fear. There is a solution. Run the find and replace function, called gsub
Crimedata\(Population <- gsub(",", "", Crimedata\)Population)
Translation:
Crimedata\(Population -- is the population column in your crime dataset gsub(",", "", finds a comma and replaces it with nothing. -- and it found the comma in the column Crimedata\)Population) and the <- dumps the results back into the Crimedata$Population column.
Fancy!
Question: “How do I get rid of the last row that only has text in the table that I just imported?”
Answer: Get rid of row using base R commands
Crimedata <- Crimedata[-c(187), ]
Translation:
Crimedata[-c(187), ] looks for row #187, which has this garbage text, and gives it the big minus sign, which eliminates it.
Crimedata <- dumps this slimmed down table back into your table and so you are good to go.
–
Agenda
–Assignment #1 due Feb. 6: Managing Data / Static Graphic
–GGplot
–Work with sample Twitter data
–Conventions in coding
–Visualizing data
ggplot2 - charts and maps
htmlwidgets - web visualization interactives
plotly - exporting charts online
Chart
Exercises –Continue with Basic Data Visualization Exercise questions
–Answer to Monday’s in-class exercise https://bit.ly/2RG6VNy
–GGPLOT –A handy explanation of ggplot and its components
If you’re using ggplot: plus it! For everything else: pipe it!
geom_point() geom_bar() geom_boxplot()
–DPLYR
–Dplyr Presentation
Five basic verbs • filter() • select() • arrange() • mutate() • summarize() plus group_by()
–Pipes - a Much-Used Command to Link Filters, Functions
pipe %>%
CMD + Shift + M
–Presentation from Bob Rudis on Writing Readable Code with Pipes, delivered at the rstudio::conf 2017.
https://www.rstudio.com/resources/videos/writing-readable-code-with-pipes/
Pipes as a way of chaining commands.
object %>% operation() —> result
–Key Concepts - Moving Forward:
Dplyr: Filters, Grouping, Sorting, pipes %>%
Pipe shortcut = CMD + SHIFT + M
Basic data visualization
Tidyverse
Notes
–How Do I? https://smach.github.io/R4JournalismBook/HowDoI.html
Resources
Basic Charts in R
https://www.youtube.com/watch?v=1EUJ0tsVsUA&t=12s
Reading
Machlis Chs. 7 & 8.
Transforming and Analyzing Data dplyr.pdf, Andrew Ba Tran, Washington Post
Charts_with_ggplot by Andrew Ba Tran,
Washington Post
–
–Note: Dr. Wells will be late - 10 a.m. - due to data mining presentation in the library
–Please begin class by working on this exercise: https://bit.ly/2Sqn1j1
–Discuss Twitter Metadata
–Work with sample Twitter data
https://developer.twitter.com/en/docs/tweets/data-dictionary/overview/user-object
–GGplot Video from Andrew Ba Tran
https://www.youtube.com/watch?v=Sx7d7eGRSj0&t=9s
Exercises
–AOC Twitter feed
https://bit.ly/2Sqn1j1
–Graphing in GGPlot
https://bit.ly/2Gqjfj4
–Multiple variable in a graph –Geom_Line, Geom_point, Geom_bar –How to alter the colors in a chart.
Notes
–Data Wrangling-Text Mining in Twitter.
See entire scraping sequence. Extract from Twitter.
Additional Exercises
–Ch. 3.8 Run a remote script to make an interactive map
–Median Income For City.R
–Work with sample Twitter data
–Other Topics
–Setting up an R Workflow
–Review Assignment #1
Assignment1_KEY_StaticGraphic_2_9.R
–Dplyr bootcamp.
See this exercise:
DPLYR_bootcamp_Feb_9
Dplyr bootcamp, #2.
Dplyr_Tutorial_Tran
Dplyr - Andrew Ba Tran - pipes-dplyr.pdf
Homework!
–Hand in your Dplyr Cheat Sheets on Blackboard by 11:59 pm Monday.
–Did I say this was Dplyr bootcamp?
Reading:
Machlis Chs. 9 & 10
Albert Cairo, “The Functional Art,” Principles of Data Visualization.
–Finish last question on DPLYR_bootcamp_Feb_9
6: Calculate a violate crime rate per 1,000 students.
Create a table selecting campuses, enrollment and crime rates
just for the top 5 crime rates
–Andrew Ba Tran dplyr class lesson
Exercises
–Andrew Ba Tran dplyr class lesson
The repo for this class is on Github, but can be easily downloaded to your desktop with the following commands:
install.packages("usethis")
usethis::use_course("https://github.com/r-journalism/learn-chapter-3/archive/master.zip")
--Before running the command, make sure the script is in the working directory folder and that the SHR76_16.sav.zip file is in the data sub folder.
--For example, on my computer, the import_murders.R script is in the dplyr folder and SHR76_16.sav.zip file is in the dplyr/data folder.
–And if this blows up, we have a backup: Return to dplyr bootcamp, graphing exercises.
–Themes for data viz library(ggthemes)
Homework
—Finish Andrew Ba Tran dplyr tutorial.
—Bring questions to class
-Using this data, produce two dplyr queries that investigate the murders dataset for Arkansas.
-In class we looked at black and white victims and unsolved murders. Find other queries that would yield newsworthy material.
--Bring questions and your queries to class on Monday. I will call on you to discuss.
–
Agenda
–Analyzing Tweets from public officials: Dataset AOC
–Study Twitter meta data
https://twittercommunity.com/ https://developer.twitter.com/en/docs
–Register as Twitter Developer https://developer.twitter.com/en/account/get-started
–Extracting Text from Twitter
–Lubridate
Class Discussion
Material we discussed in class, including Mohamed's graphs
https://bit.ly/2GQ1w4O
Homework
—Finish Andrew Ba Tran dplyr tutorial.
—Bring questions to class
-Using this data, produce two dplyr queries that investigate the murders dataset for Arkansas.
-In class we looked at black and white victims and unsolved murders. Find other queries that would yield newsworthy material.
--Bring questions and your queries to class on Monday. I will call on you to discuss.
Assignment #2 Details
Visualization of Twitter data. Due Feb 27, 11:59 p.m.
Students will produce publication-ready graphics from Twitter data.
Data dictionary required
Details:
With the assigned dataset, Coulter.csv, students will produce the following tables:
--Most common words used in Twitter text field
--Most common hashtags
--A chart of the volume of Twitter messages by date
Based on this information, write a 300-500 word report, following AP style, that describes potential newsworthy trends in this Twitter feed and insights from your analysis.
By 11:59 pm., you will submit the following on Blackboard:
3 charts in .jpeg format: 1) Common words 2) Common hashtags 3) Date trend chart
One Word document with your findings, 300-500 words. Append at the end a brief data dictionary describing the Twitter data fields you used in the assignment.
An R script with the coding that shows how you loaded and cleaned the data and produced the charts
Exercises
Extracting Text Strings from Twitter
https://bit.ly/2X5V9jL
What we will produce
Reading:
Machlis Chs. 11 & 12
Twitter metadata
https://twittercommunity.com/ https://developer.twitter.com/en/docs
For working with dates library(lubridate)
Dealing-with-dates.pdf by Andrew Ba Tran
https://github.com/profrobwells/Data-Analysis-Class-Jour-405v-5003/blob/master/Readings/dealing-with-dates.pdf
–
Agenda
–Lubridate
–Processing and counting hashtags
–Using Lubridate
The exercise
Lubridate_Intro_Feb_20.R
https://bit.ly/2H07YpX
What we will produce
–Key to Lubridate Questions in Exercise:
https://bit.ly/2BO92d1
Homework
Work through this exercise and bring questions to class:
Splitting Hashtags 2-25-19.R
https://bit.ly/2BQIE2i
Data Wrangling-Text Mining in Twitter.R
–Solving Problems in R: Bend Templates to Your Will
–Analyzing Hashtags –Finding Narratives In Twitter
–#1: Solve the chronology problem with the chart
–#2: Splitting Hashtags
Questions on Exercise
Splitting Hashtags 2-25-19.R
https://bit.ly/2BQIE2i
–Data Cleaning
–Disaggregating variables for summation
–Searching for narratives in Twitter
Resources:
Lubridate vignette
browseVignettes(“lubridate”)
StackOverflow
https://stackoverflow.com/questions/46691933/r-sort-by-year-then-month-in-ggplot2
Grammar of Graphics http://vita.had.co.nz/papers/layered-grammar.html
–Joining Dataframes in R
https://www.youtube.com/watch?v=gLg4D9bMIyc&t=13s
–Data Wrangling http://learn.r-journalism.com/en/wrangling/
http://learn.r-journalism.com/en/wrangling/dplyr/dplyr/
https://github.com/r-journalism/learn-chapter-3/blob/master/dplyr/pipes-dplyr.R
Notes
--The pie chart focuses the reader on large percentages, and encourages the reader to think of the total
--The stacked bar plot provides the same information, but makes it easier to accurately determine at a glance how large each group is out of the whole.
--This bar chart splits the categories horizontally, and draws attention to how the family members are ordered. It encourages the reader to think about the distribution rather than disconnected categories, and gives a better sense of sense of scale.
Reading Machlis Chs. 13 & 14.
Resources:
–
Agenda
Review Assignment #2
Coulter Tweet Analysis #2 Exercise
https://bit.ly/2TdY9Mv
Assignment #2 Discussion
Answer Key discussion
top_n function makes life easy
aes - reorder in ggplot
Homework
Discuss Kavanaugh text mining story
Complete Coulter Tweet Analysis #2 exercises
Agenda
Coulter Tweet Analysis #2 Exercise
https://bit.ly/2TdY9Mv
More on this topic!
Comparing AOC and Coulter tweets
Misc Items
Check what software packages are running: Global Environment
^ + shift + 8 = Zoom to Environment
Reading
Machlis Chs. 15 & 16
–Read Kavanaugh text mining story: Text analysis of Brett Kavanaugh’s opinion.
http://www.storybench.org/bringing-textual-analysis-tools-to-judge-brett-kavanaughs-latest-opinion/
Samantha Sunne, “The Challenges and Possible Pitfalls of Data Journalism, and How You Can Avoid Them,” American Press Institute, 2016
Resources: –Create R Markdown document, export to PDF, HTML
Exercises
–
Agenda
**Extra Grad Assignment** - Due March 15.
**Report from NICAR Conference**: Katie Beth Nichols
Resume AOC-Coulter Data Mining Exercise:
Coulter Tweet Analysis #2 Exercise
https://bit.ly/2TdY9Mv
Key to the exercise:
https://bit.ly/2F3brlh
Agenda
Mapping Exercise from Machlis book, Ch. 11
https://bit.ly/2VXCSU2
Bots
Bot or Not: Difficulty determining a bot on Twitter
--An app that uses machine learning to guess if a Twitter account is a bot
https://www.r-bloggers.com/botrnot-an-r-app-to-detect-twitter-bots/
https://mikewk.shinyapps.io/botornot/
--Article about Botometer
https://www.vox.com/technology/2018/4/9/17214720/pew-study-bots-generate-two-thirds-of-twitter-links
--Stanford research paper on this topic
https://pdfs.semanticscholar.org/e219/6b47133c2191d380098744c13ba77133e625.pdf
Resources
--Andrew Ba Tran - Week 4 Mapping
http://learn.r-journalism.com/en/mapping/
--Graphing GGplot 12-28.R
Exercises from Machlis Ch. 9. Facets
Ch 3 Exercises:
Stock chart exercise used quantmod is a library for financial analysis.
dygraphs creates *interactive Web graphics* of data over time.
Reading Machlis Chs. 17 & 18
Homework Over Spring Break
R_Homework <- GiveMeABreak(homework = 0), c("enjoy yourself", "make good life choices", "call your parents")
–
.
Agenda
1) Due April 3: Assignment #3. Interactive Map
Produce a basic interactive map in R
2) Updated R software. Open and run script: https://bit.ly/2TdY9Mv
3) Washington Post White House reporter Josh Dawsey, visits with students, 2 p.m. Wednesday April 3, Student Media conference room
4) Sign up for a census key:
https://api.census.gov/data/key_signup.html
5) What is an API?
6) Video of Machlis mapping
https://www.youtube.com/watch?v=HFJOV5XaU_U
See R script:
Maps in R March 24 2019
https://bit.ly/2FvZDrB
Exercises
Adapt Machlis - Maps in R Ch 11 exercise for Median Income in Arkansas
Reading
APIs - basics
https://medium.com/@LewisMenelaws/a-beginners-guide-to-web-apis-and-how-they-will-help-you-23923a0da450
A gentle introduction to APIs for data journalists
https://trendct.org/2016/12/29/fetching-airport-delays-with-python-a-gentle-guide-to-apis-for-journalists/
Twitter historical API
https://developer.twitter.com/en/docs/tutorials/choosing-historical-api
“Connecting the Dots” by Jacob Harris (2015) and discuss how people should or should not be represented through news visualizations.
Agenda
Adapt Machlis - Maps in R Ch 11 exercise for Median Income in Arkansas
Maps in R March 24 2019 - Using the Census API
https://bit.ly/2FvZDrB
Building a Census tract map
We will build this
Resources:
Census Reporter to look up tables
https://censusreporter.org/
–
Agenda
Due April 3: Assignment #3. Interactive Map
Questions on mapping exercises
Joins in R: https://bit.ly/2OFnGJ6
What is Sentiment Analysis
What is TidyText
Resources
Sentiment analysis:
Token - Tokenization
A token usually refers to a word (or could be any "meaningful unit of text") that is in a special table designed for analysis.
Tokenization = the process of splitting text into tokens.
One-token-per-row structure allows matching to sentiment dictionaries.
Sentiment dictionaties
--AFINN from Finn Årup Nielsen,
--Bing from Bing Liu and collaborators, and
--NRD from Saif Mohammad and Peter Turney.
NRC Word-Emotion Association Lexicon (aka EmoLex)
http://saifmohammad.com/WebPages/NRC-Emotion-Lexicon.htm
Issues: unigrams. Measure single words. Doesn't assess qualifiers before a word, such as in “no good” or “not true”
Can measure bigrams
Putting it all together
Exercises
Sentiment analysis tutorial 3-31-19.R
https://bit.ly/2V7bxPl
Joins in R:
https://bit.ly/2OFnGJ6
AOC sentiment analysis 3-31-19.R
https://bit.ly/2CKsWq4
Agenda
Continue with Sentiment Analysis
Quick hit: Analyze Dawsey tweets
https://bit.ly/2WKJcih
Sentiment analysis on Josh Dawsey tweets
Adapt this for the Dawsey exercise
https://bit.ly/2CKsWq4
Twitter activity of our guest speaker
Reading
Foundational text on sentiment analysis!
http://varianceexplained.org/r/trump-tweets/
Tidy text mining
https://www.tidytextmining.com/tidytext.html#
Joins in R
Sentiment analysis
http://saifmohammad.com/WebPages/NRC-Emotion-Lexicon.htm
Saif Mohammad research on sentiment analysis
http://saifmohammad.com/index.html
Resources:
Reading Bad data visualizations. Data Translation.
The Journalist as Programmer: A Case Study of The New York Times Interactive News Technology Department http://isoj.org/wp-content/uploads/2016/10/ISOJ_Journal_V2_N1_2012_Spring.pdf
–Visual Narrative Tricks by Albert Cairo https://www.youtube.com/watch?v=TSGaueL4Ggk
What is code? http://www.bloomberg.com/graphics/2015-paul-ford-what-is-code/
–
Agenda
Continue with Sentiment Analysis
Exercises
Sentiment analysis of Dawsey tweets
Adapt this for exercise: https://bit.ly/2CKsWq4
AOC, Coulter sentiment analysis
https://bit.ly/2CKsWq4
Reading
Julia Angwin, Terry Parris Jr., Surya Mattu. “Breaking the Black Box: What Facebook Knows About You,” ProPublica, 2016;
Nicholas Diakopolous, “Algorithmic Accountability,” Digital Journalism, 2014.
Resources:
Agenda
Julia Angwin article
Compare sentiment over time.
Bigrams
Twitter sentiment over time
Exercises
Sentiment and Time Analysis 4-9-19.R
https://bit.ly/2UtNP3q
–
Agenda
Happy Tax Day
Happy Boston Marathon Day
Assignment #4 is posted
Online class Wednesday
Adapt April 10 in-class exercise to the Ann Coulter feed
Bigrams
Coulter Bigrams - Relationship
Exercise
Compare AOC, Coulter Original Tweets in a Chart
Use "Coulter Tweet Analysis #2 March 4.R"
Solution: https://bit.ly/2ZcGiox
Bigrams: http://bit.ly/bigramz
Agenda
Online Class. See Blackboard
How to scrape Twitter using an API key
Reading
Amy Webb future of journalism trends
https://futuretodayinstitute.wetransfer.com/downloads/0e84e883e140bafe9a3436a6464032be20171003123607/ecda17
Google search tips
https://blog.expertisefinder.com/top-6-google-search-tips-for-journalists/
Artificial intelligence in the news
https://aiethicsinitiative.org/news/2019/3/12/artificial-intelligence-and-the-news-seven-ideas-receive-funding-to-ensure-ai-is-used-in-the-public-interest
Sharon Machlins Nicar compilation site
http://www.machlis.com/nicar19.html
Agenda
Review quiz
Bigrams
Simple Web Scraping
GitHub
Coulter Bigrams - Score
Exercises
Bigrams:
http://bit.ly/bigramz
Web Scraping in R: Simple Web Scraper
http://bit.ly/scrapeme
Agenda
Simple Web Scraping
GitHub
Resources:
This class is intended to teach you modern workflow techniques for coding. A centerpiece of that workflow is GitHub. This is a website with a system that allows you to collaborate with other programmers on coding projects. It manages versions of software code and is a very popular with the tech elite.
Your GitHub account, which is public, represents an important professional image. Prospective employers and collaborators will look at your GitHub account.
--Create a GitHub account.
https://github.com/
--Follow this tutorial
https://guides.github.com/activities/hello-world/
--Max Harlow Presentation on How to Use GitHub
https://docs.google.com/presentation/d/1MbltRcOerktc-E26HMDjYj0BO9CTubQWu1Z2bB9CpVY/edit#slide=id.g448ccc227721fe56_10
--Simplified GitHub- GitHub Desktop
https://help.github.com/en/desktop
Exercises
See Basic GitHub 4-22-19.R https://bit.ly/2UAMGTd
Resources Installing Git for a Mac - Andrew Ba Tran
–
Agenda
Measuring Engagement: Do the Ugly Tweets Get the Most Likes?
Twitter Engagement April 28.R
http://bit.ly/2GParD5
AOC Engagement and Sentiment
Important!
Course Evaluation
Please do me a favor and evaluate this course.
It's important to me and the department to get your thoughts
on what worked and what did not.
I was very happy with how things turned out this semester and
intend to offer this course again. If you think it is important,
then please take five minutes to fill out the survey.
https://courseval.uark.edu/
Last Class!
Assignment #4: Politicians Twitter Feeds, Due Tonight
Agenda
Return to Twitter Engagement
http://bit.ly/2GParD5
R Markdown
http://bit.ly/2DBLSaX
Turn your R cheatsheet into a PDF
Turn your R cheatsheet into a web page on GitHub
Congratulations!
We covered a lot this semester
–
–Setting up an R Workflow http://learn.r-journalism.com/en/publishing/workflow/r-projects/
Resources on GitHub
–GitHub flow
https://guides.github.com/introduction/flow/
–GitHub Guides
https://guides.github.com/
–Another GitHub guide
https://andrewbtran.github.io/NICAR/2018/workflow/docs/03-integrating_github.html
This program is loaded on all of the computers in Kimpel 146. You can load it on your laptop as well. It is not large and doesn’t tax the memory a lot. R runs on Windows, Mac and Linux, but this course uses the Mac version. If you use Windows on your personal laptop, the instructor is not responsible for any variations in the lessons and instructions.
First, download the most recent version of R at https://www.r-project.org/, . Look for the link to download R. Click that download option and you should be taken to CRAN, the Comprehensive R Archive Network, and a list of CRAN servers, called mirrors, around the world. Pick a server and choose the precompiled binary distribution for your operating system. Once the file finishes downloading, install it like any other software program – run the .exe for Windows or .pkg for Mac.
Accept all of the default settings for Mac.
Second, install RStudio, an excellent user interface that helps you manage and create R code. Download the open source edition of R Studio desktop and follow the prompts to install it. http://www.rstudio.com/products/rstudio/download/
More information: http://www.machlis.com/R4Journalists/download-r-and-rstudio.html