Introduction to Data Journalism
¿What is data journalism?
- Gathering Data, scrapers.
- Understanding Data: ‘cleaning’, ‘analysing’ or ‘interviewing’ to find leads or evidence.
- Presenting Data: visualisation, interactivity and personalisation
Not just numbers and statistics
Data is everywhere: audio, video, connections on social networks, photographs, documents, financial information, likes on Facebook, web search results, numbers on spreadsheets.
Also Metadata
When information was scarce, most of our efforts were devoted to hunting and gathering. Now it is abundant, processing is more important.
We process at two levels: (1) analysis to bring sense and structure out of the
never-ending flow of data and (2) presentation to get what’s important and relevant into the consumer’s head.
Philip Meyer
Professor Emeritus, University of North Carolina at Chapel Hill
Software is your friend
It helps you to: gather data, clean data, analyse the data, find connections, find patterns or changes, find the story, tell the story.
Computer Assisted Reporting (CAR)
- Pioneered in the US because Philip Meyer’s coverage of the 1967 Detroit riots.
- Bill Dedman’s revelations of racial bias in lending policies of banks in the 80s
- Steve Doig’s analyses of the damage from Hurricane Andrew in the 90s
- Police raided after hours drinking club in predominantly black neighbourhood.
- Deadliest riots in US history 43 dead, 467 injured, 7,200+ arrests 2,000+ buildings destroyed.
- Psychology professor Nathan Caplan and journalist Phillip Meyer used social science techniques to look for root cause.
- Rioters thought to have low economic status or educational levels.
- Also blamed recent immigrants from the South.
Among the Findings
- There was no correlation between economic status and participation in the disturbance.
- College-educated residents were as likely as high school drop-outs to have taken part.
- Recent immigrants from the South had not played a major role; in fact, Northerners were three times as likely to have rioted.
Precision Journalism
“Precision journalism was a way to expand the tool kit of the reporter to make topics that were previously inaccessible, or only crudely accessible, subject to journalistic scrutiny. It was especially useful in giving a hearing to minority and dissident groups that were struggling for representation.“
Philip Meyer
Author of Precision Journalism:
A Reporters Introduction to Social Science Methods (1973)
80s - Racial Bias in Lending Policies
- Through a FoIA request to the Federal Home Loan Bank Board loan Bill
- Dedmanobtained applications
- Analysed 10 million applications for loans in the US
- Derived rejection rates from a national analysis by race, sex and marital status
- Revealed in much of the US high-income blacks were rejected at the same rate as low-income whites
- Blacks were rejected more than twice as often as whites
- Race better predictor of success than sex or marital status
The Color of Money
What went wrong in 90s with Hurricane Andrew?
- Stephen Doig, Miami Herald
- Compared dates, damage and paths of other Florida hurricanes
- Merged damage reports with the property-tax roll
- Newer houses more significantly damaged
- Discovered that during Florida’s 27-year hurricane-free period, houses were not built to code
- Analysed of campaign contributions showed one out of four campaign dollars from building industry
- “Instead of just supplying anecdotes,” said Doig. “We provided powerful evidence with data analysis
“CAR is the use of computers and social science methods to acquire and analyze information to tell stories that otherwise would be difficult or impossible”.
Stephen Doig
New Approaches to Journalism
- New technology
- A move towards greater openness
- Application of social science technologies to news gathering and reporting
- Greater computer access and faster processing
- Faster, cheaper, more readily available software
- Larger data sets
- More people accessing news digitally therefore more scope different ways of telling stories
Modern Data Journalism
- Automate data gathering (scraping)
- Find stories in data sets using spreadsheets (Excel) or databases (Access and SQL)
- Data visualisation and infographics (Google Fusion, Tableau, CartoDB)
- Explain how stories affect an individual
Data Scraping
Automates the process of gathering data: Helium, Outwit, Scraperhub, Quickcode, Python, Ruby, R.
It Starts with a Question
Crime rates, financial information, public services, environment, planning and building regulations, connections, education, health.
Interviewing Data
- A dataset is a source like any other, though with very specific knowledge and an infallible memory for facts
- To interview it, you must learn to ask questions in a language it understands
- It cannot lie to you, but be aware that it may have been lied to by others…
More examples of data journalism
Data of the Week http://gijn.org/2014/09/29/top-ten-ddj-the-weeks-most-popular-data-journalism-links-29/
What is Big Data? http://gijn.org/2014/09/09/what-is-big-data/
Tableau Gallery http://www.tableausoftware.com/public/gallery
DataBlog The Guardian http://www.theguardian.com/news/datablog
Country Statistics http://www.nationmaster.com/
Spurious Correlations http://www.tylervigen.com/
Where do we find data?
- Published data: open government data, datasets published by NGOs, researchers.
- Requested data: freedom of Information Law, scraping.
- Leaked data: whistle-blowers, inside sources.
- Created data.
[Journalism is] going to be about poring over data and equipping yourself with the tools to analyse it and picking out what’s interesting. And keeping it in perspective, helping people out by really seeing where it all fits together, and what’s going on in the country.”
Tim Berners-Lee
More webgraphy about data journalism
16 useless infographics. (August 1st, 2013). The Guardian. www.theguardian.com/news/datablog/gallery/2013/aug/01/16-useless-infographics
Gray, J.; Bounegru, L. y Chambers, L. (2012). The Data Journalis. Open Knowledge Foundation. http://datajournalismhandbook.org/1.0/en/index.html
Data Journalism Blog. A news site tackling innovative projects made with data, in the newsroom and elsewhere. http://www.datajournalismblog.com/
BBC College of Journalism. Journalism blog of BBC Media. http://www.bbc.co.uk/blogs/collegeofjournalism/entries/89cf3a79-1a82-3b83-8cad-67451dbcee95
Weaver, D. y McCombs, M. (January 1st, 1980). Journalism and Social Science: A New Relationship? Public Opinion Quarterly, Vol. 44, Issue 4, 1 pp. 477–494. https://doi.org/10.1086/268618