2022 Computation + Journalism Conference

June 9-11, 2022
Brown Institute for Media Innovation
Columbia University

You can see these notes on my Github or on my site (where you can see writeups of other projects).

Session: AI for Everyone

Speakers: Swapneel Mehta (New York University), Christopher Brennan (Overtone), Zhouhan Chen (New York University), Jessica Davis (Gannett Localizer), Matt Macvey (NYC Media Lab), Steve Dorsey (Gannett)

Swapneel Mehta demonstrated a system that shows a dynamic (and predictive?) model of engagement with news websites. (I notice they use Seven Days Vermont as an example. The tool is called SimPPL.

Jessica Davis talks about the Localizer platform (which I regret not having seen). Someone asked a “what is your stack” q; Jessica says mostly Python, Chris says mainly Google Cloud based; Swapneel says Python (and says more about a stack that I don’t catch, sadly) but also Postgres. He mentions Crowdtangle which is interesting since SimPPL looks like something that could fit the niche that Crowdtangle is in (since CT seems to be in maintenance mode and it doesn’t seem like there will be more investment).
There’s some talk about having “product people” in newsrooms. This taps on one thing that is a fundamental issue: few newsrooms are big enough to have a technologist or product manager, and the media spends so little on technology in general that very few commercial entities (or investors) sees them as a market to develop specialty tools for. Mainly, newsrooms use tools developed for other use cases.
Chris: “Going out and scraping everything is a very bad way to build a dataset.” (Good point.)
Jessica is talking about a tool that can automate the process of writing a story over a number of locations (think, automating the writing of a story on unemployment numbers, only across 100 major metros). She calls this “generating relevant stories at scale that can inform our communities.” Also says that these newsrooms need to “wake up,” because these tools are part of everyday life around us.
“Optimizing our internet for clicks and likes is going to be a nightmare when you can use [AI] to generate a huge number of slightly different or tailored versions of the same content.” (I am paraphrasing Chris Brennan here). His point is that we will have to have algorithms that surface quality, not just approximate those through clicks and likes. (I think.)

Investigations and AI

Julia Angwin from The Markup.

The Markup built a tool that folks could volunteer to install that would monitor their FB feeds; with this they showed that they were showing content they previously said to Congress they would not (political, public health subjects).
They are investigating how algorithms shape what we see and what choices we have; for instance, Allstate has an algorithm that offers you lower prices if they think you will shop around.

Hilke Schellman, NYU professor and journalist

You can get into this even if you can’t learn to code, she says. “I can still interrogate these algorithms.”
She says she got into it after talking to an Uber driver, who had applied for a baggage handler position and got a phone call asking him three questions (driven by AI)?
“How can we hold AI and algorithms accountable?” She mentions that if you’re rejected for a job by an AI reviewing your resume, you usually have no idea that happened.
Tested an AI that evaluated call center applicants on their grasp of english. But some of these…aren’t great.
Podcast: “In Machines We Trust”
Advice: benchmark algorithms; benchmark them against each other by measuring the same thing. Bring your own dataset for testing.
File a FOIA when they work with public entities – public universities, public schools, cities, police, etc.
Frustrated with her inbox, she partnered with a team to write a tool that reduced the amount of email she needs to read by 2/3.

Questions from attendees

“Do you ever put in this effort and don’t find anything?” Schellman: “Absolutely yes.” laughter from the crowd. Meredith Broussard (online): “This is one of the things that makes it really hard to fund investigative journalism.” (Good point.)
Julia Angwin: “I wish I could figure out a way to publish these null results…but responsible publishing means going to everyone to comment, and so it hasn’t quite worked. But I wish there was a way to contribute these to public, because a null result is useful.”
Julia Angwin on why the work contributes to accountability: Data is a precursor to policy in our world, we just don’t change without data. (I am paraphrasing as I write these; if you want direct quotes you may need to seek them in a recording of this session.)
Meredith Broussard: “I have been very pleased with the impact.” Did a story on the availability of textbooks in Philly public schools; proved that there weren’t enough textbooks available for students to learn what was on the state mandated tests. Ultimately they spent $400M to buy enough books.
“Humans are terrible at hiring. SO many biases!” (Me: So maybe algorithms could be better?)
except: next anecdote is about BigCo using an algorithm that, it turns out was tossing applications that referred to “women.” The algorithm had learned that applicants with this on their resume did not get hired. So the algorithm was potentially automating human bias. (Pls note that I am paraphrasing. Any misunderstanding is my own.)
I always think of intent on frosting on the cake. The cake is the harm, that’s most of the story. If you get the smoking gun email that proves intent, that’s the frosting. Our job is to write about the harm so that it can be prevented in the future. (Julia Angwin)
Q: “How do you tell the story where there’s no central villain” but instead a faceless algorithm. “That’s a problem because humans really do love a villain.” (IMO, “character bias” is a problem in current journalism.)

Online Communities and Local News

Comparing open-ended community dialogue with local news Hope Schroeder, Doug Beeferman and Deb Roy (MIT Center for Constructive Communication) Local, Social, and Online: Comparing the Perceptions and Impact of Local Online Groups and Local Media Pages on Facebook Marianne Aubin Le Quéré (Cornell Tech), Mor Naaman (Cornell Tech) and Jenna Fields (Cornell University) Storytelling Structures in Data Journalism: Introducing the Water Tower structure Bahareh Heravi (University of Surrey)

Hope Schroeder

Project, take all tweeted stories from the Boston Globe house and Metro accounts, and match that with live (in person?) community conversations and see what the overlap (or lack thereof) is. See Real Talk For ChangeInteresting: Top 3 topics in community sessions were: Housing, Education, Mental Health. Overall, media undercovers the things that were talked about the most.

Marianne Aubin Le Quéré

Comparing local news to local online groups.
Started by interviewing essential workers about how they got news; many were telling her that they got it from local pages, FB groups, Nextdoor, Reddit etc.
Experiment asked people to join random hobby groups, a local group, and like a news page (all on FB).
Ask them how they felt after 4-6 weeks and ask them how they felt about their community (and about how they felt about the local news).
None of these had significant effects on individual community outcomes.

Bahareh Heravi

Intriguing title. “Water Tower Structure for Data Journalism”
Traditional journalistic story structures: Inverted Pyramid, Narrative (rising action, peak, resolution), “Martini” Inverted pyramid, followed by a chronological narrative (the stem) and a kicker (the base). Five act structure (stack of blocks)
“Gold coins” pieces of a data story that help a reader relate or move forward.
Studied 118 data stories from large and small news organizations (love that they included smaller orgs)
Inverted is still the most common structure in data stories.
2nd is the stack of blocks structure
3rd is “other” (It’s a developing space, so structures are still being developed and tried on. )
Suggestion for structure of data story: Water Tower. Inverted pyramid lede, with then topical sections (blocks) and an ending.

Questions

Querent says community orgs she has talked to feel that housing is covered more from a homeowner’s point of view, not enough about tenants, affordable housing etc.
Hope: notes that the Globe is paywalled and would like to experiment with other forms of media (hey GBH doesn’t have a paywall!)
idea: using a gif as the lede of a data story (love this idea)