A while ago, I recall reading about Public Data Ferret, a nonprofit that writes stories based on government documents and data, mostly from Washington State and local governments. I remember thinking it was a neat idea at the time, then moved on and forgot about it.
Until today, when Googling a locally breaking story on a teacher whose license was suspended only to find out that I’d been scooped —by practically a whole day. Actually, I found out about the story when the school district sent out an e-mail to parents. And since the folks at the Data Ferret’s site, socialcapitalreview.org had a story that included comments from the district’s superintendent and PTO, I assume that the district knew the story was coming.
Part of me was a little ashamed for not getting the story first, but mostly I was pretty excited. The state Office of Superintendent of Public Instruction publishes teacher discipline decisions (and loads of other data), but it’s only organized by alphabetical order, not by date, and you can’t see what district the teacher is at until you open the linked pdf — PDF! — with each teacher. For someone whose primary concern is a group of small local school districts, it’s easy to forget to regularly check such a site. (OK, I forgot.)
Suffice it to say, I was impressed enough to e-mail the blog post’s writer, Matt Rosenberg, who also is the Founder/Executive Director of Data Ferret’s parent organization, Public Eye Northwest. I asked how it worked and if they automatically dug through such documents.
The organization works through what sounds like a lot of people digging through documents. He said they mine documents from a wide range of government agencies, including the state of Washington, and others mostly in Puget Sound. They do original reporting to make the documents understandable and searchable by topic for those who visit the site, such as ethics cases at the University of Washington.
He said they’ve also been doing more work on infographics and want to explore visual and audio storytelling.
And here’s something that definitely piqued my interest because automated anything when trying to keep up with mounds of public data is on my wish-list too: “I am also very interested in working with technologists and journalists to develop a more automated, customizable feed of new, properly titled link-bearing documents only – from government Web sites only. A better Web spider, mining the Government Deep Web, if you will.” He said it may be difficult, though because of non-standard html on government websites, meaning that would have to be one awfully agile spider.
So while I’m no fan of getting beat on stories, it was great to go through this site again today and learn more about this resource for government watchdog. It’s a great model for mining public documents for stories, and worth checking out.