Massachusetts Open Checkbook: running through the ledger of choices and challenges in open government

On December 5, Massachusetts Governor Deval Patrick joined with state treasurer Steven Grossman to create an open government initiative with the promising moniker Open Checkbook. The site launched to some acclaim and has received over 220,000 hits. I decided to take a look at what's offered and what's missing from this site, and to ask someone in the government here in Massachusetts to describe their thinking in creating the site. The results can give us some insight into the effort it takes at each stage to release government data--and even more significantly, what it takes to increase the data's value.

As a finance project, Open Checkbook hones in on one area of open government: how it spends. With Open Checkbook you can find out where the money goes in the Massachusetts state government, right down to particular salaries or particular payments to vendors. This is highly welcome in a tight economy, especially in a state that is still often unfairly tarred as "Taxachusetts," decades after tax rates were lowered--a state where news of patronage and pension scandals is common enough to get tiresome--a state where cynical voters have put referendum questions on the ballot in favor of lower taxes at least three times.

I discussed Open Checkbook with Jeffrey Simon, who works for the Governor as the director of the state's economic stimulus program and who was involved in the Open Checkbook from the beginning. The site is run by a steering committee formed by the Governor and Treasurer and made up of members of their staffs. The approach being used in Open Checkbook is based on the experience they had developing the state's stimulus program, and the website that Director Simon's office created for that program. The steering committee has been eager to add context to data, helping visitors who are uninitiated in the arcanery of state budgeting get a sense of what expenditures are for.

A first look at the web site

Let's take a quick tour of this service. To get to the home page, visit the main Massachusetts government portal, look down the right-hand side, and click on "Open Checkbook." You can then explore finances along several dimensions. For instance, choosing "View by Department" gives you a table of expenditures at a high level of abstraction, ranging from "Administration and Finance" to "Transportation." An attractive pie chart also appears. At this high level, the pie chart strikes one as odd because it shows an entirely different breakdown of expenditures from the table on the left. However, it makes sense once you realize that the pie chart reflects a breakdown of expenses within a particular category: for instance, what percentage of "Administration and Finance" went to various commitments, such as the Department of Revenue. You can investigate expenditures by drilling down in two ways:

  • View "Administration and Finance" in more detail by clicking on the plus sign next to its row in the table.

  • View the percentages of different expenditures by hovering over parts of the pie chart, and then view (for instance) the different Departments of Revenue by clicking on its segment in the pie chart.

If you persist in clicking down through the table on the left, you eventually see payments made to individual vendors. And then, by hovering over a vendor's name, you can pull up "Vendor Details" and eventually see a particular expenditure on a particular date, including the Fund Name and Account Name. Other parts of the Open Checkbook site break down expenditures by vendor or by relatively abstract "spending categories" instead of by department.

At lower levels, one also encounters some of the neat twists added by the site's creators. By hovering over the name of an expenditure, one can pull up a brief description. By clicking on the name of an agency, one can go to the agency's web page.

Expert reactions

Open Checkbook gets very high marks from Kaitlin Lee of the Sunlight Foundation. The level of detail provided on each vendor goes far beyond data provided by most states. Crucially, the state shows full information about the program that funded each expenditure (funding type, account number, funding type, object code classification, and fund name, whether the source is federal or a general fund), so that researchers can trace the flow of money from programs all the way through to payments. Future plans to include data on tax credits impressed Lee, because most states don't consider tax credits worth reporting in their data sets--but of course, it does represent an expenditure, and sometimes a quite controversial one. (The Boston Globe, for instance, recently highlighted tax credits to film companies that took wings on their own.) As for the commitment by Open Checkbook to update the site nightly, Lee enthusiastically called it "unprecedented." She was also impressed with their informative FAQ and list of future plans.

I asked Beth Noveck of New York Law School (and formerly the Deputy Chief Technology Officer in the Obama Administration) for a comment. She writes:

Open Checkbook is a fabulous exemplar of a government using open data to make itself more transparent to the public. To become even more accountable and effective, the next iteration of Checkbook 2.0 should ideally track how people are using the data downstream, and how are agencies are encouraging the creation of useful mashups and visualizations. To promote this kind of participation, the creators can start by articulating what kinds of visualizations they want to see people build and create a feedback loop by showing how have they taken action (e.g., projects cut, money saved) based on this data.

I also exchanged email with John F. Moore, Founder and CEO of Government in The Lab, who wrote me:

Massachusetts, especially under Governor Patrick, has taken a proactive role in using technology to better engage citizens in the process of government. Their early efforts on social media led to a very high quality social media toolkit for all government employees to use, providing best practices and legal guidance. The Open Checkbook project is another solid step forward, working to engage citizens by providing the information that is most interesting to citizens in tough economic times: how the government is spending money. From the viewpoint of the average citizen I think the site does a great job. It is fairly easy to use, has a search that works as you would expect, and allows people to easily answer basic questions.

The State of Massachusetts would be well served running more advertising to continue to build awareness amongst citizens of this resource. Too many governments entities invest time and money building wonderful solutions only to have them underutilized by citizens due to the lack of awareness that the solutions even exist.

Levels of usability for open data

After delving down to the detail pages of Open Checkbook, one can download a table in the form of a CSV file, as mentioned earlier. Why is this only at low levels, rather than at more comprehensive levels where one can glean more interesting data? Part of the reason has to do with sheer volume: some pages have very long tables and data downloads would strain the web site. But a more fundamental challenge exists.

Let's think for a moment about the role of structuring data. Contributions follow a power law. In open government, a few programmers with both the knowledge and the zeal to create applications will sift through government data, and the rest of the population will gratefully consume the information presented in easy-to-read forms. So the value of government data--the topic with which I started off this posting--depends a lot on its accessibility to programmers to turn it into consumable information.

Alexander Howard has reported on a five-star rating system Tim Berners-Lee developed for government data, which he nicely laid out in a video of a Government 2.0 presentation. Because Berners-Lee's system highlights relatively abstract concepts of open formats and Linked Data, I will condense the possibilities to three:

Moving paper documents to the Internet

In this most rudimentary effort, agencies put up PDFs or Microsoft Word files instead of printing them. This saves an investigator the cost of a stamp or a trip down to the registry, and definitely can make a quantitative difference in the amount of research using government information. But it's not clear how much of this information are useful for third-party applications, and extracting such data becomes an immense chore.

Adding metadata

At the next level, documents are enhanced with semantic mark-up. Laws, regulations, and council agendas, for instance, can be marked up to indicate titles, dates, summaries, and other structured sections. (The U.S. Congress's Thomas site is one of the best-known examples.) Raw data is presented in tabular form, often with pie charts or other visualizations, and sophisticated systems let you sort and filter displays by its columns.

Programmatic access

At the highest level in current use, data can be loaded into programs for big data research. The key here is a regular format (comma-separated value files are fine), although APIs are useful to allow queries according to criteria chosen by the user, such as "Show me all agencies that spent over one million dollars on consultants in 2009."

Programmatic access is the empowering factor that brings data to cell phones and visualizations.(We'll take another look at Berners-Lee's Linked Data later.) And it's increasingly common. Many of the data sets on the federal government's open government data platform, can be downloaded programmatically. Socrata is a good model for the use of an API, providing precise data types that facilitate number-crunching by computer. The state of California doesn't seem to have APIs, but does sport some impressive data sets. For instance, with a few clicks, I can download a file of electricity consumption in California by any combination of counties, sectors, and years.

Lee believes that an API, while useful for retrieving small quantities of data, isn't too important. She's just as happy with one button allowing you to download all the data from the site in a ZIP file. Organizations like the Sunlight Foundation can then create API access to the data for the pleasure of other researchers. Lee's criticism of Open Checkbook, therefore, focuses on how little data you get with each download. Each download is limited to the first 10,000 rows, which may be nowhere near enough to retrieve the data on a large page. But the state could probably upgrade the site fairly easily to meet her criteria.

The research you can do on data is limited to a great extent by the way the data was collected. While I can easily compare electricity use by county in California, I can't ask "What percentage of electricity went to air conditioning." In the future, if California institutes a "smart grid," we could even investigate the consumption of air conditioning (and perhaps snoop on each other in interesting ways). Meanwhile, someone might be able to ferret out some facts about air conditioning by downloading information on a county basis and applying data retrieved elsewhere about climate.

Open Checkbook is considering an API, according to Simon, but the steering committee has to evaluate the costs of creating the API and compare it to potential benefits. So it's not certain that Massachusetts will move soon to the highest level of my hierarchy.

It should also be noted that Open Checkbook contains only state expenditures. Municipalities aren't covered (except that one can find what the state paid to them), nor are independent agencies like the MBTA transit system. Incorporating all those entities into Open Checkbook is a long-term goal.

Currently, one can just retrieve tables manually in CSV format (which can then be loaded into any one of many popular spreadsheets or scripting libraries). Furthermore, each table covers a small data set. What would it take to process data on a state-side basis?

The costs of consistent data

Tabular data requires the consistent application of categories during data entry. Data would be of little use--and in fact would offer the deceptive appearance of useful information--if different agencies classified something like advertising or disability payments in different categories but had their different classifications combined in one table.

To solve this, Berners-Lee promotes Linked Data, the cause to which he has devote most of his public life for the past few years. Technologies for Linked Data haven't been widely adopted yet, but are producing impressive results in scattered applications. (The thrust behind Linked Data is beyond the scope of this article. I'll just say that if you're comfortable with the concepts of taxonomies and ontologies you'll adapt quickly to Linked Data, and that if you're not comfortable with those concepts you'll probably want to tiptoe off and not hear any more.) Berners-Lee assures us in his Government 2.0 video that Linked Data solutions can be developed informally and incrementally, shared on a voluntary basis among institutions without bureaucratic intervention. But the creation of a data description is only one small step toward harmonizing data sets.

To answer the question mentioned earlier in this posting ("Show me all agencies that spent over one million dollars on consultants in 2009") the state would have to impose rigid workflow rules about categorizing and entering data. Training and enforcement could be entail prohibitive costs. Extracting data in structured formats from many different databases is also a burden.

If consistency could be achieved, the next level would be to coordinate data sets across many different jurisdictions. But this would take a good deal of effort to set and apply standards. Again, creating a data description is just one step in the process of collecting useful data, whether it's data on consultants in Open Checkbook or the use of electricity for air conditioning in California.

Public reactions and future plans

Still, the current level of organization in Open Checkbook promises benefits both inside and outside the state administration. For instance, the Committee for Public Counsel Services streamlined its accounts by combining small payments in checks of $50 or more. Simon is hoping that financial officers in agencies could find who was paying individually for services that the state could purchase more cheaply on a bulk level. The Secretary of Administration and Finance can quickly find out what an expenditure was for and to whom it was paid, without having to pick up the phone and ask for a report.

Journalists and activists can also indulge their investigative impulses. The salaries of all employees and pensions paid to retirees are all visible. Given some of the recent abuses reported in the press, a lot of citizens will welcome this transparency.

Simon says that the Governor and the Treasurer also want to use the data to create a dialog with the public. In just the month that the data has been released, many comments and suggestions have come in. Some of the interactions include:

  • One vendor complained that the site listed her home address. Investigation revealed that this was a systemic filtering problem affecting about 100 vendors. The team fixed the problem within 12 hours, and the Comptroller made permanent corrections to the accounting system so the issue will not recur.

  • After receiving multiple questions, the site developed a standard response form and updated their FAQs.

  • Users pointed out incompatibilities with Internet Explorer 9. A link was created on the Open Checkbook home page instructing users how to enable IE9 compatibility mode. A future update will fix the problem permanently.

Simon welcomes error reports and would like even more public feedback. Lots of positive comments have also been received, such as "I am overjoyed that we now have tool like this in the state of Massachusetts" and "Finally we have a politician who follows through." Eventually, I think, it would be valuable for the state to set up forums with logins and discussion areas to churn up group discussions of topics of interest to the public.

