BetaNYC’s testimony to NY City Council oversight hearing on open data.

Date: Monday, 27 October 2014
To: NYC Council – Committee on Technology
From: Noel Hidalgo, Executive Director

Re: BetaNYC’s Testimony on Open Data Testimony
BetaNYC Testimony Bit.ly bundle: http://bit.ly/betanyc-20141027

Good afternoon Chair and City Council members,

I’m Noel Hidalgo, and I am the Executive Director of BetaNYC. BetaNYC is a civic technology community group. Through our weekly meetings we exploring how to make the city’s data useful to its neighbors. Our goal is to demystify technology, design, and data to city council members, community boards, community groups, and businesses alike. We are a community of developers, designers, mappers, and policymakers who volunteer our time to improve the digital city. We are New York City’s civic hackers. We love open data and open government.

Because of the city’s open data program and our advocacy for better data, our community has grown. In the last twelve months, we have grown to over 2,100 members. This year, three of the four NYC BigApps winners were community members. Forty five percent of NYC BigApps’ semi-finalists were BetaNYC community members. As we discuss the city’s data, we discover more and more individuals who are looking to solve problems with the City’s data. That in turn leads us to more and more feedback about the City’s open data program and the city’s open government practices.

Today, I want to ask one simple question “how does the city, together with her constituents, build the right open data ecosystem?”

From BetaNYC’s point of view, we need three things:

  • adequate human resources,
  • meaningful data, and
  • improved feedback infrastructure.

Human Resources:

First, thank you for allocating additional resources to run the city’s open data program.

BetaNYC is excited to work with the new DoITT commissioner, CTO, and CAO. We want to thank the Mayor’s Office for appointing such talented leadership to run the City’s information technology and data infrastructure. We hope the three of them can staff up their teams as soon as possible. From BetaNYC’s view, there is not a moment more to lose.

As we pointed out in last year’s report, the People’s Roadmap to a Digital New York City, this city’s open data program is on the verge of revolutionizing how we access municipal, transactional data.  We are at a point where public consumption of data is blossoming into a beautiful garden of data, utility, and stories. 

Making Meaningful Data:

Second, the city’s open data program is starting to breathe meaningful data into our lives. Over the last year, a few key datasets have been operationalized and placed online with daily updates. This operation allows the public to derive daily situation reports. This is a good start, but we need sub-daily reports.

Today, in partnership with Code for America, the City of Charlotte, North Carolina, and the City of Lexington, Kentucky, BetaNYC launches Citygram NYC.

Citygram is a tool that converts municipal open data into meaningful notifications. Similar to the City’s Notify NYC program, you can subscribe to topics and locations. These notifications will arrive via SMS or email.

As this is a free and open source tool, you can tell us what features are needed and we will add them in the coming months.

Today, Citygram.NYC starts with NYC 311 calls and the NYPD’s crash database. We must note that the city’s NYC 311 dataset is one of the best. NYC 311 provides an unparalleled and timely view of the city. We hope that many other datasets learn from 311. On the other hand, while the NYPD data is better than last year, it needs significant improvement. NYPD’s data is wildly incomplete and contains notable discrepancies; more on this later. That being said, we are excited to work with these two data sets and make NYC’s open data meaningful.

One year from now, we will live in a city where will receive push notifications about NYC’s public meetings and procurement notices. Currently, BetaNYC and its members are working with Department of Citywide Administrative Services’ City Record team to place their data into machine readable formats. For this to become a reality, we need sustained, open effort from DOITT and DCAS as we build a data format that maximizes utility.

For another example, you can visit Manhattan Community Board Six’s website. CB6 has taken tools from the NYC open data portal and embedded them into their website. On the CB6 website we see contextualized NYC 311 reports of Road Repair Requests, Broken Parking Meters, and Noise Complaints. This contextualized data puts CB6’s website as one of the top referrers to the City’s open data portal and maximizes the public’s understanding of issues related to Community Board Six’s meetings.

If we can get more automatically geocoded data, community boards & city council members can quickly use the city’s open data tools to digest and share relevant municipal data. When community data is easy to use and displayed within context, people love the data.

Improving Tools and Feedback Infrastructure:

Another iconic example of NYC’s love for data is the “day in a life taxi” visualization. While not exactly an “open” data story, it is a prime example of an overwhelming desire for NYC’s data stories.

This visualization drives you around 2013’s yellow cab data. The original data include ~170 Million trips. The data is provided by the Taxi and Limousine Commission via a FOIL request. The intent is to analyze the NYC TLC 2013 taxi tripsheet data, to visualize running counts of fares, tips, and taxes, and to see how/when taxis move around New York.

This visualization has over a quarter of a million unique visitors and 6,500 Facebook likes. In the Facebook ‘like’ world, it is comparable to NYC 311 being liked on Facebook.

Our last story comes from the popular “I Quant NY” blog. Gothamist & Al Jazeera are the only other two news outlets that refer more traffic to the City’s data portal. I Quant NY is the tenth highest referral to the city’s open data portal and beats out the city’s own open data success story blog. I Quant NY refers more traffic to the city’s portal than other data friendly media outlets like WNYC and TechPresident. It should be noted that I Quant NY refers more traffic than Reddit, GitHub, & Facebook.

I Quant NY is so good at humanizing the City’s open data that Mr. Wellington, its editor, has been featured on WNYC, New York Times, Gothamist, and the NY Post. Since launching in February, his site has received over half a million visitors unique visitors and 1,500+ Facebook likes. This is about as many Facebook likes as the New York City Department of Consumer Affairs.

Mr. Wellington recently gave an interview on how the city sees his analysis.

There are two sides to the coin. Anytime you point out something, it could go either way. If you tell the Department of Health that there’s something wrong with the rating system, they could either say, “Wow, let’s look into that” or they could play defensive. Generally, agencies are defensive, but there’s also not a good mechanism for them to take in information like this. They get caught off guard. I hope in the coming years they build in ways to reach out like this. If there were a liaison I could reach out to, maybe I would go that route. But right now, the only way to get attention is through the media. Unfortunately, that can create an adversarial relationship, which I think is the wrong way to look at open data. I really believe that if you empower people, you’ll get much more out than you’ll get criticism.”

BetaNYC’s Conclusion:

To effectively build the right open data ecosystem, the city and her constituents need to develop a shared understanding of possibilities and collaborate on shared outcomes. We need the city to finish hiring its open data team, and we need to have a shared view of the garden.

After two years of planting data seeds, it is time to harvest the garden and build the type of data ecosystem we all want. BetaNYC is honored to represent NYC’s open government data users. We are ready to work with the city and build an equitable and just 21st century city. Attached to this written testimony, you will find nine recommendations, 60 dataset format changes, 13 questions from BetaNYC’s Open Data Fidelity Workgroup, and a listing of 250 top referrers to NYC’s open data portal.

Thank you.
Noel Hidalgo
Executive Director
BetaNYC

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

BetaNYC’s Recommendations: (sorry for the formatting error, working to change that.)

The following has sourced from BetaNYC’s experience and allies.

Improve the format of the following 60 datasets < http://bit.ly/1wtKn3u >.

By default, provide geospatial support for all datasets that have an address.

Improve constituent feedback mechanisms.

  • Perform constituent user testing to see how the City’s web portal works for them.
  • From data set requests, to in-dataset comments, to general usability issues, constituent comments need better visibility and connectivity to the data ecosystem. To start, we recomend joining our NYC Open Data Fidelity Workgroup < http://bit.ly/nyc-od-wg >.
  • Similar to GovLab’s open data 500, look at who is consuming NYC’s open data and how it can be improved. At a minimum, the city should see out its top 100 active users.
  • Similar to the Chicago’s open data blog, the city should have a one stop shop on dataset updates and improvements.

Use the Freedom of Information Law (FOIL) as a feedback mechanism.

  • We think that the proposed NYC Open FOIL legislation takes us in a direction that will update and modernize the City’s FOIL practice. We call on the Mayor’s office to work with the Council to pass this bill.
  • In the meantime, all FOIL officers and the City’s open data team should be looking at FOIL requests as a way to lower the burden of FOIL responses.

Update the City’s open data tech manual.

  • It has been two years since this document was written. Many open data practices have improved and we would love to ensure that the city’s data policy is maintained for the 21st Century.
  • >We would love to work with the city to review the City’s open data tech manual and update it.
  • Explore alternative tools that federate with the existing open data portal and enable large data set sharing.

  • Explore open and safe data sharing protocols like bittorrent & pubsubhubbub.
  • Empower agencies to host more data and share data from within their walls.

Dataset Documentation

  • A majority of the city’s data is self explanatory for someone who is a subject matter expert. For the City’s data to be maximized, datasets must have better documentation.

Crash Data

  • “From Transportation Alternatives – We know the crash data is inaccurate and significantly under-reports crashes. For instance, earlier this month we pulled numbers for YTD fatalities getting 164 for this year compared to 189 last year. The City said it was actually 195 vs 209. We know the City is aware of the problem but hasn’t told the public that the site under reports crashes.”
  • Need: Timeline for fixing data quality.
    • Many crashes have intersections but aren’t geocoded.
    • Representation of deaths is inaccurate.
    • There is an overwhelming number of “unspecified” contributing factors. How can this be the case?
  • Need: Add injury types – in the event of an injury crash, add severity of injury.
  • Need: Add if Collision Investigation Squad was called in injury crashes.
  • Need: Add types of citations handed out to individuals.

NYCHA Data

  • For an organization that houses the City of Oakland, it is shocking that NYCHA only produces 15 datasets. There is very little transparency, and this needs to change.
  • It is appalling to read stories of how our public housing programs are physically broken. While we don’t feel that data will immediately improve the lives of those in public housing, we need the Mayor and the Council to lean on NYCHA and have them release their performance reports and data in machine readable formats. We should know how long it takes to fix elevators, doors, lights, and other fundamental quality of living issues.

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

Questions for the NYC open data team:

These questions were developed by the BetaNYC community and its NYC Open Data Fidelity Workgroup.

Responses can be submitted to BetaNYC via noel@beta.nyc or by joining and posting responses to the NYC Open Data Fidelity Workgroup. < http://bit.ly/nyc-od-wg >

Metrics:

  • What are the City’s open data success stories? Within the NYC tech ecosystem, who are the largest consumers of NYC’s open data? – Noel Hidalgo, BetaNYC
  • Both NYC.gov and the City’s Open Data portal are the primary digital engagement mechanisms of the City and are often cited in the City’s Digital Roadmap updates. Can traffic analytics of NYC websites also be released as Open Data? – Joel Natividad, Ontodia

Intergovernmental use of NYC open data:

  • What type of intergovernmental outreach, training, and education do you perform? – Noel Hidalgo, BetaNYC
  • Within government, how are agencies using the City’s open data portal? – Noel Hidalgo, BetaNYC
  • Do you have any plans to use Open Data not only as a transparency program, but as a way to improve data exchange within and outside the City? If opendata is used more operationally, it will automatically ensure that the data is more current – Joel Natividad, Ontodia

Data releases and standards:

  • Why can’t all address data in the NYC Open Data Portal be GeoCoded? -Ben Wellington, I Quant NY
  • Do you have a roadmap or any plans to adopt open data standards or schemas? – Noel Hidalgo, BetaNYC
  • What are your plans to update NYC’s open data technical standard manual? – Noel Hidalgo, BetaNYC
  • The City Record Online Law is yet another landmark in NYC’s Open Data leadership that will enable all kinds of innovations. Do you have any plans of mandating that all PDF-bound data resources published by the City follow the City Record lead? – Joel Natividad, Ontodia
  • With DEP and DataBridge, inter-agency data exchange is far more efficient. Could it be mandated that opendata export to the data portal be a native function of these systems? In that way, the data can be sanitized and updated in a timely manner more efficiently? – Joel Natividad, Ontodia
  • What are the reasonable limits of invoking Homeland Security as a rationale for the withholding of environmental datasets by DEP? – Liz Barry, Public Lab
  • Prior to the adoption of the Open Data Law, several city agencies had already taken the initiative to post their data sets online, and continue to do so today, such as City Planning’s “Bytes of the Big Apple” website, Dept of Finance’s property-related data sets, and Dept of Buildings permit data. These agencies often post data sets that are more current, updated more frequently, and include more extensive and understandable metadata about these data sets than what can be found via the Open Data Portal. What has DoITT done to foster and help sustain these individual agency efforts, and how will DoITT ensure that its work on the Open Data Portal will not undermine these agency efforts but instead help strengthen them? – Steven Romalewski
  • Will the City of New York pursue copyright claims against citizens that use Open Data records to create works similar to official ones? For example, the Department of Transportation is claiming ownership and copyright of the WalkNYC data and maps, even though they incorporate DoITT data available from the Open Data portal. – David McCreery