Testimony on Open Data Examination and Verification Report 2017

Good Afternoon, Chairman Vacca and Members of the Technology Committee, I am Noel Hidalgo, Executive Director of BetaNYC and a member of the NYC Transparency Working Group.

I want to start by thanking Chairman Vacca, the members of this Committee, and the Council for your continued commitment to oversight hearings for the Open Data Law. Your ongoing energetic support for Open Data has made New York City a global leader in open data and is hugely encouraging to open data advocates inside and outside of government.

BetaNYC wants to thank this Council and this Committee for being amazing partners. We hope for the next four years, we can find collaborators who are as willing this current committee.

Second, I want to thank the City’s open data team for continuing to slog on with a notable loss of staff. It is a shame that the City’s open data team isn’t up to its full staff complement.

General Observations:

It is clear that this report is shaped by a lack of resources. Several of last year’s recommendations are not mentioned in this report and don’t seem to carry over. Specifically, 2016’s 1i, 1ii, 1iiI, and 3i, 3ii, and 3iii.

Agencies should make their technical ecosystems more accommodating to Open Data by:
Using automations, rather than manual uploads, to update datasets currently on the Open Data Portal.
Writing Open Data requirements into procurements of new data systems and analytics technologies.

iii. Allocating more resources to Open Data personnel , especially Open Data Coordinators .

MODA should improve the Examination and Verification plan for future years by:
Consulting with the Department of Investigation on potential improvements .
Creating clear guidelines and definitions of “data” and “dataset. ”

iii. Creating clear guidelines on determining whether a dataset is “public” or “private .”

With these recommendations, what happened?

Last year’s report introduced a results snapshot. At high level, we could see agencies and their released data sets, soon to be released datasets, the status of automation, their update frequency, data related to the MMR, and public requests against these agencies.

As these reports moves forward, we would love for MODA to standardize a summary data table. Also, we recommend that the Council ask for a detailed snapshot to be included in every annual report.

BetaNYC and some public members were fortunate to get copies of accompanying spreadsheets. Frustratingly, these spreadsheets burry information that should within the report. Both the DEP and FDNY will archive a significant amount of existing data sets yet the 2017 report says that the agency identified a number of datasets that don’t constitute a “public dataset” and they will be removed to improve the data portal’s hygiene.

It is important for the public summary have more detail. The public shouldn’t have to pour through spreadsheets to understand this audit.

Specifically to this audit, several questions that have come up.

Why is the FDNY’s D. Reports is blank of any legally mandated reports? How was that approved?
If these reports were put in-place during the summer, why will it take DEP over a year to release two data sets? Shouldn’t these compliance reports outline a path to compliance with some sort of compliance along the way?
What is MODA’s “improved” examination and verification process?
How has it improved open data coordinators work?
What is being done with in these three agencies to move data and open data from insight to action?

Lastly, neither 2016 nor 2017 are easily accessible via MODA’s website nor the open data portal. All reports and accompanying supporting documents should be accessible via MODA’s website and the open data portal.

Recommendations on observations:

In 2016, in partnership with the NYC Parks Department, we outlined a data release workflow that benefits users and producers. Through this process we were able to make the 2015 TreesCount! data set one of the most accessible datasets on the city’s data portal. This process outlined three phases.

Phase 1: Research & Discovery

Establish target audiences
Draft data standards that appeal to broadest possible audience
Draft data dictionary

Phase 2: User Testing

Release sample dataset for feedback
Perform user testing and get feedback on data dictionary & dataset
Develop a framework and/or guide as how to explore the dataset’s important values
Allow time for revisions

Phase 3: Initial Deployment

Upload data set and data dictionary to Portal with an event or video explaining the key features of the data.
Allow time for user testing on Portal and gather open feedback.
Share insights and communicate them out to the public. Continue to gather public feedback.

This “user centered data release” helped the Parks Department navigate three things: (1) collectively, we were able to understand data users and ensure data was accessible to them; (2) we were able to identify data quality problems; (3) we demonstrated value in and out of the agency. It is great to that this examination and verification report validates these ideas. Sadly, the report doesn’t speak to this research.

When agencies are going to release data, there must be some sort of public and private stakeholder convening. For this to happen, the open data team and agencies need resources for engagement.

For the past two years, we have worked with the Manhattan Borough President’s Office and along side Manhattan Community Boards. The open data team and MODA have supported our efforts. We can attest to this report’s recommendations on better citywide compliance.

Each agency is unique.

Executive buy-in is crucial.

Open data is an opportunity to teach users how City operations work.

Open data drives data governance and analysis.

People who use open data must be well-networked and trust each other.

Organizational knowledge retention is a challenge.

It is great that this report outlines that open data drives governance and analysis, but there needs to be considerable support in moving data from awareness to action.

This open data examination and verification report is a testimony to slow culture change. Five years into the open data law, we still have agencies that still don’t know how to review their datasets nor are prioritizing data sharing.

It is crazy to see the DEP say they will release data sets in over a year with no intermediary datasets. Additionally, it is crazy to read that the FDNY received the following public comment request from the Department of Health.

“Hello! I work for the NYC Health Department (DOHMH) and we are interested in using the EMS Dispatch Data that is provided on NYC Open Data. However, we would really like more detail to the data, if possible. Who should we contact to discuss this? Thank you!”

According to the accompanying FDNY spreadsheet, this response was denied. Did FDNY use this opportunity to engage with the DOHMH? What happened to such a promising opportunity for collaboration? Did MODA come in and help data sharing?

If we ever expect these practices to change within the next ten years, the Open Data team and its partners need to be taken seriously. Seriousness comes with resources and staff. From my understanding, MODA and the open data team need to be staffed up to its full complement and get a budget for internal and external engagement. We need this administration to hire a Chief Analytics Officer and to ensure that DOITT hires a replacement for Ralf Carvalho.

We love working with the open data team. We will donate as much time and energy as we can. With support from the Alfred P. Sloan Foundation, Fund for the City of New York, and Data & Society Research Institute, BetaNYC has been able to scrape up funds to support mutually beneficial programing. We are always looking to combine MODA’s objectives into our work and are very lucky to have them as a trusted partner. Our funding isn’t permanent, yet we see a permanent need for our collective work.

To be perfectly frank, the Mayor’s office needs to dedicate financial resources to MODA and its partners. With all the things MODA is called on to do, it is amazing they were able to produce this report.

Now is the time for Council to work with MODA, DOITT, and their partners to properly resource the city’s open data program. There needs to be money for internal education, internal/external engagement, and concrete roadmaps that link technology improvements to better data systems. MODA and the open data program needs a budget for programs that support internal change.

Thank you, Noel Hidalgo

Observations from the three reporting spreadsheets:

DEP

Number of existing DEP data sets – 48
Number of datasets that say they have a data dictionary – 8
Note 1 – As of 5 Dec, 22 listed datasets were not accessible. The reporting spreadsheet should indicate if these are to be archived. There is no listing of where this data will live.
Note 2 – Two datasets are to be released on 31 Dec 2018. If this report was published in the summer of 2017, why isn’t a preliminary release now?

FDNY

Number of existing FDNY data sets – 18
Number of FDNY data sets that will be archived – 11
Number remaining – 7
Number of remaining data sets that have data dictionaries – 3
Time to get data dictionaries – unknown
Note 1 – FDNY indicates which datasets will be archived.
Note 2 – FDNY’s D. Reports is blank of any legally mandated reports. This is odd.

DOB

Number of existing DOB data sets on the data portal – 20
Number of datasets that say they have a data dictionary – 20
Note 1 – Number of times “We do not have the resources to publish this to the Open Data portal at this time.“ – 5
Note 2 – This report uses terms that aren’t explained. For example, what is an ibot?

Testimony – NYC Open Data Examination and Verification Report 2018.PDF