BetaNYC’s Testimony to New York City Council on Open Data and Int. No. 1137 of 2018

To: NY City Council – Committee on Technology

From: Lindsay Poirier, Ph.D., Lab Manager

Re: 2018 Open Data Hearing and Testimony in support of Int. No. 1137-2018

Thursday, 18 October 2018

Chair Koo and members of the New York City Council Technology Committee,

My name is Lindsay Poirier, and I am the Lab Manager at BetaNYC – a civic technology organization with 4,000+ members that has spent several years dedicated to conducting research, developing curriculum, and producing tools to support the equity and accessibility of the City’s Open Data resources.  We work in partnership with the Manhattan Borough President Gale A Brewer, the CUNY service corps, and FCNY to improve digital and data literacy and train a new generation of civic technologists.[1] Over the past several years, we have collaborated closely with MODA and with DOITT’s Open Data Team, and we are grateful for the opportunity to work closely with Adrienne Schmoeker, MODA’s Director of Civic Engagement and Strategy, to ensure that our shared goals of “open data for all” are achieved. We are in preliminary talks with MODA to support each other’s efforts around NYC School of Data and Open Data Week 2019, celebrating the 6th anniversary of the Open Data Law. We look forward to working with the City’s new Chief Analytics Officer to continue to pursuing this aim.

For the past three years, BetaNYC has been conducting research into the information infrastructure supporting New York City community boards. More specifically, we have sought to understand how community boards are currently leveraging data resources as evidence to support their resolutions, and for what use cases they would like to have better access to open data resources. Through this research, we have been able to design tools and curriculum[2] that configure the City’s open data resources into dashboards, maps, and visualizations that are much more accessible to the public than raw data sources. In October, we published two reports that outline our research into the information infrastructure supporting community boards, summarize our findings, and offer recommendations to community boards, civic technologists, city agency representatives, and elected officials.[3] We have included an executive summary of these reports with our written testimony, and both can be downloaded in full from our website. Our testimony today is largely informed by this research.

Int. No. 1137

First and foremost, we completely support Int. No. 1137. We wish for MODA to be an independent agency but are excited to see its powers written into the City Charter. Explicitly, we are excited to see MODA be the steward of an open source analytics library that can increase visibility into how agencies develop and use algorithms. If properly implemented, this could help advance other initiatives we support, such as open algorithms.

Feedback on the 2018 Open Data for All Report

The report marks the hard work and dedication of the City’s Open Data Team, and demonstrates that they are working towards making open data more useful and accessible to the public. Most notably:

  1. The Team has published 629 new datasets, bringing the total number of datasets on the Portal to 2,154. We believe they should have the proper resources to manage these datasets as numbers continue to grow.
  2. The Team has engaged 1800+ New Yorkers at events during Open Data Week 2018 and hosted 3 sold out events in 2018. This demonstrates their efforts to engage the public in topics related to open data and advance data literacy for all.
  3. The Team is working to identify, research, and highlight real-world use cases for open data and to design projects around these use cases. This demonstrates their commitment user-centered design.

BetaNYC believes that the implementation of the Open Data Law could be strengthened in the following areas:

  1. While 80% of datasets eligible for the geospatial standard have been geocoded, some critical datasets are not in compliance with Local Law 108 of 2015. BetaNYC understands that the City’s Open Data team is working under incredible constraints. The Team is currently managing over 2,000 datasets (each requiring regular quality assurance and documentation, and most requiring georeferencing) all while both MODA and DOITT have been operating without key leadership figures for several months to years. For many on this Team, managing the City’s data assets is just one component of their job description. Budgetary resources should be allocated to ensure that the Open Data Team can prioritize performing quality assurance and getting the existing data assets in compliance with more recent addendums to the Open Data Law.
  2. While 89% of datasets have data dictionaries, many are only sparsely documented – making it not only difficult for the public to interpret what different categories mean, but also opening up the possibility that the public will interpret the data incorrectly and draw inappropriate conclusions. BetaNYC is in support of the Open Data Team’s Metadata for All initiative, which has advocated for incorporating thick narrative description of the contents of each dataset (published on the Open Data Portal) to its documentation. We believe this effort will require considerable time and resources, including meeting with the data producers for each dataset at each agency to document key terms and concepts and translating this subject matter expertise into terms the public can understand. The initiative should be funded adequately.
  3. Community boards have described wanting access to certain information that is currently not on the Open Data Portal – either because no agency is collecting the data (e.g. vacant storefronts), it is not in an accessible format (e.g. rent-stabilized units), or it is not yet available on the Portal. BetaNYC has submitted requests to the Open Data Team for a few of these datasets; in one case, we learned that the data would not be published for a year and a half, and in another case, we learned that the dataset had not yet been scheduled for release. We hope to start productive conversations on how we can ensure that data that currently exists (and the community has deemed to be a priority) can be published in a timely manner.
  4. While agencies have committed to 230+ forms of civic engagement around Open Data, we hope to see resources allocated to allow for more meaningful forms of engagement. Currently, 5 of 70 agencies have committed to hosting focus groups with users of the data, 4 of 70 agencies have committed to producing tools and sharing them to the projects library, and 1 of 70 agencies has committed to producing curriculum on their data resources. User engagement is essential to ensure that the data is structured to meet diverse needs and that jargon is properly explained in data documentation. However, we also recognize that Open Data Coordinators are strapped for time and resources. To make broader civic engagement possible, we believe that every agency should have an Open Data Team, with diverse technical/subject-matter expertise and representing diverse offices within the agency, that can collaborate to support data quality assurance, documentation, public engagement, and tool-building. Funding should be allocated to support this.
  5. There should be more opportunities for collaboration between Open Data Coordinators at different agencies. Often, the most important data insights do not emerge from analyzing and visualizing one dataset, produced by one agency, but instead by integrating data from multiple datasets. However, because the City’s data resources are often produced in silos, it can be extremely difficult to configure multiple datasets into a single view. Each city agency has their own unique way of identifying businesses, restaurants, buildings, and lots, and when their datasets characterize these features, they typically only use their own standards of identification to reference them. For example, BetaNYC has tried to design maps of potentially vacant storefronts throughout the City by integrating several datasets (from DCP, DOHMH, DCA, and the State Division of Licensing Services) reporting the location of commercial units and active business licenses. However, because businesses are referenced with a different set of identifiers in each dataset reporting licenses, this has been close to impossible. Coordinating efforts across agencies could highlight opportunities to link information across datasets.
  6. Local Law 251 of 2017 required not only that DOITT review the technical standards manual every two years, but also that they establish a method through which the public can comment on it. There are many areas where technical standards can be improved. Agencies often geocode addresses differently, use different terms or naming conventions to refer to the same concept, or use different stylistic conventions for filling in standard data values (e.g. in the 311 Service Request dataset, the ‘Community Board’ column is formatted: 01 MANHATTAN whereas in the DOB’s building permit datasets, the community board column is formatted: 101). While agencies understand these nuances, it can be very confusing for users, who may draw their own conclusions for why words are classified differently or values are input differently in different datasets. In promoting interagency coordination around data quality and release efforts, DOITT could more readily identify mismatched schemas and stylistic conventions in the datasets and use this feedback to strengthen the technical standards manual in ways that make it possible to link data across datasets, while also supporting the public in developing a civic vocabulary. We would like to work with DOITT to host events and solicit broad public feedback on the technical standards manual.
  7. Future releases of the Open Data for All report should include a headcount of MODA, positions filled, positions available, and the annual budget.

Thank you for your time,

Lindsay Poirier, Ph.D., Lab Manager, BetaNYC

[1] In September 2018, we initiated our fifth class of Civic Innovation Fellows. In 2018, we are training 12 undergraduate fellows, representing 6 different CUNY schools in open data and civic technology. In total, we have trained 52 fellows. Our alumni are now earning positions at community district offices, major technology consulting firms, the U.S. State Department, and academic institutions.

[2] We have developed a 311 dashboard called BoardStat, a tool to support liquor-license application processing called SLAM, a tool to monitor rent-stabilization and tenant harassment called Tenants Map, and a tool to monitor after-hours construction permits called AHV Dashboard.

[3] Goldman, Emily, Noel Hidalgo, and Lindsay Poirier. 2018. “BetaNYC’s Civic Innovation Fellows Community Board Technology Needs Report 2018.” BetaNYC. https://beta.nyc/publications/betanycs-civic-innovation-fellows-community-board-technology-needs-report-2018/; (Available for download at: https://bit.ly/2RUcc5t); Poirier, Lindsay, Noel Hidalgo, and Emily Goldman. 2018. “Data Design Challenges and Opportunities for NYC Community Boards.” BetaNYC. https://beta.nyc/publications/data-design-challenges-and-opportunities-for-nyc-community-boards/ Available for download at: https://bit.ly/2RPMsHs)