Through a Fellowship Associate’s Eyes: What PyData NYC Brings to Data Science in 2023

How it all began…

Last month I was awarded an incredible opportunity to attend my first conference at PyData NYC 2023 as a Diversity Scholar. PyData NYC 2023 was organized by the nonprofit charity NumFOCUS, an educational program that serves as a hub for a global community of data scientists, engineers, developers, and data analysis tool enthusiasts.

I was awarded my scholarship through the PyData NYC Diversity Scholarship program, which is dedicated to enhancing diversity and fostering awareness of opportunities in scientific computing, software engineering, and data analytics. This initiative strives to include underrepresented groups who might not otherwise have the means to attend such events. 

 NYC streets beyond the glass windows of the Microsoft Conference Center.
 NYC streets beyond the glass windows of the Microsoft Conference Center.

The Nuts and Bolts 

PyData NYC 2023 was hosted at the Microsoft Conference Center overlooking the bustling streets of Times Square. The sheer amount of knowledge these individuals held was evident to me in their presentations and discussions of applications of data analysis tools in an extensive variety of fields, including neuroscience, business, drug discovery, environmental science, and even the secondhand fashion market. This broadened my awareness of data science as a cross-section in different disciplines. I also learned about tools available to use and technical jargon that I had not known coming into the conference. 

The productive energy of the conference matched the buzzing working climate of New York City. Although we were given the conference schedule well in advance, it was difficult for me to choose which sessions to attend as multiple exciting ones were occurring at the same time. There were tutorials on tools such as Quarto, Matplotlib and Ibis and presentations of current research such as IBM’s use of generative AI to understand carbon sequestration. Everything seemed to spark my interest as I was unfamiliar with most topics as a beginner-level data user.

Keynote Speakers that Stuck with Me 

Martha Norrick, New York City’s Chief Analytics Officer, gave a warm welcome that was inclusive to all backgrounds and levels of data users in her opening remarks. She spoke vibrantly about the importance and rise of civic technology use in New York City that will be showcased and celebrated during Open Data Week this coming March and encouraged folks to attend. 

Soumith Chintala, the co-creator of PyTorch and an engineer at Meta with a focus on Artificial Intelligence (AI), took over in a more comedic style. Laughter roared across the room after his first remark. He used AI-generated memes to captivate us while delivering a hefty presentation of whether AI is useful in data science. He concluded in an agreeable answer that AI can be used to accelerate your workflow but it can not automate you. Soumith’s future plans include using AI to build a robot that can do household chores at the level of a five-year old. 

NVIDIA scientist Michelle Gill inspired both women in STEM and participants of non-traditional backgrounds with her transition from being a structural biologist to a machine learning scientist. She leads a team building BioNeMo, a framework for the development and use of deep learning models for drug discovery. J.J. Allaire, founder and CEO of Posit (formerly RStudio), ended the conference with an elaborate keynote of his new Jupyter-based project Quarto for scientific publishing and Posit’s recent and future work in the open-source PyData ecosystem.

Workshops Galore

Soumith Chintala’s presentation including a meme of a cat to portray eloquence bias in Large Language Models (LLMs). 
Soumith Chintala’s presentation including a meme of a cat to portray eloquence bias in Large Language Models (LLMs).  

I learned more and more about the applications of data science with each session. IBM introduced the concept of automating satellite data processing to quantify carbon sequestration for climate change while using sparse data. As a hydrology and climate-concentrated environmental student, I was inspired by IBM’s work that combines remote sensing and AI to explore environmental causes. I also had the pleasure of attending Mars Lee’s presentation which showcased a comic book she created to introduce documentation of NumPy, a library in the Python programming language. This opened up my eyes to using art, and design in collaboration with data science, as exemplified by her success. Other workshops included tutorials on how to use specific Python libraries such as dask and matplotlib.  

 Mars Lee’s presentation on her process of creating a comic on NumPy
 Mars Lee’s presentation on her process of creating a comic on NumPy

Who else was there? 

When I wasn’t attending workshops, I networked with folks in booths. I learned about the availability of annotation tools such as Prodigy that can help label text and images. As a BetaNYC Fellowship Associate, I help lead the Mapping for Equity on OpenStreetMap (OSM) project. Tools, such as Prodigy, can be used to annotate features present in images captured of public spaces. This is similar to the work of Rapid, Meta’s AI tool used in cooperation with Mapillary, that helps us import features onto the mapping platform of OSM. 

I also spoke with other attendees, many who flew in from other parts of the world, and the incredible committee that organized PyData NYC 2023. I felt enlightened learning how they interact with data through their work or education and I loved learning what brought them to this conference. The organizing committee, made up of volunteers, had a lot of good experiences to share regarding PyData as previous Scholars and attendees, which prompted them to continue attending as volunteers or speakers themselves. They were eager to advocate for maintaining connections with the PyData community.      

The End 

Data science stickers handed out at PyData NYC 2023
Data science stickers handed out at PyData NYC 2023

In retrospect, the conference has left an indelible mark on my professional journey. I took away not only a ton of data science stickers, but a wealth of knowledge, resources, and an expansive network. It inspired me to continue to research innovations in data science and its use in civic technology, environmental science, and art. I would like to discover data analysis tools presented at the conference and learn to use them to improve workflow and learn more about the world we build. I extend heartfelt gratitude for the chance to be part of this transformative experience.