This year's Love Data Week theme, “Where's the Data?”, invites researchers to explore data's complete lifecycle—from initial collection through long-term storage and preservation. Yale researchers can share and archive their research data through Yale Dataverse, a platform available to all Yale University faculty, staff, students, and affiliates. Hosted and managed by Yale Library, Yale Dataverse ensures that research produced at Yale remains accessible and discoverable to scholars worldwide.
10:00am - 12:00pm
In person - SHM L 111, Cushing/Whitney Medical Library, 333 Cedar Street
RSVP here.
In this session, participants will learn the benefits and basics of version control systems, including the key differences between git and GitHub. The workshop will cover remote repositories and their practical applications, then guide attendees through using git in the terminal to initialize repositories, add files, and commit changes. By the end of the session, participants will know how to publish their repositories to GitHub.com.
Come and learn about the web scraping tool from The Bright Initiative, which provides a centralized source for scraped public web data along with the ability for custom scraping - now including public queries on Chat GPT.
Ready to find your perfect dataset match? In this fast-paced session, you will be introduced to key biomedical and health sciences data resources where you can discover datasets for your research. We'll focus on getting you familiar with major repositories and Yale-affiliate resources, with tips for assessing whether a dataset might be right for your project. This is one of two Love Data Week Data classes covering research data resources. Each class focuses on different resources; attending both will give you a broader view of the research data landscape.
WEDNESDAY, February 11th
Tidy Data with OpenRefine
10:00 AM – 12:00 pm
In person - Bass L06-A, Bass L06-B
RSVP here.
Do you want to learn how to organize and prepare data for analysis and visualizations, but lack coding skills? Join us for a comprehensive, hands-on workshop on OpenRefine. Participants do not need any prior coding experience but should download and install OpenRefine on your personal computer before attending the workshop. You can download the software package from openrefine.org/download.
Ready to find your perfect dataset match? In this fast-paced session, you will be introduced to key biomedical and health sciences data resources where you can discover datasets for your research. We'll focus on getting you familiar with major repositories and Yale-affiliate resources, with tips for assessing whether a dataset might be right for your project. This is one of two Love Data Week Data classes covering research data resources. Each class focuses on different resources; attending both will give you a broader view of the research data landscape.
THURSDAY, February 12th
Getting Started with Data Analysis & Visualization: Introduction to Python (Python 1)
10:00am - 12:00pm In person - SHM L 111, Cushing/Whitney Medical Library, 333 Cedar Street
RSVP here.
Are you ready to take the first steps in using code to work with data? In this training, you’ll learn how to use Python, one of the most popular programming languages, to analyze and visualize data. This training welcomes first-time coders and Python beginners, so we’ll begin with Python programming fundamentals before moving on to the basics of data analysis and visualization later in the training. This training will be hands-on; come ready to code alongside the instructor. You will only need a google/gmail account as we will use the https://colab.research.google.com/ environment.
Yale Dataverse Demo: How to Use Yale's Data Repository
3:00 PM – 4:00 PM
In person - Seminar Room S57 (Lower Level), Science Hill
RSVP here.
Do you need a data repository for your research data and documentation in the final stages of your research project? Learn the basics of how to use Yale Dataverse, the university's open-source repository for archiving, sharing, and accessing research data. Yale Dataverse is a data repository open to specifically for Yale-affiliated researchers to support data sharing and access as related to funder requirements, federal mandates, publishing guidelines, and/or a scholarly commitment to open science and replication/reproducibility.
Are you conducting research while at Yale? Where does all your research data need to go, and how do you decide? This is the second of three workshops in the Research Data@YSM series, introducing how to set up your research data to maximize your successes and minimize your risks. Situating your research data in emerging best practices can help you effectively utilize research data resources both within and beyond Yale School of Medicine, turning project goals into reproducible results.
Data at Yale: Highlights
IPUMS
DISSC now administers access to the Restricted Full Count Data from IPUMS USA, identified household or individual data 1790-1950. Researchers need to submit a proposal directly to IPUMS before access can be granted. Contact dissc@yale.edu if you have any specific questions.
Numerator
The complete 2025 Numerator consumer purchase dataset has been released. From this release forward, all deliveries are now in Parquet format for improved performance and include six years of transaction history. New transformed versions of the data are also available to streamline analysis. Our documentation site has been updated with full details on these changes and links to additional Numerator resources.
External Data: Highlights
Roper Public Opinion Data
The Roper Center is a recognized leader in the collection, preservation, and dissemination of public opinion data. The Center holds a unique and extensive collection of public opinion data from the U.S. and around the world, including over 820,000 questions and 25,000 datasets dating back to the 1930s. Among the Center's assets are datasets and questions from the Time-Sharing Experiments in the Social Sciences (TESS). Other notable collections include pre-election pools, exit polls, U.S. state-level polls, social issues, finances and the economy, education, health, international affairs, social movements and change, and historical events. Data is provided by major survey organizations, including commercial and media survey organizations, and other academic, nonprofit (e.g., Pew), and private industry pollsters.
Data.gov
The Data.gov team recently announced a soft launch of an entirely revamped catalog interface at catalog-beta.data.gov which is powered by a completely new metadata harvester application to collect, validate, and display information from hundreds of federal agency sources.