Data Services Classes and Customized Trainings

Classes Available on Request or Customized Training for your Group

To request a class, please contact HSLS Data Services.

In this workshop, you will learn how to create a project using concept sets, datasets, and workbooks, and discover how to use code snippets to accomplish more advanced tasks. You'll explore the full potential of the All of Us Researcher Workbench, gain practical skills in creating a project from start to finish, and learn how to use code snippets to enhance your research capabilities.

In this workshop, we will introduce the notion of concept sets and their role in observational research with Electronic Health Record (EHR) data. Whether you're a new user of the All of Us Researcher Workbench or an experienced one, this workshop will provide valuable insights into leveraging the platform to meet your research needs.

 

Agenda:

If you’ve ever uploaded data or analysis code to a public repository, you may have been prompted to choose a license from dozens of options. But what’s the difference between the GNU license vs. MIT license? What does Creative Commons actually do? And how do licenses interact with copyright and formal Data Use Agreements? This session will explain the basics of licensing and help participants choose an appropriate license for their openly-accessible research products.

This workshop aims to empower researchers with the essential skills needed to utilize bioinformatics tools operated through the Command Line Interface. It will cover Unix/Linux shell navigation, FTP transfers, file, and directory management, text editor functions, shell scripting, and data analysis with bioinformatics software.

This workshop will teach how to (1) access a remote Unix machine (Center for Research Computing- HTC Cluster) (2) use basic Unix commands for managing files and directories, and (3) transfer files using FTP client software (4) access HTC Cluster-installed open-source bioinformatics software packages.

What is a Data Management Plan? This session will answer that question, as well as describe the steps to creating a DMP, tools that can help with DMP development, and post-award management issues. University of Pittsburgh-specific guidelines and support resources will also be shared.

After this session you will be able to answer: What are some common challenges with formatting data in spreadsheets and how can we avoid them? How can we carry out basic quality assurance in spreadsheets? How can we export data from spreadsheets in a way that is useful for downstream applications?

Many funders, publishers, and institutions require researchers to make their research data public, but practical challenges can act as a barrier to sharing data, especially in the health sciences. This hands-on workshop will guide participants through the data sharing process, from initial study design to data deposit. Exercises will prompt participants to think through issues of data documentation, reuse value, and promotion of their own research projects.

This hands-on workshop will cover the creation of data visualizations using the ggplot2 package in R. Upon completion, participants will be able to create various graphical summaries of data, describe the “grammar” of ggplot2 functions, and build custom visualizations with ggplot2.
This is a flipped class and the third part of a series: Introduction to R; Data Wrangling in R, and Data Visualization in R. Upon registration, you will receive links to workshop materials (PowerPoint slides, lecture videos, and practice exercises) that you can view on your schedule. During the in-person hands-on session, you will learn how to solve the exercise problems. 

This is a flipped class covering the more advanced topics in R programming for data analysis and the second part of a three-part series: Introduction to R; Data Wrangling in R, and Data Visualization in R. Upon registration, you will receive links to workshop materials (PowerPoint slides, lecture videos, and practice exercises) that you can view on your own schedule. During the hands-on in-person session, you will learn how to solve the exercise problems. 

NIH expects full transparency in reporting experimental details so that others may reproduce and extend the findings. This session will discuss ways to report experimental details including: open dissemination of methodology protocols, pre-registration of study protocols, and publication of registered reports.

Microsoft Excel is a commonly used program to record and store datasets with headings, rows, and columns. In this class, we will explore data with sorting and filtering functions, and transform data into summary tables. You will work through data examples to create pivot tables, apply conditional formatting, and prepare your figures for use in other programs.

OpenRefine is a powerful, free, open source, tool for working with messy tabular data. It runs offline in a web browser and allows for reproducibility in data cleaning. This is a hands-on in-person workshop.

The FAIR Data Principles are a set of guiding principles to make data Findable, Accessible, Interoperable and Reusable.  In this session, we will review these principles, discuss the challenges of data sharing, and offer practical tips for how sharing can be integrated into a researchers workflow.  

“What's in a name?” When you create a new file, do you give much thought to the name you save it as? This class focuses on best practices for naming files so that they are easily found, understood, and sharable in the future.

Most of us know that the US Census Bureau conducts the Decennial Census, but did you know they are also responsible for conducting censuses of local governments and US economics as well as surveys of small areas, business owners, income, and populations? Much of this data is even available down to the neighborhood level. It is an invaluable resource for learning about a community, especially as you plan the development of an intervention or grant proposal. Learn how to navigate the Census Bureau effectively to gather key data as a first step to learning about a community.

In this hands-on workshop, learn how to manage your work with the version control system Git. Git helps keep your files safe from accidental deletion, tracks who made what change when, and lets multiple people work on the same project without overwriting each other's work. We'll cover using Git from the Unix shell and through Github online. No previous experience with the command line is necessary, although some basic knowledge is recommended.

Are you curious about version control and project management with Github, but haven't had an opportunity to test out all of its features? Attend this hands-on workshop to work through solo and group exercises that will let you explore tasks like cloning a collaborator's project, making and investigating file changes, and resolving conflicts between versions. Please register for a free github.com account before the beginning of class.
Join us for a one-hour class on the All of Us Researchers Workbench - a cloud-based platform that provides researchers with access to data generated by the All of Us Research Program (AoURP). Led by the National Institutes of Health, AoURP is a longitudinal cohort study to advance precision medicine and improve human health by partnering with one million or more diverse participants across the United States.
This is a flipped class; links to PowerPoint slides, lecture videos, and practice exercises that you can view on your schedule are available upon registration. During the session, you will learn how to solve the exercise problems.
This class will cover the basics of R programming for data analysis and graphics. This is a flipped class; links to PowerPoint slides, lecture videos, and practice exercises that you can view on your schedule are available upon registration. During the class, you will learn how to solve the exercise problems.
Learn the fundamentals of keeping your data secure and organized through an introduction to the core areas of data management: planning for data management, storage and organization, file documentation, data preservation, and data sharing.

Need to find a dataset to act as a control for your study? Or do you want to reuse open access data? This class will offer tips for locating and citing data and include hands-on exercises to explore directories of data repositories and data journals.

The US Centers for Disease Control and Prevention is the leader of public health in the US. It administers multiple surveys, gathers vital statistics, and tracks infectious diseases, and much more, all the while making this data readily available to the public. Like all Federal agencies, though, it is a complex organization which is reflected in their Web site. During this class, we will explore key data initiatives of the CDC, focusing on publicly available data sites that allow you to find data through their interfaces.

OSF (Open Science Framework) is a free platform for hosting and collaborating on datasets, documentation, and research results. In this workshop, we will tour OSF, discuss the principles of open science in our fields, and work through a sample project together. Participants will learn: How to discover and download data through OSF How to manage collaborators and permissions for a research project Best practices for organizing OSF projects, including using version control How to use OSF to make your research more FAIR (Findable, Accessible, Interoperable, and Reuseable) Note: OSF also has a sub-site, OSF Registries, for pre-registering studies and systematic reviews. This class will not discuss preregistration in detail, but the underlying website architecture is the same.
The NIH Policy for Data Management and Sharing went into effect on January 25, 2023 and requires NIH-funded researchers to prospectively submit a plan outlining how scientific data from their research will be managed and shared. The policy includes an expectation that researchers will maximize their data sharing within ethical, legal, or technical constraints, and explicitly encourages researchers to incorporate data sharing via deposit into a public repository into their standard research process. After taking this class, learners should be able to: -Write a basic README file, user manual, or data dictionary to help people understand their data -Create documentation templates to streamline the data deposit process -Understand license and use restrictions in repositories -Feel confident in taking the next steps towards submitting their Data Management and Sharing Plan, or complying with their already-approved Plan
The NIH Policy for Data Management and Sharing (NIH DMSP) went into effect on January 25, 2023 and requires NIH-funded researchers to prospectively submit a plan outlining how scientific data from their research will be managed and shared. This session will cover the plan’s elements and allowable costs as well as tools to help with your own plan creation.
The NIH Policy for Data Management and Sharing went into effect on January 25, 2023 and requires NIH-funded researchers to prospectively submit a plan outlining how scientific data from their research will be managed and shared. The policy includes an expectation that researchers will maximize their data sharing within ethical, legal, or technical constraints, and explicitly encourages researchers to incorporate data sharing via deposit into a public repository into their standard research process. After taking this class, learners should be able to: - Define a "repository," and articulate the differences between discipline-specific and generalist repositories - Select at least one repository that could be a good match for their data - Identify any potential barriers to openly sharing their data, such as protected health information (PHI) that cannot be deidentified and stay meaningful - Feel confident in taking the next steps towards submitting their Data Management and Sharing Plan
This workshop will cover the more advanced topics in R programming for data analysis and graphics. Upon completion participants will be able to import and export data, merge datasets, transform data, and create basic summaries of data.
Have you ever clicked “I agree” without reading an app’s terms and conditions? Given a browser extension permission to read and change data on all sites you visit? The aggregation of data collected by apps and websites can be used by advertisers, law enforcement, and propagators of misinformation to predict and shape human behavior. In this hour-long class, you will learn who is collecting your data and what they may be doing with it, and you will learn about options to increase your privacy and control over your own data.

There are thousands of federal, state, and local government sites that link the public to their data. Like much of the internet, it is easy to get lost trying to find data useful to your research unless you know where to go. This class is designed to introduce participants to commonly used measures of social justice through publicly available data sites. We will begin by exploring data sites that focus on social justice issues, such as income, education, pollution, housing, and healthy/risky behaviors.

Need to find a dataset to act as a control for your study? Or do you want to reuse open access data? This workshop offers tips for locating and citing data, and includes hands-on exercises to explore directories of data repositories and data journals.

NIH's new Policy for Data Management and Sharing (DMS Policy), which goes into effect January 25, 2023, will require NIH-funded researchers to prospectively submit a plan outlining how scientific data from their research will be managed and shared. While broad sharing may not feasible for all datasets, steps can still be taken to share information about the dataset including a description and instructions for restricted access.

Do you want to track and organize your projects more efficiently, especially in a remote or distributed environment? Are you writing code or manuscripts with others and need to know who did what, when? In this class, learn the basics of version control and how it helps keep your work safe and reliable. Then we'll dive into Github to see how it tracks the changes you or your collaborators make to uploaded files, and how that can help make your research more reproducible.

Discussion of health literacy and the fight against health misinformation often centers around fact-checking or debunking written materials. However, identifying misleading visualizations and imagery is also a vital skill for navigating the current health information landscape: the same data may be presented in many different ways to convey drastically different impressions. This interactive session will increase your confidence in analyzing visual information and empower you to pass that knowledge along to your communities. Deceptive imagery types covered will include graphs and charts, manipulated images in scientific publications, and AI-generated imagery.
Data management and sharing plans (DMSPs) are short documents submitted with a grant proposal that outline how a research team will organize, store, preserve, and share their research data during and after data collection. If you are applying to the NIH for funding, you may have encountered DMSPs for the first time as a requirement in their new NIH Policy for Data Management and Sharing, which requires all researchers to submit a plan with their proposal as of January 2023. Come to this workshop to learn how DMP Tool, a free online resource, can help you write your DMSP through interactive templates and customized guidance.
Have you ever seen a README for a piece of software? It's a simple text document that tells you who made a program, what it does, and how to run it. Learn how to write a great README for your code, data, or even file organization system.

Did you know that for each minute of planning at the beginning of a project, you will save yourself roughly 10 minutes of headache later? This session will provide practical tips for organizing, naming, documenting, storing and preserving your data.