Data Services Classes and Customized Trainings

Classes Available on Request or Customized Training for your Group

To request a class, please contact HSLS Data Services.

If you’ve ever uploaded data or analysis code to a public repository, you may have been prompted to choose a license from dozens of options. But what’s the difference between the GNU license vs. MIT license? What does Creative Commons actually do? And how do licenses interact with copyright and formal Data Use Agreements? This session will explain the basics of licensing and help participants choose an appropriate license for their openly-accessible research products.

This workshop will teach how to (1) access a remote Unix machine (Center for Research Computing- HTC Cluster) (2) use basic Unix commands for managing files and directories, and (3) transfer files using FTP client software (4) access HTC Cluster-installed open-source bioinformatics software packages.

What is a Data Management Plan? This session will answer that question, as well as describe the steps to creating a DMP, tools that can help with DMP development, and post-award management issues. University of Pittsburgh-specific guidelines and support resources will also be shared.

After this session you will be able to answer: What are some common challenges with formatting data in spreadsheets and how can we avoid them? How can we carry out basic quality assurance in spreadsheets? How can we export data from spreadsheets in a way that is useful for downstream applications?

Many funders, publishers, and institutions require researchers to make their research data public, but practical challenges can act as a barrier to sharing data, especially in the health sciences. This hands-on workshop will guide participants through the data sharing process, from initial study design to data deposit. Exercises will prompt participants to think through issues of data documentation, reuse value, and promotion of their own research projects.

This hands-on workshop will cover the creation of data visualizations using the ggplot2 package in R. Upon completion, participants will be able to create various graphical summaries of data, describe the “grammar” of ggplot2 functions, and build custom visualizations with ggplot2.
This is a flipped class and the third part of a series: Introduction to R; Data Wrangling in R, and Data Visualization in R. Upon registration, you will receive links to workshop materials (PowerPoint slides, lecture videos, and practice exercises) that you can view on your schedule. During the in-person hands-on session, you will learn how to solve the exercise problems. 

This is a flipped class covering the more advanced topics in R programming for data analysis and the second part of a three-part series: Introduction to R; Data Wrangling in R, and Data Visualization in R. Upon registration, you will receive links to workshop materials (PowerPoint slides, lecture videos, and practice exercises) that you can view on your own schedule. During the hands-on in-person session, you will learn how to solve the exercise problems. 

One of the biggest strengths of the All of Us dataset is data from surveys. This workshop will provide details about the survey data present in All of Us and show how to use that data within a project, including how to link the data to concept sets and EHR data datasets. 

NIH expects full transparency in reporting experimental details so that others may reproduce and extend the findings. This session will discuss ways to report experimental details including: open dissemination of methodology protocols, pre-registration of study protocols, and publication of registered reports.

Microsoft Excel is a commonly used program to record and store datasets with headings, rows, and columns. In this class, we will explore data with sorting and filtering functions, and transform data into summary tables. You will work through data examples to create pivot tables, apply conditional formatting, and prepare your figures for use in other programs.

OpenRefine (formerly Google Refine) is a powerful, free, open source, tool for working with messy tabular data. It runs offline in a web browser and allows for reproducibility in data cleaning. This hand-on workshop will walk participants through how to create a new project, explore the data through sorting, filtering, and faceting functions, complete basic data cleaning such as splitting or combining cells and clustering to find and fix inconsistent data entries, and creating JSON scripts.

The FAIR Data Principles are a set of guiding principles to make data Findable, Accessible, Interoperable and Reusable.  In this session, we will review these principles, discuss the challenges of data sharing, and offer practical tips for how sharing can be integrated into a researchers workflow.  

“What's in a name?” When you create a new file, do you give much thought to the name you save it as? This class focuses on best practices for naming files so that they are easily found, understood, and sharable in the future.

Most of us know that the US Census Bureau conducts the Decennial Census, but did you know they are also responsible for conducting censuses of local governments and US economics as well as surveys of small areas, business owners, income, and populations? Much of this data is even available down to the neighborhood level. It is an invaluable resource for learning about a community, especially as you plan the development of an intervention or grant proposal. Learn how to navigate the Census Bureau effectively to gather key data as a first step to learning about a community.

In this hands-on workshop, learn how to manage your work with the version control system Git. Git helps keep your files safe from accidental deletion, tracks who made what change when, and lets multiple people work on the same project without overwriting each other's work. We'll cover using Git from the Unix shell and through Github online. No previous experience with the command line is necessary, although some basic knowledge is recommended.

Are you curious about version control and project management with Github, but haven't had an opportunity to test out all of its features? Attend this hands-on workshop to work through solo and group exercises that will let you explore tasks like cloning a collaborator's project, making and investigating file changes, and resolving conflicts between versions. Please register for a free github.com account before the beginning of class.

The All of Us Research Program (AoURP), led by the National Institutes of Health, is a longitudinal cohort study aimed at advancing precision medicine and improving human health

This is a flipped class; links to PowerPoint slides, lecture videos, and practice exercises that you can view on your schedule are available upon registration. During the session, you will learn how to solve the exercise problems.
This class will cover the basics of R programming for data analysis and graphics. This is a flipped class; links to PowerPoint slides, lecture videos, and practice exercises that you can view on your schedule are available upon registration. During the class, you will learn how to solve the exercise problems.
Learn the fundamentals of keeping your data secure and organized through brief introductions to the core areas of data management: file storage and organization, file documentation, data preservation, and data publication and/or data sharing.

Need to find a dataset to act as a control for your study? Or do you want to reuse open access data? This class will offer tips for locating and citing data and include hands-on exercises to explore directories of data repositories and data journals.

The US Centers for Disease Control and Prevention is the leader of public health in the US. It administers multiple surveys, gathers vital statistics, and tracks infectious diseases, and much more, all the while making this data readily available to the public. Like all Federal agencies, though, it is a complex organization which is reflected in their Web site. During this class, we will explore key data initiatives of the CDC, focusing on publicly available data sites that allow you to find data through their interfaces.

OSF (Open Science Framework) is a free platform for hosting and collaborating on datasets, documentation, and research results. In this workshop, we will tour OSF, discuss the principles of open science in our fields, and work through a sample project together. Participants will learn: How to discover and download data through OSF How to manage collaborators and permissions for a research project Best practices for organizing OSF projects, including using version control How to use OSF to make your research more FAIR (Findable, Accessible, Interoperable, and Reuseable) Note: OSF also has a sub-site, OSF Registries, for pre-registering studies and systematic reviews. This class will not discuss preregistration in detail, but the underlying website architecture is the same.
This is one of three sessions on NIH's new Policy for Data Management and Sharing (DMS Policy), which goes into effect January 25, 2023. This Policy will require NIH-funded researchers to prospectively submit a plan outlining how scientific data from their research will be managed and shared. The Policy includes an expectation that researchers will maximize their data sharing within ethical, legal, or technical constraints, and explicitly encourages researchers to incorporate data sharing via deposit into a public repository into their standard research process. (subject matter: secure and ethical data use) This session will cover the data deposit process and explore these topics: How to write a good README, user manual, or data dictionary Assigning metadata to your data files to make it discoverable on the web Creating documentation templates to streamline the deposit process Understanding licenses and use restrictions in repositories
NIH has a new policy going into effect on January 25, 2023 that will require NIH-funded researchers to prospectively submit a plan outlining how scientific data from their research will be managed and shared.  This session will cover the plan’s elements and allowable costs as well as tools to help with your own plan creation.
NIH's new Policy for Data Management and Sharing (DMS Policy), which goes into effect January 25, 2023, will require NIH-funded researchers to prospectively submit a plan outlining how scientific data from their research will be managed and shared. The policy includes an expectation that researchers will maximize their data sharing within ethical, legal, or technical constraints, and explicitly encourages researchers to incorporate data sharing via deposit into a public repository into their standard research process. This session will cover the "where" of data sharing through the following topics: - What is a repository? Repositories vs. data storage vs. data backup - Choosing the right repository for your discipline, data format, and legal/ethical considerations - Understanding potential costs for data sharing (and how to include them in a grant application) - Evaluating the robustness of a repository
This workshop will cover the more advanced topics in R programming for data analysis and graphics. Upon completion participants will be able to import and export data, merge datasets, transform data, and create basic summaries of data.

There are thousands of federal, state, and local government sites that link the public to their data. Like much of the internet, it is easy to get lost trying to find data useful to your research unless you know where to go. This class is designed to introduce participants to commonly used measures of social justice through publicly available data sites. We will begin by exploring data sites that focus on social justice issues, such as income, education, pollution, housing, and healthy/risky behaviors.

Need to find a dataset to act as a control for your study? Or do you want to reuse open access data? This workshop offers tips for locating and citing data, and includes hands-on exercises to explore directories of data repositories and data journals.

All of Us data includes data from electronic health records (EHR) for hundreds of thousands of people. This workshop will introduce the fundamentals of working with terminologies important to research using EHR data. The workshop will cover the construction of concept sets, important terminologies (ICD, SNOMED, LOINC, and RxNorm), and how to use them to subset patient data from the All of Us dataset. 

NIH's new Policy for Data Management and Sharing (DMS Policy), which goes into effect January 25, 2023, will require NIH-funded researchers to prospectively submit a plan outlining how scientific data from their research will be managed and shared. While broad sharing may not feasible for all datasets, steps can still be taken to share information about the dataset including a description and instructions for restricted access.
Jupyter Notebooks interweave code, data, and text into an executable "notebook" that can be published or shared as a self-contained object. They can support interactive data science and scientific computing across disciplines and programming languages (including Python, R and Julia) and promote open science and transparency in research. Additionally, Jupyter Notebooks can be used to teach programming and computational literacy.  This panel discussion aims to highlight Jupyter use in research and teaching across campus and foster  conversations among Pitt community members who are interested in this topic. It will begin with a practical overview of available Jupyter resources from the Center for Research Computing.  Each panelist will then discuss their specific use of Jupyter as outlined below.  A question and answer period will follow the panel discussion.   As part of Open Access Week (#OAweek), this event is hosted by the project team of the Pitt Seed grant Cultivating a Data Science Learning Community. All who are interested in data science topics are encouraged to attend! This session will be recorded and shared only with attendees. 

Do you want to track and organize your projects more efficiently, especially in a remote or distributed environment? Are you writing code or manuscripts with others and need to know who did what, when? In this class, learn the basics of version control and how it helps keep your work safe and reliable. Then we'll dive into Github to see how it tracks the changes you or your collaborators make to uploaded files, and how that can help make your research more reproducible.

Discussion of health literacy and the fight against health misinformation often centers around fact-checking or debunking written materials. However, identifying misleading visualizations and imagery is also a vital skill for navigating the current health information landscape: the same data may be presented in many different ways to convey drastically different impressions. This interactive session will increase your confidence in analyzing visual information and empower you to pass that knowledge along to your communities. Deceptive imagery types covered will include graphs and charts, manipulated images in scientific publications, and AI-generated imagery.
Have you ever seen a README for a piece of software? It's a simple text document that tells you who made a program, what it does, and how to run it. Learn how to write a great README for your code, data, or even file organization system.

Did you know that for each minute of planning at the beginning of a project, you will save yourself roughly 10 minutes of headache later? This session will provide practical tips for organizing, naming, documenting, storing and preserving your data.

Scheduled Data Services Classes