Data Services Classes and Customized Trainings

Classes Available on Request or Customized Training for your Group

To request a class, please contact HSLS Data Services.

If you’ve ever uploaded data or analysis code to a public repository, you may have been prompted to choose a license from dozens of options. But what’s the difference between the GNU license vs. MIT license? What does Creative Commons actually do? And how do licenses interact with copyright and formal Data Use Agreements? This session will explain the basics of licensing and help participants choose an appropriate license for their openly-accessible research products.

During the class, participants will learn how to: (1) access Center for Research Computing (CRC) clusters via Unix/Linux shell, (2) transfer files to CRC using FileZilla software, (3) create and manage files and directories, (4) edit files using vi text editor, and (5) access CRC-installed open source bioinformatics software such as FastQC, STAR, HISAT

This will be a flipped class – a prerecorded video and a hands-on workshop – which will cover the basics of command-line skills for data analysis. During this class, participants will learn how to:

  • access Center for Research Computing (CRC) clusters via Unix/Linux shell
  • transfer files to CRC using FileZilla software
  • create and manage files and directories
  • edit files using vi text editor
  • access CRC-installed open-source bioinformatics software such as FastQC, STAR, HISAT

 

 

Have questions about getting started with command line? Register to receive access to a pre-recorded video covering the basics of command line skills for data analysis and a practice dataset. Topics covered include: accessing Pitt Center for Research Computing clusters via Unix/Linux shell, transferring files using FileZilla, creating and managing files and directories, editing file using vi text editor, and accessing open source bioinformatics software. Attend the scheduled session to get answers to your questions and hear from others.

What is a Data Management Plan? This session will answer that question, as well as describe the steps to creating a DMP, tools that can help with DMP development, and post-award management issues. University of Pittsburgh-specific guidelines and support resources will also be shared.

This is a flipped class covering the more advanced topics in R programming for data analysis and the second part of a three-part series: Introduction to R; Data Management in R, and Data Visualization in R. Upon registration, you will receive links to workshop materials (PowerPoint slides, lecture videos, and practice exercises) that you can view on your schedule. During the class, you will learn how to solve the exercise problems. 
Have questions about Data Management Plans?  Stop by anytime during our 3-hour Zoom office hours to discuss best practices for data management and data sharing plans, or to explore how to use the DMPTool. It's not too early to start preparing for the new 2023 NIH policy for data management and sharing!
After this session you will be able to answer: What are some common challenges with formatting data in spreadsheets and how can we avoid them? How can we carry out basic quality assurance in spreadsheets? How can we export data from spreadsheets in a way that is useful for downstream applications?

Many funders, publishers, and institutions require researchers to make their research data public, but practical challenges can act as a barrier to sharing data, especially in the health sciences. This hands-on workshop will guide participants through the data sharing process, from initial study design to data deposit. Exercises will prompt participants to think through issues of data documentation, reuse value, and promotion of their own research projects.

This hands-on workshop will cover the creating visualization of data using the ggplot2 package in R. Upon completion, participants will be able to: create various graphical summaries of data, describe the “grammar” of ggplot2 functions, build custom visualizations with ggplot2.
This workshop will focus on LabArchives, the Electronic Research Notebook selected by the University of Pittsburgh. We will cover how to get started using it, including planning strategies, access, lab notebook creation and organization, adding and editing entries, linking, and sharing data.

Microsoft Excel is a commonly used program to record and store datasets with headings, rows, and columns. In this class, we will explore data with sorting and filtering functions, and transform data into summary tables. You will work through data examples to create pivot tables, apply conditional formatting, and prepare your figures for use in other programs.

OpenRefine (formerly Google Refine) is a powerful, free, open source, tool for working with messy tabular data. It runs offline in a web browser and allows for reproducibility in data cleaning. This hand-on workshop will walk participants through how to create a new project, explore the data through sorting, filtering, and faceting functions, complete basic data cleaning such as splitting or combining cells and clustering to find and fix inconsistent data entries, and creating JSON scripts.

The FAIR Data Principles are a set of guiding principles to make data Findable, Accessible, Interoperable and Reusable.  In this session, we will review these principles, discuss the challenges of data sharing, and offer practical tips for how sharing can be integrated into a researchers workflow.  

“What's in a name?” When you create a new file, do you give much thought to the name you save it as? This class focuses on best practices for naming files so that they are easily found, understood, and sharable in the future.

In this hands-on workshop, learn how to manage your work with the version control system Git. Git helps keep your files safe from accidental deletion, tracks who made what change when, and lets multiple people work on the same project without overwriting each other's work. We'll cover using Git from the Unix shell and through Github online. No previous experience with the command line is necessary, although some basic knowledge is recommended.

Are you curious about version control and project management with Github, but haven't had an opportunity to test out all of its features? Attend this hands-on workshop to work through solo and group exercises that will let you explore tasks like cloning a collaborator's project, making and investigating file changes, and resolving conflicts between versions. Please register for a free github.com account before the beginning of class.
The Pitt Data Catalog (PDC) is a platform to help University of Pittsburgh health sciences researchers share and discover datasets, software, and code. This workshop will highlight how the catalog can be used to increase visibility of your research outputs and act as a low-barrier way to make data discoverable without the need to deposit into a data repository.
This is the first part of a three-part series: Introduction to R; Data Management in R, and Data Visualization in R. This is a flipped class covering the basics of R programming for data analysis and graphics. Upon registration, you will receive links to workshop materials (PowerPoint slides, lecture videos, and practice exercises) that you can view on your schedule. During the class, you will learn how to solve the exercise problems. 

In this class, learn the fundamentals of keeping your data secure and organized through brief introductions to the core areas of data management: file storage and organization, file documentation, data preservation, and data publication and/or data sharing. This class is intended for graduate students and researchers who are working on long-term research projects, or for anyone who wants to make sure their personal files are safe for the long-term.

You've collected your data. Now what? In this class we will learn how to use Tableau to demonstrate the significance of your data.

Need to find a dataset to act as a control for your study? Or do you want to reuse open access data? This class will offer tips for locating and citing data and include hands-on exercises to explore directories of data repositories and data journals.

This workshop will provide an introduction to mapping and analyzing geographic data using Tableau.

The US Centers for Disease Control and Prevention is the leader of public health in the US. It administers multiple surveys, gathers vital statistics, and tracks infectious diseases, and much more, all the while making this data readily available to the public. Like all Federal agencies, though, it is a complex organization which is reflected in their Web site. During this class, we will explore key data initiatives of the CDC, focusing on publicly available data sites that allow you to find data through their interfaces.

OSF (Open Science Framework) is a free platform for hosting and collaborating on datasets, documentation, and research results. In this workshop, we will tour OSF, discuss the principles of open science in our fields, and work through a sample project together. Participants will learn: How to discover and download data through OSF How to manage collaborators and permissions for a research project Best practices for organizing OSF projects, including using version control How to use OSF to make your research more FAIR (Findable, Accessible, Interoperable, and Reuseable) Note: OSF also has a sub-site, OSF Registries, for pre-registering studies and systematic reviews. This class will not discuss preregistration in detail, but the underlying website architecture is the same.

Do you have data that require bioinformatics analysis?  Are you concerned about scientific rigor and reproducibility? Come learn about the “4 C’s” available to Pitt researchers: Core facilities, Collaboration with bioinformaticians, Coding, and Commercially-licensed tools.  Make an informed decision on the best option(s) for your data needs.

This workshop will cover the more advanced topics in R programming for data analysis and graphics. Upon completion participants will be able to:

There are thousands of federal, state, and local government sites that link the public to their data. Like much of the internet, it is easy to get lost trying to find data useful to your research unless you know where to go. This class is designed to introduce participants to commonly used measures of social justice through publicly available data sites. We will begin by exploring data sites that focus on social justice issues, such as income, education, pollution, housing, and healthy/risky behaviors.

Need to find a dataset to act as a control for your study? Or do you want to reuse open access data? This workshop offers tips for locating and citing data, and includes hands-on exercises to explore directories of data repositories and data journals.

Most of us know that the US Census Bureau conducts the Decennial Census, but did you know they are also responsible for conducting censuses of local governments and US economics as well as surveys of small areas, business owners, income, and populations? Much of this data is even available down to the neighborhood level. It is an invaluable resource for learning about a community, especially as you plan the development of an intervention or grant proposal. Learn how to navigate the Census Bureau effectively to gather key data as a first step to learning about a community.

Do you want to track and organize your projects more efficiently, especially in a remote or distributed environment? Are you writing code or manuscripts with others and need to know who did what, when? In this class, learn the basics of version control and how it helps keep your work safe and reliable. Then we'll dive into Github to see how it tracks the changes you or your collaborators make to uploaded files, and how that can help make your research more reproducible.

Have you ever seen a README for a piece of software? It's a simple text document that tells you who made a program, what it does, and how to run it. Learn how to write a great README for your code, data, or even file organization system.

Did you know that for each minute of planning at the beginning of a project, you will save yourself roughly 10 minutes of headache later? This session will provide practical tips for organizing, naming, documenting, storing and preserving your data.

Scheduled Data Services Classes