Institute of Biotechnology
drupal search
Search
Search the web
Search
BioHPC
Cornell Pages
Cornell People
Cornell University
Institute of Biotechnology
BioHPC Cloud
BioHPC Home
Announcements
Software
Hardware
Workshops
Pricing
Biotechnology Institute
Server Usage Stats
NGS Data
User Guide
Quick Start Guide
Credit Accounts
Access
Storage
Backups
Databases
Software
FAQ
Guides
Contact Us
Contact Us
Office Hours
Staff Directory
Staff Access
Login
BioHPC Cloud Login
BioHPC Cloud Password Reset
BioHPC Cloud Request New User
institute of biotechnology
>>
brc
>>
bioinformatics
>>
internal
>>
workshops
Workshops
Practical Linux Examples in Bioinformatics
March 4 and 6 2019
This workshop will deal with processing of large bioinformatics data files using Linux tools.
When working with genomics or transcriptomics data, we often need to process large text data files that are too big to open, for example, in Excel. In this one-session workshop, we will demonstrate how to use Linux tools such as
awk
,
sed
,
cut
,
paste
,
sort
,
uniq
, etc., to filter, transform, and analyze such files. We will also introduce
bedtools
- a software package designed for efficient processing of large genomics interval files. Some of the commonly used bioinformatics data file formats, including GFF, BED, and FASTQ will be covered in the examples. We will illustrate how to process multiple files simultaneously using multiple CPU cores.
The presented material will be illustrated by hands-on exercises hosted on dedicated workstations of the BioHPC Lab. No programming skills are required, however, all participants should have basic knowledge of Linux command-line environment, for example, as introduced in our two previous workshops: "
Introduction to BioHPC Lab
" and "
Linux for Biologists
" (lecture slides are available on workshop web pages).
The BioHPC Lab workstations used for the workshop will be accessed remotely using the Secure SHell (
ssh
) protocol. To participate in the exercises, please bring your own laptop with an
ssh client
installed. MACs and Linux laptops come with native ssh clients and no extra installation is needed. For windows, the recommended ssh client is
PuTTy
- please install it prior to the workshop. To be able to run Linux programs with graphical inerface displaying on your laptop, you should also install
RealVNC viewer
. To transfer files between your laptop and a Linux machine, you will need an
sftp
clinet, such as
FileZilla
on Windows (MAC and Linux laptops come with native sfpt clients and no extra installation is needed).
However, neither RealVNC viewer nor FileZilla are essential for the workshop
. For links to client software mentioned above, instructions, and more information on access to BioHPC machines, please refer to the following document:
http://cbsu.tc.cornell.edu/lab/doc/Remote_access.pdf
, especially points 1 and 2.2-2.4.
This workshop is divided into paired sessions: lecture/presentation session (Mondays) followed by hands-on session (Wednesdays), this arrangement will allow for plenty of time for hands-on training.
Access to BioHPC Lab workstations requires a Lab account. If you do not yet have an account on BioHPC Lab system, we will create one for you after you register for the workshop. Also, we will assign a machine for you to work on during and after the workshop.
Please do not make any machine reservations for the workshop!
Machine allocations will be posted here
.
Exercise handout:
http://cbsu.tc.cornell.edu/doc/linux_examples_exercises_v3.pdf
Workshop Outline
Session 1
Mar 4 2019 3:30PM - 5:00PM 655 Rhodes Hall
Lecture:
http://biohpc.cornell.edu/doc/linux_examples_slides_v4.pdf
Exercises:
https://biohpc.cornell.edu/doc/linux_examples_exercises_v3.pdf
Session 2
Mar 6 2019 3:30PM - 5:00PM 655 Rhodes Hall
Hands-on session.
Website credentials:
login
Web Accessibility Help