Docker - Exercises

The commands we provided might contain hidden characters. If a copy-pasted command does not run, use this page to clean up the command line: https://biohpc.cornell.edu/clean.aspx

Part 1. Examine the libraries of "samtools".

The two copies of samtools were compiled for different Linux systems. "samtools-1.15.1" was compiled for CentOS 7, and "samtools-1.15.1-r" was compiled for Rocky 9. The file "/etc/os-release" tells you what Linux version is on the server. The "ldd" command tells you what libraries samtools need. Do you see any libraries missing in any of the samtools?

cat /etc/os-release

ldd /programs/samtools-1.15.1/bin/samtools

ldd /programs/samtools-1.15.1-r/bin/samtools

The "ldd" command shows that "libcrypto.so.1.1" is missing for samtools-1.15.1-r. The following "ldconfig -p" commands tells you what version of libcrypto is present on your current system:

ldconfig -p |grep libcrypto

 

Part 2. Create a Docker images interactively

2.1 Pull a Docker image for Ubuntu:20.04, and start an interactive container

docker1 pull ubuntu:20.04

docker1 images

docker1 run -dit ubuntu:20.04

docker1 ps -a

 

2.2 Start a shell inside the Docker container, and install some software in it

2.2.1 Identify your container ID. Replace "xxxxxxxxx" in the following command with your container ID. (the last column of the result table should contain your User ID)

docker1 ps -a

docker1 exec -it xxxxxxxxx bash

 

2.2.2 Once you are inside the container, try to install some software.

You will use "apt" (or "apt-get") to install "samtools". "apt" is a software installation tool that uses the Ubuntu package management system. "tzdata" is for timezone management, required by samtools. The "-y" option is used so that "apt" would not prompt you for confirmation.

apt update

export DEBIAN_FRONTEND=noninteractive
export TZ=America/New_York
apt install -y tzdata

apt install -y samtools

 

2.2.3 If you feel like it, install some more software into the Docker container. Otherwise, skip this step and move on to step 2.2.4.

Here you will download the "bedtools2" source code and compile it with GCC.

You will need to install two more packages: "wget" and "build-essential". "make -j4": compilation using 4 CPU cores.

apt install -y wget build-essential 

pwd

wget https://github.com/arq5x/bedtools2/releases/download/v2.30.0/bedtools-2.30.0.tar.gz 

tar xvfz bedtools-2.30.0.tar.gz

cd bedtools2

make -j4

It could take several iterations to correct all the compilation errors. You would see error messages including "zlib library missing", "python command not found", et al. You can Google the error message to find the solutions. The latest Ubuntu comes with "python3", but not "python" command. You need to create a "python" symbolic link.

#missing libraries
apt install -y zlib1g-dev libbz2-dev liblzma-dev

ln -s /usr/bin/python3 /usr/bin/python

and continue with the installation.

make -j4

make install

cd ..

#the installation source files can be deleted to reduce the image size
rm -fr bedtools2

 

2.2.4. Test running the software you just installed

which samtools

samtools

which bedtools

bedtools -h

 

2.2.5. Run "exit" to leave the container shell. The container should still be running. You can check it by "docker1 ps -a" command.

exit

docker1 ps -a

 

2.3. Commit your container into a new image

At this point, you have installed some software in your container, and you want to commit the container into a new image.

The docker1 command automatically prefix your user ID to the image name. The new image is named "biohpc_$USER/myapp".

docker1 commit xxxxxxxxx myapp

docker1 images

 

Save the new Docker image into a tar file

cd /workdir/$USER

docker1 save -o myapp.tar biohpc_$USER/myapp

 

2.4. Clean up

#remove all your containers
docker1 clean all

#remove your image
docker1 rmi biohpc_$USER/myapp

#list the existing images and containers
docker1 ps -a

docker1 images

 

2.5. Load your saved tar file as a Docker image

docker1 load -i myapp.tar

docker1 images

#replace the image name from "myapp" to "mysam"
docker1 tag biohpc_$USER/myapp mysam
docker1 images

docker1 rmi biohpc_$USER/myapp
docker1 images

 

Part 3. Run the software in Docker

Run "samtools" installed in Docker, and convert the provided test.sam file into a bam file.

cp /shared_data/qisun/test.sam /workdir/$USER/  

 

So, how to access the file "/workdir/$USER/test.sam" from inside the Docker container?

Docker1 automatically mounts "/workdir/$USER" as "/workdir" inside the container. In the container, you can access the files under the directory "/workdir/".

Optionally, you can use the docker "-v" option to mount more directories. For security reasons, "docker1" only allows directories under "/workdir/$USER/" or "/local/storage" to be mounted.

When you use "-v" option with docker1, make sure that the source directory should always ends with "/", e.g. "/workdir/$USER/mydata/"

docker1 run --rm biohpc_$USER/mysam samtools

docker1 run --rm biohpc_$USER/mysam samtools view -b -o /workdir/test.bam /workdir/test.sam

#the output file test.bam is owned by root, run following commands to claim the ownership of the file
ls -l /workdir/$USER/test.bam

docker1 claim /workdir/$USER/test.bam

ls -l /workdir/$USER/test.bam

 

Part 4. Create an Rstudio web server

4.1 Start the Rstudio. As we have 4 people share the same server, each one should pick a different port number between 8009 and 8039. I am using port 8029 in the instructions.

#create a directory (any name) which would serve as rstudio session directory
mkdir /workdir/$USER/rstudio_dir

#start the docker container from Rocker/rstudio image. 
#Replace 8029 with a different number between 8009 and 8039.
docker1 run -d -p 8029:8787 rocker/rstudio 

#get the container ID
docker1 ps

#add your BioHPC user ID and primary group ID into Docker container
#replace xxxxxxx below with your container ID
export MYCID=xxxxxxx 

docker1 exec $MYCID groupadd -g `id -g` `id -g -n $USER`

docker1 exec $MYCID useradd -m -u `id -u` -g `id -g` -d /home/$USER $USER

#set password for your user account within container
#this is the password you will use later to login to Rstudio
docker1 exec -it $MYCID passwd $USER

#make $USER as a sudo user in container
docker1 exec $MYCID usermod -aG sudo $USER

#make an alias /home/$USER/.local pointing to /workdir/$USER/rstudio_dir
docker1 exec --user $USER $MYCID ln -s /workdir/rstudio_dir /home/$USER/.local

 

If you are on the Cornell network (or connected to VPN), you should be able to open a web browser and go to the URL http://cbsuxxxx.biohpc.cornell.edu:8029 and log in with the BioHPC user name, and password your just set (replace cbsuxxxx with your assigned machine name).