ai-rnaseq-handson

From ChatGPT to AI Agents: Practical RNA-seq Workflows with Claude Code

Hands-on Project

Part 1. Set Up the Project Directory

1.1 Set up your Claude account (if needed)

If you have not set up your Claude account on a BioHPC server, follow the instructions here.

https://biohpc.cornell.edu/doc/setup_account_claude.html

1.2 Connect to your assigned server.

Find your assigned server on this page:

https://biohpc.cornell.edu/ww/machines.aspx?i=169

On Cornell campus or using Cornell VPN:


ssh your_user_id@cbsuxxxxxx.biohpc.cornell.edu

Off campus (without VPN):


x
ssh your_user_id@cbsulogin.biohpc.cornell.edu

ssh cbsuxxxxxx

1.3 Prepare your working directory and data


x
mkdir /workdir/$USER

cp -r /shared_data/RNAseq/exercise1 /workdir/$USER/

cd /workdir/$USER/exercise1

ls -l

You should see :

FASTQ files (6 samples):
- ERR458493.fastq.gz
- ERR458494.fastq.gz
- ERR458495.fastq.gz
- ERR458500.fastq.gz
- ERR458501.fastq.gz
- ERR458502.fastq.gz
Metadata file
- sampleMeta.txt
Reference files:
- R64.fa
- R64.gtf

1.4 Inspect the two guardrail files.


x
cp /programs/ai_pipelines/AGENTS.md /home/$USER/.claude/CLAUDE.md
cp /programs/ai_pipelines/rnaseq/*.md /workdir/$USER/exercise1/

cat /home/$USER/.claude/CLAUDE.md
cat /workdir/$USER/exercise1/rnaseq.md
cat /workdir/$USER/exercise1/rnaseq_genomics_core.md

Descriptions:

CLAUDE.md: System prompt provided from the Bioinformatics Facility with usage guidelines.
rnaseq.md: nf-core RNA-seq workflow protocol (modifiable; e.g., CPU usage).
rnaseq_genomics_core.md: Protocol matching Cornell Genomics Facility standards.

You may use either RNA-seq protocols for this project. If you want to run both protocols, make sure to run them in two different sessions.

Part 2. Run Claude Code

2.1 Start Claude Code


x
cd /workdir/$USER/exercise1

claude

If your Claude account is not been linked, you will be prompted to connect it.

Use number keys or arrow keys + Enter to navigate prompts.

2.2 Default file access policy

By default Claude Code always asks for permission before modifying or deleting file. If this default policy is changed, you can find setting in the file ~/.claude/settings.json.

2.3 Generate the RNA-seq pipeline script


x
read rnaseq.md

Create a script run_rnaseq.sh to run rna-seq data analysis using 10 CPU cores.  Output directory: results.

⏱ Runtime: ~2 minutes. Claude will generate a script.

Exit Claude:

Press Ctrl + C, or
Type /exit

To resume later:


x
claude --continue

2.4 Run the RNA-seq data analysis script

Inspect the script and the formatted sample file:


xxxxxxxxxx
cat run_rnaseq.sh

cat samplesheet.csv

Then run


x
./run_rnaseq_nfcore.sh

⏱ Runtime: ~5–10 minutes (small training dataset)

Optional (skip computation and copy the pre-made results):


xxxxxxxxxx
cp -r /shared_data/RNAseq/exercise1_results /workdir/$USER/exercise1/results

Pre-made results with Genomics Facility protocol: /shared_data/RNAseq/exercise1_results

2.5 Verify results

Resume Claude:


x
cd /workdir/$USER/exercise1
claude --continue

Ask:


x
Can you check the output in the results directory?

Other useful prompts:


xxxxxxxxxx
What files should I check?

What should I do next?

You can download HTML reports using FileZilla and view them locally.

2.6 Downstream data analysis

Example tasks:


xxxxxxxxxx
Identify differentially expressed genes.

Make a PCA plot of the samples, mu in blue, and wt in red. Using triangles for mu and circles for wt.

If Claude creates but does not run a script:


xxxxxxxxxx
Run this script for me.

Plots are saved as .png files. You can:

Download via FileZilla
View in VS Code (recommended)

2.7 Function over-representation analysis (ORA).

ORA requires a GO annotation file. For most model organisms, GO annotation files are available online.

If the GO annotation file is not available, generate it in two steps:

Ask the agent to create a protein fasta file:


xxxxxxxxxx
Create a protein sequence fasta file using the genome fasta and the gtf file

use the BLAST2GO on BioHPC to generate the GO annotation: https://biohpc.cornell.edu/lab/userguide.aspx?a=software&i=73#c

For this workshop, use the pre-made GO annotation file: R64.go.txt


xxxxxxxxxx
Run functional over-representation analysis using topGO with R64.go.txt. Use R version 4.4.3.

Alternative:


xxxxxxxxxx
Use clusterProfiler for ORA.

Adjust thresholds:


xxxxxxxxxx
Redo topGO using genes with log2 fold change > 2.

Part 3. Documentation

Documentation is essential in research.

Example prompts:


x
Organize all generated scripts into a scripts directory.

Add a README file describing each script.

Write a project summary including software versions for manuscript use.

Part 4. Using VS Code (optional)

VS Code improves visualization and workflow.

Setup instructions: https://biohpc.cornell.edu/doc/setup_account_claude.html

VS Code layout:

File explorer (left): Open files and images
AI Agent (right): Manage Codex sessions
Editor (center): View/edit files
Terminal (bottom): Run commands.

Running Claude in VS Code

Option 1: Terminal


xxxxxxxxxx
claude

Option 2: Agent panel (GUI)

Key concepts:

Switching agents:

At the top of the panel, you can select which Agent to use for your project:

Codex: OpenAI agent

Claude: Anthropic agent

Chat: Microsoft Copilot agent

Project directory = folder opened in VS Code

Open with: Ctrl + Shift + P → Open Folder

Session management:

Resume previous session (default)
Start new session via “New Chat” button (upper right corner)

Try this example


xxxxxxxxxx
Make a PCA plot of the samples, wt in blue and mu in red.

The .png will appear in the file explorer—double-click to view.