ai-rnaseq-handson

From ChatGPT to AI Agents: Practical RNA-seq Workflows with Codex CLI

Hands-on Project

Part 1. Set Up the Project Directory

1.1 Set up your ChatGPT account for Codex (if needed)

If you have not set up your ChatGPT account for Codex on a BioHPC server, follow the instructions here.

https://biohpc.cornell.edu/doc/setup_codex_account.html

1.2 Connect to your assigned server.

Find your assigned server on this page:

https://biohpc.cornell.edu/ww/machines.aspx?i=169

On Cornell campus or using Cornell VPN:


ssh your_user_id@cbsuxxxxxx.biohpc.cornell.edu

Off campus (without VPN):


x
ssh your_user_id@cbsulogin.biohpc.cornell.edu

ssh cbsuxxxxxx

1.3 Prepare your working directory and data


xxxxxxxxxx
mkdir /workdir/$USER

cp -r /shared_data/RNAseq/exercise1 /workdir/$USER/

cd /workdir/$USER/exercise1

ls -l

You should see :

FASTQ files (6 samples):
- ERR458493.fastq.gz
- ERR458494.fastq.gz
- ERR458495.fastq.gz
- ERR458500.fastq.gz
- ERR458501.fastq.gz
- ERR458502.fastq.gz
Metadata file
- sampleMeta.txt
Reference files:
- R64.fa
- R64.gtf

1.4 Inspect the two guardrail files.


xxxxxxxxxx
cp /programs/ai_pipelines/AGENTS.md /home/$USER/.codex
cp /programs/ai_pipelines/rnaseq/*.md /workdir/$USER/exercise1/

cat /home/$USER/.codex/AGENTS.md
cat /workdir/$USER/exercise1/rnaseq.md
cat /workdir/$USER/exercise1/rnaseq_genomics_core.md

Descriptions:

AGENTS.md: System prompt provided from the Bioinformatics Facility with usage guidelines.
rnaseq.md: nf-core RNA-seq workflow protocol (modifiable; e.g., CPU usage).
rnaseq_genomics_core.md: Protocol matching Cornell Genomics Facility standards.

You may use either RNA-seq protocols for this project. If you want to run both protocols, make sure to run them in two different sessions.

Part 2. Run Codex

2.1 Start Codex


xxxxxxxxxx
cd /workdir/$USER/exercise1

codex

If your ChatGPT account is not been linked, you will be prompted to connect it.

Use number keys or arrow keys + Enter to navigate prompts.

2.2 Make sure Codex is in default sandbox mode.

Run "/permissions" in Codex, make sure "1 Default" is selected (in blue). If not, press "1" to select 1.


xxxxxxxxxx
/permissions

2.3 Generate the RNA-seq pipeline script


xxxxxxxxxx
load rnaseq.md

Create a script run_rnaseq.sh to run rna-seq data analysis using 10 CPU cores.  Output directory: results.

⏱ Runtime: ~2 minutes. Codex will generate a script.

Exit Codex:

Press Ctrl + C, or
Type /exit

To resume later:


xxxxxxxxxx
codex resume --last

2.4 Run the RNA-seq data analysis script

Inspect the script and the formatted sample file:


xxxxxxxxxx
cat run_rnaseq.sh

cat samplesheet.csv

Then run


xxxxxxxxxx
./run_rnaseq_nfcore.sh

⏱ Runtime: ~5–10 minutes (small training dataset)

Optional (skip computation and copy the pre-made results):


xxxxxxxxxx
cp -r /shared_data/RNAseq/exercise1_results /workdir/$USER/exercise1/results

Pre-made results with Genomics Facility protocol: /shared_data/RNAseq/exercise1_results

2.5 Verify results

Resume Codex:


xxxxxxxxxx
cd /workdir/$USER/exercise1
codex resume --last

Ask:


xxxxxxxxxx
Can you check the output in the results directory?

Other useful prompts:


xxxxxxxxxx
What files should I check?

What should I do next?

You can download HTML reports using FileZilla and view them locally.

2.6 Downstream data analysis

Example tasks:


xxxxxxxxxx
Identify differentially expressed genes.

Make a PCA plot of the samples, mu in blue, and wt in red. Using triangles for mu and circles for wt.

If Codex creates but does not run a script:


xxxxxxxxxx
Run this script for me.

Plots are saved as .png files. You can:

Download via FileZilla
View in VS Code (recommended)

2.7 Function over-representation analysis (ORA).

ORA requires a GO annotation file. For most model organisms, GO annotation files are available online.

If the GO annotation file is not available, generate it in two steps:

Ask the agent to create a protein fasta file:


xxxxxxxxxx
Create a protein sequence fasta file using the genome fasta and the gtf file

use the BLAST2GO on BioHPC to generate the GO annotation: https://biohpc.cornell.edu/lab/userguide.aspx?a=software&i=73#c

For this workshop, use the pre-made GO annotation file: R64.go.txt


xxxxxxxxxx
Run functional over-representation analysis using topGO with R64.go.txt. Use R version 4.4.3.

Alternative:


xxxxxxxxxx
Use clusterProfiler for ORA.

Adjust thresholds:


xxxxxxxxxx
Redo topGO using genes with log2 fold change > 2.

Part 3. Documentation

Documentation is essential in research.

Example prompts:


xxxxxxxxxx
Organize all generated scripts into a scripts directory.

Add a README file describing each script.

Write a project summary including software versions for manuscript use.

Part 4. Using VS Code (optional)

VS Code improves visualization and workflow.

Setup instructions: https://biohpc.cornell.edu/doc/setup_codex_account.html

VS Code layout:

File explorer (left): Open files and images
AI Agent (right): Manage Codex sessions
Editor (center): View/edit files
Terminal (bottom): Run commands.

Running Codex in VS Code

Option 1: Terminal


xxxxxxxxxx
codex

Option 2: Agent panel (GUI)

Key concepts:

Switching agents:

At the top of the panel, you can select which Agent to use for your project:

Codex: OpenAI agent

Claude: Anthropic agent

Chat: Microsoft Copilot agent

Project directory = folder opened in VS Code

Open with: Ctrl + Shift + P → Open Folder

Session management:

Resume previous session (default)
Start new session via “New Chat” button (upper right corner)

Try this example


xxxxxxxxxx
Make a PCA plot of the samples, wt in blue and mu in red.

The .png will appear in the file explorer—double-click to view.