1.1 Set up your ChatGPT account for Codex (if needed)
If you have not set up your ChatGPT account for Codex on a BioHPC server, follow the instructions here.
https://biohpc.cornell.edu/doc/setup_codex_account.html
1.2 Connect to your assigned server.
Find your assigned server on this page:
https://biohpc.cornell.edu/ww/machines.aspx?i=169
On Cornell campus or using Cornell VPN:
ssh your_user_id@cbsuxxxxxx.biohpc.cornell.edu
Off campus (without VPN):
xssh your_user_id@cbsulogin.biohpc.cornell.edussh cbsuxxxxxx
1.3 Prepare your working directory and data
xxxxxxxxxxmkdir /workdir/$USERcp -r /shared_data/RNAseq/exercise1 /workdir/$USER/cd /workdir/$USER/exercise1ls -l
You should see :
FASTQ files (6 samples):
ERR458493.fastq.gz
ERR458494.fastq.gz
ERR458495.fastq.gz
ERR458500.fastq.gz
ERR458501.fastq.gz
ERR458502.fastq.gz
Metadata file
sampleMeta.txt
Reference files:
R64.fa
R64.gtf
1.4 Inspect the two guardrail files.
xxxxxxxxxxcp /programs/ai_pipelines/AGENTS.md /home/$USER/.codexcp /programs/ai_pipelines/rnaseq/*.md /workdir/$USER/exercise1/cat /home/$USER/.codex/AGENTS.mdcat /workdir/$USER/exercise1/rnaseq.mdcat /workdir/$USER/exercise1/rnaseq_genomics_core.md
Descriptions:
AGENTS.md: System prompt provided from the Bioinformatics Facility with usage guidelines.
rnaseq.md: nf-core RNA-seq workflow protocol (modifiable; e.g., CPU usage).
rnaseq_genomics_core.md: Protocol matching Cornell Genomics Facility standards.
You may use either RNA-seq protocols for this project. If you want to run both protocols, make sure to run them in two different sessions.
2.1 Start Codex
xxxxxxxxxxcd /workdir/$USER/exercise1codex
If your ChatGPT account is not been linked, you will be prompted to connect it.
Use number keys or arrow keys + Enter to navigate prompts.
2.2 Make sure Codex is in default sandbox mode.
Run "/permissions" in Codex, make sure "1 Default" is selected (in blue). If not, press "1" to select 1.
xxxxxxxxxx/permissions
2.3 Generate the RNA-seq pipeline script
xxxxxxxxxxload rnaseq.mdCreate a script run_rnaseq.sh to run rna-seq data analysis using 10 CPU cores. Output directory: results.
⏱ Runtime: ~2 minutes. Codex will generate a script.
Exit Codex:
Press Ctrl + C, or
Type /exit
To resume later:
xxxxxxxxxxcodex resume --last
2.4 Run the RNA-seq data analysis script
Inspect the script and the formatted sample file:
xxxxxxxxxxcat run_rnaseq.shcat samplesheet.csv
Then run
xxxxxxxxxx./run_rnaseq_nfcore.sh
⏱ Runtime: ~5–10 minutes (small training dataset)
Optional (skip computation and copy the pre-made results):
xxxxxxxxxxcp -r /shared_data/RNAseq/exercise1_results /workdir/$USER/exercise1/results
Pre-made results with Genomics Facility protocol: /shared_data/RNAseq/exercise1_results
2.5 Verify results
Resume Codex:
xxxxxxxxxxcd /workdir/$USER/exercise1codex resume --last
Ask:
xxxxxxxxxxCan you check the output in the results directory?
Other useful prompts:
xxxxxxxxxxWhat files should I check?What should I do next?
You can download HTML reports using FileZilla and view them locally.
2.6 Downstream data analysis
Example tasks:
xxxxxxxxxxIdentify differentially expressed genes.Make a PCA plot of the samples, mu in blue, and wt in red. Using triangles for mu and circles for wt.
If Codex creates but does not run a script:
xxxxxxxxxxRun this script for me.
Plots are saved as .png files. You can:
Download via FileZilla
View in VS Code (recommended)
2.7 Function over-representation analysis (ORA).
ORA requires a GO annotation file. For most model organisms, GO annotation files are available online.
If the GO annotation file is not available, generate it in two steps:
Ask the agent to create a protein fasta file:
xxxxxxxxxxCreate a protein sequence fasta file using the genome fasta and the gtf file
use the BLAST2GO on BioHPC to generate the GO annotation: https://biohpc.cornell.edu/lab/userguide.aspx?a=software&i=73#c
For this workshop, use the pre-made GO annotation file: R64.go.txt
xxxxxxxxxxRun functional over-representation analysis using topGO with R64.go.txt. Use R version 4.4.3.
Alternative:
xxxxxxxxxxUse clusterProfiler for ORA.
Adjust thresholds:
xxxxxxxxxxRedo topGO using genes with log2 fold change > 2.
Documentation is essential in research.
Example prompts:
xxxxxxxxxxOrganize all generated scripts into a scripts directory.Add a README file describing each script.Write a project summary including software versions for manuscript use.
VS Code improves visualization and workflow.
Setup instructions: https://biohpc.cornell.edu/doc/setup_codex_account.html
VS Code layout:
File explorer (left): Open files and images
AI Agent (right): Manage Codex sessions
Editor (center): View/edit files
Terminal (bottom): Run commands.
Running Codex in VS Code
Option 1: Terminal
xxxxxxxxxxcodex
Option 2: Agent panel (GUI)
Key concepts:
Switching agents:
At the top of the panel, you can select which Agent to use for your project:
Codex: OpenAI agent
Claude: Anthropic agent
Chat: Microsoft Copilot agent
Project directory = folder opened in VS Code
Open with: Ctrl + Shift + P → Open Folder
Session management:
Resume previous session (default)
Start new session via “New Chat” button (upper right corner)
xxxxxxxxxxMake a PCA plot of the samples, wt in blue and mu in red.
The .png will appear in the file explorer—double-click to view.