Build Script Manualο
This document gives instructions on how to create your own build scripts for CondaTainer.
Table of Contentsο
Naming Conventionsο
The file path must follow the naming convention below to be recognized by CondaTainer:
build_scripts/<name_conversion> (should not include the .sh suffix)
Where <name_conversion> is defined as:
OSο
Apptainer definition files for distro-level system tools.
Format:
<distro>/<name>Structure:
distro: The base OS distribution (e.g.,
ubuntu24).name: The tool name (e.g.,
igv,r4.4.3).
Example:
ubuntu24/igv
Appsο
Apps not available as conda packages, or specific versions not in conda.
Single version per file:
Format:
<name>/<version>Example:
cellranger/9.0.1
PL template (multiple versions, one file):
Format:
<name>(a single file with#PL:and#TARGET:headers)Example:
cytoscapeβ expands tocytoscape/3.10.3,cytoscape/3.10.4, etc.Use this when the install logic is identical across versions and only the download URL changes.
Dataο
Any data, including genome reference indexes.
Format:
<assembly|project>/<datatype>/<version>Structure:
assembly/project: The genome assembly or project name (e.g.,
grch38).datatype: The type of data (e.g.,
gtf-gencode).version: The release or build version (e.g.,
47).
Example:
grch38/gtf-gencode/47
Exampleο
cellranger/9.0.1 (App)
grch38/cellranger/2024-A (Data)
Available Variablesο
Variable |
Description |
|---|---|
|
Number of CPUs (from script directives, or |
|
Memory per task in MB (from script directives, or |
|
Memory per task in GB (integer) |
|
app: name; data/ref: assembly/datatype |
|
apps: app version; ref: data version |
|
Target installation directory (managed by CondaTainer) |
|
Temporary working directory (managed by CondaTainer) |
tmp_dir:
For OS
.def: intermediate.sifwill created to condatainer tmp, or next to the target dir.For app module: use scheduler tmp/
$TMP//tmpFor data module: use condatainer tmp
For external script: determined by
#TYPE:tag, if app(default) use tmp, if data use target dirif
CNT_TMPDIRis set, it overrides all tmp behaviors.
Function |
Description |
|---|---|
|
Print message to stderr with current time |
|
Decompress gz file using pigz or gunzip |
|
Extract tar.gz using pigz if available |
|
Decompress gz file and pipe to stdout using pigz or gunzip |
Headersο
Headers are special comments at the beginning of build scripts that provide metadata and instructions for CondaTainer.
Example Header: star/2.7.11b/gencode47-101
#!/usr/bin/bash
#DEP:grch38/genome/gencode
#DEP:grch38/gtf-gencode/22
#DEP:star/2.7.11b
#WHATIS:STAR grch38 GENCODE22 index for read length 101
#URL:https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf
#ENV:STAR_INDEX_DIR=$app_root
#ENVNOTE:STAR index for grch38 GENCODE v22 with read length 101
#SBATCH --cpus-per-task=16
#SBATCH --mem=42G
#SBATCH --time=2:00:00
#SBATCH --job-name=star-index
#SBATCH --output=%x-%j.log
install() {
...
}
WhatIs and URLο
#WHATIS: and #URL: lines are used to replace modulefileβs {WHATIS} and {HELP} placeholders.
They are used by ModGen when generating modulefiles, and displayed by CondaTainer during interactive template resolution (condatainer create).
Set Dependenciesο
#DEP: lines specify dependencies that must be installed before building the current overlay.
When CondaTainer processes the build script, it will ensure that all specified dependencies are available and load them in the same order as listed.
Basic (exact version):
#DEP:samtools/1.21
Requires exactly samtools/1.21. If not installed, builds it.
With version constraint:
#DEP:samtools/1.22.1>=1.10
Preferred version:
1.22.1β built if no compatible version is installed.Minimum version (
>=1.10): any installed version in the range[1.10, 1.22.1]is accepted.Upper bound: the preferred version acts as an implicit upper bound. A version higher than
1.22.1(e.g.,2.0) will not be used and the preferred version will be built instead.
> is also supported for a strict lower bound:
#DEP:samtools/1.22.1>1.10
Version format: partial versions are accepted β 1, 1.10, 1.22.1. Missing components are treated as 0 (so 1.10 matches 1.10.0, 1.10.3, etc.).
When multiple installed versions satisfy the constraint, the latest one is used.
Scheduler Parametersο
Scheduler directive lines (#SBATCH, #PBS, or #BSUB) allow you to specify job parameters for the build process. HTCondor uses native .sub submit files instead of in-script directives.
If a supported scheduler is available, CondaTainer will submit the build job with the specified parameters.
Slurm Example:
#SBATCH --cpus-per-task=16
#SBATCH --mem=42G
#SBATCH --time=2:00:00
#SBATCH --job-name=star-index
#SBATCH --output=%x-%j.log
--cpus-per-task,--mem, and--timeshould be set according to the expected resource requirements.--nodes,--ntasks: must not be set (always single-task). Writable overlay (.img) can only be mounted by one process at a time.--output: will always be overwritten to point to thelogsdirectory.
PBS Example:
#PBS -l select=1:ncpus=16:mem=42gb
#PBS -l walltime=2:00:00
#PBS -N star-index
LSF Example:
#BSUB -n 16
#BSUB -M 43008
#BSUB -W 2:00
#BSUB -J star-index
For LSF, CondaTainer will add -R "span[hosts=1]" to ensure all CPUs are allocated on the same node.
Note: Certain SLURM flags are not supported and will cause the build to fail:
--topology-plugin,--switches,--gpus-per-socket,--sockets-per-node,--cores-per-socket,--threads-per-core,--ntasks-per-socket,--ntasks-per-core,--distribution. Remove these from your build script.
Type Tagο
#TYPE: controls temporary build path behavior for external .sh/.bash builds.
#TYPE:app(default): build in scratch tmp (utils.GetTmpDir(), affected byCNT_TMPDIR).#TYPE:data: build alongside the target prefix directory.
If CNT_TMPDIR is set, it overrides both #TYPE:app and #TYPE:data behavior and forces scratch tmp resolution under CNT_TMPDIR/cnt-$USER.
Accepted aliases (case-insensitive):
App aliases:
app,env,tool,conda,smallData aliases:
data,ref,large
Examples:
#TYPE:app
#TYPE:data
#TYPE:ref
Auto-Update Tagο
#AUTOUPDATE: opts a script into automatic version maintenance. A CI workflow runs twice every month, fetches the latest versions from the specified source, and rewrites the version list in place.
#AUTOUPDATE:{key}:{source}:{identifier}[>={min}][<{max}|<={max}]
{key} must match an existing #PL:, #DEP:, or (in helpers) #VALUE: header in the same file. The target type is detected automatically:
Header matched |
Behavior |
|---|---|
|
Rewrites the full version list (all versions β₯ min) |
|
Rewrites the pinned version to latest only; preserves |
Supported sources:
Source |
Format |
Example |
|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
The docker source requires a full capture-group regex as the tag pattern β the first capture group is extracted as the version string.
Examples:
# PL template β full version list, all 3.9+
#PL:cytoscape_version:3.9.0,3.9.1,3.10.0,3.10.3,3.10.4
#AUTOUPDATE:cytoscape_version:github:cytoscape/cytoscape>=3.9.0
#TARGET:cytoscape/{cytoscape_version}
# DEP pin β latest samtools; min constraint preserved from the #DEP: line
#DEP:samtools/1.23.1>=1.10
#AUTOUPDATE:samtools:bioconda:samtools
# DEP pin with upper bound β stay on openjdk 17.x, never upgrade to 18+
#DEP:openjdk/17.0.12>=17
#AUTOUPDATE:openjdk:bioconda:openjdk>=17<18
Place #AUTOUPDATE: immediately after the #PL: or #DEP: line it manages.
Environment Variablesο
#ENV:lines define environment variables to be set when the overlay is loaded.#ENVNOTE:lines provide descriptions for the environment variables, which will be included in the modulefile help text and CondaTainer.envfile.#ENVNOTE:must directly follow its corresponding#ENV:line.Only one
#ENVNOTE:line is allowed per#ENV:variable.
$app_root is a special placeholder that will be replaced with the actual installation path of the overlay when loaded.
Example:
#ENV:CELLRANGER_REF_DIR=$app_root
#ENVNOTE:cellranger reference dir
#ENV:GENOME_FASTA=$app_root/fasta/genome.fa
#ENVNOTE:genome fasta
#ENV:ANNOTATION_GTF_GZ=$app_root/genes/genes.gtf.gz
#ENVNOTE:10X modified gtf
ENV Naming Guidelinesο
For common data: genome fasta, gtf, etc., use standard variable names like GENOME_FASTA, ANNOTATION_GTF_GZ.
If the file is compressed, add
_GZsuffix. e.g.ANNOTATION_GTFfor uncompressed gtf,ANNOTATION_GTF_GZfor gzipped gtf.
For tool-specific references, use the tool name as a prefix.
If the index is a directory, use
_DIRsuffix.If the index is a file prefix, use
_PREFIXsuffix.If the index is a specific file, use appropriate suffix based on file type.
Examples:
CELLRANGER_REF_DIRfor Cellranger references.STAR_INDEX_DIRfor STAR indices.BOWTIE2_PREFIXfor Bowtie2 indices.BWA_MEM2_FASTAfor BWA-MEM2 genome fasta withbwa-mem2indices.
Interactive Tagο
#INTERACTIVE:<Prompt>tag indicates that the build script requires input from the user during execution.It is common for apps that need license agreement acceptance or custom configuration.
When CondaTainer encounters this tag, it will prompt the user with the specified
<Prompt>message before building. It will take the user input and pass it to the build script during execution.You can use
\nto add new lines in the prompt message.
Example:
#!/usr/bin/bash
#WHATIS:10X Genomics Single Cell Software Suite
#URL:https://www.10xgenomics.com/support/software/cell-ranger/downloads/previous-versions
#INTERACTIVE:β οΈ 10X links only valid for one day. Please go to the link below and get tar.gz link.\nhttps://www.10xgenomics.com/support/software/cell-ranger/downloads/previous-versions
Example: cellranger/9.0.1
Appsο
Do not try to manually download apps that are already available via conda-forge or bioconda.
Also, I donβt recommend compiling apps from source unless absolutely necessary.
HPC systems often lack required build tools or dependencies unless you load specific modules.
To maximize compatibility (CondaTainer), itβs better to rely on pre-compiled packages.
Template: build-template-apps
Tipsο
You can use tar_xf_pigz and pigz_or_gunzip functions to speed up decompression of large files if pigz is available on your system.
If the app requires specific environment variables to function properly, make sure to add them using #ENV: and #ENVNOTE: tags. e.g. orad/2.7.0
Examplesο
Dataο
Data often require downloading large files from external sources.
Indices may need to be built using specific versions of software.
If indices are version dependent, ensure the app version is included in the name. e.g. grch38/star/2.7.11b/gencode47-101
If indices require building, ensure you have the scheduler parameters (
#SBATCH,#PBS, or#BSUB) set appropriately to allocate sufficient resources.
Always add environment variables using
#ENV:and#ENVNOTE:to help users locate the reference data.
Template: build-template-ref
Tipsο
You can use tar_xf_pigz and pigz_or_gunzip functions to speed up decompression of large files if pigz is available on your system.