Skip to content

Processing DAGs

These DAGs run after the IFS download DAGs and before the model run. They verify that all three IFS files (analysis, analysis-00, forecast) are present and then invoke the IFS-to-NEMO pre-processing script that converts ECMWF fields into the format expected by the ocean model.


ifs_process (Lucia)

Property Value
Schedule 15 08 * * * (08:15 UTC)
Retries 80 x 10 min (default), 3 x - for prepare_ifs only
File LUCIA/process_ifs.py
check_ifs_an  ─┐
check_ifs_an00 ┼─>> prepare_ifs
check_ifs_fc  ─┘
check_free    ─┘

All four check tasks run in parallel and must all succeed before prepare_ifs starts.

check_ifs_an / check_ifs_an00 / check_ifs_fc are SSHOperators that verify the expected IFS files exist on Lucia's GPFS filesystem:

# check_ifs_an
ssh frontal "test -f /gpfs/.../data/IFS/Analysis/\
  {{ get_date(-1,0) }}-ECMWF---AM0100-MEDATL-b{{ get_date(0,0) }}_an-fv13.00.nc"

# check_ifs_fc
ssh frontal "test -f /gpfs/.../data/IFS/Forecast/\
  {{ get_date(0,0) }}_{{ get_date(10,0) }}-ECMWF---AM0100-MEDATL-b{{ get_date(0,0) }}_fc00-fv13.00.nc"

test -f exits with code 0 if the file exists and code 1 if not. Airflow treats a non-zero exit code as task failure, which triggers a retry.

check_free verifies that no previous V2025_an Slurm jobs are already running on Lucia, preventing the IFS preparation from starting while a leftover job is consuming cluster resources:

ssh frontal "count=\$(squeue -u lvdbk | grep V2025_an | wc -l); echo Jobs found: \$count; exit \$count"

If count > 0 the exit code is non-zero, triggering a retry. This is an important guard: if the previous day's run got stuck in the queue, the new day's IFS preparation would otherwise start and overwrite the namelist files that the stuck job is using.

prepare_ifs runs the conversion script on Lucia:

ssh frontal "nohup /gpfs/.../nemo4.2.0/cfgs/BSFS_BIO/NRT_V2025/Ecmwf/prepare_ifs_for_nemo.sh"

This script reads the three IFS NetCDF files and produces the ecmwf_{date}.nc files that NEMO expects. The output file ecmwf_{{ get_date(10,2) }}.nc is later checked by model_lucia_run before submitting the Slurm job.


ifs_process_local (LOCAL)

Property Value
Schedule 15 08 * * * (08:15 UTC)
Retries 80 x 10 min
File LOCAL/process_ifs.py
check_ifs_an  ─┐
check_ifs_an00 ┼─>> prepare_ifs
check_ifs_fc  ─┘

Same three-file check pattern as the Lucia version, but run locally inside the Docker worker container using BashOperator:

# check_ifs_an
test -f /opt/airflow/marines_data/data/IFS/Analysis/\
  {{ get_date(-1,0) }}-ECMWF---AM0100-MEDATL-b{{ get_date(0,0) }}_an-fv13.00.nc

Files are in the locally mounted /opt/airflow/marines_data/data/IFS/ directory, populated by the LOCAL IFS download DAGs.

prepare_ifs runs the same conversion script but from inside the Docker container:

'/opt/airflow/marines_data/data/IFS/prepare_ifs_for_nemo.sh'

Note the quotes around the path: this is a BashOperator quirk where a trailing space or quote is needed to prevent Jinja2 from treating the path as a template. The output feeds the GCP pipeline's forcing preparation step.

No check_free in the LOCAL version

The local version does not check for a running Slurm job because it operates on the GCP pipeline, not on Lucia's Slurm cluster. The GCP job is managed differently.