Skip to content

Deployment and Configuration


Host layout

Everything lives under /home/airflow on the marines server host:

/home/airflow/
├── Dockerfile              # Custom image definition
├── docker-compose.yml      # Stack definition
├── .env                    # Runtime secrets
├── dags/                   # DAG Python files (mounted into containers)
│   ├── LUCIA/              # DAGs that operate on Lucia HPC via SSH
│   ├── LOCAL/              # DAGs that run locally in Docker
│   └── GoogleCloud/        # DAGs that use GCP
├── plugins/                # Custom Airflow plugins (currently empty)
├── logs/                   # Task logs (purged every month)

Volume mounts

The x-airflow-common anchor in docker-compose.yml mounts these directories into every Airflow container:

Host path Container path Purpose
./dags /opt/airflow/dags DAG files; Airflow scans this directory continuously
./logs /opt/airflow/logs Task execution logs
./config /opt/airflow/config Airflow config overrides
./plugins /opt/airflow/plugins Custom operators and hooks
/home /opt/airflow/users_home/ Host user home directories (for SSH keys used by shell scripts)
/mnt/md0 /opt/airflow/marines_data Model data disk: inputs, outputs, forcings

The data disk (/mnt/md0)

/mnt/md0 is the primary data disk on the marines server. Inside containers it appears as /opt/airflow/marines_data. All DAG paths that reference model data use the container path. The main subdirectories:

/opt/airflow/marines_data/
├── data/
│   ├── IFS/
│   │   ├── Analysis/       # ECMWF IFS analysis files (downloaded locally)
│   │   └── Forecast/       # ECMWF IFS forecast files (downloaded locally)
│   ├── CAMS/               # CAMS atmospheric deposition files
│   ├── RIVERS/
│   │   ├── EFAS/NRT/       # EFAS river discharge data
│   │   └── NIHWM/          # NIHWM river data
├── backupPU/               # Working directory for the GCP backup model run
│   └── forcing/            # Forcing files prepared for the GCP model
├── NRT_V2025/
│   └── out/                # Model output from Lucia (transferred by postprocess DAG)
└── bin/                    # Utilities (REBUILD tool, etc.)

Host user home directories

The /home directory is mounted so that shell scripts and SSH operators can access SSH keys stored in user home directories on the host. For example, the download_MAR_local DAG uses:

ssh -i /opt/airflow/users_home/luc/.ssh/id_rsa lvdbulcke@139.165.67.19

Environment configuration

.env file

The .env file at /home/airflow/.env provides secrets that the compose file reads at startup. Do not commit!

example.env
AIRFLOW_PROJ_DIR=.
AIRFLOW_UID=50000
_AIRFLOW_WWW_USER_USERNAME=airflow
_AIRFLOW_WWW_USER_PASSWORD=<admin-ui-password>
FERNET_KEY="<base64-encoded-32-byte-key>"
Variable Purpose
AIRFLOW_PROJ_DIR Base directory for dags/, logs/ etc. relative to compose file. Set to .
AIRFLOW_UID User ID used by airflow-init when creating directories. Ignored in practice because containers run as root
_AIRFLOW_WWW_USER_USERNAME Username for the Airflow web UI admin account
_AIRFLOW_WWW_USER_PASSWORD Password for the web UI admin account
FERNET_KEY 32-byte URL-safe base64 key used to encrypt Connection passwords stored in Postgres. If you lose this key, all stored Connection passwords become unreadable

Secrets: Airflow Variables and Connections

Sensitive credentials are never hardcoded in DAG files. They are stored in one of two places, both backed by the Postgres metadata database.

Airflow Variables

Key-value pairs used by DAG Python code at runtime:

from airflow.models import Variable

username = Variable.get("copernicus_username")
password = Variable.get("copernicus_password")

Variables currently in use:

Variable name Used by Purpose
copernicus_username count_files_MDS Copernicus Marine Service username
copernicus_password count_files_MDS Copernicus Marine Service password
submitted_job model_lucia_run writes; model_lucia_postprocess reads Slurm job ID of the most recently submitted model run
last_processing_date model_lucia_run writes Logical date of the last model run

Variables are managed through the Airflow UI under Admin > Variables or with the CLI:

docker exec -it airflow-airflow-scheduler-1 airflow variables set copernicus_username "myuser"

Airflow Connections

Connection objects store host, port, login, and password for external systems. The password is encrypted with the Fernet key. Connections are accessed via a conn_id string:

from airflow.hooks.base import BaseHook

conn = BaseHook.get_connection("lucia_gateway_lev")
# conn.host, conn.login, conn.password are decrypted automatically

Connections currently in use:

Connection ID Type Used by Purpose
lucia_gateway_luc SSH LUCIA DAGs, model_lucia_postprocess SSH gateway into the Lucia HPC cluster
sftp_CMCC SFTP download_local_ifs_analysis, download_local_ifs_an00, download_local_ifs_fc Primary SFTP server at CMCC for IFS files
sftp_CMCC_backup SFTP download_local_ifs_fc Backup SFTP (AWS) for IFS forecast files
seamod SFTP download_local_ifs_analysis, download_local_ifs_an00, download_local_ifs_fc Secondary server where downloaded IFS files are backed up

Connections are managed under Admin > Connections in the UI.


Log retention

Logs are kept for 30 days (AIRFLOW__LOG_RETENTION_DAYS=30). The scheduler runs a log cleanup job automatically. Logs are written to ./logs/ on the host and are readable without entering a container:

ls /home/airflow/logs/dag_id=compress_results/