Deployment and Configuration¶
Host layout¶
Everything lives under /home/airflow on the marines server host:
/home/airflow/
├── Dockerfile # Custom image definition
├── docker-compose.yml # Stack definition
├── .env # Runtime secrets
├── dags/ # DAG Python files (mounted into containers)
│ ├── LUCIA/ # DAGs that operate on Lucia HPC via SSH
│ ├── LOCAL/ # DAGs that run locally in Docker
│ └── GoogleCloud/ # DAGs that use GCP
├── plugins/ # Custom Airflow plugins (currently empty)
├── logs/ # Task logs (purged every month)
Volume mounts¶
The x-airflow-common anchor in docker-compose.yml mounts these directories into every Airflow container:
| Host path | Container path | Purpose |
|---|---|---|
./dags |
/opt/airflow/dags |
DAG files; Airflow scans this directory continuously |
./logs |
/opt/airflow/logs |
Task execution logs |
./config |
/opt/airflow/config |
Airflow config overrides |
./plugins |
/opt/airflow/plugins |
Custom operators and hooks |
/home |
/opt/airflow/users_home/ |
Host user home directories (for SSH keys used by shell scripts) |
/mnt/md0 |
/opt/airflow/marines_data |
Model data disk: inputs, outputs, forcings |
The data disk (/mnt/md0)¶
/mnt/md0 is the primary data disk on the marines server. Inside containers it appears as /opt/airflow/marines_data. All DAG paths that reference model data use the container path. The main subdirectories:
/opt/airflow/marines_data/
├── data/
│ ├── IFS/
│ │ ├── Analysis/ # ECMWF IFS analysis files (downloaded locally)
│ │ └── Forecast/ # ECMWF IFS forecast files (downloaded locally)
│ ├── CAMS/ # CAMS atmospheric deposition files
│ ├── RIVERS/
│ │ ├── EFAS/NRT/ # EFAS river discharge data
│ │ └── NIHWM/ # NIHWM river data
├── backupPU/ # Working directory for the GCP backup model run
│ └── forcing/ # Forcing files prepared for the GCP model
├── NRT_V2025/
│ └── out/ # Model output from Lucia (transferred by postprocess DAG)
└── bin/ # Utilities (REBUILD tool, etc.)
Host user home directories¶
The /home directory is mounted so that shell scripts and SSH operators can access SSH keys stored in user home directories on the host. For example, the download_MAR_local DAG uses:
Environment configuration¶
.env file¶
The .env file at /home/airflow/.env provides secrets that the compose file reads at startup. Do not commit!
AIRFLOW_PROJ_DIR=.
AIRFLOW_UID=50000
_AIRFLOW_WWW_USER_USERNAME=airflow
_AIRFLOW_WWW_USER_PASSWORD=<admin-ui-password>
FERNET_KEY="<base64-encoded-32-byte-key>"
| Variable | Purpose |
|---|---|
AIRFLOW_PROJ_DIR |
Base directory for dags/, logs/ etc. relative to compose file. Set to . |
AIRFLOW_UID |
User ID used by airflow-init when creating directories. Ignored in practice because containers run as root |
_AIRFLOW_WWW_USER_USERNAME |
Username for the Airflow web UI admin account |
_AIRFLOW_WWW_USER_PASSWORD |
Password for the web UI admin account |
FERNET_KEY |
32-byte URL-safe base64 key used to encrypt Connection passwords stored in Postgres. If you lose this key, all stored Connection passwords become unreadable |
Secrets: Airflow Variables and Connections¶
Sensitive credentials are never hardcoded in DAG files. They are stored in one of two places, both backed by the Postgres metadata database.
Airflow Variables¶
Key-value pairs used by DAG Python code at runtime:
from airflow.models import Variable
username = Variable.get("copernicus_username")
password = Variable.get("copernicus_password")
Variables currently in use:
| Variable name | Used by | Purpose |
|---|---|---|
copernicus_username |
count_files_MDS |
Copernicus Marine Service username |
copernicus_password |
count_files_MDS |
Copernicus Marine Service password |
submitted_job |
model_lucia_run writes; model_lucia_postprocess reads |
Slurm job ID of the most recently submitted model run |
last_processing_date |
model_lucia_run writes |
Logical date of the last model run |
Variables are managed through the Airflow UI under Admin > Variables or with the CLI:
Airflow Connections¶
Connection objects store host, port, login, and password for external systems. The password is encrypted with the Fernet key. Connections are accessed via a conn_id string:
from airflow.hooks.base import BaseHook
conn = BaseHook.get_connection("lucia_gateway_lev")
# conn.host, conn.login, conn.password are decrypted automatically
Connections currently in use:
| Connection ID | Type | Used by | Purpose |
|---|---|---|---|
lucia_gateway_luc |
SSH | LUCIA DAGs, model_lucia_postprocess |
SSH gateway into the Lucia HPC cluster |
sftp_CMCC |
SFTP | download_local_ifs_analysis, download_local_ifs_an00, download_local_ifs_fc |
Primary SFTP server at CMCC for IFS files |
sftp_CMCC_backup |
SFTP | download_local_ifs_fc |
Backup SFTP (AWS) for IFS forecast files |
seamod |
SFTP | download_local_ifs_analysis, download_local_ifs_an00, download_local_ifs_fc |
Secondary server where downloaded IFS files are backed up |
Connections are managed under Admin > Connections in the UI.
Log retention¶
Logs are kept for 30 days (AIRFLOW__LOG_RETENTION_DAYS=30). The scheduler runs a log cleanup job automatically. Logs are written to ./logs/ on the host and are readable without entering a container: