Usage

Quickstart

The following code would write a run directory based on the contents of a yaml file:

import fv3config

with open("config.yml", "r") as f:
    config = fv3config.load(f)
fv3config.write_run_directory(config, './rundir')

config is a configuration dictionary which contains namelists, input data specifications, and other options, as described further below. It can be edited just like any dictionary. Namelists are specified as sub-dictionaries. An example C12 configuration dictionary is in the tests directory of this package.

A run directory based on a configuration can be written using fv3config.write_run_directory().

Shell Usage

This module installs a command line interface write_run_directory that can be used to write the run directory from a shell. For example, if the file config.yaml contains a yaml-encoded configuration dictionary

write_run_directory config.yaml rundir

will write an FV3 run directory to the path rundir.

Two additional command line interfaces are useful for modifying configuration dictionaries in order to use them for restart runs:

enable_restart config.yaml /path/to/initial/conditions

and to provision the necessary files required for a nudged run:

update_config_for_nudging config.yaml

Both of these shell commands will modify the given configuration dictionary in place.

This module also installs a command line interface fv3run, which is further detailed below.

Data Caching

fv3config can write files from local or remote locations. When remote locations are used, the package first downloads the data to a local cache directory.

If the FV3CONFIG_CACHE_DIR environment variable is set, the package will download and store data into $(FV3CONFIG_CACHE_DIR)/fv3config-cache. If unset, by default the package will use the “user cache” directory for the user’s operating system.

The download location can be retrieved using fv3config.get_cache_dir(), and set manually using fv3config.set_cache_dir(). Note that the “fv3config-cache” subdirectory will be appended to the cache directory you set. If the target is set to a directory that already contains the archive download, it will automatically start using those files. Conversely, if the target is set to an empty directory, it will be necessary to re-download the cache files.

It’s unlikely, but do not set the cache directory to a location that already contains a “fv3config-cache” subdirectory with unrelated files, or the cache files will not download until you call fv3config.refresh_downloaded_data() (which will delete any files in the subdirectory).

Automatic caching of remote files can be disabled using the fv3config.do_remote_caching() routine.

Configuration

The config dictionary must have at least the following items:

Key Type Description
namelist dict Model namelist
experiment_name str Name of experiment to use in output
diag_table str or DiagTable location of diag_table file, or one of (“default”, “grid_spec”, “no_output”), or DiagTable object
data_table str location of data_table file, or “default”
initial_conditions str location of directory containing initial conditions data
forcing str location of directory containing forcing data
orographic_forcing str location of directory containing orographic data

Paths to files or directories on the local filesystem must be given as absolute paths. If paths are given that begin with gs:// then fv3config will attempt to download the specified file or files from Google Cloud Storage. For this functionality, gcsfs must be installed and authorized to download from the specified bucket.

The namelist item is special in that it is explicitly stored in the config dictionary. For the fv3gfs model, individual namelists are specified for various components of the model. As an example, the vertical resolution can be accessed via config['namelist']['fv_core_nml']['npz'].

The diag_table can be either be a tag or path to a file, or it can explicitly represent the desired output diagnostics with a DiagTable object. See a more complete description of this object below.

By default, fv3config attempts to automatically select the field_table file to use for the model based on the selected microphysics scheme in the namelist. This supports Zhao-Carr or GFDL microphysics. If the user provides a field_table key indicating a filename in the configuration dictionary, that file will be used instead.

Note

The Han and Bretherton (2019) TKE-EDMF boundary layer scheme requires an additional tracer to be defined in the field_table for TKE. This scheme is currently not supported by default in fv3config; however for the time being one can supply a custom field_table for this purpose.

Some helper functions exist for editing and retrieving information from configuration dictionaries, like fv3config.get_run_duration() and fv3config.set_run_duration(). See the API Reference for more details.

Specifying individual files

More fine-grained control of the files that are written to the run-directory is possible using the “asset” representation of run-directory files. An asset is a dictionary that knows about one files’s source location/filename, target filename, target location within the run directory and whether that file is copied or linked. Asset dicts can be generated with the helper function fv3config.get_asset_dict(). For example:

>>> get_asset_dict('/path/to/filedir/', 'sample_file.nc', target_location='INPUT/')
{'source_location': '/path/to/filedir/',
'source_name': 'sample_file.nc',
'target_location': 'INPUT/',
'target_name': 'sample_file.nc',
'copy_method': 'copy'}

One can also add specify the asset as a python bytes object that will be written to the desired location using fv3config.get_bytes_asset_dict(). For example:

>>> get_bytes_asset_dict(b"hello_world", "hello.txt", target_location=".")

This is useful for storing small files in the configuration dictionary, without needing to deploy them to an external storage system.

One can set config['initial_conditions'] or config['forcing'] to a list of assets in order to specify every initial condition or forcing file individually.

One can use a directory to specify the initial conditions or forcing files and replace only a subset of the files within the that directory with the optional config['patch_files'] item. All assets defined in config['patch_files'] will overwrite any files specified in the initial conditions or forcing if they have the same target location and name.

DiagTable configuration

The diag_table specifies the diagnostics to be output by the Fortran model. See documentation for the string representation of the diag_table here. The fv3config package defines a python representation of this object, DiagTable, which can be used to explicitly represent the diag_table within an fv3config configuration dictionary.

The DiagTable object can be initialized from a dict (here serialized as YAML) as follows. Suppose the following is saved within sample_diag_table.yaml:

name: example_diag_table
base_time: 2000-01-01 00:00:00
file_configs:
- name: physics_diagnostics
  frequency: 1
  frequency_units: hours
  field_configs:
  - field_name: totprcpb_ave
    module_name: gfs_phys
    output_name: surface_precipitation_rate
  - field_name: ULWRFtoa
    module_name: gfs_phys
    output_name: upward_longwave_radiative_flux_at_toa

Then a DiagTable object can be initialized as:

>>> import fv3config
>>> import yaml
>>> with open("sample_diag_table.yaml") as f:
        diag_table_dict = yaml.safe_load(f)
>>> diag_table = fv3config.DiagTable.from_dict(diag_table_dict)
>>> print(diag_table)  # will output diag_table in format expected by Fortran model
example_diag_table
2000 1 1 0 0 0

"physics_diagnostics", 1, "hours", 1, "hours", "time"

"gfs_phys", "totprcpb_ave", "surface_precipitation_rate", "physics_diagnostics", "all", "none", "none", 2
"gfs_phys", "ULWRFtoa", "upward_longwave_radiative_flux_at_toa", "physics_diagnostics", "all", "none", "none", 2

The same DiagTable can also be initialized programmatically as follows:

>>> import fv3config
>>> import datetime
>>> diag_table = fv3config.DiagTable(
        name="example_diag_table",
        base_time=datetime.datetime(2000, 1, 1),
        file_configs=[
            fv3config.DiagFileConfig(
                name="physics_diagnostics",
                frequency=1,
                frequency_units="hours",
                field_configs=[
                    fv3config.DiagFieldConfig(
                        "gfs_phys",
                        "totprcb_ave",
                        "surface_precipitation_rate"
                    ),
                    fv3config.DiagFieldConfig(
                        "gfs_phys",
                        "ULWRFtoa",
                        "upward_longwave_radiative_flux_at_toa"
                    ),
                ]
            )
        ]
    )

String representations of the diag_table (i.e. those expected by the Fortran model) can be parsed with the fv3config.DiagTable.from_str() method.

If serializing an fv3config configuration object to yaml it is recommended to use fv3config.dump(). This method will convert any DiagTable instances to dicts (using fv3config.DiagTable.asdict()), which can be safely serialized.

Running the model with fv3run

fv3config provides a tool for running the python-wrapped model called fv3run. For example, you can run the default configuration using first:

$ docker pull us.gcr.io/vcm-ml/fv3gfs-python

to acquire the docker image for the python wrapper, followed by a call to fv3config.run_docker():

>>> import fv3config
>>> with open("config.yml", "r") as f:
>>>     config = fv3config.load(f)
>>> fv3config.run_docker(config, 'outdir', docker_image='us.gcr.io/vcm-ml/fv3gfs-python')

If the fv3gfs-python package is installed natively, the model could be run using fv3config.run_native():

>>> fv3config.run_native(config, 'outdir')

The python config can be passed as either a configuration dictionary, or the name of a yaml file. There is also a bash interface for running from yaml configuration.

$ fv3run --help
usage: fv3run [-h] [--runfile RUNFILE] [--dockerimage DOCKERIMAGE]
              [--keyfile KEYFILE]
              config outdir

Run the FV3GFS model. Will use google cloud storage key at
$GOOGLE_APPLICATION_CREDENTIALS by default.

positional arguments:
  config                location of fv3config yaml file
  outdir                location to copy final run directory, used as run
                        directory if local


optional arguments:
  -h, --help            show this help message and exit
  --runfile RUNFILE     location of python script to execute with mpirun
  --dockerimage DOCKERIMAGE
                        if passed, execute inside a docker image with the
                        given name
  --keyfile KEYFILE     google cloud storage key to use for cloud copy
                        commands
  --kubernetes          if given, ignore --keyfile and output a yaml
                        kubernetes config to stdout instead of submitting a
                        run

The only required inputs are config, which specifies a yaml file containing the fv3config run directory configuration, and a final location to copy the run directory. A keyfile can be specified to authenticate Google cloud storage for any data sources located in Google cloud buckets, or the key is taken from an environment variable by default. If dockerimage is specified, the command will run inside a Docker container based on the given image name. This assumes the fv3config package and fv3gfs python wrapper are installed inside the container, along with any dependencies.

The python interface is very similar to the command-line interface, but is split into separate functions based on where the model is being run.

Customizing the model execution

The runfile is the python script that will be executed by mpi, which typically imports the fv3gfs module, and then performs some time stepping. The default behavior is to use a pre-packaged runfile which reproduces the behavior of Fortran model identically. For additional, flexibility a custom runfile can be specified as an argument to all the run_ functions.

The environmental variable FV3CONFIG_DEFAULT_RUNFILE can be used to override the default runfile. If set, this variable should contain the path of the runfile.

Note

When using run_docker or run_kubernetes, the value of FV3CONFIG_DEFAULT_RUNFILE and the file it points to will be read inside the docker image where execution occurs. It will have no effect if set on the host system outside of the docker image.

Submitting a Kubernetes job

A python interface fv3config.run_kubernetes() is provided for submitting fv3run jobs to Kubernetes. Here’s an example for submitting a job based on a config dictionary stored in Google cloud storage:

import gcsfs
import fv3config

config_location = 'gs://my_bucket/fv3config.yml'
outdir = 'gs://my_bucket/rundir'
docker_image = 'us.gcr.io/vcm-ml/fv3gfs-python'

fv3config.run_kubernetes(
    config_location,
    outdir,
    docker_image,
    gcp_secret='gcp-key'  # replace with your kubernetes secret
                          # containing gcp key in key.json
)

The gcp key is generally necessary to gain permissions to read and write from google cloud storage buckets. In the unlikely case that you are writing to a public bucket, it can be ommitted.

From the command line, fv3run can be used to create a yaml file to submit for a kubernetes job. To create the file, add the --kubernetes flag to fv3run and pipe the result to a file. For example:

$ fv3run gs://bucket/config.yml gs://bucket/outdir –dockerimage us.gcr.io/vcm-ml/fv3gfs-python:latest –kubernetes > kubeconfig.yml

The resulting file can be submitted using

$ kubectl apply -f kubeconfig.yml

You can also modify the yaml file before submitting the job, for example to request more than one processor or a different amount of memory.

Restart runs

The required namelist settings for a restart run (as opposed to a run initialized from an observational analysis) can be applied to a configuration dictionary as follows:

config = enable_restart(config, initial_conditions)

Nudging

The fv3gfs model contains a module for nudging the state of the atmosphere towards GFS analysis. Two public functions are provided to ease the configuration of nudging runs.

Given the run duration and start date, fv3config.get_nudging_assets() returns a list of fv3config assets corresponding to the GFS analysis files required. Given an fv3config object, fv3config.update_config_for_nudging() will add the necessary assets and namelist options for a nudging run. This function requires that the fv3config object contains a gfs_analysis_data entry with corresponding url and filename_pattern items.