Skip to main content

Another thought on Earth System Model challenge

I recent had several discussions with several friends on various topics. Such as whether improving one component with an Earth System Model (ESM) necessarily improve the overall performance.

In the end, I realized that the problem we were discussing is actually very interesting and yet challenge.

To put the question into another perspective, consider the following scenario: if you want to buy a computer, here are three options provided to you:
  1. Some low frequency CPU + 32GB RAM + unknown motherboard;
  2. Some high frequency CPU, i.e. i7 + 1GB RAM + unknown motherboard;
  3. Some average frequency CPU, i.e. i5 + 16 GB RAM + unknown motherboard.
Which one will you pick for daily use? 
Another aspect I also want to point out is that the "unknown motherboard", what if the motherboard does not even support the CPU or RAM frequency?

Let's switch gear back to ESM. Does ESM also have similar issue?
To run a ESM simulation, you have multiple choice of land model, ocean model and atmosphere model. Then you have choice of spatial and temporal resolution. The list goes on.

For each individual model, there are usually hundreds of components. If you take a look at the Community Land Model for example, a list of processes are simulated at depth. 

If you zoom out, you will see that some process is at millisecond level while the ESM is at hourly or monthly temporal resolution. Some process is very local process such as soil biogeochemistry while our simulation spatial resolution is 1.0 degree by 1.0 degree.

Isn't that similar to our struggle with the computer shopping?

Great efforts have been done trying to close this gap but there is still a long way to go. For example, we setup benchmark to study how different models perform under different scenarios. 

While the challenge is that we don't know the role of structure in model performance. A lot of time, we are trying to upgrade i7 to i8 but ignore that we only have 1GB RAM, needless to say that we seldom consider whether the motherboard is the issue. 

This concept is also well-known in management: A bucket can only fill with the volume of water the shortest plank allows. While our computer program can tell us which process consumes most of the computing time, it won't tell us which process is the one affecting the model performance.

With the whole community going forward nearly in parallel, ESM modeler should be aware of this challenge and should not blindly throw whatever progress into the system. Because in the end, it is the shortest one that slows us down.


Popular posts from this blog

Spatial datasets operations: mask raster using region of interest

Climate change related studies usually involve spatial datasets extraction from a larger domain.
In this article, I will briefly discuss some potential issues and solutions.

In the most common scenario, we need to extract a raster file using a polygon based shapefile. And I will focus as an example.

In a typical desktop application such as ArcMap or ENVI, this is usually done with a tool called clip or extract using mask or ROI.

Before any analysis can be done, it is the best practice to project all datasets into the same projection.

If you are lucky enough, you may find that the polygon you will use actually matches up with the raster grid perfectly. But it rarely happens unless you created the shapefile using "fishnet" or other approaches.

What if luck is not with you? The algorithm within these tool usually will make the best estimate of the value based on the location. The nearest re-sample, but not limited to, will be used to calculate the value. But what about the outp…

Numerical simulation: ode/pde solver and spin-up

For Earth Science model development, I inevitably have to deal with ODE and PDE equations. I also have come across some discussion related to this topic, i.e.,

In an attempt to answer this question, as well as redefine the problem I am dealing with, I decided to organize some materials to illustrate our current state on this topic.

Models are essentially equations. In Earth Science, these equations are usually ODE or PDE. So I want to discuss this from a mathematical perspective.

Ideally, we want to solve these ODE/PDE with initial condition (IC) and boundary condition (BC) using various numerical methods.

Because of the nature of geology, everything is similar to its neighbors. So we can construct a system of equations which may have multiple equation for each single grid cell. Now we have an array of equation…

Watershed Delineation On A Hexagonal Mesh Grid: Part A

One of our recent publications is "Watershed Delineation On A Hexagonal Mesh Grid" published on Environmental Modeling and Software (link).
Here I want to provide some behind the scene details of this study.

(The figures are high resolution, you might need to zoom in to view.)

First, I'd like to introduce the motivation of this work. Many of us including me have done lots of watershed/catchment hydrology modeling. For example, one of my recent publications is a three-dimensional carbon-water cycle modeling work (link), which uses lots of watershed hydrology algorithms.
In principle, watershed hydrology should be applied to large spatial domain, even global scale. But why no one is doing it?  I will use the popular USDA SWAT model as an example. Why no one is setting up a SWAT model globally? 
There are several reasons we cannot use SWAT at global scale: We cannot produce a global DEM with a desired map projection. SWAT model relies on stream network, which depends on DEM.…