Skip to main content


Showing posts from 2016

High Performance Computing: ParaFly

In my recent posts I shared some first hand experience of parallel computing using OpenMP.
While OpenMP is supported by many programming languages, there are still a few does not. So here I am sharing another approach to create a parallel computing job.

The utility I will use is called "ParaFly". There are some information you can read here.
Basically, ParaFly can be used to run a list of command simultaneously. This approach will be particularly useful for some types of jobs in which tasks are independent with each other (such as for loop) but take a long time to run.

In my case, I was using the IDL library to process a huge amount to spatial datasets. I will use this job as an example to show how it is done.

Organically, I have to call a routine:
PRO project48, extension\_file = ef, \$
    filename\_mapinfo, \$
    missing\_value, \$
    o\_pixel\_size, \$
    prefix\_in, $
    prefix\_out = po, \$
    workspace\_in, \$
    workspace\_out, \$
    year\_end, \$

High Performance Computing: OpenMP and Anusplin

This is a live example from the project I am working right now. Good luck if you are following!
In order to speed my Anusplin program, I decided to use OpenMP. Anusplin is a package for climate data preparation.

My program was written in C++ as usual. C++ itself is fast, but Anusplin program takes a long time to run for approximately 50K+ simulations. Previously I set up some system pause (0.1second) between each run. Let's see what OpenMP can do.

There are many guide of OpenMP online.
First you need to know what's the GCC version you get, such as
gcc -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: /tmp/aai/gcc-5.2.0/configure --prefix=/apps/rhel6/gcc/5.2.0 --enable-languages=c,c++,fortran --enable-shared --enable-threads=posix --disable-checking --disable-multilib --with-system-zlib --enable-__cxa…

Ecosystem modeling: how to deal with the time-variant datasets?

Ecosystem modeling usually requires a variety of data to run a complete simulation. These data usually include both time-variant and time-invariant datasets. For example, we usually consider Digital Elevation Model (DEM) as time-invariant data because surface topography is relatively stable for a given period of time unless extreme events such as earthquake occur.

However, most other driving datasets are actually time-variant. For example, climate data (temperature, precipitation.etc.) are constantly changing at any given time.

To date, there is yet no accurate definition whether a data should be defined as time-variant or time-invariant. And we always have to make some assumptions to simplify our models.

There are several reasons behind this and how they may be improved in future study.
First, using time-variant data requires more data. For example, for an ecosystem model at daily time step, daily climate data are also required. While it may be relatively easy to retrieve climate dat…

High Performance Computing: A library of utilities

A brief look into all the advanced modules we can do on a typical HPC cluster.
I will gradually introduce what I have been using over the time to advance my research.

--------------------------- /opt/modules/modulefiles ---------------------------
   anaconda/2.0.1-py27                                 (D)
   ansys/16.2                                          (D)
   ase/                                      (D)
   atk/13.8.0                                          (D)
   atlas/3.11.30_gcc-4.7.2                             (D)
   autodesk/2015                                       (D)

IDL programming: How to design a simple program

For scientists and engineers, our scripts/programs are usually very straightforward and simple. Even some of us are modelers, our models should be complex but the source code are still manageable.
However, the re-usability of these simple programs are still necessary because a lot of our work must be done over and over again. For example, we need to plot/map some data in various scenarios. And we wants the plot/map program can be used for various types of inputs.

Specially, for IDL programming, there are a few common blocks that should be considered in a program.

Compiler option; this step is usually just for compiling setup, you can also specify it the start-up file.Error control; some of the most important error information;Global variable; this step should be used to define the global variable, such as the extension of a file.Keyword check;Local variable; this step should be project variable, such as data range.Workspace; workspace usually refer to the directory all operations will …

Spatial datasets operations: When the sky is failing

Not all raster can be re-projected to another projection, at least some tools don't allow us to.

For example, the AMSR-E/Aqua L3 Global Snow Water Equivalent EASE-Grids cannot be directly re-projected to UTM-6N within ENVI 5.1.
Please refer this website for projection information.

 Fortunately, ArcMap can do a better job in this scenario.
You can use the project tool just like any other operations.

In order to re-project lot of data using ArcObject, I have updated a ArcTool, which is basically a batch mode data operation toolkit. For this task, the screenshot is as follow:

Below is the result:

Beneath the hook is the GP tool, some sample scripts are as follow: ///================================================ public static object ProjectRaster(string sFileIn, string sFileOut, string sNewProject, double dCellSize)         {             ESRI.ArcGIS.Geoprocessor.Geoprocessor gp = new ESRI.ArcGIS.Geoprocessor.Geoprocessor(); …

Spatial datasets operations: From MODIS to model ready binary workflow

I have written a couple of articles regarding the potential issues during raster operations. Here I have put down some typical workflow for these operations.
For study area smaller than one MODIS granule, following these steps:

The common method that will work for you: Convert one HDF file to GeoTIFF using ArcMap, save this file as A.tif.Convert all HDF files to GeoTIFF using IDL and A.tif, call routine "hdf2tiff". In this step, you should set all unwanted pixel to missing value, such as -9999.Project one GeoTIFF A.tif to B.dat using ArcMap;Project all GeoTIFF to ENVI files using B.dat, call routine "project48";Define the missing value for all the ENVI files from step 4, call routine "define_envi_missing_value";Prepare the shapefile mask for the study area, C.shp using Arcmap, C should has the same projection as the target projection;Extract all ENVI files using the C.shp shapefile , call routine "extract50".(Updated January 15 2017). 
Since in s…

Spatial datasets operations: From MODIS HDF to Binary file

MODIS products are widely used in Earth Science, here I present some typical demos and tryouts to deal with MODIS datasets.
All the MODIS products datasets are distributed in HDF format. To get familiar with HDF, you can go to: or

MODIS land surface products usually have a different map projection than most of our projects. Here are some information of this projection: Therefore, a re-project is needed to convert HDF datasets to our projects.
There are quite a few tools available to operate on HDF files, but most of them are not efficient enough. You have to repeat the same set of operations. Therefore, using IDL or ArcObject based script/program is desirable.
The first step is to extract the datasets from HDF, and we also need to define the projection as exactly the same with HDF. Below is a very simple way to do so. You can use ArcMap to open one sin…

Groundwater hydrology modeling: the relation among stream, groundwater and surface runoff

Stream discharge is an important component in hydrology. Yet stream discharge is commonly studied in surface hydrology. In recent decades, it has been recognized that groundwater also interact with stream water throughout the year.

In groundwater modeling, stream routing is explicitly modeled as a groundwater boundary condition.
In general, several data are required for a successful simulation. Here I will only discuss a little bit related to water itself, to say, what kind of sink or source of the water in stream routing should be considered.

First, a stream segment/reach may received upstream discharge.

Second, a stream can also receive direct precipitation from the sky. But this term is usually omitted some the total amount may not be significant.

Third, a stream may also lose water from evaporation. Similar to direct precipitation, this term is usually omitted as well.

Forth, stream also receive later flow, it can be surface runoff and subsurface runoff (soil zone inter-flow). Man…

Surface water hydrology: build the relationship between river gage height and discharge

Both gage height and discharge rate are important factors in surface water hydrology.
For most of the time they are closely coupled.
When one of them is missing, we can always estimate the other using some relation algorithms.
To date, there a few commonly used relation algorithms. For example:

Here are more materials:

Based on these relations, a simple test was run to illustrate and estimate the discharge based on gage height.

Figure 1. Gage height-discharge relation curve. The blue crosses are observed gage height-discharge data. The red crosses are calculated after "parameter estimation" using curve fitting method.

Numerical simulation: unit system

Unit is always one of the most important factors in science. Most of the time, a number without unit is useless.
For numerical simulation, a good design of unit system can be critical.
Recently, I have written a short script/program to prepare climate data for my groundwater/surface water hydrology simulation. I downloaded some data from the National Climatic Data Center(NCDC) separately, and I did NOT notice the changes in unit system. In the end, I have to do everything over again.

So the question is: what kind of unit system should we use to improve our efficiency, and how do we actually implement it?

First of all, we need to identify the unit system used in the data obtained from third party. This is usually done through reading the meta data in the documentation. Never directly use the data without reading the documentation!
A lot of time we don't usually have choices of the unit system provided. In this case, the best practice might be stick with the original unit system.


Groundwater hydrology modeling: Why it is so difficult to simulate the groundwater dynamics in mountain area

First, in mountain area, there is generally less groundwater observations, which makes it's difficult to define the boundary condition. Without observations, the initial head can only be assumed or estimated using a steady state simulation.
Second, groundwater recharge in mountain area is very low. This is because some mountain surface has no soil or a shallow soil layer. And the permeability of rock fractures is very low. As a result, infiltration from the surface/subsurface to groundwater systems is extremely low.
Third, the hydraulic priorities of the fractured and unfractured rock material are also very low, which means the groundwater system is very sensitive.
Beside, topography effects from mountain area usually require high spatial resolution grid cell in the numerical simulation.

Ecosystem modeling: challenges in water-carbon cycling in cold region numerical simulation

Numerical simulation of water and carbon cycling in cold region usually encounter some unique challenges which must be taken into consideration in climate change study.

First, the presence of snow, glacier and permafrost has direct influence on the water cycling. Snow hydrology has been long studied and I try to avoid discussing it here. Glacier dynamics have also drawn a lot of attentions in recent decades due to the warming climate.
Permafrost also plays an important role, which has not yet been fully understood and investigated. Permafrost and its upper soil layer (Active Layer) have seasonal dynamics in water/ice content and temperature. As temperature continues to increase, the thawing permafrost also releases the ice to the surface layer, which means the active layer thickness is expected to increase in the near future.
Other than that, unlike that in temperate regions, permafrost acts as a confining layer between surface water and groundwater, which means the groundwater and su…

Ecosystem modeling: a question of chicken or eggs?

Ecosystem modeling generally include three conceptual components: inputs, algorithm and outputs.
For example, we usually need precipitation to estimate surface runoff.
On the other hand, sometimes some outputs are also considered as inputs for other models. For example, vegetation dynamics also change the surface albedo and therefore the incoming radiation.
In another word, the feedback among different processes are often too complex that we generally have to ignore some feedback processes.

The reason is that our computer simulation MUST have a starting point and a sequence of algorithms in order to run the simulation.
For some well-designed models, the differences between different starting points are not significant when using the numerical approach. However, it is a best practice to put all the algorithms in the right orders.

For example, in surface hydrology, the generally order of the water flow is like this: precipitation->interception->snow->infiltration->overland r…

Ecosystem modeling: creating hierarchical class dependency maps

Having a visual class dependency map could significantly help you to better understand the structure of a complex ecosystem model. For example, a group in University of Tennessee published a web-based structure tree of the Community Land Model.

From the perspective of model development, a dependency map also increase your productivity and performance.

There are many approach to produce such a map. For example, if you are using Visual Studio IDE in Windows environment, you can try the built-in Architecturetool. And it will generate beautiful maps.

If you prefer to use open source alternatives, you can try Doxygen. It may be already installed in many Linux distributions. You can also use the Window version.

Below is an example my current ecosystem model structure:

Doxygen also has lots of other features you can try out.

Ecosystem modeling: a look up table for system state variables

A list of variables has been established to describe the ecosystem system state. Specially, these variables are from different process-based algorithms.

ModuleVariableDescriptionRadiationShortwaveIncoming shortwave radiationPARPhotosynthetically active radiationEvapotranspirationPotential ETTotal potential evapotranspirationInterceptioncanopy depthDepth of rain and snow in canopycanopy ETET from the canopyNet rainNet rain from canopyNet snowNet snow from canopySnowSCASnow cover areaSnow albedoSnow albedoSnow ETSnow sublimationSnowmeltSWESnowpack water equivalentSWE temperatureTemperature of the snowpackInfiltrationInfiltrationSurface runoffSurface runoffSurface runoff upslopeSurface runoff from upslopeSoilDunnian runoffDunnian runoff upslopeInterflowInterflow upslopeSoil moistureSoil to groundwaterGroundwaterGroundwater inflowGroundwater outflowGroundwater upslope

Surface water hydrology modeling: scaling issue in Precipitation Runoff Modeling System(PRMS)

Grid-based HRU network for stream routing in PRMS could be convenient for several reasons.
Apparently the ideal data structure as matrix could be one of them.But it is not always the case if the scaling issue is not well handled.
Below is an example how the scale issue could be a potential problem for the simulation framework.
First, I would like to introduce the CRT. Cascade Routing Tool (CRT) is a computer application for watershed models that include the coupled Groundwater and Surface-water FLOW model GSFLOW and the Precipitation-Runoff Modeling System (PRMS). More information could be found at (
The example run result is listed as:
Read the instruction of CRT and you will easily understand the legend/label of the above result.The stream line datasets used here is vector and the grid resolutions in horizontal are both 100 meters.
The next step, the subbasin was delineated using stream line and DEM datasets. Note that in these operations, the resoluti…

Groundwater hydrology modeling: relation between steady state and transient simulation in MODFLOW simulation

Standard groundwater modeling usually uses the three-dimensional Richard's equation to describe the groundwater flow. Solution to the Richard equation usually requires numerical methods such as finite-difference equation used in USGS MODFLOW.
To solute the array of equations within MODFLOW, an initial head is required regardless of steady state (SS) or transient (TR) simulation.
As the MODFLOW manual states that in a steady state simulation, the storage terms are ignored since storage of each grid cell is constant with time.
One of the common problems to run the MODFLOW is  that we don't have the initial head data for the simulation. Therefore it is usually suggested that a SS simulation could be run at first to provide the head for the TR simulation. This is because SS is independent with initial head, but TR isn't.
However, whether placing the SS simulation ahead of TR simulations is doubtful in many practices! Here are the explanations:
First, the SS flow equation shoul…

Surface water hydrology modeling: deal with different types of precipitation

In surface water hydrology, precipitation is one of the most important components.
However, within most hydrology models, it is unclear of how precipitation is actually represented.
For example, in a typical water cycle illustration from Wiki, precipitation is described as
Here is the question, what form does precipitation actually take when it falls to land surface? Water can be in either liquid (water, rain), solid (ice, snow) or gas (water vapor) forms. How about precipitation? Surely most of time precipitation is either rain of snow. In some cases, it takes form in hail, etc.
Since the physical proprieties of water and snow are significantly different, it is necessary to distinguish them within surface water hydrology models. In some scenarios, rain and snow may co-exist in a mixed precipitation event. In this case, surface water hydrology models need to deal with both of them. The difficulty is how to manage the two-phase mass and energy balance. A complete comparison of how diffe…

Scientific writing: how to prepare a good image figure

"A picture is worth a thousand words", from Wikipedia, unveils the importance of figures in writing.
There are many types of figures or pictures, and in the field of Earth Science, image figure is one of the most important types.
By image figure, I mean a figure composed of matrix, such as
Link of the image.

(A selfie is also an image, but it is not commonly seen in scientific writing. Although the above figure is taken from the website, it is publication-ready in my opinion.)

So what makes a good  image figure? The figure above actually gives a good example, and we can see what are the principle elements in this figure.
A clear and concise titleAn appropriate map projectionA beautiful color bar and data presentationAdequate description (data source, etc.) Even without supplementary material, I believe most readers can understand this figure without difficulties.

So where are the challenges?
Usually we can easily prepare the title, projection and description with care. The ch…

IDL programming: control variables in IDL across routines

IDL, unlike C++, has its own approach and principle when handling variables across routines. These features are often very different with the variable scope within C++.
I will try to answer several key questions and prove them using simple demos.
Question 1: Will the value of a variable be changed after it’s passed into a function and changed inside?
PRO variable_change
PRINT,'Before the operation, a is ',a
PRINT,'After the operation, a is ',a
FUNCTION plus,a,b
And the IDL Console output:
Before the operation, a is 1.00000
After the operation, a is 3.00000
If the value of a variable has been changed inside a routine, it will remains changed outside. Question 2: What if this variable is not even passed into any routine, but the routine has a variable wh…

Integrated groundwater and surface water modeling: the appropriate way to prepare the best DEM

One the most important inputs for groundwater and surface water hydrology modeling is the Digital Elevation Model (DEM).

Though DEM can be obtained through many approaches, the final DEM varies significantly. Therefore, DEM can seldom  be directly used in most modeling work. Instead, we often need to adjust the DEM so that it can meet the requirements of the hydrology modeling.

The adjustment, however, often involves series of operations. These operations can be carried out using different approaches in different orders. The result could be quite different. The important part is that most of them are unusable. So the question is, what is most promising way to prepare the DEM?

Here I have provided an example work flow, which may be suitable for most tasks. Then I will explain in details the purpose of this step and what tools we can make use of. In the end I will discuss why it should be done in this order.
Download the DEM datasets of the same format for the study area;Mosaic these DEM…