Skip to main content


Showing posts from October, 2016

High Performance Computing: ParaFly

In my recent posts I shared some first hand experience of parallel computing using OpenMP.
While OpenMP is supported by many programming languages, there are still a few does not. So here I am sharing another approach to create a parallel computing job.

The utility I will use is called "ParaFly". There are some information you can read here.
Basically, ParaFly can be used to run a list of command simultaneously. This approach will be particularly useful for some types of jobs in which tasks are independent with each other (such as for loop) but take a long time to run.

In my case, I was using the IDL library to process a huge amount to spatial datasets. I will use this job as an example to show how it is done.

Organically, I have to call a routine:
PRO project48, extension\_file = ef, \$
    filename\_mapinfo, \$
    missing\_value, \$
    o\_pixel\_size, \$
    prefix\_in, $
    prefix\_out = po, \$
    workspace\_in, \$
    workspace\_out, \$
    year\_end, \$

High Performance Computing: OpenMP and Anusplin

This is a live example from the project I am working right now. Good luck if you are following!
In order to speed my Anusplin program, I decided to use OpenMP. Anusplin is a package for climate data preparation.

My program was written in C++ as usual. C++ itself is fast, but Anusplin program takes a long time to run for approximately 50K+ simulations. Previously I set up some system pause (0.1second) between each run. Let's see what OpenMP can do.

There are many guide of OpenMP online.
First you need to know what's the GCC version you get, such as
gcc -v
Using built-in specs.
Target: x86_64-unknown-linux-gnu
Configured with: /tmp/aai/gcc-5.2.0/configure --prefix=/apps/rhel6/gcc/5.2.0 --enable-languages=c,c++,fortran --enable-shared --enable-threads=posix --disable-checking --disable-multilib --with-system-zlib --enable-__cxa…

Ecosystem modeling: how to deal with the time-variant datasets?

Ecosystem modeling usually requires a variety of data to run a complete simulation. These data usually include both time-variant and time-invariant datasets. For example, we usually consider Digital Elevation Model (DEM) as time-invariant data because surface topography is relatively stable for a given period of time unless extreme events such as earthquake occur.

However, most other driving datasets are actually time-variant. For example, climate data (temperature, precipitation.etc.) are constantly changing at any given time.

To date, there is yet no accurate definition whether a data should be defined as time-variant or time-invariant. And we always have to make some assumptions to simplify our models.

There are several reasons behind this and how they may be improved in future study.
First, using time-variant data requires more data. For example, for an ecosystem model at daily time step, daily climate data are also required. While it may be relatively easy to retrieve climate dat…