Skip to main content

High Performance Computing: ParaFly

In my recent posts I shared some first hand experience of parallel computing using OpenMP.
While OpenMP is supported by many programming languages, there are still a few does not. So here I am sharing another approach to create a parallel computing job.

The utility I will use is called "ParaFly". There are some information you can read here.
Basically, ParaFly can be used to run a list of command simultaneously. This approach will be particularly useful for some types of jobs in which tasks are independent with each other (such as for loop) but take a long time to run.

In my case, I was using the IDL library to process a huge amount to spatial datasets. I will use this job as an example to show how it is done.

Organically, I have to call a routine:
PRO project48, extension\_file = ef, \$
    filename\_mapinfo, \$
    missing\_value, \$
    o\_pixel\_size, \$
    prefix\_in, $
    prefix\_out = po, \$
    workspace\_in, \$
    workspace\_out, \$
    year\_end, \$

to re-project a list of raster image files from 1980 to 2015.
IDL is known for inefficient in for loop just like MATLAB.

And it is not easy to parallel IDL on a Linux HPC. You can use C/C++ to call IDL with OpenMP enabled if possible, but then you have to write some additional program to do the work. You can also use GDAL library to write a projection function then you can use OpenMP freely, but that certainly requires some efforts.

However, with ParaFly, we can do the work within a few minutes if you are lucky.
First, we need to add an additional routine to call the above routine, but this new routine should accept one parameter, which is time(year), because all the other will remain the same.
Such as:
PRO project48_tmax, year
  year_start = year
  year_end = year
  ;;some other lines are remove here
  project48, extension_file = file_extension, $
    filename_mapinfo, $
    missing_value, $
    o_pixel_size, $
    prefix_in, $
    workspace_in, $
    workspace_out, $
  PRINT, 'Finished!'

Then the routine is wrapped and ready for ParaFly.

You may want to write another wrapper to generate the ParaFly file, such as:
PRO prepare_parafly_files      
  year_start = 1980                    
  year_end = 2015
  ;;some other lines are remove here
  FOR year = year_start, year_end, 1 DO BEGIN
     year_str = STRING(year, format = '(i04)')    
     str = 'idl -e '+ '"' +'project48_tmax, ' $  
           + year_str + '"'                  
     PRINTF, lun, str                    
  FREE_LUN, lun                          

Then work should be done.
Theoretically, you can request the whole node with all cores in one job, and then gain speed the number of core faster. For example, if I request 10 cores, then the program will improve to 10 time faster.


Popular posts from this blog

Spatial datasets operations: mask raster using region of interest

Climate change related studies usually involve spatial datasets extraction from a larger domain.
In this article, I will briefly discuss some potential issues and solutions.

In the most common scenario, we need to extract a raster file using a polygon based shapefile. And I will focus as an example.

In a typical desktop application such as ArcMap or ENVI, this is usually done with a tool called clip or extract using mask or ROI.

Before any analysis can be done, it is the best practice to project all datasets into the same projection.

If you are lucky enough, you may find that the polygon you will use actually matches up with the raster grid perfectly. But it rarely happens unless you created the shapefile using "fishnet" or other approaches.

What if luck is not with you? The algorithm within these tool usually will make the best estimate of the value based on the location. The nearest re-sample, but not limited to, will be used to calculate the value. But what about the outp…

Numerical simulation: ode/pde solver and spin-up

For Earth Science model development, I inevitably have to deal with ODE and PDE equations. I also have come across some discussion related to this topic, i.e.,

In an attempt to answer this question, as well as redefine the problem I am dealing with, I decided to organize some materials to illustrate our current state on this topic.

Models are essentially equations. In Earth Science, these equations are usually ODE or PDE. So I want to discuss this from a mathematical perspective.

Ideally, we want to solve these ODE/PDE with initial condition (IC) and boundary condition (BC) using various numerical methods.

Because of the nature of geology, everything is similar to its neighbors. So we can construct a system of equations which may have multiple equation for each single grid cell. Now we have an array of equation…

Watershed Delineation On A Hexagonal Mesh Grid: Part A

One of our recent publications is "Watershed Delineation On A Hexagonal Mesh Grid" published on Environmental Modeling and Software (link).
Here I want to provide some behind the scene details of this study.

(The figures are high resolution, you might need to zoom in to view.)

First, I'd like to introduce the motivation of this work. Many of us including me have done lots of watershed/catchment hydrology modeling. For example, one of my recent publications is a three-dimensional carbon-water cycle modeling work (link), which uses lots of watershed hydrology algorithms.
In principle, watershed hydrology should be applied to large spatial domain, even global scale. But why no one is doing it?  I will use the popular USDA SWAT model as an example. Why no one is setting up a SWAT model globally? 
There are several reasons we cannot use SWAT at global scale: We cannot produce a global DEM with a desired map projection. SWAT model relies on stream network, which depends on DEM.…