Skip to main content

Spatial datasets operations: mask raster using region of interest

Climate change related studies usually involve spatial datasets extraction from a larger domain.
In this article, I will briefly discuss some potential issues and solutions.

In the most common scenario, we need to extract a raster file using a polygon based shapefile. And I will focus as an example.

In a typical desktop application such as ArcMap or ENVI, this is usually done with a tool called clip or extract using mask or ROI.

Before any analysis can be done, it is the best practice to project all datasets into the same projection.

If you are lucky enough, you may find that the polygon you will use actually matches up with the raster grid perfectly. But it rarely happens unless you created the shapefile using "fishnet" or other approaches.

What if luck is not with you? The algorithm within these tool usually will make the best estimate of the value based on the location. The nearest re-sample, but not limited to, will be used to calculate the value. But what about the output location of the new data. Since the output will also be a raster with the same resolution, it the best solution that the output raster can match up with input raster perfectly.

Another issue is the efficiency because we usually have more than one raster need to be extracted from. Apparently this could be done using programming. Since ArcGIS doesn't support quite well on Linux, it is very naturally we use IDL/ENVI to achieve the task. In ENVI, the concept of ROI is used to extract raster. However there is another issue, ROI and the input raster have a one to one relation, which means that you can't use one single ROI to extract all the raster. The good news is that we can dynamically create ROI from shapefile using APIs.

There are some articles online stating that you can open a shapefile directly and store it as the FID, which can be used as the mask. It is NOT going to work. Instead, you need to create a shapefile object and retrieve all the boundary locations, then you can get all the values needed.The last step will be save the data with spatial reference.

Some incomplete demo code using IDL is listed here:

  1.            ;;Read shapefile
  2.            oshp = OBJ_NEW('IDLffshape', shapefile_in)
  3.            oshp -> GetProperty, n_entities = n_ent, Attribute_info = attr_info, $
  4.                                 n_attributes = n_attr, Entity_type = ent_type
  5.            roi_shp = LONARR(n_ent)
  6.            FOR ishp = 0, n_ent - 1 DO BEGIN
  7.               entitie = oshp -> GetEntity(ishp)
  8.               ;;Check polygon
  9.               IF entitie.SHAPE_TYPE EQ 5 THEN BEGIN
  10.                  record = *(entitie.VERTICES)
  11.                  ;;Convert coordinates
  12.                  ENVI_CONVERT_FILE_COORDINATES, fid_in, xmap, ymap, record[0, *], record[1, *]
  13.                  ;;Create ROI
  14.                  roi_shp[ishp] = ENVI_CREATE_ROI(ns = ns_in, nl = nl_in)
  15.                  ENVI_DEFINE_ROI, roi_shp[ishp], /polygon, xpts = REFORM(xmap), ypts = REFORM(ymap)
  16.                  IF ishp EQ 0 THEN BEGIN
  17.                     ;;nearest sampling is used
  18.                     xmin = ROUND(MIN(xMap))
  19.                     yMin = ROUND(MIN(yMap))
  20.                  ENDIF ELSE BEGIN
  21.                     ;;there should be only one polygon in most cases
  22.                     RETURN
  23.                  ENDELSE
  24.               ENDIF
  25.               oshp -> DestroyEntity, entitie
  26.            ENDFOR
  27.            ;;apply the mask
  28.            ENVI_MASK_DOIT, AND_OR = 1, /IN_MEMORY, ROI_IDS = roi_shp, $
  29.                            ns = ns_in, nl = nl_in, /inside, $
  30.                            r_fid = fid_mask
  31.            ;;define the output raster array
  32.            dims_mask = [-1, xMin, (xMin + ncol - 1), yMin, (ymin + nrow - 1)]
  33.            m_pos = [0]
  34.            ;;subset the input raster and save it within the memory
  35.            filename_out = year_out + !slash + prefix_out + year_str + day_str + envi_extension
  36.            ENVI_MASK_APPLY_DOIT, FID = fid_in, POS = pos, DIMS = dims_mask, $
  37.                                  M_FID = fid_mask, M_POS = m_pos, VALUE = missing_value, /in_memory, $
  38.                                  R_FID = fid_out
  39.            ENVI_FILE_QUERY, fid_out, ns = ns_out, nl = nl_out, nb = nb_out, bname = bname_out, dims = dims_out
  40.            data = ENVI_GET_DATA(fid = fid_out, dims = dims_out, pos = pos)
  41.            ;;output with pre-defined spatial reference
  42.            ENVI_WRITE_ENVI_FILE, FLOAT(data), map_info = map_info, out_name = filename_out, $
  43.                                  nb = nb_out, ns = ncol, nl = nrow, OUT_DT = 4


Feel free to try it out and give me feedback.


Comments

Popular posts from this blog

Numerical simulation: ode/pde solver and spin-up

For Earth Science model development, I inevitably have to deal with ODE and PDE equations. I also have come across some discussion related to this topic, i.e.,

https://www.researchgate.net/post/What_does_one_mean_by_Model_Spin_Up_Time

In an attempt to answer this question, as well as redefine the problem I am dealing with, I decided to organize some materials to illustrate our current state on this topic.

Models are essentially equations. In Earth Science, these equations are usually ODE or PDE. So I want to discuss this from a mathematical perspective.

Ideally, we want to solve these ODE/PDE with initial condition (IC) and boundary condition (BC) using various numerical methods.
https://en.wikipedia.org/wiki/Initial_value_problem
https://en.wikipedia.org/wiki/Boundary_value_problem

Because of the nature of geology, everything is similar to its neighbors. So we can construct a system of equations which may have multiple equation for each single grid cell. Now we have an array of equation…

Watershed Delineation On A Hexagonal Mesh Grid: Part A

One of our recent publications is "Watershed Delineation On A Hexagonal Mesh Grid" published on Environmental Modeling and Software (link).
Here I want to provide some behind the scene details of this study.

(The figures are high resolution, you might need to zoom in to view.)

First, I'd like to introduce the motivation of this work. Many of us including me have done lots of watershed/catchment hydrology modeling. For example, one of my recent publications is a three-dimensional carbon-water cycle modeling work (link), which uses lots of watershed hydrology algorithms.
In principle, watershed hydrology should be applied to large spatial domain, even global scale. But why no one is doing it?  I will use the popular USDA SWAT model as an example. Why no one is setting up a SWAT model globally? 
There are several reasons we cannot use SWAT at global scale: We cannot produce a global DEM with a desired map projection. SWAT model relies on stream network, which depends on DEM.…