Skip to main content

Scientific writing: how to prepare a good image figure

"A picture is worth a thousand words", from Wikipedia, unveils the importance of figures in writing.
There are many types of figures or pictures, and in the field of Earth Science, image figure is one of the most important types.
By image figure, I mean a figure composed of matrix, such as
Link of the image.

(A selfie is also an image, but it is not commonly seen in scientific writing. Although the above figure is taken from the website, it is publication-ready in my opinion.)

So what makes a good  image figure? The figure above actually gives a good example, and we can see what are the principle elements in this figure.
  • A clear and concise title
  • An appropriate map projection
  • A beautiful color bar and data presentation
  • Adequate description (data source, etc.)
Even without supplementary material, I believe most readers can understand this figure without difficulties.

So where are the challenges?
Usually we can easily prepare the title, projection and description with care. The challenging part is the color presentation of the matrix data.

For example, if the above image is rendered using black and white, I think all of us will be disappointed (this is not photography).
How about other color scale, such as from green to yellow? Maybe not a good idea.
I think now we almost get to the point that colors matter.

We all know that nearly all colors can be decomposed into Red, Green and Blue and there is more than one million colors we can use. But in real life, we seldom need that much.
A commonly used approach to get the best color for the data is using the color table, or the color look up table, through which only a number of colors (usually less than one thousand) are used for mapping.
The selections of these colors are also based on a few approaches. Anyway, we can produce much nicer color tables such as this:

Link of the image.
The above color table contains exactly 200 colors. And if the matrix contains 200 unique values, the image would be perfectly displayed using these colors.

However, most of time, our data has way too many unique values. The stretch method is then used to scale our data before mapping them. There are also a few different stretch methods for different purposes. ArcGIS Map has some detailed explanation of these methods here.

Most of time, a linearly stretch will work, but sometimes, it does not work due to the data distribution.

However, when the data is stretched, we need to be careful with the color table. Because interpretation of the value from the color table need to be stretched as well. In this case, it has become not intuitive for us to guess the value from the map since its color may not be linearly stretched.

Then the classification method comes into the play. If we can classify the matrix data to a few classes, then it would be pretty straight forward to guess the value range from its color. Besides, classification means that we will only have a few colors, usually less than 20.

There are also quite a few methods of classification.
I will leave this work to the readers.

To be continued...


Popular posts from this blog

Spatial datasets operations: mask raster using region of interest

Climate change related studies usually involve spatial datasets extraction from a larger domain.
In this article, I will briefly discuss some potential issues and solutions.

In the most common scenario, we need to extract a raster file using a polygon based shapefile. And I will focus as an example.

In a typical desktop application such as ArcMap or ENVI, this is usually done with a tool called clip or extract using mask or ROI.

Before any analysis can be done, it is the best practice to project all datasets into the same projection.

If you are lucky enough, you may find that the polygon you will use actually matches up with the raster grid perfectly. But it rarely happens unless you created the shapefile using "fishnet" or other approaches.

What if luck is not with you? The algorithm within these tool usually will make the best estimate of the value based on the location. The nearest re-sample, but not limited to, will be used to calculate the value. But what about the outp…

Numerical simulation: ode/pde solver and spin-up

For Earth Science model development, I inevitably have to deal with ODE and PDE equations. I also have come across some discussion related to this topic, i.e.,

In an attempt to answer this question, as well as redefine the problem I am dealing with, I decided to organize some materials to illustrate our current state on this topic.

Models are essentially equations. In Earth Science, these equations are usually ODE or PDE. So I want to discuss this from a mathematical perspective.

Ideally, we want to solve these ODE/PDE with initial condition (IC) and boundary condition (BC) using various numerical methods.

Because of the nature of geology, everything is similar to its neighbors. So we can construct a system of equations which may have multiple equation for each single grid cell. Now we have an array of equation…

Watershed Delineation On A Hexagonal Mesh Grid: Part A

One of our recent publications is "Watershed Delineation On A Hexagonal Mesh Grid" published on Environmental Modeling and Software (link).
Here I want to provide some behind the scene details of this study.

(The figures are high resolution, you might need to zoom in to view.)

First, I'd like to introduce the motivation of this work. Many of us including me have done lots of watershed/catchment hydrology modeling. For example, one of my recent publications is a three-dimensional carbon-water cycle modeling work (link), which uses lots of watershed hydrology algorithms.
In principle, watershed hydrology should be applied to large spatial domain, even global scale. But why no one is doing it?  I will use the popular USDA SWAT model as an example. Why no one is setting up a SWAT model globally? 
There are several reasons we cannot use SWAT at global scale: We cannot produce a global DEM with a desired map projection. SWAT model relies on stream network, which depends on DEM.…