Skip to main content

Scientific writing: how to prepare a good image figure

"A picture is worth a thousand words", from Wikipedia, unveils the importance of figures in writing.
There are many types of figures or pictures, and in the field of Earth Science, image figure is one of the most important types.
By image figure, I mean a figure composed of matrix, such as
https://www.ncdc.noaa.gov/sotc/service/global/map-blended-mntp/201601.gif
Link of the image.

(A selfie is also an image, but it is not commonly seen in scientific writing. Although the above figure is taken from the website, it is publication-ready in my opinion.)

So what makes a good  image figure? The figure above actually gives a good example, and we can see what are the principle elements in this figure.
  • A clear and concise title
  • An appropriate map projection
  • A beautiful color bar and data presentation
  • Adequate description (data source, etc.)
Even without supplementary material, I believe most readers can understand this figure without difficulties.

So where are the challenges?
Usually we can easily prepare the title, projection and description with care. The challenging part is the color presentation of the matrix data.

For example, if the above image is rendered using black and white, I think all of us will be disappointed (this is not photography).
How about other color scale, such as from green to yellow? Maybe not a good idea.
I think now we almost get to the point that colors matter.

We all know that nearly all colors can be decomposed into Red, Green and Blue and there is more than one million colors we can use. But in real life, we seldom need that much.
A commonly used approach to get the best color for the data is using the color table, or the color look up table, through which only a number of colors (usually less than one thousand) are used for mapping.
The selections of these colors are also based on a few approaches. Anyway, we can produce much nicer color tables such as this:

Link of the image.
The above color table contains exactly 200 colors. And if the matrix contains 200 unique values, the image would be perfectly displayed using these colors.


However, most of time, our data has way too many unique values. The stretch method is then used to scale our data before mapping them. There are also a few different stretch methods for different purposes. ArcGIS Map has some detailed explanation of these methods here.

Most of time, a linearly stretch will work, but sometimes, it does not work due to the data distribution.

However, when the data is stretched, we need to be careful with the color table. Because interpretation of the value from the color table need to be stretched as well. In this case, it has become not intuitive for us to guess the value from the map since its color may not be linearly stretched.

Then the classification method comes into the play. If we can classify the matrix data to a few classes, then it would be pretty straight forward to guess the value range from its color. Besides, classification means that we will only have a few colors, usually less than 20.

There are also quite a few methods of classification.
I will leave this work to the readers.

To be continued...













Comments

Popular posts from this blog

Spatial datasets operations: mask raster using region of interest

Climate change related studies usually involve spatial datasets extraction from a larger domain.
In this article, I will briefly discuss some potential issues and solutions.

In the most common scenario, we need to extract a raster file using a polygon based shapefile. And I will focus as an example.

In a typical desktop application such as ArcMap or ENVI, this is usually done with a tool called clip or extract using mask or ROI.

Before any analysis can be done, it is the best practice to project all datasets into the same projection.

If you are lucky enough, you may find that the polygon you will use actually matches up with the raster grid perfectly. But it rarely happens unless you created the shapefile using "fishnet" or other approaches.

What if luck is not with you? The algorithm within these tool usually will make the best estimate of the value based on the location. The nearest re-sample, but not limited to, will be used to calculate the value. But what about the outp…

Numerical simulation: ode/pde solver and spin-up

For Earth Science model development, I inevitably have to deal with ODE and PDE equations. I also have come across some discussion related to this topic, i.e.,

https://www.researchgate.net/post/What_does_one_mean_by_Model_Spin_Up_Time

In an attempt to answer this question, as well as redefine the problem I am dealing with, I decided to organize some materials to illustrate our current state on this topic.

Models are essentially equations. In Earth Science, these equations are usually ODE or PDE. So I want to discuss this from a mathematical perspective.

Ideally, we want to solve these ODE/PDE with initial condition (IC) and boundary condition (BC) using various numerical methods.
https://en.wikipedia.org/wiki/Initial_value_problem
https://en.wikipedia.org/wiki/Boundary_value_problem

Because of the nature of geology, everything is similar to its neighbors. So we can construct a system of equations which may have multiple equation for each single grid cell. Now we have an array of equation…

Lessons I have learnt during E3SM development

I have been involved with the E3SM development since I joined PNNL as a postdoc. Over the course of time, I have learnt a lot from the E3SM model. I also found many issues within the model, which reflects lots of similar struggles in the lifespan of software engineering.

Here I list a few major ones that we all dislike but they are around in almost every project we have worked on.

Excessive usage of existing framework even it is not meant to Working in a large project means that you should NOT re-invent the wheels if they are already there. But more often, developers tend to use existing data types and functions even when they were not designed to do so. The reason is simple: it is easier to use existing ones than to create new ones. For example, in E3SM, there was not a data type to transfer data between river and land. Instead, developers use the data type designed for atmosphere and land to do the job. While it is ok to do so, it added unnecessary confusion for future development a…