### Evaluate the stream networks from watershed delineation

Recently I asked a question on the GIS Exchange site:
https://gis.stackexchange.com/questions/315910/quantitatively-evaluating-quality-of-watershed-delineation-stream-line-results

I also asked the question on Research Gate:
https://www.researchgate.net/post/How_to_quantitatively_evaluate_the_quality_of_the_watershed_delineation_stream_line_results

The reason is that we have developed a watershed delineation model and we need to evaluate whether our model performs better or not than the previous method.

So we set out trying to find ways to evaluate the results.
The first thing as a watershed hydrologist will usually do is to look at the stream segments. If they match up with actual stream lines then it means the model is not bad.

However, both our method and the previous method can produce similar stream segment results. The the question is how can we say which one is better than the other one.

So we did some research online, and most publications used visual results as proof. Basically,  we literally "see" the differences between different models.

Then there are also a few papers include slope, length for comparison, which still does not actually show which one is better effectively.

Among them, there is one paper proposed that by calculating distance between simulated stream and actual stream, we can determine how close the model is away from the truth. And I think this might be the closest we can find.

However, the idea of distance between two line vectors is not as intuitive as it looks.

So we kept researching.
Then we realized that if we place two stream line vectors together, we can get a polygon created by them. And if the two vectors are very close, then the area of the polygon should be small.
This is illustrated by the figure:
Figure 1. The stream line vector with the true stream lines.

The new question becomes:
How can we calculate the area of this "intersected area"?
There is no existing tool can do this task in GIS. Because we are intersecting polylines and we need to create multiple polygons. It would be possible if we have the mathematical formula of each line and the locations of each intersection. By any means, it is a challenge in precise GIS.

So instead of thinking in the vector domain, we decided to try the raster domain. Luckily, we can easily convert these vectors into raster and do raster calculation. TLDR, the new question is similar to calculating the area of a closed domain.

And there are several related questions. For example this one:
https://stackoverflow.com/questions/21816562/finding-holes-in-2d-point-sets

If you have ever used Photoshop, you probably have used the magic tool, which actually solves the problem we have here.

In one sentence, in order to calculate the area of the closed holes, we need to fill the rest of the raster.
Since I did not use any advanced fill algorithm at the time of solving the problem, a brutal algorithm was used. Basically it fills the raster from boundary towards inside. And it took a long time to run.

Figure 2. The fill algorithm. The x-axis is step, and the y-axis the remaining pixel count.

Of course, if flood filling algorithm will significantly reduce the computational time.
And the result shows that our method performs better under coarse resolution.

The take away message: I was surprised that a watershed delineation evaluation process will lead to flood filling algorithm, which is already used in the depression filling process!

For those who is not familiar with flood filling algorithm, here is more information:

Let me know if you have any question.

### Spatial datasets operations: mask raster using region of interest

Climate change related studies usually involve spatial datasets extraction from a larger domain.
In this article, I will briefly discuss some potential issues and solutions.

In the most common scenario, we need to extract a raster file using a polygon based shapefile. And I will focus as an example.

In a typical desktop application such as ArcMap or ENVI, this is usually done with a tool called clip or extract using mask or ROI.

Before any analysis can be done, it is the best practice to project all datasets into the same projection.

If you are lucky enough, you may find that the polygon you will use actually matches up with the raster grid perfectly. But it rarely happens unless you created the shapefile using "fishnet" or other approaches.

What if luck is not with you? The algorithm within these tool usually will make the best estimate of the value based on the location. The nearest re-sample, but not limited to, will be used to calculate the value. But what about the outp…

### Numerical simulation: ode/pde solver and spin-up

For Earth Science model development, I inevitably have to deal with ODE and PDE equations. I also have come across some discussion related to this topic, i.e.,

https://www.researchgate.net/post/What_does_one_mean_by_Model_Spin_Up_Time

In an attempt to answer this question, as well as redefine the problem I am dealing with, I decided to organize some materials to illustrate our current state on this topic.

Models are essentially equations. In Earth Science, these equations are usually ODE or PDE. So I want to discuss this from a mathematical perspective.

Ideally, we want to solve these ODE/PDE with initial condition (IC) and boundary condition (BC) using various numerical methods.
https://en.wikipedia.org/wiki/Initial_value_problem
https://en.wikipedia.org/wiki/Boundary_value_problem

Because of the nature of geology, everything is similar to its neighbors. So we can construct a system of equations which may have multiple equation for each single grid cell. Now we have an array of equation…

### Lessons I have learnt during E3SM development

I have been involved with the E3SM development since I joined PNNL as a postdoc. Over the course of time, I have learnt a lot from the E3SM model. I also found many issues within the model, which reflects lots of similar struggles in the lifespan of software engineering.

Here I list a few major ones that we all dislike but they are around in almost every project we have worked on.

Excessive usage of existing framework even it is not meant to Working in a large project means that you should NOT re-invent the wheels if they are already there. But more often, developers tend to use existing data types and functions even when they were not designed to do so. The reason is simple: it is easier to use existing ones than to create new ones. For example, in E3SM, there was not a data type to transfer data between river and land. Instead, developers use the data type designed for atmosphere and land to do the job. While it is ok to do so, it added unnecessary confusion for future development a…