Be careful what you wish for: Error measures for regression using R

Almost 10 years ago I was working with evolutionary strategies for tuning neural network for time series prediction when I became curious about error measures and the effects on the final forecast. In general, evolutionary algorithms use a fitness function that is based on a error measure. The objective is to get better individual(s) minimizing (or maximizing) the fitness function. Thus, to determine which model is “the best”, the performance of a trained model is evaluated against one or more criteria (e.g. error measure). However, the relation between the lowest error and “best model” is complex and should be applied according with the desirable goal (i.e. forecasting average, forecasting extremes, deviance measures, relative errors, etc.).

There is a journal paper which gives the description of the errors used in the ‘qualV’ package. This package has several implementations of quantitative validation methods. The paper (which is very interesting by the way), also has some examples of how the final results of the errors measures change when dealing with noise, shifts, nonlinear scaling, etc.

The objective of this post is just to show the problem, and raise the awareness when measuring the best model based on error only. Sometimes the minimization of one error measure does not guarantee the minimization of all other error measures and it even could lead to a pareto front. Here i am using some of the functions described in the paper and for simplicity i am comparing here only 4 errors measures: mean absolute error (MAE), root-mean-square error (RMSE), correlation coefficient (r) and mean absolute percentage error (MAPE). Each error measure is measuring a distinct characteristic of the time series and each of them has strong and weak points. I am using R version 3.3.2 (2016-10-31) on Ubuntu 16.04.

Case (i): Adding noise

Lets say we have the function with the original signal given by:
y=a\sin(\pi x b)+s.

Using x=[0,1], a=1.5, b=2, and s=0.75 then:

x = seq(0,1,by=.005)
ysignal = 1.5*sin(pi*2*x)+0.75
plot(x,ysignal,main = "Main signal")

figunnamed-chunk-1-1

We should change the original signal and check how this will affect the final result. If the “forecast” is the same as the signal then all the errors should be 0. Thus applying some noise to the signal s=(0.75+noise), where noise comes from a Gaussian function with mean=0 and standard deviation =0.2, and comparing with the original signal we get:

library(qualV)
n = 0.2 #noise level
noise = rnorm(length(x),sd=n)
ynoise = ysignal+noise
par(mfrow=c(1,2))
range.yy <- range(c(ysignal,ynoise))
plot(x,ysignal,type='l',main = "Adding noise"); lines(x,ynoise,col=2)
plot(ynoise,ysignal,ylim=range.yy,xlim=range.yy,main = "Signal vs Forecast") 

figunnamed-chunk-2-1

round(MAE(ysignal,ynoise),2)
## [1] 0.16
round(RMSE(ysignal,ynoise),2)
## [1] 0.21
round(cor(ysignal,ynoise),2)
## [1] 0.98
round(MAPE(ysignal,ynoise),2)
## [1] 40.65

Case (ii): Shifting the signal

Lets apply a shift on the values of the original signal. With s=0.95 we have:

yshift = ysignal+0.2

figunnamed-chunk-3-1

round(MAE(ysignal,yshift),2)
## [1] 0.2
round(RMSE(ysignal,yshift),2)
## [1] 0.2
round(cor(ysignal,yshift),2)
## [1] 1
round(MAPE(ysignal,yshift),2)
## [1] 60.95

Case (iii): shift + rescale

Lets apply a shift and also rescale the values of the original signal. Doing a=0.8 and s=0.95 we have:

yresshift = 0.8*ysignal+0.2

figunnamed-chunk-4-1

round(MAE(ysignal,yresshift),2)
## [1] 0.19
round(RMSE(ysignal,yresshift),2)
## [1] 0.22
round(cor(ysignal,yresshift),2)
## [1] 1
round(MAPE(ysignal,yresshift),2)
## [1] 61.66

Case (iv): Changing the frequency

In this case lets vary slightly the frequency of the original signal making b=2.11:

yfreq = 1.5*sin(pi*2.11*x)+0.75

figunnamed-chunk-5-1

round(MAE(ysignal,yfreq),2)
## [1] 0.17
round(RMSE(ysignal,yfreq),2)
## [1] 0.22
round(cor(ysignal,yfreq),2)
## [1] 0.98
round(MAPE(ysignal,yfreq),2)
## [1] 89.33

Each case has the original series (in black) and the possible “forecast” (in red). I also plotted the original series (signal) versus the residual series. Which case would you pick as the best forecast? What is your assumption?

How is Artificial Intelligence Helping the Environment?

Recently, one of the most intelligent man alive, the theoretical physicist Stephen Hawking, said in an interview: “The development of full artificial intelligence could spell the end of the human race.”. Should we be afraid of Artificial intelligence (AI) because the machines could take over the human race (similarly to the terminator)? For the environmental science point of view, AI so far has been really helpful. First what is AI? There are different definitions, but one of my favourites is:

[The automation of] activities that we associate with human thinking, activities such as decision-making, problem solving, learning…
(Hellman 1978)

Thus, it is basically the use of machines (e.g. computers) to solve problems and to help complex decisions. There is a branch of AI called machine learning (ML). It is a scientific discipline which studies computational algorithms that can learn from data. One of the applications of the ML algorithms is to use some particular data to perform classification and numerical regression. But how is ML helping the environment?

Satellites (remote sensing) generate thousands of data every day. Images around the globe with different frequencies band (infrared, microwave, visible to the human eye, etc), time and scales. But How useful can be those images? One example is to detect phytoplankton in the ocean. Phytoplankton are important components to sustain the aquatic food web. The importance of them is beyond of being food for krills. Accurate estimates of chlorophyll concentrations (consequently phytoplankton) are essential for estimating primary productivity, biomass, etc.

Image Credit: NASA

Image Credit: NASA

It is possible to use satellites to detect phytoplankton presence in the ocean due the concentration chlorophyll in the surface water. Each frequency channel has a purpose. For example, at certain wavelengths, sand reflects more energy than green vegetation while at other wavelengths it absorbs more (reflects less) energy. However, even using satellites this is not a easy task. Aerosol concentrations could affect the ocean colour viewed from the satellite. Further complications arise when there are also suspended sediments,and/or dissolved organic matter from decayed vegetation in the water. In addition coastal water quality gradually degrades from increased pollution and human activities. To overcome those problems, ML algorithms such as artificial neural networks and support vector machines are used to automatically classify (separating chlorophyll from aerosols, pollution, etc) and detect the presence of phytoplankton in the ocean.

Credit: Nasa

It is also possible to use remote sensing to classify and detect land cover applying the same algorithms to identify and classify different types of vegetation including forests, dead trees in the forest, forest fires, portion of regenerated trees after forest fires, etc. With accurate information is possible to avoid more deforestation, track urbanization, mitigate diseases, understand and control ecosystems, planning, etc.

asfd

These are only small samples of how AI is used in environmental sciences. There are so many contributions that is unfair to give only a few examples about the topic.

More about:

http://earthobservatory.nasa.gov/Features/RemoteSensing/remote.php

Sources:

William Hsieh (2009). Machine Learning Methods in the Environmental Sciences Cambridge DOI: 10.1017/CBO9780511627217

Keiner, L., & Yan, X. (1998). A Neural Network Model for Estimating Sea Surface Chlorophyll and Sediments from Thematic Mapper Imagery Remote Sensing of Environment, 66 (2), 153-165 DOI: 10.1016/S0034-4257(98)00054-6

Schiller, H., & Doerffer, R. (2005). Improved determination of coastal water constituent concentrations from MERIS data IEEE Transactions on Geoscience and Remote Sensing, 43 (7), 1585-1591 DOI: 10.1109/TGRS.2005.848410

Dash, J., Mathur, A., Foody, G., Curran, P., Chipman, J., & Lillesand, T. (2007). Land cover classification using multi‐temporal MERIS vegetation indices International Journal of Remote Sensing, 28 (6), 1137-1159 DOI: 10.1080/01431160600784259

Changes in the Water Cycle Expected with Climate Change. Are We Doomed?

Everybody know we are evolving as human beings. Is this true? When I see how clean water has being handled I have some questions. More than a billion people across the globe don’t have access to safe water. Every day 3900 children die as a result of insufficient or unclean water supplies. The situation can only get worse as water gets ever more scarce. The world without clean water. How many times I’ve heard that. The humankind is polluting, wasting, diverting, pumping, and degrading the clean water that we have. On top of that, water has being privatized. Why? Because is becoming rare and only what is rare is valuable! The rampant over-development of agriculture, housing and industry increase the demands for fresh water well beyond the finite supply, resulting in the desertification of the earth. There are companies now saying why don’t we bottle it, mine it, divert it, sell it, commodify it. Corporate giants force developing countries to privatize their water supply for profit. Wall Street investors target desalination and mass bulk water export schemes. Corrupt governments use water for economic and political gain. Military control of water emerges and a new geopolitical map and power structure forms, setting the stage for world water wars. The following two documentaries show how the problem is affecting countries in the world. It is interesting how two documentaries show the same topic. They complement each other.


So, why can we be friends with nature? Is there any hope? According with this recent paper from nature climate:

Adaptation of water resources management will help communities adjust to changes in the water cycle expected with climate change, but it can’t be fixed by innovations alone.

The paper talks about the Pangani River, where the Tanzania Electric Supply Company has three hydropower plants. There, climate change is affecting the water cycle, changing precipitation amounts and droughts duration which is altering the way farmers, pastoralists and Tanzania’s energy company are managing water. All over the world new techniques and planning have been developed. The urban and rural development plans (sometimes) are moving away from large, static projects by combining sustainable approaches of engineering and ecology.

For the Pangani River, leaders adjusted water allocation policies with the changing needs of the communities. Still, they made water availability for ecosystems a main priority by maintaining at least a minimum flow of water to wetlands, riparian forests and mangroves to provide water for wildlife including fish, plants for medicinal use, timber and fruits, for example. Then, as the region’s population swelled, water uses for urban city centres were balanced with the needs of subsistence farmers, pastoralists and the Tanzanian energy company. That same kind of flexibility is the hallmark of the new thinking on water management. Rather than relying on large, long-lived concrete infrastructure, often built all at once and designed based on historical climatic conditions.

It makes sense. Rather than isolating water management issues within a single field, such as engineering or hydrology, the team to solve these problems should include economists, hydrologists, policymakers and engineers. Solutions have been proposed such as the redesigning of water treatment plants that can accommodate extreme rainfall, and the adding of city orchards and grassed bio-swales (which resemble marshy depressions in the land) to slow the flow of storm water from sidewalks. They will act as green sponges all over the city. Thus the water gets soaked up avoiding pumping every time it rains.

Another good example comes from Japan where it is possible to be sustainable (of course I am not talking about the Japanese nuclear power stations). Over centuries they reshaped the land where people and nature could remain in harmony. For the Japanese, it is important that they have a special word for it, satoyama, villages where mountains give way to plains. The satoyama landscape is a system in which agricultural practices and natural resource management techniques are used to optimize the benefits derived from local ecosystems. In the Satoyama villages, each home has a built in pool or water tank that lies partly inside, partly outside its’ walls… A continuous stream of spring water is piped right into a basin, so freshwater is always available. People rinse out pots in the tank and clean their freshly picked vegetables. If they simply pour the food scraps back in the water, they risk polluting the whole village supply. However, carps do the washing up there scouring out even the greasy or burnt pans. Cleaned up by the carp, the tank water eventually rejoins the channel. This documentary talks about the Satoyama villages:

In the Satoyama villages the products obtained (including food and fuel) help safeguard the community against poverty, but without degrading the land, water or other resources. Of course documentaries have a bias towards the ideas which they want to show but can you spot the difference? Also, is water public or private? Am I saying no more bottled water? Am I saying everybody should live in a Satoyama village? No. However I balance must exist between extraction and use. We need to reinvent ourselves.

Do you have anything to say? I’d like to hear your opinion.

More information:

http://satoyama-initiative.org/

http://onlinelearning.unu.edu/en/the-satoyama-initiative/

Click to access e_satoyama_pamph.pdf

Journal References:
Palmer, L. (2014). The next water cycle Nature Climate Change, 4 (11), 949-950 DOI: 10.1038/nclimate2420

Dr. Agr. Kazuhiko Takeuchi,Robert D. Brown Ph.D., Dr. Sci. Izumi Washitani, Dr. Agr. Atsushi Tsunekawa, Dr. Agr. Makoto Yokohari (2003). Satoyama, The Traditional Rural Landscape of Japan Springer Japan DOI: 10.1007/978-4-431-67861-8

Weather Forecasting: Is it better to toss a coin?

Why Is Weather Forecasting Always Wrong?  Have you asked yourself the same question? Have you cursed the weather forecasting when you were expecting a sunny day and then rained? Does this picture looks familiar?

tempoeficaz

Once I was in Toronto when the forecasting for the other day was a blizzard. Basically, lots of snow. When that day finally arrived we had half of the expected snow. I friend of mine said: “It is the government. They say more snow will fall than what is expected to scare the people.” Well at that time my knowledge about weather and atmospheric science was minimal. I did not know what to think. Is my friend right? Is the government really doing this? What is the real reason behind? Are the guys responsible for the weather forecasting incompetents? A few years ago i did a seminar giving a brief explanation of how hard is to predict weather. Unfortunately the slides do not come with detailed information. Thus, to answer some questions about weather forecasting I will do a series (not consecutive) of posts explaining why weather is so hard to predict. In addition I will try to give an overview of  how it is predicted. I will add the posts under the category ¨Weather Forecasting”.

To explain the whole weather forecasting problem it is really hard, almost impossible. For example:

Despite the detailed knowledge about precipitation including the complete hydrological cycle (evaporation, water vapour, convection, condensation, clouds, soil moisture, groundwater and the origin of rivers), predicting precipitation accurately is still one of the most difficult tasks in meteorology (Kuligowski:1998)

I know the paper is old but the problem persists. Even in 2014 precipitation still a major forecasting challenge. Some of the reasons are:

  • The chaotic nature of the atmosphere and the complexity of the processes that are involved in precipitation
  • The difficulties of precipitation measurements including problems with rain gauges, radar and satellites
  • The limited temporal and spatial scales of numerical weather prediction (NWP) models?

My goal is to provide information about the most important parts of the weather forecasting. Of course at the end of the posts, if I am missing something please let me know but I hope my posts will be enough to anyone know that the scientists are doing a really good job and they are really hard work guys and if they are missing is not because conspiracy or incompetence. It is because the problem is really hard.

 

Journal references:

Kuligowski, R., & Barros, A. (1998). Localized Precipitation Forecasts from a Numerical Weather Prediction Model Using Artificial Neural Networks Weather and Forecasting, 13 (4), 1194-1204 DOI: 10.1175/1520-0434(1998)0132.0.CO;2

Can We Really Count on Plants to Slow Down Global Warming?

The idea is simple. Fact 1:Plants reduce CO2 in the atmosphere trough photosynthesis. Fact 2: Increasing CO2 in the atmosphere stimulates plants growth. Thus fact 1 + fact 2 is the perfect scenario. If there is more CO2 in the atmosphere and plants are growing more because of that, the solution to global warming is to plant more trees right? Well not really. There is a missing piece called Carbon cycle.

Continue reading

Phantom power? Call the Ghostbusters

One of the hot topic of the moment is global warming. In 2007 the International Panel on Climate Change (IPCC) reported that scientists were more than 90% certain that most of global warming was being caused by increasing concentrations of greenhouse gases produced by human activities. Fossil fuel burning has produced about three-quarters of the increase in CO2 from human activity over the past 20 years. Coal burning was responsible for 43% of the total emissions, oil 34%, and gas 18%. Also steam generators in large power plants burn considerable amounts of fossil fuels and therefore emit large amounts of CO2 to the ambient atmosphere. Thus, the needed for more energy means needed for more fossil fuel burning (the ideal scenario is renewable energy but this is another post).

Phantom power (also called Standby power, vampire power, vampire draw, phantom load, or leaking electricity) is when electronic devices are left plugged in, using a significant amount of power. They cannot be turned ‘off’ without being unplugged while others continue to draw power while not performing their primary purpose. It’s costing you money. It’s also costing our planet even more with wasteful carbon emissions. Continue reading

The Sound of The Arctic Ice Death Spiral

Much of the Arctic Ocean is covered by sea ice which varies in extent and thickness seasonally. The Arctic sea ice extent has been shrinking (during the summer) and growing (during the winter) over decades (achieving the maximum in April and the minimum in September). However a sea ice loss has been observed in recent decades. For example the average ice extent for March 2014 was the fifth lowest for the month in the satellite record which supports the idea of sea ice decline.

Monthly March ice extent for 1979 to 2014 shows a decline of 2.6% per decade relative to the 1981 to 2010 average. Credit: National Snow and Ice Data Center

In 2012, Peter Wadhams published a paper talking about how fast sea ice decline is happening:

Arctic sea ice extent had been shrinking at a relatively modest rate of 3-4% per decade (annually averaged) but after 1996 this speeded up to 10% per decade and in summer 2007 there was a massive collapse of ice extent to a new record minimum of only 4.1 million km2. Thickness has been falling at a more rapid rate (43% in the 25 years from the early 1970s to late 1990s) with a specially rapid loss of mass from pressure ridges. The summer 2007 event may have arisen from an interaction between the long-term retreat and more rapid thinning rates.

But what is exactly the Arctic ice death spiral?   Continue reading

Can We Hear The Sound of Our Warming Planet?

Does it sound weird for you? Great you are not alone. I had the same reaction but guys from University of Minnesota combined music and environmental data.

It is not new that our planer is warming.  Ok, lets stand on the shoulders of giants (I hate when someone uses those quotes and I don’t know where it came from). In 2010 J. Hansen and his fellows scientific friends published an article at Reviews of Geophysics. They used the Goddard Institute for  Space Studies (GISS) (NASA) analysis of global surface temperature change. Yes, there are different groups of scientists doing different analysis of global temperatures. So when you hear someone saying: “scientists said that average global temperature is rising”, it is a specific group using some specific data (not always they work together). However these analysis are not totally independent because they must use much the same information (same satellites,  meteorological stations, etc). Roughly speaking, they complete each other. Continue reading

Could climate change increase the price of airfare tickets (consequently tour costs)?

It is really funny how the things work on internet nowadays. You start looking for something and you end reading unexpected things. I was writing and reading about what is turbulence and how to avoid it (last two posts) when I found a recent (2013) interesting paper talking about the possible intensification of turbulence activity due climate change.

The paper is quite interesting. They define turbulence in an elegant way:

…turbulence when they encounter vertical airflow that varies on horizontal length scales greater than, but roughly equal to, the size of the plane.

The Australian Transport Safety Bureau (ATSB), suggested that cases of turbulence had risen and incidents doubled over the three-month period between October and December last year, compared to the previous quarter. Also moderate-or-greater upper-level turbulence has been found to increase over the period 1994–2005 in pilot reports in the United States. Continue reading