# Snowfall Inspired video

Continuing my series of posts about making music inspired by videos/images I recorded previously based weather/climate events. This episode occurred when an unusual amount of snow felt during a short period of time this last winter. The therm unusual is used because this episode was unusual for the actual standards. The same amount of snow used to fall in the past and that was a normal event.

I called this video Snow – Spock’s finger as a joke to the album with the same name of the progressive rock band Spock’s Beard.

# I Believe In Gnomes, Santa Claus And The Weather Man

It’s been a while since my last post. Basically the struggle of any artist, be happy or make money!

Anyway, I have been experimenting and making music inspired by videos/images I recorded previously based weather/climate events. It is more or less like a little side project that will help me when creating the videos of the climate change prog rock opera.

This video I made when a series of storms hit the place where I live.

The reason I called this song I Believe In Gnomes, Santa Claus And The Weather Man is because sometimes I have the feeling that people believe more in gnomes than in the forecast of the weather man. They are not that bad and they do a really good work most of the time. Weather forecast is hard!

# Think Python AND R and not just PYTHON OR R: Creating vectors

In this post I will talk a little bit about how python and R work with vectors. I am using python 2.7.13 and R 3.4.3, both 64-bit on a Ubuntu 16.04 and I am also using the free book called A Hands-On Introduction to Using Python in the Atmospheric and Oceanic Sciences by prof. Johnny Lin as a guide.

## Creating vectors

The first thing about a vector is how to create one. There are several ways to create a vector in R. For example, if one needs to create a logical vector:

a = vector(mode = "logical", length = 5)
b = c(FALSE, FALSE, FALSE, FALSE, FALSE)
c = rep(FALSE,5)

## [1] FALSE FALSE FALSE FALSE FALSE
## [1] FALSE FALSE FALSE FALSE FALSE
## [1] FALSE FALSE FALSE FALSE FALSE


or a numerical vector:

a = vector(mode = "numeric", length = 5)
b = c(0, 0, 0, 0, 0)
c = rep(x=0, 5)

## [1] 0 0 0 0 0
## [1] 0 0 0 0 0
## [1] 0 0 0 0 0


The vector() function produces a vector of the given length and mode, the c() function is a generic function which combines its arguments and rep() function replicates the values in x also returning a vector.

In python we can use the numpy package which has lots of methods to create and manipulate arrays/vectors. Thus importing the package numpy and generating the same vectors in python:

import numpy as np
a = np.array([False,False,False,False])
b = np.full(4, False, bool)
print a
print b

## [False False False False]
## [False False False False]


or a numerical vector:

import numpy as np
a = np.array([0,0,0,0])
b = np.full(4, 0, float)
print a
print b

## [0 0 0 0]
## [ 0.  0.  0.  0.]


Why are they different? Remember the dynamical typing? Vector a is a vector of integers and b is a vector of float. The attribute dtype gives the data-type of the array’s elements:

c = np.array([0.0,0.0,0.0,0.0])
print a.dtype
print b.dtype
print c
print c.dtype

## int64
## float64
## [ 0.  0.  0.  0.]
## float64


## Indexing Vectors

One major difference between python and R is how they address the element in the vector. In python the element addresses start with zero, so the first element of vector a is a[0], the second is a[1], etc.

import numpy as np
d = np.array(range(1,5))
print d
print d[0], d[3]

## [1 2 3 4]
## 1 4


In R the element addresses follows the ordinal value thus starting from one. Consequently the first element of vector a is a[1], the second is a[2], etc.

d = seq(4)
print(d)
cat(d[1], d[4], sep=" ")

## [1] 1 2 3 4
## 1 4


## Be careful!!!!

Python and R have the same method range() but they do different things. In python range() returns a list containing an arithmetic progression of integers. range(i, j) returns $([i, i+1, i+2,\ldots , j-1])$ and the default is i=0.

f = range(5)
g = range(2,5)
print f
print g

## [0, 1, 2, 3, 4]
## [2, 3, 4]


In R the methods similar to range() are seq(), seq_along(), seq_len() (please check the R documentation to see the differences between them) which generates regular sequences. However the default starting value is 1.

f = seq(5)
g = seq(2,5)
print(f)
print(g)

## [1] 1 2 3 4 5
## [1] 2 3 4 5


The method range() in R returns a vector containing the minimum and maximum of all the given arguments.

range(f)
range(5)

## [1] 1 5
## [1] 5 5


# Think Python AND R and not just PYTHON OR R: From NULL to String types

Continuing the saga of types in python and R, in this short post i will talk briefly about the NULL object (or NoneType in python) and how python and R handle the String type. I am using python 2.7.13 and R 3.3.3, both 64-bit on Ubuntu 16.04 and I am also using the free book called A Hands-On Introduction to Using Python in the Atmospheric and Oceanic Sciences by prof. Johnny Lin as a guide.

## The NULL/None type

The NoneType in python and the NULL object in R are basically used to signify the absence of a value in many situations. It is often returned by expressions and functions whose value is undefined.  Because variables are dynamically typed, objects with value NULL can be changed by replacement operators and will be coerced to the type of the right-hand side. This object is also good to “safely” initialize a parameter, making sure to set the variable to a real value later.
For example, to initialize a variable to NULL (or None in python), and if later on the code tries to do an operation with the variable before the variable has been reassigned to a non-Null Type (or non-NoneType in python) variable, the interpreter will give an error.

In addition because NULL (None) is a special type (object) it has a relational operator to test if an object is NULL (or None). Using R:

a = NULL
a == NULL
is.null(a)

## logical(0)
## [1] TRUE


And using pyhton:

a = None
print a == None
print a is None

## True
## True


In python it is possible to use the common relational operator == but it is not recommended. One should use is instead of ==. More information about why here and an example here.

## String variables

In python or R string variables (character vectors) are created by setting text in either paired single or double quotes.
Python uses the operator (+) to join strings together:

a = "Hello"
b = "World"
print a
print b
print a + b

## Hello
## World
## HelloWorld


However R does not use the same operator. There are diffrent forms to concatenate strings in R and one of the simplest way is to use the functions paste() (by default includes a space caracther between the strings but this can be easily changed) or paste0().

paste(a,b)
paste0(a,b)

## [1] "Hello World"
## [1] "HelloWorld"


# Think Python AND R and not just PYTHON OR R: More about types

Last post I talked a little bit about the two most used programming languages for machine learning (python and R) and how they handle operators and types. In this post i will extend the talk about types in python and R. In particular logical operators and Boolean. Here I am using python 2.7.13 and R 3.3.3, both 64-bit on a Ubuntu 16.04 and I am using the free book called A Hands-On Introduction to Using Python in the Atmospheric and Oceanic Sciences by prof. Johnny Lin as a guide.

## Logical operators

The logical operators are <, <=, >, >=, == for exact equality and != for inequality (valid for both languages). However there are some differences between python and R when comparing logical expressions. In R if test1 and test2 are logical expressions, then test1 & test2 is their intersection (“and”), test1 | test2 is their union (“or”), and !test1 is the negation of test1. Thus:

a = TRUE
b = FALSE
a & b
a | b
!a

## [1] FALSE
## [1] TRUE
## [1] FALSE


In python test1 and test2 is their intersection (“and”), test1 or test2 is their union (“or”), and not test1 is the negation of test1.

a = True
b = False
print(a and b)
print(a or b)
print(not a)

## False
## True
## False


## Boolean variables

Python and R are case sensitive, so capitalization matters!!!! Therefore, TRUE != True. R also allows the use of T and F but it is not recommended. From the R documentation:

The elements of a logical vector can have the values TRUE, FALSE, and NA (for “not available”). The first two are often abbreviated as T and F, respectively. Note however that T and F are just variables which are set to TRUE and FALSE by default, but are not reserved words and hence can be overwritten by the user. Hence, you should always use TRUE and FALSE.

Thus doing a simple example in R:

c = T
a == c

## [1] FALSE


And this is why it is NOT recommended:

T = 10
a == T

## [1] FALSE


In some languages, the integer value zero is considered FALSE (or False) and the integer value one is considered TRUE (or True). From the R documentation:

Logical vectors may be used in ordinary arithmetic, in which case they are coerced into numeric vectors, FALSE becoming 0 and TRUE becoming 1.

Doing a simple example in R:

a == TRUE
b == FALSE
10 + a
10 + b

## [1] TRUE
## [1] TRUE
## [1] 11
## [1] 10


The Python’s version I am using here (2.7.13) follows the same convention:

print(1 == a)
print(0 == b)
print(10 + a)
print(10 + b)

## True
## True
## 11
## 10


How about the operators? Lets use the same rule with R:

a & 1
a & 0
1 & a
2 & a
0 & a

## [1] TRUE
## [1] FALSE
## [1] TRUE
## [1] TRUE
## [1] FALSE


And using python:

print(a and 1)
print(a and 0)
print(1 and a)
print(2 and a)
print(0 and a)

## 1
## 0
## True
## True
## 0


Remember Python and R are dynamically typed but they sometimes handle variables in a different way? You can click here and see how python handles the truth value testing.

## Similarities not so similar

As a final remark I’d like to mention about the similarities not so similar of the operators & and | in R and python. Yes, python also has the same operators but they are the bitwise logical operators, which is slight different of what we are doing here. For example:

print(a & 1)
print(a & 0)
print(1 & a)
print(2 & a)
print(0 & a)

## 1
## 0
## 1
## 0
## 0


In R the bitwise operators are bitwAnd() and bitwOr(), but this is another post. Click here for more information about bitwise operators in python and R.

# Think Python AND R and not just PYTHON OR R: basic operators could generate different results

Nowadays, probably the two most used programming languages for machine learning are python and R. Both have advantages and disadvantages. With tools like rpy2 or Jupyter with the IRKernel, it is possible to integrate R and python into a single application and make them “talk” with each other. However, it is important to know how they work individually before the connection of these two programming languages. I will try to show some of the similarities and differences between the commands, functions and environment. For example, both languages could have very similar commands but these commands could lead to different results.

There are hundreds books about python and R with different flavours. Basic, advanced, applied, how to, free, paid, master, ninja, etc. Because I used a lot of programming applied to real world scenario I randomly biased decided to, use as a initial guide, a free book called A Hands-On Introduction to Using Python in the Atmospheric and Oceanic Sciences by prof. Johnny Lin (the main idea is to reproduce a machine learning portable code so i will change the reference later). Thus I will follow some examples of this book and give the insights about the python/R relation.

Nevertheless, It is imperative to know what version of the programming language one is using. Commands, types, syntax, can change over versions. Here I am using python 2.7.13 and R 3.2.2, both 64-bit on a Ubuntu 16.04.

## Basic operators

In R, the elementary arithmetic operators are the usual +, -, *, / and ^ for raising to a power. The only difference to python is the exponentiation operator which is **.

## Basic variables

Python and R are dynamically typed, meaning that variables take on the type of whatever they are set to when they are assigned. Additionally, the variable’s type can be changed (at run time) without changing the variable name. Lets start with two of the most important basic types: integer and float (called double in R). Here it is possible to see the first few differences between the languages. The integer type from the R documentation:

Integer vectors exist so that data can be passed to C or Fortran code which expects them, and so that (small) integer data can be represented exactly and compactly. Note that current implementations of R use 32-bit integers for integer vectors, so the range of representable integers is restricted to about +/-2*10^9: doubles can hold much larger integers exactly.

There are two integers types in python: plain and long.

Plain integers (also just called integers) are implemented using long in C, which gives them at least 32 bits of precision (sys.maxint is always set to the maximum plain integer value for the current platform, the minimum value is -sys.maxint – 1). Long integers have unlimited precision.

How about the float (or double) types? From both programming languages ultimately how double (or float) precision numbers are handled is down to the CPU/FPU and compiler (i.e. for the machine on which your program is running).

OK lets try a simple example. This example is the same as Example 4 on chapter 3 of our guide book. Lets say we have the following variables:

a = 3.5
b = -2.1
c = 3
d = 4


If we run the operators described above in python we have:

print(a*b) #case 1
print(a*d) #case 2
print(b+c) #case 3
print(a/c) #case 4
print(c/d) #case 5

## -7.35
## 14.0
## 0.9
## 1.16666666667
## 0


Repeating the same steps for R we obtain:

## [1] -7.35
## [1] 14
## [1] 0.9
## [1] 1.166667
## [1] 0.75


On cases 2, 4 and 5 we had different results. On case 4, the difference should be related to the float/double representation. Thus it is expected a difference on precision. However it does not mean that R has less precision than python in this example. It could be only the way R shows the variable to the user. Yes, unfortunately R can mislead you. For example, on case 2 the numbers are technically the but they are shown in a different way. The fact that R shows $14$ instead of  $14.0$ does not mean that the value is integer and not double. Let’s use the functions is.integer() and is.double() to check the type of the variable of the result on case 2.

is.integer(a*d)
is.double(a*d)

## [1] FALSE
## [1] TRUE


Remember when the “dynamically typed”? The programming language automatically decides what type a variable is based on the value operation. Again from the python documentation:

Python fully supports mixed arithmetic: when a binary arithmetic operator has operands of different numeric types, the operand with the “narrower” type is widened to that of the other, where plain integer is narrower than long integer is narrower than floating point is narrower than complex. Comparisons between numbers of mixed type use the same rule.

That explains why in case 2 we have a float. You can check using the function isinstance().

print(isinstance( a*c, ( int, long ) ))
print(isinstance( a*c, ( float ) ))

## False
## True


How about case 5? Case 5 is a little bit more interesting. For python we are dealing with 2 integers, thus the result should be integer. That is why we have $0$ because python does integer division and returns only the quotient.

print(isinstance( c/d, ( int, long ) ))
print(isinstance( c/d, ( float ) ))

## True
## False


How about R? Here is the reason (from the documentation):

For most purposes the user will not be concerned if the “numbers” in a numeric vector are integers, reals or even complex. Internally calculations are done as double precision real numbers, or double precision complex numbers if the input data are complex.

Thus, it is important to keep this in mind because you can have different results.

## Similarities not so similar

As a final remark I’d like to mention about the similarities not so similar of the operator ^ in and python. Yes, python also has the same operator but it is the bitwise XOR  operator, which is different of exponentiation. Click here for more information about bitwise operators in python.

# Be careful what you wish for: Error measures for regression using R

Almost 10 years ago I was working with evolutionary strategies for tuning neural network for time series prediction when I became curious about error measures and the effects on the final forecast. In general, evolutionary algorithms use a fitness function that is based on a error measure. The objective is to get better individual(s) minimizing (or maximizing) the fitness function. Thus, to determine which model is “the best”, the performance of a trained model is evaluated against one or more criteria (e.g. error measure). However, the relation between the lowest error and “best model” is complex and should be applied according with the desirable goal (i.e. forecasting average, forecasting extremes, deviance measures, relative errors, etc.).

There is a journal paper which gives the description of the errors used in the ‘qualV’ package. This package has several implementations of quantitative validation methods. The paper (which is very interesting by the way), also has some examples of how the final results of the errors measures change when dealing with noise, shifts, nonlinear scaling, etc.

The objective of this post is just to show the problem, and raise the awareness when measuring the best model based on error only. Sometimes the minimization of one error measure does not guarantee the minimization of all other error measures and it even could lead to a pareto front. Here i am using some of the functions described in the paper and for simplicity i am comparing here only 4 errors measures: mean absolute error (MAE), root-mean-square error (RMSE), correlation coefficient (r) and mean absolute percentage error (MAPE). Each error measure is measuring a distinct characteristic of the time series and each of them has strong and weak points. I am using R version 3.3.2 (2016-10-31) on Ubuntu 16.04.

Lets say we have the function with the original signal given by:
$y=a\sin(\pi x b)+s.$

Using x=[0,1], a=1.5, b=2, and s=0.75 then:

x = seq(0,1,by=.005)
ysignal = 1.5*sin(pi*2*x)+0.75
plot(x,ysignal,main = "Main signal")


We should change the original signal and check how this will affect the final result. If the “forecast” is the same as the signal then all the errors should be 0. Thus applying some noise to the signal s=(0.75+noise), where noise comes from a Gaussian function with mean=0 and standard deviation =0.2, and comparing with the original signal we get:

library(qualV)
n = 0.2 #noise level
noise = rnorm(length(x),sd=n)
ynoise = ysignal+noise
par(mfrow=c(1,2))
range.yy <- range(c(ysignal,ynoise))
plot(ynoise,ysignal,ylim=range.yy,xlim=range.yy,main = "Signal vs Forecast") 

round(MAE(ysignal,ynoise),2)
## [1] 0.16
round(RMSE(ysignal,ynoise),2)
## [1] 0.21
round(cor(ysignal,ynoise),2)
## [1] 0.98
round(MAPE(ysignal,ynoise),2)
## [1] 40.65


## Case (ii): Shifting the signal

Lets apply a shift on the values of the original signal. With s=0.95 we have:

yshift = ysignal+0.2


round(MAE(ysignal,yshift),2)
## [1] 0.2
round(RMSE(ysignal,yshift),2)
## [1] 0.2
round(cor(ysignal,yshift),2)
## [1] 1
round(MAPE(ysignal,yshift),2)
## [1] 60.95


## Case (iii): shift + rescale

Lets apply a shift and also rescale the values of the original signal. Doing a=0.8 and s=0.95 we have:

yresshift = 0.8*ysignal+0.2


round(MAE(ysignal,yresshift),2)
## [1] 0.19
round(RMSE(ysignal,yresshift),2)
## [1] 0.22
round(cor(ysignal,yresshift),2)
## [1] 1
round(MAPE(ysignal,yresshift),2)
## [1] 61.66


## Case (iv): Changing the frequency

In this case lets vary slightly the frequency of the original signal making b=2.11:

yfreq = 1.5*sin(pi*2.11*x)+0.75


round(MAE(ysignal,yfreq),2)
## [1] 0.17
round(RMSE(ysignal,yfreq),2)
## [1] 0.22
round(cor(ysignal,yfreq),2)
## [1] 0.98
round(MAPE(ysignal,yfreq),2)
## [1] 89.33


Each case has the original series (in black) and the possible “forecast” (in red). I also plotted the original series (signal) versus the residual series. Which case would you pick as the best forecast? What is your assumption?

# Installing R package gputools and cuda 8.0 on Ubuntu 16.04

This is a quick tutorial of how to install the R package ‘gputools’ version 1.1 using R version 3.3.2 (2016-10-31) and cuda 8.0 on Ubuntu 16.04. Most of these versions are new so I did some search on the internet and I could not find a tutorial about that. However most of this tutorial is based on this page which is for ‘gputools’ version 0.28 and cuda 7.0 on Ubuntu 15.04. At the end I just changed a few lines.

I have tested it on a ASUS ROG G752VM with NVIDIA GeForce GTX 965M graphics card. The instruction assumes you have the necessary CUDA compatible hardware support. In my case I also installed the NVIDIA driver 367.57 first. My computer was new so I did not have any nvidia driver or compatibility issues. However I strongly recommend to look on the internet how to remove the old drivers first, before install the new ones (things like sudo apt-get purge nvidia-cuda*).

## Installing CUDA 8.0

First, to install CUDA 8.0 we can do:

wget https://developer.nvidia.com/compute/cuda/8.0/prod/local_installers/cuda-repo-ubuntu1604-8-0-local_8.0.44-1_amd64-deb
sudo dpkg -i cuda-repo-ubuntu1604-8-0-local_8.0.44-1_amd64.deb
sudo apt-get update
sudo apt-get install cuda

## Environment Variables

I tried to install the gputools package without adding the variables to the environment and i got an error related to nvcc. Thus, as part of the CUDA environment, we should add the nvcc compiler in the .bashrc file of your home folder.

export CUDA_HOME=/usr/local/cuda-8.0
export LD_LIBRARY_PATH=${CUDA_HOME}/lib64 PATH=${CUDA_HOME}/bin:${PATH} PATH=${CUDA_HOME}/bin/nvcc:${PATH} export PATH ## Installing gputools version 1.1 The fastest way to install gpuplots if you are using R version 3.3.2 is: install.packages('gputools') Now my tutorial differs a bit more from the tutorial I mentioned before. I received the message: rinterface.cu:1:14: fatal error: R.h: No such file or directory #include So we have to check where R header dir location is. First lets locate the file R.h: locate \/R.h ## /usr/share/R/include/R.h Then next step is to tell to gputools where the R.h is located. Thus it is necessary to change a line in the source package. First download and extract the source package: wget http://cran.r-project.org/src/contrib/gputools_1.1.tar.gz tar -zxvf gputools_1.1.tar.gz Look into the folder you just extracted then open the file configure on your favourite Ubuntu editor to replace the string R_INCLUDE="${R_HOME}/include" for R_INCLUDE="/usr/share/R/include" (which is the location of my R.h file).

The two finals steps are compress the modified source code

tar -czvf gputools_1.1_new.tar.gz gputools

and install the modified package

install.packages("~/gputools_1.1_new.tar.gz", repos = NULL, type = "source")

I had lots of warning messages but no error.

## Testing performance

Now we can try some simple benchmarks and see how much time the CPU and gpu time will spend. First a small matrix multiplication:

library(gputools)

magnitude <- 10
dimA <- 2*magnitude;dimB <- 3*magnitude;dimC <- 4*magnitude
matA <- matrix(runif(dimA*dimB), dimA, dimB)
matB <- matrix(runif(dimB*dimC), dimB, dimC)

system.time(matA%*%matB);
##    user  system elapsed
##   0.000   0.000   0.001
system.time(gpuMatMult(matA, matB))
##    user  system elapsed
##   0.076   0.140   0.215

then using larger matrices:

magnitude <- 1000
dimA <- 2*magnitude;dimB <- 3*magnitude;dimC <- 4*magnitude
matA <- matrix(runif(dimA*dimB), dimA, dimB)
matB <- matrix(runif(dimB*dimC), dimB, dimC)

system.time(matA%*%matB);
##    user  system elapsed
##  15.552   0.028  15.579
system.time(gpuMatMult(matA, matB))
##    user  system elapsed
##   0.792   0.124   0.914

# How I am Making Music about Climate Change

People often ask me how I am doing my music about climate change. What are the foundations of you work? What are exactly you talking and writing about? I will try to clarify a little today.

In my music I have to main aspects, the physical aspect and the human/social aspect. The physical aspect is what it is happening or what it will happen physically with the environment. This is what I am using to write the music. Musical notes, chords, changes, rhythmic are my artistic interpretation of the physical aspects.

The second aspect is the human/social factor, or what is happening or will happen with us, humans. How we are reacting to those changes. What it is changing now for us, what it is not changing, etc. This is what the lyrics are about. Books like Tropic of Chaos, give some insights what is happening right now with us, humans and the social aspects of climate change. This book talks about how climate change is acting on humans’ social aspects in Africa, Americas and parts of Europe and Asia. It also gives some historical background and some possible future scenarios (some of them more chaotic than others).

Now, the recordings are on full speed and it won’t take long to release the first song. I hope you enjoy it.