Think Python AND R and not just PYTHON OR R: More about types

Last post I talked a little bit about the two most used programming languages for machine learning (python and R) and how they handle operators and types. In this post i will extend the talk about types in python and R. In particular logical operators and Boolean. Here I am using python 2.7.13 and R 3.3.3, both 64-bit on a Ubuntu 16.04 and I am using the free book called A Hands-On Introduction to Using Python in the Atmospheric and Oceanic Sciences by prof. Johnny Lin as a guide.

Logical operators

The logical operators are <, <=, >, >=, == for exact equality and != for inequality (valid for both languages). However there are some differences between python and R when comparing logical expressions. In R if test1 and test2 are logical expressions, then test1 & test2 is their intersection (“and”), test1 | test2 is their union (“or”), and !test1 is the negation of test1. Thus:

a = TRUE
a & b
a | b
## [1] FALSE
## [1] TRUE
## [1] FALSE

In python test1 and test2 is their intersection (“and”), test1 or test2 is their union (“or”), and not test1 is the negation of test1.

a = True
b = False
print(a and b)
print(a or b)
print(not a)
## False
## True
## False

Boolean variables

Python and R are case sensitive, so capitalization matters!!!! Therefore, TRUE != True. R also allows the use of T and F but it is not recommended. From the R documentation:

The elements of a logical vector can have the values TRUE, FALSE, and NA (for “not available”). The first two are often abbreviated as T and F, respectively. Note however that T and F are just variables which are set to TRUE and FALSE by default, but are not reserved words and hence can be overwritten by the user. Hence, you should always use TRUE and FALSE.

Thus doing a simple example in R:

c = T
a == c
## [1] FALSE

And this is why it is NOT recommended:

T = 10
a == T
## [1] FALSE

In some languages, the integer value zero is considered FALSE (or False) and the integer value one is considered TRUE (or True). From the R documentation:

Logical vectors may be used in ordinary arithmetic, in which case they are coerced into numeric vectors, FALSE becoming 0 and TRUE becoming 1.

Doing a simple example in R:

a == TRUE
b == FALSE
10 + a
10 + b
## [1] TRUE
## [1] TRUE
## [1] 11
## [1] 10

The Python’s version I am using here (2.7.13) follows the same convention:

print(1 == a)
print(0 == b)
print(10 + a)
print(10 + b)
## True
## True
## 11
## 10

How about the operators? Lets use the same rule with R:

a & 1
a & 0
1 & a
2 & a
0 & a
## [1] TRUE
## [1] FALSE
## [1] TRUE
## [1] TRUE
## [1] FALSE

And using python:

print(a and 1)
print(a and 0)
print(1 and a)
print(2 and a)
print(0 and a)
## 1
## 0
## True
## True
## 0

Remember Python and R are dynamically typed but they sometimes handle variables in a different way? You can click here and see how python handles the truth value testing.

Similarities not so similar

As a final remark I’d like to mention about the similarities not so similar of the operators & and | in R and python. Yes, python also has the same operators but they are the bitwise logical operators, which is slight different of what we are doing here. For example:

print(a & 1)
print(a & 0)
print(1 & a)
print(2 & a)
print(0 & a)
## 1
## 0
## 1
## 0
## 0

In R the bitwise operators are bitwAnd() and bitwOr(), but this is another post. Click here for more information about bitwise operators in python and R.

If you have any question, suggestion or opinion about this post please feel free to write a comment below.

Think Python AND R and not just PYTHON OR R: basic operators could generate different results

Nowadays, probably the two most used programming languages for machine learning are python and R. Both have advantages and disadvantages. With tools like rpy2 or Jupyter with the IRKernel, it is possible to integrate R and python into a single application and make them “talk” with each other. However, it is important to know how they work individually before the connection of these two programming languages. I will try to show some of the similarities and differences between the commands, functions and environment. For example, both languages could have very similar commands but these commands could lead to different results.

There are hundreds books about python and R with different flavours. Basic, advanced, applied, how to, free, paid, master, ninja, etc. Because I used a lot of programming applied to real world scenario I randomly biased decided to, use as a initial guide, a free book called A Hands-On Introduction to Using Python in the Atmospheric and Oceanic Sciences by prof. Johnny Lin (the main idea is to reproduce a machine learning portable code so i will change the reference later). Thus I will follow some examples of this book and give the insights about the python/R relation.

Nevertheless, It is imperative to know what version of the programming language one is using. Commands, types, syntax, can change over versions. Here I am using python 2.7.13 and R 3.2.2, both 64-bit on a Ubuntu 16.04.

Basic operators

In R, the elementary arithmetic operators are the usual +, -, *, / and ^ for raising to a power. The only difference to python is the exponentiation operator which is **.

Basic variables

Python and R are dynamically typed, meaning that variables take on the type of whatever they are set to when they are assigned. Additionally, the variable’s type can be changed (at run time) without changing the variable name. Lets start with two of the most important basic types: integer and float (called double in R). Here it is possible to see the first few differences between the languages. The integer type from the R documentation:

Integer vectors exist so that data can be passed to C or Fortran code which expects them, and so that (small) integer data can be represented exactly and compactly. Note that current implementations of R use 32-bit integers for integer vectors, so the range of representable integers is restricted to about +/-2*10^9: doubles can hold much larger integers exactly.

There are two integers types in python: plain and long.

Plain integers (also just called integers) are implemented using long in C, which gives them at least 32 bits of precision (sys.maxint is always set to the maximum plain integer value for the current platform, the minimum value is -sys.maxint – 1). Long integers have unlimited precision.

How about the float (or double) types? From both programming languages ultimately how double (or float) precision numbers are handled is down to the CPU/FPU and compiler (i.e. for the machine on which your program is running).

OK lets try a simple example. This example is the same as Example 4 on chapter 3 of our guide book. Lets say we have the following variables:

a = 3.5
b = -2.1
c = 3
d = 4

If we run the operators described above in python we have:

print(a*b) #case 1
print(a*d) #case 2
print(b+c) #case 3
print(a/c) #case 4
print(c/d) #case 5
## -7.35
## 14.0
## 0.9
## 1.16666666667
## 0

Repeating the same steps for R we obtain:

## [1] -7.35
## [1] 14
## [1] 0.9
## [1] 1.166667
## [1] 0.75

On cases 2, 4 and 5 we had different results. On case 4, the difference should be related to the float/double representation. Thus it is expected a difference on precision. However it does not mean that R has less precision than python in this example. It could be only the way R shows the variable to the user. Yes, unfortunately R can mislead you. For example, on case 2 the numbers are technically the but they are shown in a different way. The fact that R shows 14 instead of  14.0 does not mean that the value is integer and not double. Let’s use the functions is.integer() and is.double() to check the type of the variable of the result on case 2.

## [1] FALSE
## [1] TRUE

Remember when the “dynamically typed”? The programming language automatically decides what type a variable is based on the value operation. Again from the python documentation:

Python fully supports mixed arithmetic: when a binary arithmetic operator has operands of different numeric types, the operand with the “narrower” type is widened to that of the other, where plain integer is narrower than long integer is narrower than floating point is narrower than complex. Comparisons between numbers of mixed type use the same rule.

That explains why in case 2 we have a float. You can check using the function isinstance().

print(isinstance( a*c, ( int, long ) ))
print(isinstance( a*c, ( float ) ))
## False
## True

How about case 5? Case 5 is a little bit more interesting. For python we are dealing with 2 integers, thus the result should be integer. That is why we have 0 because python does integer division and returns only the quotient.

print(isinstance( c/d, ( int, long ) ))
print(isinstance( c/d, ( float ) ))
## True
## False

How about R? Here is the reason (from the documentation):

For most purposes the user will not be concerned if the “numbers” in a numeric vector are integers, reals or even complex. Internally calculations are done as double precision real numbers, or double precision complex numbers if the input data are complex.

Thus, it is important to keep this in mind because you can have different results.

Similarities not so similar

As a final remark I’d like to mention about the similarities not so similar of the operator ^ in and python. Yes, python also has the same operator but it is the bitwise XOR  operator, which is different of exponentiation. Click here for more information about bitwise operators in python.

If you have any question, suggestion or opinion about this post please feel free to write a comment below.

How is Artificial Intelligence Helping the Environment?

Recently, one of the most intelligent man alive, the theoretical physicist Stephen Hawking, said in an interview: “The development of full artificial intelligence could spell the end of the human race.”. Should we be afraid of Artificial intelligence (AI) because the machines could take over the human race (similarly to the terminator)? For the environmental science point of view, AI so far has been really helpful. First what is AI? There are different definitions, but one of my favourites is:

[The automation of] activities that we associate with human thinking, activities such as decision-making, problem solving, learning…
(Hellman 1978)

Thus, it is basically the use of machines (e.g. computers) to solve problems and to help complex decisions. There is a branch of AI called machine learning (ML). It is a scientific discipline which studies computational algorithms that can learn from data. One of the applications of the ML algorithms is to use some particular data to perform classification and numerical regression. But how is ML helping the environment?

Satellites (remote sensing) generate thousands of data every day. Images around the globe with different frequencies band (infrared, microwave, visible to the human eye, etc), time and scales. But How useful can be those images? One example is to detect phytoplankton in the ocean. Phytoplankton are important components to sustain the aquatic food web. The importance of them is beyond of being food for krills. Accurate estimates of chlorophyll concentrations (consequently phytoplankton) are essential for estimating primary productivity, biomass, etc.

Image Credit: NASA

Image Credit: NASA

It is possible to use satellites to detect phytoplankton presence in the ocean due the concentration chlorophyll in the surface water. Each frequency channel has a purpose. For example, at certain wavelengths, sand reflects more energy than green vegetation while at other wavelengths it absorbs more (reflects less) energy. However, even using satellites this is not a easy task. Aerosol concentrations could affect the ocean colour viewed from the satellite. Further complications arise when there are also suspended sediments,and/or dissolved organic matter from decayed vegetation in the water. In addition coastal water quality gradually degrades from increased pollution and human activities. To overcome those problems, ML algorithms such as artificial neural networks and support vector machines are used to automatically classify (separating chlorophyll from aerosols, pollution, etc) and detect the presence of phytoplankton in the ocean.

Credit: Nasa

It is also possible to use remote sensing to classify and detect land cover applying the same algorithms to identify and classify different types of vegetation including forests, dead trees in the forest, forest fires, portion of regenerated trees after forest fires, etc. With accurate information is possible to avoid more deforestation, track urbanization, mitigate diseases, understand and control ecosystems, planning, etc.


These are only small samples of how AI is used in environmental sciences. There are so many contributions that is unfair to give only a few examples about the topic.

More about:


William Hsieh (2009). Machine Learning Methods in the Environmental Sciences Cambridge DOI: 10.1017/CBO9780511627217

Keiner, L., & Yan, X. (1998). A Neural Network Model for Estimating Sea Surface Chlorophyll and Sediments from Thematic Mapper Imagery Remote Sensing of Environment, 66 (2), 153-165 DOI: 10.1016/S0034-4257(98)00054-6

Schiller, H., & Doerffer, R. (2005). Improved determination of coastal water constituent concentrations from MERIS data IEEE Transactions on Geoscience and Remote Sensing, 43 (7), 1585-1591 DOI: 10.1109/TGRS.2005.848410

Dash, J., Mathur, A., Foody, G., Curran, P., Chipman, J., & Lillesand, T. (2007). Land cover classification using multi‐temporal MERIS vegetation indices International Journal of Remote Sensing, 28 (6), 1137-1159 DOI: 10.1080/01431160600784259