Python vs R

Python_vs_R

Python vs. R

This tutorial summarizes some of the main differences between R and Python. It is meant to help you avoid some of the potential pitfalls if you are coming from an R programming background.

In Python, indexing starts at 0, so the first element of a list is selected by the 0-th index.

In [1]:
lst = ["A", "B", 3.45]
lst[0]
Out[1]:
'A'

R:

lst <- list("A","B", 3)

lst[1]

Output: "A"

Unlike R, the ending index is excluded in Python.

In [2]:
lst[0:1]
Out[2]:
['A']
In [3]:
lst[0:2]
Out[3]:
['A', 'B']

In R, you use {} to define scope. In Python, there are no curlies, and you use indentation to define scope.

R:

printString <- function(x,y) {

print("Hello!")  # indenting this line is not necessary

}

In Python, when you indent, you need to end the line above with a colon. We suggest that you use 4 spaces for indentation, but tabs work just as fine.

In [4]:
def printInput(name):
    if type(name) is str:
        print("String: Hello " + name + '!')
    elif type(name) is int or type(name) is float:
        print("Numeric: Hello " + str(name) + '!')
    else:
        print("We don't greet strangers!")

printInput("world")
printInput(123)
printInput(1.45)
printInput(None)
String: Hello world!
Numeric: Hello 123!
Numeric: Hello 1.45!
We don't greet strangers!

In Python, variables are passed as references to functions. In R, they are passed as values.

If you modify an input variable inside a Python function, it will also change it in the main function. This can result in hard to find bugs if you don't pay attention. However, this makes Python functions faster and more memory efficient.

In [5]:
x = [1, 2, 3]
In [6]:
def lst(x):
   y = x.append(4)
   return y
In [7]:
lst(x)
In [8]:
# Variable x globally changed too!
x
Out[8]:
[1, 2, 3, 4]

The issue is that when you assign the list x to the variable y, you are not creating a copy of x. Rather you are creating a reference to x. This means that when either variable changes (x or y), they both change because they both point to the same address in the memory. To resolve this issue, you need to explicitly tell Python to create a copy of x and call it y. This way, both variables will be independent of each other.

In [9]:
x = [1, 2, 3]
def lst(x):
    y = list(x)  # create a copy of x and not a reference
    y.append(4)  # change the copy
    return y
In [10]:
lst(x)
Out[10]:
[1, 2, 3, 4]
In [11]:
# Variable x is still [1 , 2, 3]
x
Out[11]:
[1, 2, 3]

Assignment is not always what you think.

In [12]:
a1 = [1,1]
a2 = [1,1]
In [13]:
# This simply creates a view: both a and b point to the 
# same location in the computer memory
b = a1
In [14]:
b[0] = 'boo!'
In [15]:
print(a1)
['boo!', 1]

If you want a real copy, do either one of the below:

In [16]:
c = list(a2)
# OR
import copy
c = copy.copy(a2)
# even more confusing, if you want deep copy 
# (that is, also make copies of lists within a list): 
# c = copy.deepcopy(a)
c
Out[16]:
[1, 1]

How to check if two variables point to the same address in the memory:

In [17]:
b is a1
Out[17]:
True
In [18]:
c is a2
Out[18]:
False

Tricky! How to check if two variables have the same value:

In [19]:
c == a2
Out[19]:
True

Python does this for memory efficiency. However, base types will work just fine:

In [20]:
a = 1
b = a
b = 'boo!'
print(a)
1

In R, to perform exponentiation, you can use either the caret symbol or double asterisk. In Python, you can only use double asterisk because ^ is bitwise XOR in Python.

So, here is $2^3$ (notice how you can embed Latex code inside a notebook):

R:

Input: 2**3

Output: 8

Input: 2^3

Output: 8
In [21]:
2**3
Out[21]:
8
In [22]:
2^3
Out[22]:
1

In R, you can usually use dot when naming variables and functions. In Python, you use dot to access methods and attributes of classes and objects. In Python, you should not use dot when naming anything.

R:

my.integer.variable <- 5
In [23]:
a = [1,2,3]
print(a)

a.append(4)
print(a)
[1, 2, 3]
[1, 2, 3, 4]

In R, by default, reshaping of data happens column-wise. The default behaviour in Python is to reshape row-wise. This can cause subtle bugs that are hard to catch.

R:

matrix(0:9, nrow=2, ncol=5)
     [,1] [,2] [,3] [,4] [,5]
[1,]    0    2    4    6    8
[2,]    1    3    5    7    9
In [24]:
import numpy as np
np.arange(10).reshape(2, 5)
Out[24]:
array([[0, 1, 2, 3, 4],
       [5, 6, 7, 8, 9]])

However, you can force Python to do column-wise reshaping by setting the order parameter to 'F' inside the reshape function.

In [25]:
import numpy as np
np.arange(10).reshape(2, 5, order='F')
Out[25]:
array([[0, 2, 4, 6, 8],
       [1, 3, 5, 7, 9]])

www.featureranking.com