# Notebook 1: Python Overview

## Motivations

Spark provides multiple *Application Programming Interfaces* (API), i.e. the interface allowing the user to interact with the application. The main APIs are [Scala](http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.package) and [Java](http://spark.apache.org/docs/latest/api/java/index.html) APIs, as Spark is implemented in Scala and runs on the Java Virtual Machine (JVM).
Since the 0.7.0 version, a [Python API](http://spark.apache.org/docs/latest/api/python/index.html) is available, also known as PySpark. An [R API](http://spark.apache.org/docs/latest/api/R/index.html) has been released with 1.5.0 version. During this course, you will be using Spark 2.3.

Throughout this course we will use the Python API for the following reasons:
- R API is still too young and limited to be relied on. Besides, R can quickly become a living hell when using immature libraries.
- Many of you are wanabee datascientists, and Python is a must-know language in data industry.
- Scala and Java APIs would have been quite hard to learn given the length of the course and your actual programming skills.
- Python is easy to learn, and even easier if you are already familiar with R.

The goal of this session is to teach (or remind) you the syntax of basic operations, control structures and declarations in Python that will be useful when using Spark. Keep in mind that we do not have a lot of time, and that you should be able to create functions and classes and to manipulate them at the end of the lab. If you don't get that, the rest of the course will be hard to follow. Don't hesitate to ask for explanations and/or more exercises if you don't feel  confident enough at the end of the lab.


*This introduction relies on [Learn Python in Y minutes](https://learnxinyminutes.com/docs/python/)*

## Introduction

Python is a high level, general-purpose interpreted language. Python is meant to be very concise and readable, it is thus a very pleasant language to work with. 

## 1. Primitive Datatypes and Operators
### Points: 1 pt
Read section 1 of [Learn python in Y Minutes](https://learnxinyminutes.com/docs/python/) (if you already know Python, you can skip this step). Then, replace `???` in the following cells with your code to answer the questions. To get started, please run the following cell.

In [5]:
# Run this cell, it loads a Test object
# that will allow you to check your code
# for some questions.
import unittest

test = unittest.TestCase()

Compute 4 + 8

In [None]:
???

Compute 4 * 8

In [None]:
???

Compute 4 / 8 (using the regular division operation, not integer division)

In [None]:
???

Check if the variable `foo` is None:

In [None]:
foo = None
foo ??? None  # => True

## 2. Variables and Collections
### Points: 2 pts
### Bonus: 2 pts
Same as before, read the corresponding section, and answer the questions below.

From now on, when you will be asked to print something, please use the print statement.

Declare a variable containing a float of your choice and print it.

In [None]:
???

In [None]:
# Create a list containing strings and store it in a variable
bar = ???
# Append a new string to this list
???
# Append an integer to this list and print it
???
print(bar)

Note that the modifications on list objects are performed inplace, i.e.

    li = [1, 2, 3]
    li.append(4)
    li  # => [1, 2, 3, 4]

In [None]:
# Mixing types inside a list object can be a bad idea depending on the situation.
# Remove the integer you just inserted in the list and print it
???
print(bar)

In [None]:
# Print the second element of the list
print(???)

You can access list elements in reverse order, e.g.

    li[-1]  # returns the last element of the list
    li[-2]  # returns the second last element of the list
    
and so on...

In [None]:
# Extend your list with new_list and print it
new_list = ["We", "are", "the", "knights", "who", "say", "Ni", "!"]
???
print(bar)

In [None]:
# Replace "Ni" by "Ekke Ekke Ekke Ekke Ptang Zoo Boing" in the list and print it
???
print(bar)

In [None]:
# Compute the length of the list and print it
???

What is the difference between lists and tuples?


*short* answer here

In [None]:
# Create a dictionary containing the following mapping:
# "one" : 1
# "two" : 2
# etc. until you reach "five" : 5
baz = ???

In [None]:
# Check if the key "four" is contained in the dict
# If four is contained in the dict, print the associated value
???

First, make each letter in gibberish appear once using a collection.

In [None]:
gibberish = list("fqfgsrhrfeqluihjgrshioprqoqeionfvnorfiqeo")

# Answer here

*BONUS*: +2 pts Find all the unique letters contained in gibberish. Your answer must fits in one line.

In [None]:
# Hint: this import will help you with the bonus.

from collections import Counter

In [None]:
test.assertEqual(set(unique_letters), set('vjlup'), 'unique letters')

You should now be able to answer the following problem using dictionaries, lists and sets. Imagine you owe money to your friends because your forgot your credit card last time you went out for drinks. You want to remember how much you owe to each of them in order to refund them later. Which data structure would be useful to store this information? Use this data structure and fill it in with some debt data in the cell below:

In [None]:
debts = ???

Another party night with more people, yet you forgot your credit card again... You meet new friends who buy you drinks. Create the same data structure as above with different data, i.e. include friends that were not here during the first party.

In [None]:
debts_2 = ???

Count the number of new friends you made that second night. Print the name of the friends who bought you drinks during the second party, but not during the first.

In [None]:
new_friends = ??? # should fit in one line
nb_new_friends = ??? # should fit in one line
print(new_friends)

## 3. Control flow
### Points: 3 pts
Same as before, read the corresponding section, and answer the questions below.
You can skip the paragraph on exceptions for now.

In [None]:
# Code the following:
# if you have made more than 5 friends that second night, 
# print "Yay! I'm super popular!", else, print "Duh..."
???

In [None]:
# Now, thank each new friend iteratively, i.e.
# print "Thanks <name of the friend>!" using loops and string formatting (cf. section 1)
???

In [None]:
# Sum all the number from 0 to 15 (included) using what we've seen so far (i.e. without the function sum() )
sum_to_fifteen = 0
???
    
test.assertEquals(sum_to_fifteen, 120)

In [None]:
# Note: you can break a loop with the break statement
for i in range(136):
    print i
    if i >= 2:
        break

In [None]:
# enumerate function can be very useful when dealing with iterators:
for i, value in enumerate(["a", "b", "c"]):
    print(value, i)

__Q__: Find a Python function that allows to iterate over two collections in the same time stopping when the smallest collection is finished.

## 4. Functions

### Points: 5,5 pts
Things are becoming more interesting. Read section 4. It's ok if you don't get the args/kwargs part. Be sure to understand basic function declaration and anonymous function declaration. Higher order functions, maps, and filters that will be covered during the next lab use massively lambda functions.

Write a Python function that checks whether a passed string is palindrome or not. Note: a palindrome is a word, phrase, or sequence that reads the same backward and forward, e.g. "madam" or "nurses run". Hint: strings are lists of characters e.g.

    a = "abcdef"
    a[2] => c
    
If needed, here are [some tips about string manipulation](http://www.pythonforbeginners.com/basics/string-manipulation-in-python).

In [None]:
def isPalindrome(string_input):
    ???

test.assertEqual(isPalindrome('aza'), True, "Simple palindrome") 
test.assertEqual(isPalindrome('nurses run'), True, "Palindrome containing a space") 
test.assertEqual(isPalindrome('palindrome'), False, "Not a palindrome") 

Write a Python function to check whether a string is pangram or not. Note: pangrams are words or sentences containing every letter of the alphabet at least once. For example: "The quick brown fox jumps over the lazy dog".

[Hint](https://docs.python.org/2/library/stdtypes.html#set-types-set-frozenset)

In [None]:
import string

# In this function, "alphabet" argument has a default value: string.ascii_lowercase
# string.ascii_lowercase contains all the letters in lowercase.
def ispangram(string_input, alphabet=string.ascii_lowercase):  
    ??? 

    

test.assertEqual(ispangram('The quick brown fox jumps over the lazy dog'), True, "Pangram")
test.assertEqual(ispangram('The quick red fox jumps over the lazy dog'), False, "Pangram")   

### Python lambda expressions

When evaluated, lambda expressions return an anonymous function, i.e. a function that is not bound to any variable (hence the "anonymous"). However, it is possible to assign the function to a variable. Lambda expressions are particularly useful when you need to pass a simple function into another function. To create lambda functions, we use the following syntax

    lambda argument1, argument2, argument3, etc. : body_of_the_function

For example, a function which takes a number and returns its square would be

    lambda x: x**2
    
A function that takes two numbers and returns their sum:

    lambda x, y: x + y
    
`lambda` generates a function and returns it, while `def` generates a function and assigns it to a name.  The function returned by `lambda` also automatically returns the value of its expression statement, which reduces the amount of code that needs to be written.

Here are some additional references that explain lambdas: [Lambda Functions](http://www.secnetix.de/olli/Python/lambda_functions.hawk), [Lambda Tutorial](https://pythonconquerstheuniverse.wordpress.com/2011/08/29/lambda_tutorial/), and [Python Functions](http://www.bogotobogo.com/python/python_functions_lambda.php).

Here is an example:

In [None]:
# Function declaration using def
def add_s(x):
    return x + 's'

print(type(add_s))
print(add_s)
print(add_s('dog'))

In [None]:
# Same function declared as a lambda
add_s_lambda = lambda x: x + 's'
print(type(add_s_lambda))
print(add_s_lambda)  # Note that the function shows its name as <lambda>
print(add_s_lambda('dog'))

In [None]:
# Code a function using a lambda expression which takes
# a number and returns this number multiplied by two.
multiply_by_two = lambda ???
print(multiply_by_two(5))

Test.assertEqual(multiply_by_two(10), 20, 'incorrect definition for multiply_by_two')

Observe the behavior of the following code:

In [None]:
def add(x, y):
    """Add two values"""
    return x + y

def sub(x, y):
    """Substract y from x"""
    return x - y

functions = [add, sub]
print(functions[0](1, 2))
print(functions[1](3, 4))

Code the same functionality, using lambda expressions:

In [7]:
lambda_functions = [lambda ??? ,  lambda ???]

test.assertEqual(lambda_functions[0](1, 2), 3, 'add lambda_function')
test.assertEqual(lambda_functions[1](3, 4), -1, 'sub lambda_function')

In [11]:
# Example:
add_two_1 = lambda x, y: (x[0] + y[0], x[1] + y[1])
add_two_2 = lambda x0, x1, y0, y1: (x0 + y0, x1 + y1)
print('add_two_1((1,2), (3,4)) = {0}'.format(add_two_1((1,2), (3,4))))
print('add_two_2((1,2), (3,4)) = {0}'.format(add_two_2(1, 2, 3, 4)))

add_two_1((1,2), (3,4)) = (4, 6)
add_two_2((1,2), (3,4)) = (4, 6)


In [None]:
reverse2 = lambda x0, x1, x2: x0+x1+x2

In [None]:
reverse2(1, 2, 3)

In [None]:
# Use both syntaxes to create a function that takes in a tuple of three values and reverses their order
# E.g. (1, 2, 3) => (3, 2, 1)
reverse1 = lambda x: ???
reverse2 = lambda (x0, x1, x2): ???

Test.assertEqual(reverse1((1, 2, 3)), (3, 2, 1), 'reverse order, syntax 1')
Test.assertEqual(reverse2((1, 2, 3)), (3, 2, 1), 'reverse order, syntax 2')

Lambda expressions allow you to reduce the size of your code, but they are limited to simple logic. The following Python keywords refer to statements that cannot be used in a lambda expression: `assert`, `pass`, `del`, `print`, `return`, `yield`, `raise`, `break`, `continue`, `import`, `global`, and `exec`.  Assignment statements (`=`) and augmented assignment statements (e.g. `+=`) cannot be used either. If more complex logic is necessary, use `def` in place of `lambda`.

## 5. Classes
### Points: 4,5 pts

Classes allow you to create objects. Object Oriented Programming (OOP) can be a very powerful paradigm. If done well, OOP  allows you to improve the modularity and reusability of your code, but that's the subject of an entire other course. 
Here is a *very* short introduction to it.

By convention, class names are written in camel case, e.g. `MyBeautifulClass`, while variable and function names are written in snake case, e.g. `my_variable`, `my_very_complex_function`

Classes contain methods (i.e. functions owned by the class) and attributes (i.e. variables owned by the class). 
When you define a class, first thing to do is to define a specific method, the constructor. In Python, the constructor is called `__init__`. This method is used to create the instances of an object. Example:

    class MyClass:
    
        def __init__(self, first_attribute, second_attribute):
            self.first_attribute = first_attribute
            self.second_attribute = second_attribute
            
This class has two attributes, and one (hidden) method, the constructor. To create an instance of this class, one simply does:

    instance_example = MyClass(1, "foo")
    
Then, the attributes can easily be accessed to:

    instance_example.first_attribute  # => 1
    instance_example.first_attribute  # => "foo"

In [None]:
# Run this example
class MyClass:
    
    def __init__(self, first_attribute, second_attribute):
        self.first_attribute = first_attribute
        self.second_attribute = second_attribute
            
instance_example = MyClass(1, "foo") 
print(instance_example.first_attribute)
instance_example.__init__(3,4)  # In real life, it is rare to reinit an object.
print(instance_example.first_attribute)

`self` denotes the object itself. When you declare a method, you have to pass `self` as the first argument of the method:

class MyClass:
    
    def __init__(self, first_attribute, second_attribute):
        self.first_attribute = first_attribute
        self.second_attribute = second_attribute
   
    def method_baz(self):
        print "Hello! I'm a method! I have two attributes, initialized with values %s, %s"%(self.first_attribute, self.second_attribute)
        
indeed, when we call
    
    instance_example = MyClass(1, "foo") 
    instance_example.method_baz()
    
the `self` object is implicitely passed to `method_baz`as an argument. Think of the method call as the following function call

    method_baz(instance_example)

In [None]:
# Run this example
class MyClass:
    
    def __init__(self, first_attribute, second_attribute):
        self.first_attribute = first_attribute
        self.second_attribute = second_attribute
    
    def class_method(self):
        print("Hello! I'm a method! My class has two attributes, of value {0}, {1}".format(self.first_attribute, self.second_attribute))
            
instance_example = MyClass(1, "foo") 
# Call to a class method
instance_example.class_method()

Now, the tricky part. You can declare **static** methods, i.e. methods that don't need to access the data contained in `self` to work properly. Such methods do not require the `self` argument as they do not use any instance data. They are implemented in the following way:

In [14]:
# Run this example
class MyClass:
    
    def __init__(self, first_attribute, second_attribute):
        self.first_attribute = first_attribute
        self.second_attribute = second_attribute
    
    def class_method(self):
        print("Hello! I'm a method! My class has two attributes, of value {0}, {1}".format(self.first_attribute, self.second_attribute))
         
    @staticmethod
    def static_method():
        print("I'm a static method!")
            
instance_example = MyClass(1, "foo") 
# Call to a class method
instance_example.class_method()
# Call to a static method
instance_example.static_method()

Hello! I'm a method! My class has two attributes, of value 1, foo
I'm a static method!


In [15]:
# Call to a static method without class instanciation
MyClass.static_method()

I'm a static method!


In [16]:
# Call to a class method without class instanciation: raises an error
MyClass.class_method()
# => TypeError: unbound method class_method() must be called with MyClass instance as first argument (got nothing instead)

TypeError: class_method() missing 1 required positional argument: 'self'

You can set attributes without passing them to the constructor:

In [None]:
# Run this example
class MyClass:
    
    default_attribute = 42
    
    def __init__(self, first_attribute, second_attribute):
        self.first_attribute = first_attribute
        self.second_attribute = second_attribute
    
    def method_baz(self):
        print("Hello! I'm a method! I have two attributes, initialized with values %s, %s"%(self.first_attribute, self.second_attribute))
        
    @staticmethod
    def static_method():
        print("I'm a static method!")
            
instance_example = MyClass(1, "foo") 
print(instance_example.default_attribute)

In [None]:
# Write a Python class named Rectangle which is 
# constructed by a length and width 
# and has two class methods
# - "rectange_area", which computes the area of a rectangle.
# - "rectangle_perimeter", which computes the perimeter of a rectangle.
#
# The Rectangle class should have an attribute n_edges equal to 4
# which should not be initialized by the __init__ constructor.
#
# Declare a static method "talk" that returns "Do you like rectangles?" when called

class Rectangle: 
    
    ???


new_rectangle = Rectangle(12, 10)
test.assertEqual(new_rectangle.rectangle_area(), 120, "rectangle_area method")
test.assertEqual(new_rectangle.rectangle_perimeter(), 44, "rectangle_area method")
test.assertEqual(Rectangle.n_edges, 4, "constant attibute")
test.assertEqual(Rectangle.talk(), "Do you like rectangles?", "Rectangle talk static method")

Congratulations, you've reched the end of this notebook. =)