Introduction to Apache Spark



2. Apache Spark and RDDs

SparkContext object

You interact with the driver through the SparkContext object. To create a SparkContext object:


from pyspark import SparkConf, SparkContext

conf = SparkConf().setAppName(appName).setMaster(master)
sc = SparkContext(conf = conf)
                    

When using the Virtual Machine notebooks or any Spark interactive shell, SparkContext is automatically created.

GroupByKey
ReduceByKey