Saturday, 26 September 2015

Installation of VM image for Jupyter Ipython Notebook for Pyspark

Process of installing VM image for Jupyter Ipython Notebook for Pyspark :

VirtualBox 4.3.28 (or later).
Make sure you have a virtualbox installed in your machine by the command:
vboxmanage –version
If it is not installed then install it with the following command:
sudo apt-get install virtualbox

Vagrant 1.7.2 (or later).
Make sure you have a vagrant installed in your machine by the command:
vagrant –version
If it is not installed then install it with the following command:
sudo apt-get install vagrant



Create a file named Vagrantfile in the empty directory of your choice having the following code in it:

# -*- mode: ruby -*-
# vi: set ft=ruby :

ipythonPort = 8001                 # Ipython port to forward (also set in IPython notebook config)

Vagrant.configure(2) do |config|
  config.ssh.insert_key = true
  config.vm.define "sparkvm" do |master|
    master.vm.box = "sparkmooc/base"
    master.vm.box_url = "https://atlas.hashicorp.com/sparkmooc/boxes/base/versions/0.0.7.1/providers/virtualbox.box"
    master.vm.box_download_insecure = true
    master.vm.boot_timeout = 900
    master.vm.network :forwarded_port, host: ipythonPort, guest: ipythonPort, auto_correct: true   # IPython port (set in notebook config)
    master.vm.network :forwarded_port, host: 4040, guest: 4040, auto_correct: true                 # Spark UI (Driver)
    master.vm.hostname = "sparkvm"
    master.vm.usable_port_range = 4040..4090

    master.vm.provider :virtualbox do |v|
      v.name = master.vm.hostname.to_s
    end
  end
end


Then run the command vagrant up.

Once the VM is running, to access the notebook, open a web browser to "http://localhost:8001/" (on Windows and Mac) or "http://127.0.0.1:8001/" (on Linux).


Spark and Pyspark-Cassandra connector installation

Skip to end of metadata
Go to start of metadata
Using following commands easily install Java in Ubuntu machine:
$ sudo apt-add-repository ppa:webupd8team/java
$ sudo apt-get update
$ sudo apt-get install oracle-java7-installer
To check the Java installation is successful
$ java -version
It shows installed java version
java version "1.7.0_72"_ Java(TM) SE Runtime Environment (build 1.7.0_72-b14)_ Java HotSpot(TM) 64-Bit Server VM (build 24.72-b04, mixed mode)
In next step is install Scala, follow the following instructions to set up Scala. First download the Scala from here
Copy downloaded file to some location for example /urs/local/src, untar the file and set path variable,
$ wget http://www.scala-lang.org/files/archive/scala-2.10.4.tgz
$ sudo mkdir /usr/local/src/scala
$ sudo tar xvf scala-2.10.4.tgz -C /usr/local/src/scala/
$ vi .bashrc
And add following in the end of the file
export SCALA_HOME=/usr/local/src/scala/scala-2.10.4
export PATH=$SCALA_HOME/bin:$PATH
restart bashrc
$ . .bashrc
To check the Scala is installed successfully
$ scala -version
It shows installed Scala version Scala code runner version 2.10.4 -- Copyright 2002-2013, LAMP/EPFL
Or just type scala. It goes to scala interactive shell
$ scala
scala>
In next step install git.
sudo apt-get install git
Finally download spark ditributaion from here
$ wget http://d3kbcqa49mib13.cloudfront.net/spark-1.4.1.tgz
$ tar xvf spark-1.4.1.tgz
Once spark is dowwnloaded,follow the following command to build spark:
$ sudo build/mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean package
Clone pyspark-connector from Target holding in the directory of your choice and build using sbt:
$ sudo git clone https://github.com/TargetHolding/pyspark-cassandra.git
Building:
Pyspark Cassandra can be compiled using:
$ sudo apt-get install sbt
go to pyspark-cassandra and compile:
$ sbt compile
The package can be published locally with:
$ sbt spPublishLocal
A Java / JVM library as well as a python library is required to use PySpark Cassandra. They can be built with:
$ make dist
This creates:
1) a fat jar with the Spark Cassandra Connector and additional classes for bridging Spark and PySpark for Cassandra data and
2) a python source distribution at:
target/pyspark_cassandra-<version>.jar
target/pyspark_cassandra_<version>-<python version>.egg.
Command to run in spark cluster from pyspark:
$ export SPARK_MASTER_IP=127.0.0.1
$ ./sbin/start-master.sh
Now you can start up a single set of workers. It'll start in the foreground:
$ ./bin/spark-class org.apache.spark.deploy.worker.Worker spark://127.0.0.1:7077
$./pyspark --jars ${PYSPARK_ROOT}/pyspark_cassandra-0.1.5.jar --driver-class-path ${PYSPARK_ROOT}/pyspark_cassandra-0.1.5.jar --py-files ${PYSPARK_ROOT}/pyspark_cassandra-0.1.5-py2.7.egg --conf spark.cassandra.connection.host=host-ip --master spark://127.0.0.1:7077

If you have problem installing scala, then follow the following steps:
This is on Ubuntu 15.04 but should work on 14.04 the same
1) Remove the following lines from your bashrc
export SCALA_HOME=/usr/local/src/scala/scala-2.10.4
export PATH=$SCALA_HOME/bin:$PATH
2) Remove and reinstall scala
sudo rm -rf /usr/local/src/scala
# The following line is only needed if you installed scala another way, if so remove the #
# sudo apt-get remove scala-library scala
wget http://www.scala-lang.org/files/archive/scala-2.11.7.deb
sudo dpkg -i scala-2.11.7.deb
sudo apt-get update
sudo apt-get install scala

Wednesday, 22 July 2015

Creating Cluster of Vagrant Machines


CLUSTER OF VAGRANT MACHINE

Setting up a cluster with one vagrant file:

Make sure you have a virtualbox installed in your machine by the command:
vboxmanage –version
If it is not installed then install it with the following command:
sudo apt-get install virtualbox

Make sure you have a vagrant installed in your machine by the command:
vagrant –version
If it is not installed then install it with the following command:
sudo apt-get install vagrant

Create a file named Vagrantfile in the empty directory of your choice having the following code in it:

Vagrant.configure("2") do |config|
    # Number of nodes to provision
    numNodes = 4
    # IP Address Base for private network
    ipAddrPrefix = "192.168.56.10"

    # Define Number of RAM for each node
    config.vm.provider "virtualbox" do |v|
        v.customize ["modifyvm", :id, "--memory", 1024]
    end

    # Download the initial box from this url
    config.vm.box_url = "http://files.vagrantup.com/precise64.box"

    # Provision Config for each of the nodes
    1.upto(numNodes) do |num|
        nodeName = ("node" + num.to_s).to_sym
        config.vm.define nodeName do |node|
            node.vm.box = "precise32"
            node.vm.network :private_network, ip: ipAddrPrefix + num.to_s
            node.vm.provider "virtualbox" do |v|
                v.name = "Couchbase Server Node " + num.to_s
            end
        end
    end

end


Then run the command vagrant up to up the cluster! This will setup 4 node cluster for you.

Monday, 13 July 2015

D3 Pie-Chart

    Hola!
    This tutorial will give the basics of creating a pie-chart using D3.js. New to D3? No worries. We will give you a brief understanding of D3 too.
    So, what is  D3?
    D3–Data-Driven Documents is a JavaScript library for producing dynamic and interactive data visualizations in web browsers using widely implemented SVG, HTML5, and CSS standards.
    JavaScript D3.js library is embedded within an HTML webpage and uses pre-built JavaScript functions to select elements, create SVG objects, style them, or add transitions, dynamic effects. Briefly, it allows you to dynamically manipulate the properties and attributes of your HTML document elements and it can create and manipulate SVG elements as well.
    We can easily bind large data set to SVG objects using simple D3.js functions.Pie charts are built using SVG paths. The SVG path is a more advanced shape compared to circles and rectangles, since it uses path commands to create any arbitrary shape we want.
    The data can be in various formats, most commonly JSON,CSV geoJSON, but if required, JavaScript functions can be written to read other data formats as well.
    For more information on this, you can visit this link D3-intro
    Until now, you must have understood what D3 is all about.Here we shall describe how to create a pie chart using D3

    pie-chart-1
    As you see in the above figure, a pie chart is composed of multiple arc-like paths, with different fill colours. D3.js provides a helper functions to draw arcs. Arcs are drawn using 4 main parameters: startAngle, endAngle, innerRadius and outerRadius.In case of pie-charts, the innerRadius is zero.
    To draw an arc, first add the following SVG element:
    var width = 550;
    var height = 350;
    var color = d3.scale.category20b();  //builtin range of colors
    var svg = d3.select('#pie_chart').append('svg').attr('width', width).attr('height', height).append('g').attr('transform', 'translate(' + (width / 2) +',' + (height / 2) + ')');
    Now draw an arc using:
    var arc = d3.svg.arc().outerRadius(radius);
    Next, we create a pie element of D3 using:
    var pie = d3.layout.pie().value(function(d,i) { return  pie_data[i]; }).sort(null);
    We now create the path and append it to our SVG element, using     .append(“path”); assign data of pie_data and then assign the “arc” to its “d”     attribute.To append arcs dynamically based on our data, we select path, bind our data to the selections and append new paths accordingly.
    var path = svg.selectAll('path')
    .data(pie(data))
    .enter()
    .append('path')
    .attr('d', arc)
    .attr('fill', function(d, i) {
    return data[i].color;
    });
    where data is:
    [
    { "count": 10,"color":"rgb(0,154,205)" },
    { "count": 20 ,"color":"rgb(139,119,101)"},
    { "count": 30,"color":"rgb(255,140,0)" },
    { "count": 40,"color":"rgb(127,255,0)" }
    ]
    Now that you have understood the basics, you can jump right in to the code:
    Step 1:
    To create a pie chart, we first need the libraries , which will be inside the script tag (script tag which is inside html <head></head> tag):
    <script src="http://code.jquery.com/jquery-2.1.4.min.js"></script> //JQuery
    <script src="https://cdnjs.cloudflare.com/ajax/libs/d3/3.5.5/d3.min.js" charset="utf-8"></script>//D3 
    Next, define a <div id=”pie_chart”> </div> tag.
    Step 2: 
    You must have a sample_data.json file in the directory where your is present.
    sample_data.json
    [
    { "count": 10,"color":"rgb(0,154,205)" },
    { "count": 20 ,"color":"rgb(139,119,101)"},
    { "count": 30,"color":"rgb(255,140,0)" },
    { "count": 40,"color":"rgb(127,255,0)" }
    ]
    Step 3:
    Lastly, we define a Script tag. We will write the actual code here to render the pie-chart.
    <script>
    var width = 550;          //width
    var height = 350;        //height
    var radius = 300/ 2;   //radius of the pie-chart
    var color = d3.scale.category20b();    //builtin range of colors
    var svg = d3.select('#pie_chart')        //create the SVG element inside the <body>
    .append('svg')
    .attr('width', width) //set the width and height of our visualization
    .attr('height', height) // attributes of the <svg> tag
    .append('g')              //create a group to hold our pie chart
    .attr('transform', 'translate(' + (width / 2) +
    ',' + (height / 2) + ')');//move the center of the pie chart from 0, 0 to specified value
    var total=0;
    d3.json("sample_data.json", function(error, data) {
    for(var a=0;a<data.length;a++){
    total=total+parseInt(data[a].count); // simple logic to calculate total of data count value
    console.log(total);
    }
    var pie_data=[];
    for( var a=0;a<data.length;a++){ // simple logic to calculate percentage data for the pie
    pie_data[a]=(data[a].count/total)*100;
    }
    var arc = d3.svg.arc().outerRadius(radius);
    // creating arc element.
    var pie = d3.layout.pie()
    .value(function(d,i) { return pie_data[i]; })
    .sort(null);
    //Given a list of values, it will create an arc data for us
    //we must explicitly specify it to access the value of each element in our data array
    var path = svg.selectAll('path')
    .data(pie(data))
    .enter()
    .append('path')
    .attr('d', arc)
    .attr('fill', function(d, i) {
    return data[i].color;
    });
    //set the color for each slice to be chosen, from the color defined in sample_data.json
    //this creates the actual SVG path using the associated data (pie) with the arc drawing function
    });
    </script>
    Here’s the output !

    pie-chart-2