Home
Videos uploaded by user “Melvin L”
Creating a Power BI report in under 5 mins
 
05:20
Quick (fast paced) demo showing you how you can build an interactive dashboard using Power BI desktop in under 5 minutes Download the free Power BI app from https://powerbi.microsoft.com/en-us/desktop/ Using sample data from http://go.microsoft.com/fwlink/?LinkID=521962
Views: 92556 Melvin L
Install Apache Spark 2.0 - Quick Setup
 
08:34
This video covers how you can install Apache Spark 2.0 using the prebuit package INSTALL SPARK 2.0: (using Prebuilt Packages) ------------------ Prereq: JDK is installed Step 1: Download Apache Spark 2.0 and extract Step 2: Add to Path (SPARK_HOME, PATH) ln -s spark-2.0.1-bin-hadoop2.7 spark sudo nano bashrc export SPARK_HOME=/home/yourname/spark export PATH=$PATH:$SPARK_HOME/bin save and exit . .bashrc Step 3: Verify Spark is working ------------------------------------------------------- Resources: Download Spark http://spark.apache.org/downloads.html Resources: Install Apache Spark on Ubuntu from Source Code (step by step guide) https://www.youtube.com/watch?v=eQ0nPdfVfc0
Views: 14379 Melvin L
Support Vector Machines (SVM) Overview and Demo using R
 
16:57
Quick overview and examples /demos of Support Vector Machines (SVM) using R. The getting started with SVM video covers the basics of SVM machine learning algorithm and then finally goes into a quick demo
Views: 52916 Melvin L
Apache Parquet & Apache Spark
 
13:43
- Overview of Apache Parquet and key benefits of using Apache Parquet. - Demo of using Apache Spark with Apache Parquet
Views: 14439 Melvin L
Setup Elasticsearch, Logstash and Kibana (ELK Stack) using Docker Containers - Step by Step Tutorial
 
15:48
Setup Elasticsearch, Logstash and Kibana (ELK) using Docker Containers.In this video you can see a step by step tutorial on how you can use Docker to setup ELK stack. Scope: Using 3 different Docker images (official Elastic docker images) - Step 1: Setup Elasticsearch container and verify elastic its working docker run -d -p 9200:9200 -p 9300:9300 -it -h elasticsearch --name elasticsearch elasticsearch curl http://localhost:9200/ - Step 2: Setup Kibana container https://www.elastic.co/guide/en/logstash/current/config-examples.html docker run -d -p 5601:5601 -h kibana --name kibana --link elasticsearch:elasticsearch kibana curl http://localhost:9200/_cat/indices - Step 3: Create logstash config file and use it to setup Logstash container docker run -h logstash --name logstash --link elasticsearch:elasticsearch -it --rm -v "$PWD":/config-dir logstash -f /config-dir/logstash.conf curl http://localhost:9200/_cat/indices docker run -d -p 9500:9500 -h logstash2 --name logstash2 --link elasticsearch:elasticsearch --rm -v "$PWD":/config-dir logstash -f /config-dir/logstash2.conf
Views: 50196 Melvin L
Building a REST API using Python and Flask | Flask-RESTful
 
15:21
Overview and demo - Create a REST API quickly using Python and Flask The video covers two options Option 1: Using only Flask Option 2: Using Flask-RESTful extension
Views: 30640 Melvin L
Running R Scripts in PowerBI
 
14:58
Overview of the Nov 2015 PowerBI capability to run R Scripts from within PowerBI. The video demonstrates simple examples of how to return data.frames from R and consume it in PowerBI. We can then use the resulting data to create rich interactive dashboards. A more detailed overview of the webpage scraping in R can be found in my other video https://www.youtube.com/watch?v=tuNuxCjBU3U
Views: 23602 Melvin L
Simple web scraping using R and rvest library – 3 lines of code
 
06:51
Simple demo to illustrate how you can scrape web page content in R using the rvest library.
Views: 13471 Melvin L
Quantmod R package
 
11:41
Overview of the Quantmod R package to retrieve stock data and display charts. Video cover basic commands in the Quantmod package that can be used to pull financial data and then display it on charts along with technical indicators and other charting parameters to filter for the data and change the charting theme
Views: 20170 Melvin L
Install R and R Studio on Ubuntu in 5 mins
 
04:45
Getting started with R on Ubuntu? then this video is for you. Video covers how you can install R and R Studio on Ubuntu operating system The following commands needs to be run to install R sudo apt-get update sudo apt-get install r-base
Views: 18234 Melvin L
Principal Componets Analysis (PCA) in R
 
12:41
Video covers - Overview of Principal Componets Analysis (PCA) and why use PCA as part of your machine learning toolset - Using princomp function in R to do PCA - Visually understanding PCA
Views: 65913 Melvin L
Correlation in R
 
08:55
Demo covers how you can use the correlation functions in R and uses Rs rich visualisation to see and understand correlation.
Views: 9967 Melvin L
Apache Parquet: Parquet file internals and inspecting Parquet file structure
 
24:38
In this video we will look at the inernal structure of the Apache Parquet storage format and will use the Parquet-tool to inspect the contents of the file. Apache Parquet is a columnar storage format available in the Hadoop ecosystem Related videos: Creating Parquet files using Apache Spark: https://youtu.be/-ra0pGUw7fo Parquet vs Avro: https://youtu.be/sLuHzdMGFNA
Views: 11272 Melvin L
Using Gephi to visualise and understand communities
 
07:32
Quick video to show how you can use Gephi (a free graph visualisation tool) to quickly visualise communities in a network and use some of the features in Gephi like modularity, node ranking, layout (Force Atlas 2) and filtering to analyse the network to detect communities, influencers and information brokers
Views: 7209 Melvin L
Apache Spark  - Setting up your development environment
 
18:07
Video covers setting up your development environment for developing standalone Apache Spark applications - Setting up Scala IDE and using Maven project type to build Scala Applications - sbt to build Spark Applications
Views: 32239 Melvin L
R and OpenNLP for Natural Language Processing NLP -  Part 1
 
12:20
Overview and demo of using Apache OpenNLP library in R to perform basic Natural Language Processing (NLP) tasks like string tokenizing, word tokenizing, Parts of Speech (POS) tokenizing This is a getting started guide covering demos of OpenNLP coding in R
Views: 19125 Melvin L
Using Tensorflow with Docker (Demo) | Tensorflow + Jupyter + Docker
 
08:24
Tensorflow with Docker Overview: - Run Tensorflow and Jupyter Notebooks - Docker + TensorFlow Other Resouces: Install Docker on Ubuntu https://www.youtube.com/watch?v=cVoR9rY31EQ Commands: docker run -it -p 8888:8888 tensorflow/tensorflow docker run -it --rm --name tf -v ~/mywork:/notebooks -p 8888:8888 -p 6006:6006 tensorflow/tensorflow (Note you may have to change permissions on your host computer using chmod -R etc) docker exec -it tf tensorboard --logdir tf_logs/
Views: 7585 Melvin L
R Visualisations within Power BI (using R and Power BI)
 
26:23
Video covers the latest update from Microsoft Power BI which supports adding R visualisations within Power BI. The video covers the following: - Overview of R integration with PowerBI - Possible scenarios that you can use both R and Power BI together - Samples of surfacing/integrating simple R plots in PowerBI - Sample of how you can use ggplot2 / other advanced R visualisations in PowerBI - Example of using more advanced R, statistical visualisation within Power BI and creating interactive dashboards - Example of running forecasting in R and visualising it in Power BI
Views: 41032 Melvin L
Getting Started with Cloudera Quick Start Docker Image
 
06:20
Video outlines steps to download and run the Cloudera Quick Start Docker image http://blog.cloudera.com/blog/2015/12/docker-is-the-new-quickstart-option-for-apache-hadoop-and-cloudera/ STEP 1: docker pull cloudera/quickstart:latest STEP 2: docker run --hostname=quickstart.cloudera --privileged=true -t -i -p 8888:8888 -p 80:80 cloudera/quickstart /usr/bin/docker-quickstart STEP 3: browse hue localhost:8888 Optionally start services For a Docker quick start and guide refer to my earlier video https://youtu.be/-tHeJbGP8e0
Views: 8501 Melvin L
Getting Started with AWS S3 CLI
 
17:26
Getting Started with AWS S3 CLI The video will cover the following: Step 1: Install AWS CLI (sudo pip install awscli) Pre-req:Python 2 version 2.6.5+ or Python 3 version 3.3+ Pre-req:pip is installed -curl "https://bootstrap.pypa.io/get-pip.py" -o "get-pip.py" -python get-pip.py Step 2: Configure AWS Credentials (aws configure) Step 3: Run AWS S3 CLI commands (Most common/useful commands) - Create/Remove bucket - Copy and sync files to S3 - Browse S3 bucket - Others: view file content, recursive operations, bucket size aws s3 ls s3://bucket/folder --recursive | awk 'BEGIN {total=0}{total+=$3}END{print total/1024/1024" MB"}'
Views: 16611 Melvin L
Using Spark and Hive - PART 1: Spark as ETL tool
 
07:00
Working with Spark and Hive Part 1: Scenario - Spark as ETL tool Write to Parquet file using Spark Part 2: SparkSQL to query data from Hive Read Hive table data from Spark Create an External Table Query the data from Hive Add new data Query the data from Hive case class Person(name: String, age: Int, sex:String) val data = Seq(Person("Jack", 25,"M"), Person("Jill", 25,"F"), Person("Jess", 24,"F")) val df = data.toDF() import org.apache.spark.sql.SaveMode df.select("name", "age", "sex").write.mode(SaveMode.Append).format("parquet").save("/tmp/person") //Add new data val data = Seq(Person("John", 25,"M")) val df = data.toDF() df.select("name", "age", "sex").write.mode(SaveMode.Append).format("parquet").save("/tmp/person") CREATE EXTERNAL TABLE person ( name String, age Int, sex String) STORED as PARQUET LOCATION '/tmp/person'
Views: 17412 Melvin L
Parquet vs Avro
 
13:28
In this video we will cover the pros-cons of 2 Popular file formats used in the Hadoop ecosystem namely Apache Parquet and Apache Avro Agenda: Where these formats are used Similarities Key Considerations when choosing: -Read vs Write Characteristics -Tooling -Schema Evolution General guidelines -Scenarios to keep data in both Parquet and Avro Avro is a row-based storage format for Hadoop. However Avro is more than a serialisation framework its also an IPC framework Parquet is a column-based storage format for Hadoop. Both highly optimised (vs pain text), both are self describing , uses compression If your use case typically scans or retrieves all of the fields in a row in each query, Avro is usually the best choice. If your dataset has many columns, and your use case typically involves working with a subset of those columns rather than entire records, Parquet is optimized for that kind of work. Finally in the video we will cover cases where you may use both file formats
Views: 15071 Melvin L
Decision Trees vs Ramdom Forest  R Demo
 
05:54
Video compares the performance of Random Forest vs Decision Trees. The sample code is based on R The video builds on top of previous videos a) Random Forest - https://youtu.be/wHnKpykaFR4 b) Decision Tree Classification - https://youtu.be/JFJIQ0_2ijg
Views: 6800 Melvin L
Simple Web Scraping using R
 
08:57
Simple example of using R to extract structured content from web pages. There are several options and libraries that can be considered. if your webpage has data in HTML tables you can use readHTMLTable however in this example the web pages doesnt use HTML tables so we use a straightforward XPath technique to extract page content. We will in the end turn content from web pages into a data frame in R
Views: 33198 Melvin L
NLTK   Basic Text Analytics
 
14:12
Natural Language Processing (NLP) using NLTK and Python to perform basic text analytics such as Word and Sentense Tokenizing, Parts of Speech POS tagging, extracting Named Entities Video covers: Word and Sentense Tokenizer, Parts of Speech POS tokenizer, Named Entities
Views: 19678 Melvin L
SparkR Setup
 
12:40
Video covers how you can start using SparkR and how you can run SparkR based applications from from RStudio Agenda: Environment Setup Demo: - sparkR (shell) - RStudio
Views: 16150 Melvin L
PostgreSQL and Docker - getting started
 
09:15
Tutorial and demo to show you how you can start using Postgres docker containers Video agenda : PostgreSQL with Docker AGENDA ------- 1 : Create a Postgres docker container docker run --name demo -e POSTGRES_PASSWORD=password1 -d postgres 2: Connect and run some queries docker exec -it demo psql -U postgres CREATE DATABASE demo_db1 \c demo_db1 CREATE TABLE demo_t(something int); INSERT INTO demo_t (something) VALUES (1); 3: Automate - run scripts using docker cli - run sql scripts from your host machine/dev machine etc docker run --name demo -v "$PWD"/:/opt/demo/ -e POSTGRES_PASSWORD=password1 -d postgres docker exec -it demo psql -U postgres -c "CREATE DATABASE demo_db2" docker exec -it demo psql -U postgres -f /opt/demo/script_demo1.sql
Views: 9765 Melvin L
Deploying Flask Application on AWS Elastic Beanstalk
 
25:53
Deploy an Python Flask application (eg REST API) on AWS Elastic Beanstalk Video covers: - AWS Beanstalk brief overview - Step 1: Create a new AWS Beanstalk app (AWS Web Console) - Step 2: Upload Flask application to AWS Beanstalk (not using eb cli) Related videos: Building a REST API using Python and Flask - https://youtu.be/s_ht4AKnWZg
Views: 3460 Melvin L
Hive vs Impala -  Comparing Apache Hive vs Apache Impala
 
26:22
Comparison of two popular SQL on Hadoop technologies - Apache Hive and Impala. In the video, we will review some of the architectural design differences between the two and discuss the pro and cons of Cloudera Impala vs Hive. And finally explore scenarios where you can leverage the strengths of Hive and Impala and use it together in hybrid scenarios.
Views: 11098 Melvin L
Decision Tree Classification in R
 
19:21
This video covers how you can can use rpart library in R to build decision trees for classification. The video provides a brief overview of decision tree and the shows a demo of using rpart to create decision tree models, visualise it and predict using the decision tree model
Views: 65760 Melvin L
Spark  Reading and Writing to Parquet Storage Format
 
11:28
Spark: Reading and Writing to Parquet Format -------------------------------------------------------------------------- - Using Spark Data Frame save capability - Code/Approach works on both local HDD and in HDFS environments Related video: Introduction to Apache Spark and Parquet, https://www.youtube.com/watch?v=itm0TINmK9k Code for demo case class Person(name: String, age: Int, sex:String) val data = Seq(Person("Jack", 25,"M"), Person("Jill", 25,"F"), Person("Jess", 24,"F")) val df = data.toDF() import org.apache.spark.sql.SaveMode df.select("name", "age", "sex").write.mode(SaveMode.Append).format("parquet").save("/tmp/person") df.select("name", "age", "sex").write.partitionBy("sex").mode(SaveMode.Append).format("parquet").save("/tmp/person_partitioned/") val sqlContext = new org.apache.spark.sql.SQLContext(sc) val dfPerson = sqlContext.read.parquet("/tmp/person")
Views: 7463 Melvin L
Connecting to Ubuntu using Windows RDP
 
08:57
Short video explains how you can use Windows Remote Desktop Protocol to connect to Ubuntu version 14 by using xrdp service in Linux. Also covers steps to install xfce as desktop environment to overcome the issue of using Unity desktop to avoid getting the grey screen. Commands to use: sudo apt-get update sudo apt-get install ubuntu-desktop sudo apt-get install xrdp sudo apt-get update sudo apt-get install xfce4 need to add xfce4-session to .xsession file sudo service xrdp restart
Views: 59580 Melvin L
Using R and Elastisearch together
 
09:57
Overview and demo of using R with Eastic (Elasticsearch ) API. We cover both calling APIs directly and using JSON serialisation as well as using the elastic package
Views: 2712 Melvin L
Installing Elasticsearch Step by Step
 
08:35
This is a getting started guide to Elasticsearch which focuses on installing Elasticsearch on a development machine.
Views: 15670 Melvin L
Apache Spark   Introduction to Spark Shell and SparkUI
 
23:59
Introduction to Spark Shell and SparkUI Agenda Get familiar with the environment/tools Spark Shell (Scala) Understanding how to run commands SparkContext Run basic commands Understanding RDD data lineage SparkUI Inferring details of Jobs, Stages, Tasks
Views: 17017 Melvin L
Quick introduction to Apache Spark
 
13:45
Quick introduction and getting started video covering Apache Spark. This is a quick introduction to the fundamental concepts and building blocks that make up Apache Spark Video covers the following: - Introduction to Apache Spark (ie What is Spark?) - Spark Abstractions & Concepts (RDD, DAG, SparkContext, Transformations, Actions) - Spark Components (Driver, Worker, Executor, Cluster Manager) for further details see http://spark.apache.org/
Views: 141903 Melvin L
Scrapy - Overview and Demo (web crawling and scraping)
 
13:11
If your getting started with Scrapy or want to understand what Scrapy can do for you then this video is for you. Scrapy tutorial video provides covers the following - What is Scrapy - Why use Scrapy - alternatives to Scrapy - Architecture, components & performance - Quick demo Setup on Ubuntu apt-get update apt-get install -y python python-pip python-dev libxml2-dev libxslt-dev libffi-dev libssl-dev pip install lxml && pip install pyopenssl && pip install Scrapy
Views: 56375 Melvin L
R and OpenNLP for Natural Language Processing NLP - Part 2
 
15:22
Part 2 of the OpenNLP and R series focusing on Entity Extraction and Named Entity Recognition. Overview and demo of using Apache OpenNLP library in R to perform basic Natural Language Processing (NLP) tasks like string tokenizing, word tokenizing, Parts of Speech (POS) tokenizing This is a getting started guide covering demos of OpenNLP coding in R
Views: 8356 Melvin L
Jupyter Notebook using Docker for Data Science (Demo)
 
07:56
A simple guide to running a docker contaner for using Jupyter Notebook for Data Scince work The Notebook supports R, Python and Julia and comes bundled with the most commonly used libraries. The video covers details on how you can add additional packages /libraries for both R and Python. Command to run container: docker run -it --rm --name ds -p 8888:8888 jupyter/datascience-notebook
Views: 5066 Melvin L
R data visualisation using googleVis library (Google Charts API)
 
08:10
Overview of using the Google Charting API within R using googleVis library. Pros and cons of using the library and demos highlighting best cases to use the googleVis library
Views: 4684 Melvin L
Getting Started with Kafka - Concepts & Simple Demo
 
10:32
Overview and Getting Started with Kafka highlighting the key concepts & simple demo to get started GETTING STARTED WITH KAFKA Setting up Kafka ---------------- Start the server ---------------- bin/zookeeper-server-start.sh config/zookeeper.properties bin/kafka-server-start.sh config/server.properties Create a topic -------------- bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test See the topics -------------- bin/kafka-topics.sh --list --zookeeper localhost:2181 bin/kafka-topics.sh --zookeeper localhost:2181 --describe --topic test Create messages --------------- bin/kafka-console-producer.sh --broker-list localhost:9092 --topic test Start a consumer ---------------- bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic test --from-beginning
Views: 5588 Melvin L
Testing the Power BI and Cortana Integration
 
05:15
Overview and demo of the Power BI and Cortana integration. Showcases how you can setup a Power BI dataset for Cortana integration and then test the Cortana integration via the Power BI-Cortana testing URL
Views: 3804 Melvin L
Spark - Setting up Spark  Dev Environment using SBT and Eclipse
 
26:44
Setting up Apache Spark Development Environment using SBT and Eclipse Spark: Setting up Dev Environment (Spark : SBT + Eclipse) -------------------------------------------------- STEP 1(Prereq): Install /Setup Software Install Scala: http://www.scala-lang.org/download/ Install SBT: http://www.scala-sbt.org/download.html Install Eclipse (/Scala IDE): http://scala-ide.org/download/sdk.html Install Spark STEP 2: Create folder structure and add a build.sbt file mkdir -p src/{main,test}/{java,resources,scala} mkdir lib project target touch build.sbt touch project/plugins.sbt Create plugins.sbt in project folder (for sbt eclipse and sbt assembly) addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.3") addSbtPlugin("com.typesafe.sbteclipse" % "sbteclipse-plugin" % "4.0.0") STEP 3: Run sbt, sbt eclipse STEP 4: Import project into Workspace Add code for Spark
Views: 10134 Melvin L
Random Forest Overview and Demo in R
 
16:31
Random Forest Overview and Demo in R (for classification). See previous videos - What: An ensemble learning method for classification and regression Operate by constructing a multitude of decision trees - Why use Random Forest: Reasonable fast but very easy to use Handles sparse data/missing data well Overcomes problem with over fitting - How: Tree bagging - random sample with replacement Random subset of the features. Voting - Demo using randomForest library
Views: 30375 Melvin L
Use AWS Lambda to start and stop AWS EC2 instances
 
13:51
Use AWS Lambda and AWS cloudwatch to start and stop EC2 instances base on a custom schedule (eg working days-working hours only etc) See how you can reduce AWS EC2 running cost by scheduling /auto shutdown and auto startup
Views: 2382 Melvin L
Rattle - Data Mining in R
 
25:47
Overview of using Rattle - a GUI data mining tool in R. Overview covers some of the basic operations that can be performed in Rattle such as loading data, exploring the data and applying some of the data mining algorithms on the data - all this without actually having to type any R code
Views: 33976 Melvin L
R using Quantmod and Highcharts to visualise stock data
 
05:57
Video covers how you can you R to display stock data using Quantmod and Highcharts The approach covers how you can use Quantmod to retreive stock data and visualize via charts and use Highcharter R package to bind the data retrieved by Quantmod and display it in highcharts interactive charts For more details on Quantmod see https://youtu.be/xeflRR5RKFw For more details on Highcharter see https://youtu.be/of8ras0Bl8Q
Views: 1598 Melvin L
R with Highcharts visualisations using Highcharter library
 
08:23
Overview and demo of using Highcharts visualisation directly within R using the Hicharter library. Resources * http://jkunst.com/highcharter * http://www.highcharts.com/demo
Views: 2513 Melvin L
Conda Enviroments with Jupyter Notebooks Kernels
 
04:06
Quick video/how to guide shows you how you can integrate Jupyter Notebooks with Anaconda/conda. The integration is enabled by installing nb_conda package. The video describes the variious steps
Views: 723 Melvin L
Docker quick start and useful commands
 
12:18
*********************************** Docker Quick Start/Useful Commands *********************************** Docker get images ----------------- docker images docker search [] docker pull [imagename:tag] List containers -------------- docker ps docker ps -a run/exec/inspect ---------------- docker run -it [imagename:tag] /bin/bash docker run --name [name] -it [imagename:tag] /bin/bash docker inspect [containername] Ctrl-pq - exit docker exec -it [containername] /bin/bash Copy files ---------- docker cp foo.txt [containername]:/foo.txt docker cp [containername]:/foo.txt foo.txt Remove container ---------------- docker rm [container] docker rmi [image] Stop and remove all containers ------------------------------ docker stop $(docker ps -a -q) docker rm $(docker ps -a -q)
Views: 2533 Melvin L