Dividing data into training and testing dataset in R

During machine learning one often needs to divide the two different data sets, namely training and testing datasets. While you can’t directly use the “sample” command in R, there is a simple workaround for this. Essentially, use the “sample” command … Continue reading

Visualizing Confusion Matrix using heatmap in R

In this post I show how to visualize confusion matrix using heatmap. Heatmap is drawn using R and GGplot. Continue reading

Easy-Hadoop, Rapid Hadoop Programming

In today’s digital world, the biggest challenge is how to deal with the large volume of data. Apache Hadoop provides one solution to this problem. It’s an open source software for distributed and scalable computing. Hadoop is usually deployed in … Continue reading

Notes: Linear Regression

Note: This post is more of my running notes than any authoritative guide on the subject. I just find blogging a good way to develop thorough understanding and also easy to access and hence I am putting over here. You … Continue reading