yet another note for myself

Dividing data into training and testing dataset in R

Your Ad Here

During machine learning one often needs to divide the two different data sets, namely training and testing datasets. While you can’t directly use the “sample” command in R, there is a simple workaround for this. Essentially, use the “sample” command to randomly select certain index number and then use the selected index numbers to divide the dataset into training and testing dataset. Below is the sample code for doing this. In the code below I use 20% of the data for testing and rest of the 80% for training.

#lets say we have data variable that is of type data.frame
> class(data)
[1] "data.frame"

>indexes <- sample(1:nrow(data), size=0.2*nrow(data))
>test <- data[indexes]
>train <- data[-indexes]

No related posts.

This entry was posted in Data Mining, Programming, Uncategorized and tagged , , . Bookmark the permalink.

Leave a Reply