machine learning - R Random Forest Unsupervised -
i'm trying understand random forest implementing in unsupervised mode detect outliers.
here dataset using:
dataset: https://gist.github.com/k2xl/5cd9a048ae153275f9c7
if observe, there 1 row values:
xktveqax 570 12980.5 clothing store the amount way more other values, expecting detected in random forest output.
library(randomforest) library(ggplot2) data_set <- read.csv("~/path/anomaly-sample.csv", header=true, as.is=true ) data_set$category = factor(data_set$category) train_all = data_set test_all = train_all #test_all = data_set[0:200,] rf <- randomforest(train_all[,-1],importance=true,mtry=3,norm.votes=false) print(rf) predictions <- rf$votes qplot(test_all$mins.after.midnight,test_all$amount,size=predictions[,2]) results <- cbind(test_all,predictions) results <- results[sort.list(results[,5]), ] what trying graph outliers big circles demonstrate unusualness. doing right?
Comments
Post a Comment