r - Select the most dissimilar individual using cluster analysis -
i want cluster data 5 clusters, need select 50 individuals dissimilar relationship data. means if cluster 1 contains 100, 2 contains 200, 3 contains 400, 4 contains 200, , 5 100, have select 5 first cluster + 10 second cluster + 20 third + 10 fourth + 5 fifth.
data example:
mydata<-matrix(nrow=100,ncol=10,rnorm(1000, mean = 0, sd = 1))
what did till clustering data , rank individuals within each cluster, export excel , go there … has become became problem since data has became big.
i appreciate or suggestion on how apply previous in r .
i´m not sure if searching, maybe helps:
mydata<-matrix(nrow=100, ncol=10, rnorm(1000, mean = 0, sd = 1)) rownames(mydata) <- paste0("id", 1:100) # id identification # cluster objects , calculate dissimilarity matrix cl <- cutree(hclust( sim <- dist(mydata, diag = true, upper=true)), 5) # combine results, take sum aggregate dissimilarity res <- data.frame(id=rownames(mydata), cluster=cl, dis_sim=rowsums(as.matrix(sim))) # order, lowest overall dissimilarity first res <- res[order(res$dis_sim), ] # split object reslist <- split(res, f=res$cluster) ## takes first 3 items highest overall dissim. lapply(reslist, tail, n=3) ## returns id´s highest overall dissimilarity, top 20% lapply(reslist, function(x, p) tail(x, round(nrow(x)*p)), p=0.2)
Comments
Post a Comment