r - Select the most dissimilar individual using cluster analysis -


i want cluster data 5 clusters, need select 50 individuals dissimilar relationship data. means if cluster 1 contains 100, 2 contains 200, 3 contains 400, 4 contains 200, , 5 100, have select 5 first cluster + 10 second cluster + 20 third + 10 fourth + 5 fifth.

data example:

     mydata<-matrix(nrow=100,ncol=10,rnorm(1000, mean = 0, sd = 1)) 

what did till clustering data , rank individuals within each cluster, export excel , go there … has become became problem since data has became big.

i appreciate or suggestion on how apply previous in r .

i´m not sure if searching, maybe helps:

mydata<-matrix(nrow=100, ncol=10, rnorm(1000, mean = 0, sd = 1)) rownames(mydata) <- paste0("id", 1:100) # id identification   # cluster objects , calculate dissimilarity matrix cl <- cutree(hclust(   sim <- dist(mydata, diag = true, upper=true)), 5)   # combine results, take sum aggregate dissimilarity res <- data.frame(id=rownames(mydata),                   cluster=cl, dis_sim=rowsums(as.matrix(sim))) # order, lowest overall dissimilarity first res <- res[order(res$dis_sim), ]    # split object reslist <- split(res, f=res$cluster)   ## takes first 3 items highest overall dissim. lapply(reslist, tail, n=3)   ## returns id´s highest overall dissimilarity, top 20%  lapply(reslist, function(x, p) tail(x, round(nrow(x)*p)), p=0.2) 

Comments

Popular posts from this blog

java.util.scanner - How to read and add only numbers to array from a text file -

rewrite - Trouble with Wordpress multiple custom querystrings -