Plot rows of numbers that fall within a defined proximity to other rows in R -


i have dataframe of numbers (genetics data) on different chromosomes can considered factors separate numbers on. looks (with adjacent columns containing sample info each position):

awk '{print $2 "\t"   $3}' log_values | head  chr   start    sample1     sample2 1   102447376  0.46957632  0.38415043 1   102447536  0.49194950  0.30094824 1   102447366  0.49874880 -0.17675325 2   102447366 -0.01910729  0.20264680 1   108332063 -0.03295081  0.07738970 1   109472445  0.02216355 -0.02495788 

what want make series of plots taking values other columns in file. instead of plotting 1 each row (which represent results in different region and/or different sample), want draw plots covering ranges if there values in start column close enough each other. start, plot made if there 3 values in start column within 1000 of each other. is, 1000 b c inclusive b <= 1000 , b c <= 1000 c not have <= 1000. in code below, 1000 "cnv_size". "flanking_size" variable zoom plot out bit can give context.

taking sample values, rows 1 2 , 3 highlighted 1 plot sample1. these sample numbers log2ratios want plot significant ones. define above 0.4 or below -0.6. means same 3 rows not yield plot sample 2.

the fourth row not included chr column number/factor different. that's separate plot each column showing values in rows meet condition. can have more plot per sample each set of regions meets criterion plotted in samples. if doesn't make sense, perhaps ineffective attempt below explain i'm waffling about.

pdf("all_cnvs_closeup.pdf")  cnv_size <- 1000 # bp  flanking_size <- 1000 # bp  #for(chr in 1:24){ for(chr in 1:1){  #for(array in 1:24) { for(array in 1:4) {   dat <- subset(file, file$chr == chr )   dat <- subset(dat, dat[,array+6] > 0.4 | dat[,array+6] < -0.6)   if(length(dat$start) > 1 ) {   dat <- dat[with(dat, order(start)), ]  x=dat$start[2:length(dat$start)]-dat$start[1:(length(dat$start)-1)] cnv <- 1 while(cnv <= length(x)) {  for(i in cnv:length(x) ) {     if(x[i] >= cnv_size) {         plot_title <- paste(sample_info$sample.id[array], files[array],   sep = "   ")          plot(dat$start, -dat[,array+6], main = plot_title , ylim = c(-2,2),      xlim = c(dat$start[cnv] - flanking_size , dat$start[i ] + flanking_size) , xlab = chr, ylab = "log2 ratio")          abline(h = 0.4, col="blue")          abline(h = 0, col="red")          abline(h = -0.6, col="blue")      break     } # if(x[i] >= cnv_size) {           #if(x[i] < cnv_size) <- + 1 }   # for(i in cnv:length(x) ) {  cnv <-   } # while(x[cnv] <= length(x)) {    } # if(length(dat$start) > 1 ) {    } # for(array in 1:24) { } # for(chr in 1:24){     dev.off() 

you write loop accumulates indices given criteria, plot each window:

# assuming dataframe called snps. getchrwindows <- function(snps, windowsize) {   curwindow <- 1   windows <- list()   for(i in 2:nrow(snps)) {     # if pair on same chromosome , within window, add     if(snps$chr[i-1] == snps$chr[i] &&         (snps$start[i] - snps$start[i-1]) <= windowsize) {       trycatch({         windows[[curwindow]]       }, error = function(e) {         windows[[curwindow]] <- c(i-1)           }, {         windows[[curwindow]] <- c(windows[[curwindow]], i)       })     } else {       # if there existing window, create new one.       trycatch({         windows[[curwindow]]       }, error = function(e) {         curwindow <- curwindow + 1       })     }   }   return(windows) } 

now can list of windows in data.frame, , plot each one:

windows <- getchrwindows(snps, 1000) (i in seq_along(windows)) {   # plot snps[windows[[i]],] using plotting code. } 

Comments

Popular posts from this blog

java.util.scanner - How to read and add only numbers to array from a text file -

rewrite - Trouble with Wordpress multiple custom querystrings -