bioinformatics - How to set a for -loop in R -


i biologist , have less knowledge of programming. have series of files(fasta format files) need apply r package.

each file contents follows:

file_1.fasta

>>ttbk2_hsap ,(ck1/ttbk) msgggeqldilsvgilvkerwkvlrkiggggfgeiydaldmltrenvalkvesaqqpkqvlkmevavlkklqgkdhvcrfigcgrndrfnyvvmqlqgrnladlrrsqsrgtft 

file_2.fasta

>>ttbk2_hsap ,(ck1/ttbk) msgggeqldilsvgilvkerwkvlrkiggggfgeiydaldmltrenvalkvesaqqpkqvlkmevavlkklqgkdhvcrfigcgrndrfnyvvmqlqgrnladlrrsqsrgtft 

and package (protr in r) works this:

x = readfasta(system.file(’protseq/p00750.fasta’, package = ’protr’))[[1]]  extractaac(x) 

is there possibility set forloop above lines read multiple files , give output in 1 file??

if possible please give me idea or example me set for-loop in r.

it possible this. strategy use write function encapsulates want each fasta file:

# fasta string represents fasta file read. read_and_extract <- function(fasta){     seq <- readfasta(fasta)[[1]]     return(extractaac(seq)) } 

this wrapper function allow read fasta file , extract amino acid composition in 1 fell swoop. in order loop on files, need in same directory fasta files.

setwd("path/to/files") 

using dir command, can of names of files exist in directory.

fasta_files <- dir(pattern = "[.]fasta$") 

note pattern argument tells computer read files end ".fasta"

now perform loop using vapply function (see note below details):

aa_comp <- vapply(fasta_files, read_and_extract, rep(pi, 20)) 

this produce matrix columns being each fasta file , rows being each amino acid. can save simple csv file:

write.csv(aa_comp, file = "amino_acid_composition.csv") 

details of vapply

the vapply function fancy (and times faster) way for loops in r. looks bit confusing @ first, works if know output be. let's @ arguments:

> vapply(argument1, argument2, argument3)

  • argument1: vector looped on (fasta_files)
  • argument2: function apply each element of vector (read_and_extract)
  • argument3: expected output (rep(pi, 20))

the last argument hardest grasp initially, it's representative vector of our expected output. in case, documentation extractaac says returns numeric vector of length 20. command rep(pi, 20) telling r replicate number pi 20 times, giving numeric vector of length 20.

there more generalized versions of vapply can return output of type. see help("vapply") details on those.


Comments

Popular posts from this blog

java.util.scanner - How to read and add only numbers to array from a text file -

rewrite - Trouble with Wordpress multiple custom querystrings -