bioinformatics - How to set a for -loop in R -
i biologist , have less knowledge of programming. have series of files(fasta format files) need apply r package.
each file contents follows:
file_1.fasta
>>ttbk2_hsap ,(ck1/ttbk) msgggeqldilsvgilvkerwkvlrkiggggfgeiydaldmltrenvalkvesaqqpkqvlkmevavlkklqgkdhvcrfigcgrndrfnyvvmqlqgrnladlrrsqsrgtft
file_2.fasta
>>ttbk2_hsap ,(ck1/ttbk) msgggeqldilsvgilvkerwkvlrkiggggfgeiydaldmltrenvalkvesaqqpkqvlkmevavlkklqgkdhvcrfigcgrndrfnyvvmqlqgrnladlrrsqsrgtft
and package (protr in r) works this:
x = readfasta(system.file(’protseq/p00750.fasta’, package = ’protr’))[[1]] extractaac(x)
is there possibility set forloop above lines read multiple files , give output in 1 file??
if possible please give me idea or example me set for-loop in r.
it possible this. strategy use write function encapsulates want each fasta file:
# fasta string represents fasta file read. read_and_extract <- function(fasta){ seq <- readfasta(fasta)[[1]] return(extractaac(seq)) }
this wrapper function allow read fasta file , extract amino acid composition in 1 fell swoop. in order loop on files, need in same directory fasta files.
setwd("path/to/files")
using dir
command, can of names of files exist in directory.
fasta_files <- dir(pattern = "[.]fasta$")
note pattern
argument tells computer read files end ".fasta
"
now perform loop using vapply
function (see note below details):
aa_comp <- vapply(fasta_files, read_and_extract, rep(pi, 20))
this produce matrix columns being each fasta file , rows being each amino acid. can save simple csv file:
write.csv(aa_comp, file = "amino_acid_composition.csv")
details of vapply
the vapply
function fancy (and times faster) way for
loops in r. looks bit confusing @ first, works if know output be. let's @ arguments:
> vapply(argument1, argument2, argument3)
- argument1: vector looped on (
fasta_files
) - argument2: function apply each element of vector (
read_and_extract
) - argument3: expected output (
rep(pi, 20)
)
the last argument hardest grasp initially, it's representative vector of our expected output. in case, documentation extractaac
says returns numeric vector of length 20. command rep(pi, 20)
telling r replicate number pi
20 times, giving numeric vector of length 20.
there more generalized versions of vapply
can return output of type. see help("vapply")
details on those.
Comments
Post a Comment