Understanding gcc optimization -o3 on a multicore system -
i'm comparing serial versus parallel implementation of code on quad-core processor. 1 of things i'd understand/measure how serial code performs when running on single core.
when compile serial code, use gcc's -o3 option , @ first noticed serial code wasn't doing shabby. however, 1 thing noticed when running compute-intense process on 1 of cores, serial version's performance drops.
here numbers:
total time elapsed: 1s, 233ms <- serial code running total time elapsed: 1s, 238ms <- serial code running total time elapsed: 2s, 128ms <- serial code run other code running on core total time elapsed: 2s, 220ms <- serial code run other code running on core
i guessing there may background processes running on 1 of 4 cores. best gather running 2 processes on quad-core processor shouldn't saturate 4 cores.
what i'm wondering whether there reason believe step in o3 process allows code take advantage of quad-core set up, or, perhaps more precisely, why supposed "serial version" performs better when other cores available? trying understand gcc documentation , gathered there references threading. don't , wondering if me understand precisely o3 might or might not take advantage of more 1 core.
for worth, using intel(r) core(tm) i7-3820 cpu @ 3.60ghz , running linux mint 13.
thanks
-o3
not in face of more 1 core.
you seeing effects of shared resources on processor: memory bandwidth , cache.
Comments
Post a Comment