parallel processing - moving elements between arrays in a CUDA kernel -


i stuck in simple thing , need opinion. have simple kernel in cuda copies elements between 2 arrays (there reason want in way) ,

__global__  void kernelexample( float* a, float* b, float* c, int rows, int cols ) {      int r = blockidx.y * blockdim.y + threadidx.y; // vertical dim in block     int c = blockidx.x * blockdim.x + threadidx.x; // horizontal dim in block      if ( r < rows && c < cols) {         // row-major order         c[ c + r*cols ] =  a[ c + r*cols ];      }     //__syncthreads();  } 

i taking unsatisfying results. suggestions please?

the kernel called this:

int numelements =  rows * cols; int threadsperblock = 256; int blockspergrid = ceil( (double) numelements  / threadsperblock); kernelexample<<<blockspergrid , threadsperblock >>>( d_a, d_b, d_c, rows, cols ); 

updated(after eric's help):

int numelements =  rows * cols; int threadsperblock = 32; //talonmies comment int blockspergrid = ceil( (double) numelements  / threadsperblock); dim3 dimblock( threadsperblock,threadsperblock );  dim3 dimgrid( blockspergrid,blockspergrid );  kernelexample<<<dimblock, dimblock>>>( d_a, d_b, d_c, rows, cols ); 

for example having matrix

a =[ 0   1 2   1 0   2 0   0 2   0 0   1 2   1 2   2 2   2 0   0 2   1 2   2 3   1 2   2 2   2    ] 

the returned matrix c is

c = [  0   1 2   1 0   2 0   0 2   0 0   1 2   1 2   2 2   2 0   0 2   1 2   2 3   1 2   2 2   2 ] 

c/c++ uses 0-based indexing default.

try

1) change from

 if ( r <= rows && c <= cols) { 

to

if ( r < rows && c < cols) { 

2) del __syncthreads(); since don't share data between threads

3) correct block , grid settings 1-d 2-d, since use both .x , .y in kernel

4) remove float* b if don't use it.

to solve problem.

see kernel copy() located in following file in cuda sample code more info.

$cuda_home/samples/6_advanced/transpose/transpose.cu 

Comments

Popular posts from this blog

java.util.scanner - How to read and add only numbers to array from a text file -

rewrite - Trouble with Wordpress multiple custom querystrings -