Working Notes: a commonplace notebook for recording & exploring ideas.
Home. Site Map. Subscribe. More at expLog.

Programming Massively Parallel Processors

tags
['AI', 'GPU']
author
W Hwu

Ch1

Ch2

Because I always get lost about this:

Ch3 Multidimensional grids and data

			N xxVxxxxxx
			  xxVxxxxxx
			  xxVxxxxxx
			  xxVxxxxxx
			  xxVxxxxxx
			  xxVxxxxxx
			  xxVxxxxxx
				.
M yyyyyyy   P xxxxxxxxx
  yyyyyyy     xxxxxxxxx
  yyyyyyy     xxxxxxxxx
  >>>>>>>.....xx#xxxxxx
  yyyyyyy     xxxxxxxxx

Ch4 Compute arch & scheduling

int devCount;
cudaGetDeviceCount(&devCount);
count = 1

Device 0:
prop.name = NVIDIA GeForce GTX 1650 Ti
prop.totalGlobalMem = 3899326464
prop.sharedMemPerBlock = 49152
prop.regsPerBlock = 65536
prop.clockRate = 1485000
prop.warpSize = 32
prop.maxThreadsPerBlock = 1024
prop.multiProcessorCount = 16
prop.integrated = 0
prop.major = 7
prop.minor = 5
prop.maxThreadsDim[j] = 1024
prop.maxThreadsDim[j] = 1024
prop.maxThreadsDim[j] = 64
prop.maxGridSize[j] = 2147483647
prop.maxGridSize[j] = 65535
prop.maxGridSize[j] = 65535

Ch5 Memory Architecture & Data Locality

Computational
Throughput
 ^
 |
 |   Peak bandwidth * FLOP/B
 |         x
 |        x
 |       x              Peak throughput (GFLOPs)
 |      x........................
 |     x^
 |    x ^
 |   x  ^
 |  x   ^
 | x    ^
 |x     ^
 x---------------------------------------------------->
   Computational intensity

// call
kernel<<<dimGrid, dimBlock, *param*>>>

// inside kernel
extern __shared__ Mds_Nds[];

Ch6 Performance Considerations

Kunal