CUDASharedMemoryMatrixMul

Matrix structure used:
We used .bin files with raw numbers, being the first one and the second one the number of rows and the number of columns respectively.

In order to creathe those matrix as easy as possible, a .cpp file is added. In it we can create 2 matrix, the first one with random numbers and a given size, and the second one will be an identity matrix, with a given size aswell.

Kernel:
This kernel was used to study diferent computation times with diferent matrix sizes.
The multiplication is done as in the following image:

Multiplying 2 matrix with size of 10000x10000, we obtained the following results:

Trying to do a secuencial multiplication with a simple FOR loop:
-Unable to computate

Doing an 8 threads static division multiplication:
-1595'099 sec

Using CUDA with shared memory:
-18'914 sec

With CUDA we obtained a speedup of 84'334302 compared with static division.

The results were obtained with an Intel Xenon and a nVidia GTX 560.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
x64/Debug		x64/Debug
README.md		README.md
createMatrix.cpp		createMatrix.cpp
kernel.cu		kernel.cu
result.bin		result.bin
test.vcxproj		test.vcxproj
test.vcxproj.user		test.vcxproj.user
vc140.pdb		vc140.pdb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CUDASharedMemoryMatrixMul

About

Releases

Packages

Languages

AlbertoSoutullo/CUDASharedMemoryMatrixMul

Folders and files

Latest commit

History

Repository files navigation

CUDASharedMemoryMatrixMul

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages