|
To compile a single file CUDA program:
nvcc -o prog prog.cu |
To compile a multi files CUDA program:
nvcc -c prog1.cu nvcc -c prog2.cu ... nvcc -c progN.cc nvcc -o prog prog1.o prog2.o ... progN.o |
To run the compiled program:
prog [optionally with arguments] |
(CUDA programs has the file extension .cu)
All CUDA programs must include the cuda.h header file:
#include <cuda.h> |
However: the CUDA compiler (nvcc) will include this header file automatically
Therefore: you do not have to include it...
Compile and run the following hello1.cu CUDA program:
#include <stdio.h> // C programming header file
#include <unistd.h> // C programming header file
// cude.h is automatically included by nvcc...
/* ------------------------------------
Your first kernel (= GPU function)
------------------------------------ */
__global__ void hello( )
{
printf("Hello World !\n"); // You don't see this msg...
}
int main()
{
hello<<< 1, 4 >>>( ); // launch kernel
printf("I am the CPU: Hello World ! \n");
}
|
DEMO: /home/cs355001/demo/CUDA/1-intro/hello1.cu -- nvcc hello1.cu
We can see the message when the C main program waits for 1 sec before exiting:
#include <stdio.h> // C programming header file
#include <unistd.h> // C programming header file
// cude.h is automatically included by nvcc...
/* ------------------------------------
Your first kernel (= GPU function)
------------------------------------ */
__global__ void hello( )
{
printf("Hello World !\n"); // You don't see this msg...
}
int main()
{
hello<<< 1, 4 >>>( ); // launch kernel
printf("I am the CPU: Hello World ! \n");
sleep(1);
}
|
From this experiment, you can see that multiple "things" are happening at the same time !
|
|
CPU: GPU:
------------------ -------------------
hello<<< 1, 4 >>>( ) ---> runs the kernel function "hello"
| using a grid consisting of:
| 1 thread block and
| 4 threads in block
V
CPU continues
exection with
next statement ---> exit....
|
That is why the first version of the hello1.cu does not print the messages from the GPU threads: the main program has terminated before the print messages are received !!
QUIZ: Why did we see 4 lines of messages from the kernel function ???
CPU: GPU:
------------------ -------------------
hello<<< 1, 4 >>>( ) ---> runs the kernel function "hello"
| using a grid consisting of:
| 1 thread block and
| 4 threads in block
V
sleep(1);
We see 4 lines of messages printed by the kernel function
|
QUIZ: Why did we see 4 lines of messages from the kernel function ???
CPU: GPU:
------------------ -------------------
hello<<< 1, 4 >>>( ) ---> runs the kernel function "hello"
| using a grid consisting of:
| 1 thread block and
| 4 threads in block
V
sleep(1);
We see 4 lines of messages printed by the kernel function
|
Because a total of 4 threads are executing the same kernel function !!!
Experiment: change the launch code to: <<< 2, 4 >>>