Reference and limitations for CUDA Fortran support

IBM® XL Fortran for Linux, V15.1.6 supports a commonly used subset of CUDA Fortran. For more information about the language extensions introduced by CUDA Fortran, see the CUDA Fortran Programming Guide and Reference manual downloadable from http://www.pgroup.com/doc/pgicudaforug.pdf.

The following CUDA Fortran features are not supported in IBM XL Fortran for Linux, V15.1.6:

Calling reduction intrinsic functions, such as sum, maxval, and minval, on the host with device actual arguments
Conditional sentinels for CUDA Fortran (!@CUF)
CUF kernel directives
Data transfer using the following CUDA Runtime APIs:
- cudaMemcpyFromSymbol
- cudaMemcpyFromSymbolAsync
- cudaMemcpyToSymbol
- cudaMemcpyToSymbolAsync
- cudaMemset
Note: You can use assignment, cudaMemcpy, or cudaMemcpyAsync instead. XL Fortran allows device global and constant module variables to appear as arguments to cudaMemcpy and cudaMemcpyAsync.
Data transfer using the following CUDA Runtime APIs:
- cudaMalloc3D when the first argument is a rank 3 allocatable array
- cudaMemcpyPeer
- cudaMemcpyPeerAsync
- cudaMemcpy2D
- cudaMemcpy2DAsync
- cudaMemcpy3DAsync
- cudaMemset2D
Dynamic parallelism
Procedure definitions or interfaces that have the attributes(host, device) prefix
Note: To work around this, make a copy of the procedure, and give one copy the attributes(host) prefix and the other copy the attributes(device) prefix. You must not defined the two procedures in the same compilation unit.
Pointers with the texture attribute
Note: The compiler automatically utilizes the texture cache for passing dummy arguments when appropriate.
The curand module

The following limitations apply to IBM XL Fortran for Linux, V15.1.6:

You can use CUDA Fortran with IBM XL Fortran for Linux, V15.1.6 only if the CUDA Toolkit 9.0 or 9.1 is installed and the compiler is configured with the location of the toolkit.
- If you install the compiler after you install the toolkit, the compiler detects the location of the toolkit and no action is required.
- If you install the toolkit after you install the compiler, reconfigure the compiler as described in Configuring IBM XL Fortran for Linux.
Note: To install the CUDA Toolkit, use the Package Manager installation. The Runfile installation is currently not supported on Power® processors. For instructions about Package Manager installation, see the NVIDIA CUDA Installation Guide for Linux.
You can use PRINT and list-directed WRITE statements inside target regions and procedures to transfer data to the standard output unit. However, derived types are not supported on the output statements inside target regions and procedures.
To debug an OpenMP application, you need to compile it with -g -qsmp=noopt.
The reason is that due to a limitation in NVVM, IBM XL Fortran for Linux, V15.1.6 cannot generate debug information when GPU runtime inlining is enabled. To cope with this case, you can turn off GPU runtime inlining by specifying -qsmp=noopt. GPU runtime inlining is enabled at all other optimization levels.

Voice your opinion on getting help information

Ask IBM compiler experts a technical question in the IBM XL compilers forum