public class JCuda extends Object
Modifier and Type | Field and Description |
---|---|
static int |
cudaArrayColorAttachment
Must be set in cudaExternalMemoryGetMappedMipmappedArray if the
mipmapped array is used as a color target in a graphics API
|
static int |
cudaArrayCubemap
Must be set in cudaMalloc3DArray to create a cubemap CUDA array
|
static int |
cudaArrayDefault
Default CUDA array allocation flag
|
static int |
cudaArrayLayered
Must be set in cudaMalloc3DArray to create a layered CUDA array
|
static int |
cudaArraySurfaceLoadStore
Must be set in cudaMallocArray or cudaMalloc3DArray in order
to bind surfaces to the CUDA array
|
static int |
cudaArrayTextureGather
Must be set in cudaMallocArray or cudaMalloc3DArray in order to
perform texture gather operations on the CUDA array
|
static int |
cudaCooperativeLaunchMultiDeviceNoPostSync
If set, any subsequent work pushed in a stream that participated in a
call to ::cudaLaunchCooperativeKernelMultiDevice will only wait for the
kernel launched on the GPU corresponding to that stream to complete
before it begins execution.
|
static int |
cudaCooperativeLaunchMultiDeviceNoPreSync
If set, each kernel launched as part of
::cudaLaunchCooperativeKernelMultiDevice only waits for prior work in the
stream corresponding to that GPU to complete before the kernel begins
execution.
|
static int |
cudaCpuDeviceId
Device id that represents the CPU
|
static int |
cudaDeviceBlockingSync
Deprecated.
As of CUDA 4.0 and replaced by cudaDeviceScheduleBlockingSync
|
static int |
cudaDeviceLmemResizeToMax
Device flag - Keep local memory allocation after launch
|
static int |
cudaDeviceMapHost
Device flag - Support mapped pinned allocations
|
static int |
cudaDeviceMask
Device flags mask
|
static int |
cudaDeviceScheduleAuto
Device flag - Automatic scheduling
|
static int |
cudaDeviceScheduleBlockingSync
Device flag - Use blocking synchronization
|
static int |
cudaDeviceScheduleMask
Device schedule flags mask
|
static int |
cudaDeviceScheduleSpin
Device flag - Spin default scheduling
|
static int |
cudaDeviceScheduleYield
Device flag - Yield default scheduling
|
static int |
cudaEventBlockingSync
Event uses blocking synchronization
|
static int |
cudaEventDefault
Default event flag
|
static int |
cudaEventDisableTiming
Event will not record timing data
|
static int |
cudaEventInterprocess
Event is suitable for interprocess use. cudaEventDisableTiming must be set
|
static int |
cudaHostAllocDefault
Default page-locked allocation flag
|
static int |
cudaHostAllocMapped
Map allocation into device space
|
static int |
cudaHostAllocPortable
Pinned memory accessible by all CUDA contexts
|
static int |
cudaHostAllocWriteCombined
Write-combined memory
|
static int |
cudaHostRegisterDefault
Default host memory registration flag
|
static int |
cudaHostRegisterIoMemory
Memory-mapped I/O space
|
static int |
cudaHostRegisterMapped
Map registered memory into device space
|
static int |
cudaHostRegisterPortable
Pinned memory accessible by all CUDA contexts
|
static int |
cudaInvalidDeviceId
Device id that represents an invalid device
|
static int |
cudaIpcMemLazyEnablePeerAccess
Automatically enable peer access between remote devices as needed
|
static int |
cudaMemAttachGlobal
Memory can be accessed by any stream on any device
|
static int |
cudaMemAttachHost
Memory cannot be accessed by any stream on any device
|
static int |
cudaMemAttachSingle
Memory can only be accessed by a single stream on the associated device
|
static int |
cudaOccupancyDefault
Default behavior
|
static int |
cudaOccupancyDisableCachingOverride
Assume global caching is enabled and cannot be automatically turned off
|
static int |
cudaPeerAccessDefault
Default peer addressing enable flag
|
static int |
CUDART_VERSION
CUDA runtime version
|
static int |
cudaStreamCallbackBlocking
Deprecated.
This flag was only present in CUDA 5.0.25 (release candidate)
and may be removed (or added again) in future releases
|
static int |
cudaStreamCallbackNonblocking
Deprecated.
This flag was only present in CUDA 5.0.25 (release candidate)
and may be removed (or added again) in future releases
|
static int |
cudaStreamDefault
Default stream flag
|
static cudaStream_t |
cudaStreamLegacy
Stream handle that can be passed as a cudaStream_t to use an implicit stream
with legacy synchronization behavior.
|
static int |
cudaStreamNonBlocking
Stream does not synchronize with stream 0 (the NULL stream)
|
static cudaStream_t |
cudaStreamPerThread
Stream handle that can be passed as a cudaStream_t to use an implicit stream
with per-thread synchronization behavior.
|
static int |
cudaSurfaceType1D
cudaSurfaceType1D
|
static int |
cudaSurfaceType1DLayered
cudaSurfaceType1DLayered
|
static int |
cudaSurfaceType2D
cudaSurfaceType2D
|
static int |
cudaSurfaceType2DLayered
cudaSurfaceType2DLayered
|
static int |
cudaSurfaceType3D
cudaSurfaceType3D
|
static int |
cudaSurfaceTypeCubemap
cudaSurfaceTypeCubemap
|
static int |
cudaSurfaceTypeCubemapLayered
cudaSurfaceTypeCubemapLayered
|
static int |
cudaTextureType1D
cudaTextureType1D
|
static int |
cudaTextureType1DLayered
cudaTextureType1DLayered
|
static int |
cudaTextureType2D
cudaTextureType2D
|
static int |
cudaTextureType2DLayered
cudaTextureType2DLayered
|
static int |
cudaTextureType3D
cudaTextureType3D
|
static int |
cudaTextureTypeCubemap
cudaTextureTypeCubemap
|
static int |
cudaTextureTypeCubemapLayered
cudaTextureTypeCubemapLayered
|
Modifier and Type | Method and Description |
---|---|
static int |
cudaArrayGetInfo(cudaChannelFormatDesc desc,
cudaExtent extent,
int[] flags,
cudaArray array)
Gets info about the specified cudaArray.
|
static int |
cudaBindSurfaceToArray(surfaceReference surfref,
cudaArray array,
cudaChannelFormatDesc desc)
Deprecated.
Deprecated as of CUDA 10.1
|
static int |
cudaBindTexture(long[] offset,
textureReference texref,
Pointer devPtr,
cudaChannelFormatDesc desc,
long size)
Deprecated.
Deprecated as of CUDA 10.1
|
static int |
cudaBindTexture2D(long[] offset,
textureReference texref,
Pointer devPtr,
cudaChannelFormatDesc desc,
long width,
long height,
long pitch)
Deprecated.
Deprecated as of CUDA 10.1
|
static int |
cudaBindTextureToArray(textureReference texref,
cudaArray array,
cudaChannelFormatDesc desc)
Deprecated.
Deprecated as of CUDA 10.1
|
static int |
cudaBindTextureToMipmappedArray(textureReference texref,
cudaMipmappedArray mipmappedArray,
cudaChannelFormatDesc desc)
Deprecated.
Deprecated as of CUDA 10.1
|
static int |
cudaChooseDevice(int[] device,
cudaDeviceProp prop)
Select compute-device which best matches criteria.
|
static int |
cudaConfigureCall(dim3 gridDim,
dim3 blockDim,
long sharedMem,
cudaStream_t stream)
Deprecated.
This function is deprecated as of CUDA 7.0
|
static cudaChannelFormatDesc |
cudaCreateChannelDesc(int x,
int y,
int z,
int w,
int cudaChannelFormatKind_f)
[C++ API] Returns a channel descriptor using the specified format
template < class T > cudaChannelFormatDesc cudaCreateChannelDesc (
void ) [inline]
[C++ API] Returns a channel descriptor
using the specified format Returns a channel descriptor with format
f and number of bits of each component x, y, z, and w.
|
static int |
cudaCreateSurfaceObject(cudaSurfaceObject pSurfObject,
cudaResourceDesc pResDesc)
Creates a surface object.
|
static int |
cudaCreateTextureObject(cudaTextureObject pTexObject,
cudaResourceDesc pResDesc,
cudaTextureDesc pTexDesc,
cudaResourceViewDesc pResViewDesc)
Creates a texture object.
|
static int |
cudaCtxResetPersistingL2Cache()
\brief Resets all persisting lines in cache to normal status. |
static int |
cudaDestroySurfaceObject(cudaSurfaceObject surfObject)
Destroys a surface object.
|
static int |
cudaDestroyTextureObject(cudaTextureObject texObject)
Destroys a texture object.
|
static int |
cudaDeviceCanAccessPeer(int[] canAccessPeer,
int device,
int peerDevice)
Queries if a device may directly access a peer device's memory.
|
static int |
cudaDeviceDisablePeerAccess(int peerDevice)
Disables direct access to memory allocations on a peer device.
|
static int |
cudaDeviceEnablePeerAccess(int peerDevice,
int flags)
Enables direct access to memory allocations on a peer device.
|
static int |
cudaDeviceGetAttribute(int[] value,
int cudaDeviceAttr_attr,
int device)
Returns information about the device.
|
static int |
cudaDeviceGetByPCIBusId(int[] device,
String pciBusId)
Returns a handle to a compute device.
|
static int |
cudaDeviceGetCacheConfig(int[] pCacheConfig)
Returns the preferred cache configuration for the current device.
|
static int |
cudaDeviceGetLimit(long[] pValue,
int limit)
Returns resource limits.
|
static int |
cudaDeviceGetP2PAttribute(int[] value,
int attr,
int srcDevice,
int dstDevice)
Queries attributes of the link between two devices.
|
static int |
cudaDeviceGetPCIBusId(String[] pciBusId,
int len,
int device)
Returns a PCI Bus Id string for the device.
|
static int |
cudaDeviceGetSharedMemConfig(int[] pConfig)
Returns the shared memory configuration for the current device.
|
static int |
cudaDeviceGetStreamPriorityRange(int[] leastPriority,
int[] greatestPriority) |
static int |
cudaDeviceReset()
Destroy all allocations and reset all state on the current device in the current process.
|
static int |
cudaDeviceSetCacheConfig(int cacheConfig)
Sets the preferred cache configuration for the current device.
|
static int |
cudaDeviceSetLimit(int limit,
long value)
Set resource limits.
|
static int |
cudaDeviceSetSharedMemConfig(int config)
Sets the shared memory configuration for the current device.
|
static int |
cudaDeviceSynchronize()
Wait for compute device to finish.
|
static int |
cudaDriverGetVersion(int[] driverVersion)
Returns the CUDA driver version.
|
static int |
cudaEventCreate(cudaEvent_t event)
[C++ API] Creates an event object with the specified flags
cudaError_t cudaEventCreate (
cudaEvent_t* event,
unsigned int flags )
[C++ API] Creates an event object with
the specified flags Creates an event object with the specified flags.
|
static int |
cudaEventCreateWithFlags(cudaEvent_t event,
int flags)
Creates an event object with the specified flags.
|
static int |
cudaEventDestroy(cudaEvent_t event)
Destroys an event object.
|
static int |
cudaEventElapsedTime(float[] ms,
cudaEvent_t start,
cudaEvent_t end)
Computes the elapsed time between events.
|
static int |
cudaEventQuery(cudaEvent_t event)
Queries an event's status.
|
static int |
cudaEventRecord(cudaEvent_t event,
cudaStream_t stream)
Records an event.
|
static int |
cudaEventSynchronize(cudaEvent_t event)
Waits for an event to complete.
|
static int |
cudaFree(Pointer devPtr)
Frees memory on the device.
|
static int |
cudaFreeArray(cudaArray array)
Frees an array on the device.
|
static int |
cudaFreeHost(Pointer ptr)
Frees page-locked memory.
|
static int |
cudaFreeMipmappedArray(cudaMipmappedArray mipmappedArray)
Frees a mipmapped array on the device.
|
static int |
cudaFuncGetAttributes(cudaFuncAttributes attr,
String func)
Deprecated.
This function is no longer supported as of CUDA 5.0
|
static int |
cudaGetChannelDesc(cudaChannelFormatDesc desc,
cudaArray array)
Get the channel descriptor of an array.
|
static int |
cudaGetDevice(int[] device)
Returns which device is currently being used.
|
static int |
cudaGetDeviceCount(int[] count)
Returns the number of compute-capable devices.
|
static int |
cudaGetDeviceFlags(int[] flags)
Gets the flags for the current device.
|
static int |
cudaGetDeviceProperties(cudaDeviceProp prop,
int device)
Returns information about the compute-device.
|
static String |
cudaGetErrorName(int error)
Returns the string representation of an error code enum name
Returns a string containing the name of an error code in the enum, or NULL if the error code is not valid. |
static String |
cudaGetErrorString(int error)
Returns the message string from an error code.
|
static int |
cudaGetLastError()
Returns the last error from a runtime call.
|
static int |
cudaGetMipmappedArrayLevel(cudaArray levelArray,
cudaMipmappedArray mipmappedArray,
int level)
Gets a mipmap level of a CUDA mipmapped array.
|
static int |
cudaGetSurfaceObjectResourceDesc(cudaResourceDesc pResDesc,
cudaSurfaceObject surfObject)
Returns a surface object's resource descriptor Returns the resource descriptor for the surface object specified by surfObject.
|
static int |
cudaGetSurfaceReference(surfaceReference surfref,
String symbol)
Deprecated.
This function is no longer supported as of CUDA 5.0
|
static int |
cudaGetSymbolAddress(Pointer devPtr,
String symbol)
Deprecated.
This function is no longer supported as of CUDA 5.0
|
static int |
cudaGetSymbolSize(long[] size,
String symbol)
Deprecated.
This function is no longer supported as of CUDA 5.0
|
static int |
cudaGetTextureAlignmentOffset(long[] offset,
textureReference texref)
Deprecated.
Deprecated as of CUDA 10.1
|
static int |
cudaGetTextureObjectResourceDesc(cudaResourceDesc pResDesc,
cudaTextureObject texObject)
Returns a texture object's resource descriptor.
|
static int |
cudaGetTextureObjectResourceViewDesc(cudaResourceViewDesc pResViewDesc,
cudaTextureObject texObject)
Returns a texture object's resource view descriptor.
|
static int |
cudaGetTextureObjectTextureDesc(cudaTextureDesc pTexDesc,
cudaTextureObject texObject)
Returns a texture object's texture descriptor.
|
static int |
cudaGetTextureReference(textureReference texref,
String symbol)
Deprecated.
This function is no longer supported as of CUDA 5.0
|
static int |
cudaGLGetDevices(int[] pCudaDeviceCount,
int[] pCudaDevices,
int cudaDeviceCount,
int cudaGLDeviceList_deviceList)
Gets the CUDA devices associated with the current OpenGL context.
|
static int |
cudaGLMapBufferObject(Pointer devPtr,
int bufObj)
Deprecated.
Deprecated as of CUDA 3.0
|
static int |
cudaGLMapBufferObjectAsync(Pointer devPtr,
int bufObj,
cudaStream_t stream)
Deprecated.
Deprecated as of CUDA 3.0
|
static int |
cudaGLRegisterBufferObject(int bufObj)
Deprecated.
Deprecated as of CUDA 3.0
|
static int |
cudaGLSetBufferObjectMapFlags(int bufObj,
int flags)
Deprecated.
Deprecated as of CUDA 3.0
|
static int |
cudaGLSetGLDevice(int device)
Deprecated.
Deprecated as of CUDA 5.0
|
static int |
cudaGLUnmapBufferObject(int bufObj)
Deprecated.
Deprecated as of CUDA 3.0
|
static int |
cudaGLUnmapBufferObjectAsync(int bufObj,
cudaStream_t stream)
Deprecated.
Deprecated as of CUDA 3.0
|
static int |
cudaGLUnregisterBufferObject(int bufObj)
Deprecated.
Deprecated as of CUDA 3.0
|
static int |
cudaGraphicsGLRegisterBuffer(cudaGraphicsResource resource,
int buffer,
int Flags)
Registers an OpenGL buffer object.
|
static int |
cudaGraphicsGLRegisterImage(cudaGraphicsResource resource,
int image,
int target,
int Flags)
Register an OpenGL texture or renderbuffer object.
|
static int |
cudaGraphicsMapResources(int count,
cudaGraphicsResource[] resources,
cudaStream_t stream)
Map graphics resources for access by CUDA.
|
static int |
cudaGraphicsResourceGetMappedMipmappedArray(cudaMipmappedArray mipmappedArray,
cudaGraphicsResource resource)
Get a mipmapped array through which to access a mapped graphics resource.
|
static int |
cudaGraphicsResourceGetMappedPointer(Pointer devPtr,
long[] size,
cudaGraphicsResource resource)
Get an device pointer through which to access a mapped graphics resource.
|
static int |
cudaGraphicsResourceSetMapFlags(cudaGraphicsResource resource,
int flags)
Set usage flags for mapping a graphics resource.
|
static int |
cudaGraphicsSubResourceGetMappedArray(cudaArray arrayPtr,
cudaGraphicsResource resource,
int arrayIndex,
int mipLevel)
Get an array through which to access a subresource of a mapped graphics resource.
|
static int |
cudaGraphicsUnmapResources(int count,
cudaGraphicsResource[] resources,
cudaStream_t stream)
Unmap graphics resources.
|
static int |
cudaGraphicsUnregisterResource(cudaGraphicsResource resource)
Unregisters a graphics resource for access by CUDA.
|
static int |
cudaHostAlloc(Pointer ptr,
long size,
int flags)
Allocates page-locked memory on the host.
|
static int |
cudaHostGetDevicePointer(Pointer pDevice,
Pointer pHost,
int flags)
Passes back device pointer of mapped host memory allocated by cudaHostAlloc or registered by cudaHostRegister.
|
static int |
cudaHostRegister(Pointer ptr,
long size,
int flags)
Registers an existing host memory range for use by CUDA.
|
static int |
cudaHostUnregister(Pointer ptr)
Unregisters a memory range that was registered with cudaHostRegister.
|
static int |
cudaIpcCloseMemHandle(Pointer devPtr)
Close memory mapped with cudaIpcOpenMemHandle.
|
static int |
cudaIpcGetEventHandle(cudaIpcEventHandle handle,
cudaEvent_t event)
Gets an interprocess handle for a previously allocated event.
|
static int |
cudaIpcGetMemHandle(cudaIpcMemHandle handle,
Pointer devPtr)
cudaError_t cudaIpcGetMemHandle (
cudaIpcMemHandle_t* handle,
void* devPtr )
/brief Gets an interprocess memory
handle for an existing device memory allocation
Takes a pointer to the base of an
existing device memory allocation created with cudaMalloc and exports
it for use in another process.
|
static int |
cudaIpcOpenEventHandle(cudaEvent_t event,
cudaIpcEventHandle handle)
Opens an interprocess event handle for use in the current process.
|
static int |
cudaIpcOpenMemHandle(Pointer devPtr,
cudaIpcMemHandle handle,
int flags)
cudaError_t cudaIpcOpenMemHandle (
void** devPtr,
cudaIpcMemHandle_t handle,
unsigned int flags )
/brief Opens an interprocess memory
handle exported from another process and returns a device pointer
usable in the local
process.
|
static int |
cudaLaunch(String symbol)
Deprecated.
This function is no longer supported as of CUDA 5.0
|
static int |
cudaLaunchHostFunc(cudaStream_t stream,
cudaHostFn fn,
Object userData)
Enqueues a host function call in a stream
Enqueues a host function to run in a stream. |
static int |
cudaMalloc(Pointer devPtr,
long size)
Allocate memory on the device.
|
static int |
cudaMalloc3D(cudaPitchedPtr pitchDevPtr,
cudaExtent extent)
Allocates logical 1D, 2D, or 3D memory objects on the device.
|
static int |
cudaMalloc3DArray(cudaArray arrayPtr,
cudaChannelFormatDesc desc,
cudaExtent extent)
Allocate an array on the device.
|
static int |
cudaMalloc3DArray(cudaArray arrayPtr,
cudaChannelFormatDesc desc,
cudaExtent extent,
int flags)
Allocate an array on the device.
|
static int |
cudaMallocArray(cudaArray array,
cudaChannelFormatDesc desc,
long width,
long height)
Allocate an array on the device.
|
static int |
cudaMallocArray(cudaArray array,
cudaChannelFormatDesc desc,
long width,
long height,
int flags)
Allocate an array on the device.
|
static int |
cudaMallocHost(Pointer ptr,
long size)
[C++ API] Allocates page-locked memory on the host
cudaError_t cudaMallocHost (
void** ptr,
size_t size,
unsigned int flags )
[C++ API] Allocates page-locked memory
on the host Allocates size bytes of host memory that is
page-locked and accessible to the device.
|
static int |
cudaMallocManaged(Pointer devPtr,
long size,
int flags)
__host__ cudaError_t cudaMallocManaged (
void** devPtr,
size_t size,
unsigned int flags = cudaMemAttachGlobal )
Allocates memory that will be automatically managed by the Unified
Memory system.
|
static int |
cudaMallocMipmappedArray(cudaMipmappedArray mipmappedArray,
cudaChannelFormatDesc desc,
cudaExtent extent,
int numLevels,
int flags)
Allocate a mipmapped array on the device.
|
static int |
cudaMallocPitch(Pointer devPtr,
long[] pitch,
long width,
long height)
Allocates pitched memory on the device.
|
static int |
cudaMemAdvise(Pointer devPtr,
long count,
int advice,
int device)
Advise about the usage of a given memory range
Advise the Unified Memory subsystem about the usage pattern for the memory range starting at devPtr with a size of count bytes. |
static int |
cudaMemcpy(Pointer dst,
Pointer src,
long count,
int cudaMemcpyKind_kind)
Copies data between host and device.
|
static int |
cudaMemcpy2D(Pointer dst,
long dpitch,
Pointer src,
long spitch,
long width,
long height,
int cudaMemcpyKind_kind)
Copies data between host and device.
|
static int |
cudaMemcpy2DArrayToArray(cudaArray dst,
long wOffsetDst,
long hOffsetDst,
cudaArray src,
long wOffsetSrc,
long hOffsetSrc,
long width,
long height,
int cudaMemcpyKind_kind)
Copies data between host and device.
|
static int |
cudaMemcpy2DAsync(Pointer dst,
long dpitch,
Pointer src,
long spitch,
long width,
long height,
int cudaMemcpyKind_kind,
cudaStream_t stream)
Copies data between host and device.
|
static int |
cudaMemcpy2DFromArray(Pointer dst,
long dpitch,
cudaArray src,
long wOffset,
long hOffset,
long width,
long height,
int cudaMemcpyKind_kind)
Copies data between host and device.
|
static int |
cudaMemcpy2DFromArrayAsync(Pointer dst,
long dpitch,
cudaArray src,
long wOffset,
long hOffset,
long width,
long height,
int cudaMemcpyKind_kind,
cudaStream_t stream)
Copies data between host and device.
|
static int |
cudaMemcpy2DToArray(cudaArray dst,
long wOffset,
long hOffset,
Pointer src,
long spitch,
long width,
long height,
int cudaMemcpyKind_kind)
Copies data between host and device.
|
static int |
cudaMemcpy2DToArrayAsync(cudaArray dst,
long wOffset,
long hOffset,
Pointer src,
long spitch,
long width,
long height,
int cudaMemcpyKind_kind,
cudaStream_t stream)
Copies data between host and device.
|
static int |
cudaMemcpy3D(cudaMemcpy3DParms p)
Copies data between 3D objects.
|
static int |
cudaMemcpy3DAsync(cudaMemcpy3DParms p,
cudaStream_t stream)
Copies data between 3D objects.
|
static int |
cudaMemcpy3DPeer(cudaMemcpy3DPeerParms p)
Copies memory between devices.
|
static int |
cudaMemcpy3DPeerAsync(cudaMemcpy3DPeerParms p,
cudaStream_t stream)
Copies memory between devices asynchronously.
|
static int |
cudaMemcpyArrayToArray(cudaArray dst,
long wOffsetDst,
long hOffsetDst,
cudaArray src,
long wOffsetSrc,
long hOffsetSrc,
long count,
int cudaMemcpyKind_kind)
Deprecated.
Deprecated as of CUDA 10.1
|
static int |
cudaMemcpyAsync(Pointer dst,
Pointer src,
long count,
int cudaMemcpyKind_kind,
cudaStream_t stream)
Copies data between host and device.
|
static int |
cudaMemcpyFromArray(Pointer dst,
cudaArray src,
long wOffset,
long hOffset,
long count,
int cudaMemcpyKind_kind)
Deprecated.
Deprecated as of CUDA 10.1
|
static int |
cudaMemcpyFromArrayAsync(Pointer dst,
cudaArray src,
long wOffset,
long hOffset,
long count,
int cudaMemcpyKind_kind,
cudaStream_t stream)
Deprecated.
Deprecated as of CUDA 10.1
|
static int |
cudaMemcpyFromSymbol(Pointer dst,
String symbol,
long count,
long offset,
int cudaMemcpyKind_kind)
Deprecated.
This function is no longer supported as of CUDA 5.0
|
static int |
cudaMemcpyFromSymbolAsync(Pointer dst,
String symbol,
long count,
long offset,
int cudaMemcpyKind_kind,
cudaStream_t stream)
Deprecated.
This function is no longer supported as of CUDA 5.0
|
static int |
cudaMemcpyPeer(Pointer dst,
int dstDevice,
Pointer src,
int srcDevice,
long count)
Copies memory between two devices.
|
static int |
cudaMemcpyPeerAsync(Pointer dst,
int dstDevice,
Pointer src,
int srcDevice,
long count,
cudaStream_t stream)
Copies memory between two devices asynchronously.
|
static int |
cudaMemcpyToArray(cudaArray dst,
long wOffset,
long hOffset,
Pointer src,
long count,
int cudaMemcpyKind_kind)
Deprecated.
Deprecated as of CUDA 10.1
|
static int |
cudaMemcpyToArrayAsync(cudaArray dst,
long wOffset,
long hOffset,
Pointer src,
long count,
int cudaMemcpyKind_kind,
cudaStream_t stream)
Deprecated.
Deprecated as of CUDA 10.1
|
static int |
cudaMemcpyToSymbol(String symbol,
Pointer src,
long count,
long offset,
int cudaMemcpyKind_kind)
Deprecated.
This function is no longer supported as of CUDA 5.0
|
static int |
cudaMemcpyToSymbolAsync(String symbol,
Pointer src,
long count,
long offset,
int cudaMemcpyKind_kind,
cudaStream_t stream)
Deprecated.
This function is no longer supported as of CUDA 5.0
|
static int |
cudaMemGetInfo(long[] free,
long[] total)
Gets free and total device memory.
|
static int |
cudaMemPrefetchAsync(Pointer devPtr,
long count,
int dstDevice,
cudaStream_t stream)
Prefetches memory to the specified destination device
Prefetches memory to the specified destination device. |
static int |
cudaMemRangeGetAttribute(Pointer data,
long dataSize,
int attribute,
Pointer devPtr,
long count)
Query an attribute of a given memory range.
|
static int |
cudaMemRangeGetAttributes(Pointer[] data,
long[] dataSizes,
int[] attributes,
long numAttributes,
Pointer devPtr,
long count)
Query attributes of a given memory range.
|
static int |
cudaMemset(Pointer mem,
int c,
long count)
Initializes or sets device memory to a value.
|
static int |
cudaMemset2D(Pointer mem,
long pitch,
int c,
long width,
long height)
Initializes or sets device memory to a value.
|
static int |
cudaMemset2DAsync(Pointer devPtr,
long pitch,
int value,
long width,
long height,
cudaStream_t stream)
Initializes or sets device memory to a value.
|
static int |
cudaMemset3D(cudaPitchedPtr pitchDevPtr,
int value,
cudaExtent extent)
Initializes or sets device memory to a value.
|
static int |
cudaMemset3DAsync(cudaPitchedPtr pitchedDevPtr,
int value,
cudaExtent extent,
cudaStream_t stream)
Initializes or sets device memory to a value.
|
static int |
cudaMemsetAsync(Pointer devPtr,
int value,
long count,
cudaStream_t stream)
Initializes or sets device memory to a value.
|
static int |
cudaPeekAtLastError()
Returns the last error from a runtime call.
|
static int |
cudaPointerGetAttributes(cudaPointerAttributes attributes,
Pointer ptr)
Returns attributes about a specified pointer.
|
static int |
cudaProfilerInitialize(String configFile,
String outputFile,
int outputMode)
Deprecated.
As of CUDA 11.0
|
static int |
cudaProfilerStart()
Enable profiling.
|
static int |
cudaProfilerStop()
Disable profiling.
|
static int |
cudaRuntimeGetVersion(int[] runtimeVersion)
Returns the CUDA Runtime version.
|
static int |
cudaSetDevice(int device)
Set device to be used for GPU executions.
|
static int |
cudaSetDeviceFlags(int flags)
Sets flags to be used for device executions.
|
static int |
cudaSetupArgument(Pointer arg,
long size,
long offset)
Deprecated.
This function is deprecated as of CUDA 7.0
|
static int |
cudaSetValidDevices(int[] device_arr,
int len)
Set a list of devices that can be used for CUDA.
|
static int |
cudaStreamAddCallback(cudaStream_t stream,
cudaStreamCallback callback,
Object userData,
int flags)
Add a callback to a compute stream.
|
static int |
cudaStreamAttachMemAsync(cudaStream_t stream,
Pointer devPtr,
long length,
int flags) |
static int |
cudaStreamCopyAttributes(cudaStream_t dst,
cudaStream_t src)
\brief Copies attributes from source stream to destination stream. |
static int |
cudaStreamCreate(cudaStream_t stream)
Create an asynchronous stream.
|
static int |
cudaStreamCreateWithFlags(cudaStream_t pStream,
int flags)
Create an asynchronous stream.
|
static int |
cudaStreamCreateWithPriority(cudaStream_t pStream,
int flags,
int priority) |
static int |
cudaStreamDestroy(cudaStream_t stream)
Destroys and cleans up an asynchronous stream.
|
static int |
cudaStreamGetAttribute(cudaStream_t hStream,
int attr,
cudaStreamAttrValue value_out)
\brief Queries stream attribute. |
static int |
cudaStreamGetFlags(cudaStream_t hStream,
int[] flags) |
static int |
cudaStreamGetPriority(cudaStream_t hStream,
int[] priority) |
static int |
cudaStreamQuery(cudaStream_t stream)
Queries an asynchronous stream for completion status.
|
static int |
cudaStreamSetAttribute(cudaStream_t hStream,
int attr,
cudaStreamAttrValue value)
\brief Sets stream attribute. |
static int |
cudaStreamSynchronize(cudaStream_t stream)
Waits for stream tasks to complete.
|
static int |
cudaStreamWaitEvent(cudaStream_t stream,
cudaEvent_t event,
int flags)
Make a compute stream wait on an event.
|
static int |
cudaThreadExit()
Deprecated.
Deprecated in CUDA
|
static int |
cudaThreadGetCacheConfig(int[] pCacheConfig)
Deprecated.
Deprecated in CUDA
|
static int |
cudaThreadGetLimit(long[] pValue,
int limit)
Deprecated.
Deprecated in CUDA
|
static int |
cudaThreadSetCacheConfig(int cacheConfig)
Deprecated.
Deprecated in CUDA
|
static int |
cudaThreadSetLimit(int limit,
long value)
Deprecated.
Deprecated in CUDA
|
static int |
cudaThreadSynchronize()
Deprecated.
Deprecated in CUDA
|
static int |
cudaUnbindTexture(textureReference texref)
Deprecated.
Deprecated as of CUDA 10.1
|
static String |
getJCudaVersion()
Returns an unspecified string that will be appended to native
library names for disambiguation
|
static void |
initialize()
Initializes the native library.
|
static void |
setExceptionsEnabled(boolean enabled)
Enables or disables exceptions.
|
static void |
setLogLevel(LogLevel logLevel)
Set the specified log level for the JCuda runtime library.
|
public static final int CUDART_VERSION
public static final int cudaHostAllocDefault
public static final int cudaHostAllocPortable
public static final int cudaHostAllocMapped
public static final int cudaHostAllocWriteCombined
public static final int cudaHostRegisterDefault
public static final int cudaHostRegisterPortable
public static final int cudaHostRegisterMapped
public static final int cudaHostRegisterIoMemory
public static final int cudaPeerAccessDefault
public static final int cudaStreamDefault
public static final int cudaStreamNonBlocking
public static final int cudaEventDefault
public static final int cudaEventBlockingSync
public static final int cudaEventDisableTiming
public static final int cudaEventInterprocess
public static final int cudaDeviceScheduleAuto
public static final int cudaDeviceScheduleSpin
public static final int cudaDeviceScheduleYield
public static final int cudaDeviceScheduleBlockingSync
@Deprecated public static final int cudaDeviceBlockingSync
public static final int cudaDeviceScheduleMask
public static final int cudaDeviceMapHost
public static final int cudaDeviceLmemResizeToMax
public static final int cudaDeviceMask
public static final int cudaArrayDefault
public static final int cudaArrayLayered
public static final int cudaArraySurfaceLoadStore
public static final int cudaArrayCubemap
public static final int cudaArrayTextureGather
public static final int cudaArrayColorAttachment
public static final int cudaIpcMemLazyEnablePeerAccess
@Deprecated public static final int cudaStreamCallbackNonblocking
@Deprecated public static final int cudaStreamCallbackBlocking
public static final int cudaSurfaceType1D
public static final int cudaSurfaceType2D
public static final int cudaSurfaceType3D
public static final int cudaSurfaceTypeCubemap
public static final int cudaSurfaceType1DLayered
public static final int cudaSurfaceType2DLayered
public static final int cudaSurfaceTypeCubemapLayered
public static final int cudaTextureType1D
public static final int cudaTextureType2D
public static final int cudaTextureType3D
public static final int cudaTextureTypeCubemap
public static final int cudaTextureType1DLayered
public static final int cudaTextureType2DLayered
public static final int cudaTextureTypeCubemapLayered
public static final int cudaMemAttachGlobal
public static final int cudaMemAttachHost
public static final int cudaMemAttachSingle
public static final int cudaOccupancyDefault
public static final int cudaOccupancyDisableCachingOverride
public static final int cudaCpuDeviceId
public static final int cudaInvalidDeviceId
public static final int cudaCooperativeLaunchMultiDeviceNoPreSync
public static final int cudaCooperativeLaunchMultiDeviceNoPostSync
public static cudaStream_t cudaStreamLegacy
public static cudaStream_t cudaStreamPerThread
public static String getJCudaVersion()
public static void initialize()
public static void setLogLevel(LogLevel logLevel)
logLevel
- The log level to use.public static void setExceptionsEnabled(boolean enabled)
enabled
- Whether exceptions are enabledpublic static int cudaGetDeviceCount(int[] count)
cudaError_t cudaGetDeviceCount ( int* count )
Returns the number of compute-capable devices. Returns in *count the number of devices with compute capability greater or equal to 1.0 that are available for execution. If there is no such device then cudaGetDeviceCount() will return cudaErrorNoDevice. If no driver can be loaded to determine if any such devices exist then cudaGetDeviceCount() will return cudaErrorInsufficientDriver.
Note that this function may also return error codes from previous, asynchronous launches.
count
- Returns the number of devices with compute capability greater or equal to 1.0cudaGetDevice(int[])
,
cudaSetDevice(int)
,
cudaGetDeviceProperties(jcuda.runtime.cudaDeviceProp, int)
,
cudaChooseDevice(int[], jcuda.runtime.cudaDeviceProp)
public static int cudaSetDevice(int device)
cudaError_t cudaSetDevice ( int device )
Set device to be used for GPU executions. Sets device as the current device for the calling host thread.
Any device memory subsequently allocated from this host thread using cudaMalloc(), cudaMallocPitch() or cudaMallocArray() will be physically resident on device. Any host memory allocated from this host thread using cudaMallocHost() or cudaHostAlloc() or cudaHostRegister() will have its lifetime associated with device. Any streams or events created from this host thread will be associated with device. Any kernels launched from this host thread using the <<<>>> operator or cudaLaunch() will be executed on device.
This call may be made from any host thread, to any device, and at any time. This function will do no synchronization with the previous or new device, and should be considered a very low overhead call.
Note that this function may also return error codes from previous, asynchronous launches.
device
- Device on which the active host thread should execute the device code.cudaGetDeviceCount(int[])
,
cudaGetDevice(int[])
,
cudaGetDeviceProperties(jcuda.runtime.cudaDeviceProp, int)
,
cudaChooseDevice(int[], jcuda.runtime.cudaDeviceProp)
public static int cudaSetDeviceFlags(int flags)
cudaError_t cudaSetDeviceFlags ( unsigned int flags )
Sets flags to be used for device executions. Records flags as the flags to use when initializing the current device. If no device has been made current to the calling thread then flags will be applied to the initialization of any device initialized by the calling host thread, unless that device has had its initialization flags set explicitly by this or any host thread.
If the current device has been set and that device has already been initialized then this call will fail with the error cudaErrorSetOnActiveProcess. In this case it is necessary to reset device using cudaDeviceReset() before the device's initialization flags may be set.
The two LSBs of the flags parameter can be used to control how the CPU thread interacts with the OS scheduler when waiting for results from the device.
cudaDeviceScheduleAuto: The default value if the flags parameter is zero, uses a heuristic based on the number of active CUDA contexts in the process C and the number of logical processors in the system P. If C > P, then CUDA will yield to other OS threads when waiting for the device, otherwise CUDA will not yield while waiting for results and actively spin on the processor.
cudaDeviceScheduleSpin: Instruct CUDA to actively spin when waiting for results from the device. This can decrease latency when waiting for the device, but may lower the performance of CPU threads if they are performing work in parallel with the CUDA thread.
cudaDeviceScheduleYield: Instruct CUDA to yield its thread when waiting for results from the device. This can increase latency when waiting for the device, but can increase the performance of CPU threads performing work in parallel with the device.
cudaDeviceScheduleBlockingSync: Instruct CUDA to block the CPU thread on a synchronization primitive when waiting for the device to finish work.
cudaDeviceBlockingSync: Instruct CUDA to block the CPU thread on a synchronization primitive when waiting for the device to finish work.
Deprecated: This flag was deprecated as of CUDA 4.0 and replaced with cudaDeviceScheduleBlockingSync.
cudaDeviceMapHost: This flag must be set in order to allocate pinned host memory that is accessible to the device. If this flag is not set, cudaHostGetDevicePointer() will always return a failure code.
cudaDeviceLmemResizeToMax: Instruct CUDA to not reduce local memory after resizing local memory for a kernel. This can prevent thrashing by local memory allocations when launching many kernels with high local memory usage at the cost of potentially increased memory usage.
flags
- Parameters for device operationcudaGetDeviceCount(int[])
,
cudaGetDevice(int[])
,
cudaGetDeviceProperties(jcuda.runtime.cudaDeviceProp, int)
,
cudaSetDevice(int)
,
cudaSetValidDevices(int[], int)
,
cudaChooseDevice(int[], jcuda.runtime.cudaDeviceProp)
public static int cudaGetDeviceFlags(int[] flags)
flags
- Pointer to store the device flagscudaGetDevice(int[])
,
cudaGetDeviceProperties(jcuda.runtime.cudaDeviceProp, int)
,
cudaSetDevice(int)
,
cudaSetDeviceFlags(int)
public static int cudaSetValidDevices(int[] device_arr, int len)
cudaError_t cudaSetValidDevices ( int* device_arr, int len )
Set a list of devices that can be used for CUDA. Sets a list of devices for CUDA execution in priority order using device_arr. The parameter len specifies the number of elements in the list. CUDA will try devices from the list sequentially until it finds one that works. If this function is not called, or if it is called with a len of 0, then CUDA will go back to its default behavior of trying devices sequentially from a default list containing all of the available CUDA devices in the system. If a specified device ID in the list does not exist, this function will return cudaErrorInvalidDevice. If len is not 0 and device_arr is NULL or if len exceeds the number of devices in the system, then cudaErrorInvalidValue is returned.
Note that this function may also return error codes from previous, asynchronous launches.
device_arr
- List of devices to trylen
- Number of devices in specified listcudaGetDeviceCount(int[])
,
cudaSetDevice(int)
,
cudaGetDeviceProperties(jcuda.runtime.cudaDeviceProp, int)
,
cudaSetDeviceFlags(int)
,
cudaChooseDevice(int[], jcuda.runtime.cudaDeviceProp)
public static int cudaGetDevice(int[] device)
cudaError_t cudaGetDevice ( int* device )
Returns which device is currently being used. Returns in *device the current device for the calling host thread.
Note that this function may also return error codes from previous, asynchronous launches.
device
- Returns the device on which the active host thread executes the device code.cudaGetDeviceCount(int[])
,
cudaSetDevice(int)
,
cudaGetDeviceProperties(jcuda.runtime.cudaDeviceProp, int)
,
cudaChooseDevice(int[], jcuda.runtime.cudaDeviceProp)
public static int cudaGetDeviceProperties(cudaDeviceProp prop, int device)
cudaError_t cudaGetDeviceProperties ( cudaDeviceProp* prop, int device )
Returns information about the compute-device. Returns in *prop the properties of device dev. The cudaDeviceProp structure is defined as:
struct cudaDeviceProp { char name[256]; size_t totalGlobalMem; size_t sharedMemPerBlock; int regsPerBlock; int warpSize; size_t memPitch; int maxThreadsPerBlock; int maxThreadsDim[3]; int maxGridSize[3]; int clockRate; size_t totalConstMem; int major; int minor; size_t textureAlignment; size_t texturePitchAlignment; int deviceOverlap; int multiProcessorCount; int kernelExecTimeoutEnabled; int integrated; int canMapHostMemory; int computeMode; int maxTexture1D; int maxTexture1DMipmap; int maxTexture1DLinear; int maxTexture2D[2]; int maxTexture2DMipmap[2]; int maxTexture2DLinear[3]; int maxTexture2DGather[2]; int maxTexture3D[3]; int maxTextureCubemap; int maxTexture1DLayered[2]; int maxTexture2DLayered[3]; int maxTextureCubemapLayered[2]; int maxSurface1D; int maxSurface2D[2]; int maxSurface3D[3]; int maxSurface1DLayered[2]; int maxSurface2DLayered[3]; int maxSurfaceCubemap; int maxSurfaceCubemapLayered[2]; size_t surfaceAlignment; int concurrentKernels; int ECCEnabled; int pciBusID; int pciDeviceID; int pciDomainID; int tccDriver; int asyncEngineCount; int unifiedAddressing; int memoryClockRate; int memoryBusWidth; int l2CacheSize; int maxThreadsPerMultiProcessor; }where:
name[256] is an ASCII string identifying the device;
totalGlobalMem is the total amount of global memory available on the device in bytes;
sharedMemPerBlock is the maximum amount of shared memory available to a thread block in bytes; this amount is shared by all thread blocks simultaneously resident on a multiprocessor;
regsPerBlock is the maximum number of 32-bit registers available to a thread block; this number is shared by all thread blocks simultaneously resident on a multiprocessor;
warpSize is the warp size in threads;
memPitch is the maximum pitch in bytes allowed by the memory copy functions that involve memory regions allocated through cudaMallocPitch();
maxThreadsPerBlock is the maximum number of threads per block;
maxThreadsDim[3] contains the maximum size of each dimension of a block;
maxGridSize[3] contains the maximum size of each dimension of a grid;
clockRate is the clock frequency in kilohertz;
totalConstMem is the total amount of constant memory available on the device in bytes;
major, minor are the major and minor revision numbers defining the device's compute capability;
textureAlignment is the alignment requirement; texture base addresses that are aligned to textureAlignment bytes do not need an offset applied to texture fetches;
texturePitchAlignment is the pitch alignment requirement for 2D texture references that are bound to pitched memory;
deviceOverlap is 1 if the device can concurrently copy memory between host and device while executing a kernel, or 0 if not. Deprecated, use instead asyncEngineCount.
multiProcessorCount is the number of multiprocessors on the device;
kernelExecTimeoutEnabled is 1 if there is a run time limit for kernels executed on the device, or 0 if not.
integrated is 1 if the device is an integrated (motherboard) GPU and 0 if it is a discrete (card) component.
canMapHostMemory is 1 if the device can map host memory into the CUDA address space for use with cudaHostAlloc()/cudaHostGetDevicePointer(), or 0 if not;
cudaComputeModeDefault: Default mode - Device is not restricted and multiple threads can use cudaSetDevice() with this device.
cudaComputeModeExclusive: Compute-exclusive mode - Only one thread will be able to use cudaSetDevice() with this device.
cudaComputeModeProhibited: Compute-prohibited mode - No threads can use cudaSetDevice() with this device.
cudaComputeModeExclusiveProcess: Compute-exclusive-process mode - Many threads in one process will be able to use cudaSetDevice() with this device.
If cudaSetDevice() is called on an already occupied device with computeMode cudaComputeModeExclusive, cudaErrorDeviceAlreadyInUse will be immediately returned indicating the device cannot be used. When an occupied exclusive mode device is chosen with cudaSetDevice, all subsequent non-device management runtime functions will return cudaErrorDevicesUnavailable.
maxTexture1D is the maximum 1D texture size.
maxTexture1DMipmap is the maximum 1D mipmapped texture texture size.
maxTexture1DLinear is the maximum 1D texture size for textures bound to linear memory.
maxTexture2D[2] contains the maximum 2D texture dimensions.
maxTexture2DMipmap[2] contains the maximum 2D mipmapped texture dimensions.
maxTexture2DLinear[3] contains the maximum 2D texture dimensions for 2D textures bound to pitch linear memory.
maxTexture2DGather[2] contains the maximum 2D texture dimensions if texture gather operations have to be performed.
maxTexture3D[3] contains the maximum 3D texture dimensions.
maxTextureCubemap is the maximum cubemap texture width or height.
maxTexture1DLayered[2] contains the maximum 1D layered texture dimensions.
maxTexture2DLayered[3] contains the maximum 2D layered texture dimensions.
maxTextureCubemapLayered[2] contains the maximum cubemap layered texture dimensions.
maxSurface1D is the maximum 1D surface size.
maxSurface2D[2] contains the maximum 2D surface dimensions.
maxSurface3D[3] contains the maximum 3D surface dimensions.
maxSurface1DLayered[2] contains the maximum 1D layered surface dimensions.
maxSurface2DLayered[3] contains the maximum 2D layered surface dimensions.
maxSurfaceCubemap is the maximum cubemap surface width or height.
maxSurfaceCubemapLayered[2] contains the maximum cubemap layered surface dimensions.
surfaceAlignment specifies the alignment requirements for surfaces.
concurrentKernels is 1 if the device supports executing multiple kernels within the same context simultaneously, or 0 if not. It is not guaranteed that multiple kernels will be resident on the device concurrently so this feature should not be relied upon for correctness;
ECCEnabled is 1 if the device has ECC support turned on, or 0 if not.
pciBusID is the PCI bus identifier of the device.
pciDeviceID is the PCI device (sometimes called slot) identifier of the device.
pciDomainID is the PCI domain identifier of the device.
tccDriver is 1 if the device is using a TCC driver or 0 if not.
asyncEngineCount is 1 when the device can concurrently copy memory between host and device while executing a kernel. It is 2 when the device can concurrently copy memory between host and device in both directions and execute a kernel at the same time. It is 0 if neither of these is supported.
unifiedAddressing is 1 if the device shares a unified address space with the host and 0 otherwise.
memoryClockRate is the peak memory clock frequency in kilohertz.
memoryBusWidth is the memory bus width in bits.
l2CacheSize is L2 cache size in bytes.
maxThreadsPerMultiProcessor is the number of maximum resident threads per multiprocessor.
prop
- Properties for the specified devicedevice
- Device number to get properties forcudaGetDeviceCount(int[])
,
cudaGetDevice(int[])
,
cudaSetDevice(int)
,
cudaChooseDevice(int[], jcuda.runtime.cudaDeviceProp)
,
cudaDeviceGetAttribute(int[], int, int)
public static int cudaDeviceGetAttribute(int[] value, int cudaDeviceAttr_attr, int device)
cudaError_t cudaDeviceGetAttribute ( int* value, cudaDeviceAttr attr, int device )
Returns information about the device. Returns in *value the integer value of the attribute attr on device device. The supported attributes are:
cudaDevAttrMaxThreadsPerBlock: Maximum number of threads per block;
cudaDevAttrMaxBlockDimX: Maximum x-dimension of a block;
cudaDevAttrMaxBlockDimY: Maximum y-dimension of a block;
cudaDevAttrMaxBlockDimZ: Maximum z-dimension of a block;
cudaDevAttrMaxGridDimX: Maximum x-dimension of a grid;
cudaDevAttrMaxGridDimY: Maximum y-dimension of a grid;
cudaDevAttrMaxGridDimZ: Maximum z-dimension of a grid;
cudaDevAttrMaxSharedMemoryPerBlock: Maximum amount of shared memory available to a thread block in bytes; this amount is shared by all thread blocks simultaneously resident on a multiprocessor;
cudaDevAttrTotalConstantMemory: Memory available on device for __constant__ variables in a CUDA C kernel in bytes;
cudaDevAttrWarpSize: Warp size in threads;
cudaDevAttrMaxPitch: Maximum pitch in bytes allowed by the memory copy functions that involve memory regions allocated through cudaMallocPitch();
cudaDevAttrMaxTexture1DWidth: Maximum 1D texture width;
cudaDevAttrMaxTexture1DLinearWidth: Maximum width for a 1D texture bound to linear memory;
cudaDevAttrMaxTexture1DMipmappedWidth: Maximum mipmapped 1D texture width;
cudaDevAttrMaxTexture2DWidth: Maximum 2D texture width;
cudaDevAttrMaxTexture2DHeight: Maximum 2D texture height;
cudaDevAttrMaxTexture2DLinearWidth: Maximum width for a 2D texture bound to linear memory;
cudaDevAttrMaxTexture2DLinearHeight: Maximum height for a 2D texture bound to linear memory;
cudaDevAttrMaxTexture2DLinearPitch: Maximum pitch in bytes for a 2D texture bound to linear memory;
cudaDevAttrMaxTexture2DMipmappedWidth: Maximum mipmapped 2D texture width;
cudaDevAttrMaxTexture2DMipmappedHeight: Maximum mipmapped 2D texture height;
cudaDevAttrMaxTexture3DWidth: Maximum 3D texture width;
cudaDevAttrMaxTexture3DHeight: Maximum 3D texture height;
cudaDevAttrMaxTexture3DDepth: Maximum 3D texture depth;
cudaDevAttrMaxTexture3DWidthAlt: Alternate maximum 3D texture width, 0 if no alternate maximum 3D texture size is supported;
cudaDevAttrMaxTexture3DHeightAlt: Alternate maximum 3D texture height, 0 if no alternate maximum 3D texture size is supported;
cudaDevAttrMaxTexture3DDepthAlt: Alternate maximum 3D texture depth, 0 if no alternate maximum 3D texture size is supported;
cudaDevAttrMaxTextureCubemapWidth: Maximum cubemap texture width or height;
cudaDevAttrMaxTexture1DLayeredWidth: Maximum 1D layered texture width;
cudaDevAttrMaxTexture1DLayeredLayers: Maximum layers in a 1D layered texture;
cudaDevAttrMaxTexture2DLayeredWidth: Maximum 2D layered texture width;
cudaDevAttrMaxTexture2DLayeredHeight: Maximum 2D layered texture height;
cudaDevAttrMaxTexture2DLayeredLayers: Maximum layers in a 2D layered texture;
cudaDevAttrMaxTextureCubemapLayeredWidth: Maximum cubemap layered texture width or height;
cudaDevAttrMaxTextureCubemapLayeredLayers: Maximum layers in a cubemap layered texture;
cudaDevAttrMaxSurface1DWidth: Maximum 1D surface width;
cudaDevAttrMaxSurface2DWidth: Maximum 2D surface width;
cudaDevAttrMaxSurface2DHeight: Maximum 2D surface height;
cudaDevAttrMaxSurface3DWidth: Maximum 3D surface width;
cudaDevAttrMaxSurface3DHeight: Maximum 3D surface height;
cudaDevAttrMaxSurface3DDepth: Maximum 3D surface depth;
cudaDevAttrMaxSurface1DLayeredWidth: Maximum 1D layered surface width;
cudaDevAttrMaxSurface1DLayeredLayers: Maximum layers in a 1D layered surface;
cudaDevAttrMaxSurface2DLayeredWidth: Maximum 2D layered surface width;
cudaDevAttrMaxSurface2DLayeredHeight: Maximum 2D layered surface height;
cudaDevAttrMaxSurface2DLayeredLayers: Maximum layers in a 2D layered surface;
cudaDevAttrMaxSurfaceCubemapWidth: Maximum cubemap surface width;
cudaDevAttrMaxSurfaceCubemapLayeredWidth: Maximum cubemap layered surface width;
cudaDevAttrMaxSurfaceCubemapLayeredLayers: Maximum layers in a cubemap layered surface;
cudaDevAttrMaxRegistersPerBlock: Maximum number of 32-bit registers available to a thread block; this number is shared by all thread blocks simultaneously resident on a multiprocessor;
cudaDevAttrClockRate: Peak clock frequency in kilohertz;
cudaDevAttrTextureAlignment: Alignment requirement; texture base addresses aligned to textureAlign bytes do not need an offset applied to texture fetches;
cudaDevAttrTexturePitchAlignment: Pitch alignment requirement for 2D texture references bound to pitched memory;
cudaDevAttrGpuOverlap: 1 if the device can concurrently copy memory between host and device while executing a kernel, or 0 if not;
cudaDevAttrMultiProcessorCount: Number of multiprocessors on the device;
cudaDevAttrKernelExecTimeout: 1 if there is a run time limit for kernels executed on the device, or 0 if not;
cudaDevAttrIntegrated: 1 if the device is integrated with the memory subsystem, or 0 if not;
cudaDevAttrCanMapHostMemory: 1 if the device can map host memory into the CUDA address space, or 0 if not;
cudaComputeModeDefault: Default mode - Device is not restricted and multiple threads can use cudaSetDevice() with this device.
cudaComputeModeExclusive: Compute-exclusive mode - Only one thread will be able to use cudaSetDevice() with this device.
cudaComputeModeProhibited: Compute-prohibited mode - No threads can use cudaSetDevice() with this device.
cudaComputeModeExclusiveProcess: Compute-exclusive-process mode - Many threads in one process will be able to use cudaSetDevice() with this device.
cudaDevAttrConcurrentKernels: 1 if the device supports executing multiple kernels within the same context simultaneously, or 0 if not. It is not guaranteed that multiple kernels will be resident on the device concurrently so this feature should not be relied upon for correctness;
cudaDevAttrEccEnabled: 1 if error correction is enabled on the device, 0 if error correction is disabled or not supported by the device;
cudaDevAttrPciBusId: PCI bus identifier of the device;
cudaDevAttrPciDeviceId: PCI device (also known as slot) identifier of the device;
cudaDevAttrTccDriver: 1 if the device is using a TCC driver. TCC is only available on Tesla hardware running Windows Vista or later;
cudaDevAttrMemoryClockRate: Peak memory clock frequency in kilohertz;
cudaDevAttrGlobalMemoryBusWidth: Global memory bus width in bits;
cudaDevAttrL2CacheSize: Size of L2 cache in bytes. 0 if the device doesn't have L2 cache;
cudaDevAttrMaxThreadsPerMultiProcessor: Maximum resident threads per multiprocessor;
cudaDevAttrUnifiedAddressing: 1 if the device shares a unified address space with the host, or 0 if not;
cudaDevAttrComputeCapabilityMajor: Major compute capability version number;
cudaDevAttrComputeCapabilityMinor: Minor compute capability version number;
Note that this function may also return error codes from previous, asynchronous launches.
value
- Returned device attribute valueattr
- Device attribute to querydevice
- Device number to querycudaGetDeviceCount(int[])
,
cudaGetDevice(int[])
,
cudaSetDevice(int)
,
cudaChooseDevice(int[], jcuda.runtime.cudaDeviceProp)
,
cudaGetDeviceProperties(jcuda.runtime.cudaDeviceProp, int)
public static int cudaDeviceGetP2PAttribute(int[] value, int attr, int srcDevice, int dstDevice)
value
- Returned value of the requested attributeattrib
- The requested attribute of the link between srcDevice and dstDevice.srcDevice
- The source device of the target link.dstDevice
- The destination device of the target link.JCuda#cudaCtxEnablePeerAccess
,
JCuda#cudaCtxDisablePeerAccess
,
JCuda#cudaCtxCanAccessPeer
public static int cudaChooseDevice(int[] device, cudaDeviceProp prop)
cudaError_t cudaChooseDevice ( int* device, const cudaDeviceProp* prop )
Select compute-device which best matches criteria. Returns in *device the device which has properties that best match *prop.
Note that this function may also return error codes from previous, asynchronous launches.
device
- Device with best matchprop
- Desired device propertiescudaGetDeviceCount(int[])
,
cudaGetDevice(int[])
,
cudaSetDevice(int)
,
cudaGetDeviceProperties(jcuda.runtime.cudaDeviceProp, int)
public static int cudaMalloc3D(cudaPitchedPtr pitchDevPtr, cudaExtent extent)
cudaError_t cudaMalloc3D ( cudaPitchedPtr* pitchedDevPtr, cudaExtent extent )
Allocates logical 1D, 2D, or 3D memory objects on the device. Allocates at least width * height * depth bytes of linear memory on the device and returns a cudaPitchedPtr in which ptr is a pointer to the allocated memory. The function may pad the allocation to ensure hardware alignment requirements are met. The pitch returned in the pitch field of pitchedDevPtr is the width in bytes of the allocation.
The returned cudaPitchedPtr contains additional fields xsize and ysize, the logical width and height of the allocation, which are equivalent to the width and heightextent parameters provided by the programmer during allocation.
For allocations of 2D and 3D objects, it is highly recommended that programmers perform allocations using cudaMalloc3D() or cudaMallocPitch(). Due to alignment restrictions in the hardware, this is especially true if the application will be performing memory copies involving 2D or 3D objects (whether linear memory or CUDA arrays).
Note that this function may also return error codes from previous, asynchronous launches.
pitchedDevPtr
- Pointer to allocated pitched device memoryextent
- Requested allocation size (width field in bytes)cudaMallocPitch(jcuda.Pointer, long[], long, long)
,
cudaFree(jcuda.Pointer)
,
cudaMemcpy3D(jcuda.runtime.cudaMemcpy3DParms)
,
cudaMemset3D(jcuda.runtime.cudaPitchedPtr, int, jcuda.runtime.cudaExtent)
,
cudaMalloc3DArray(jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc, jcuda.runtime.cudaExtent)
,
cudaMallocArray(jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc, long, long)
,
cudaFreeArray(jcuda.runtime.cudaArray)
,
cudaMallocHost(jcuda.Pointer, long)
,
cudaFreeHost(jcuda.Pointer)
,
cudaHostAlloc(jcuda.Pointer, long, int)
,
cudaPitchedPtr
,
cudaExtent
public static int cudaMalloc3DArray(cudaArray arrayPtr, cudaChannelFormatDesc desc, cudaExtent extent)
cudaError_t cudaMalloc3DArray ( cudaArray_t* array, const cudaChannelFormatDesc* desc, cudaExtent extent, unsigned int flags = 0 )
Allocate an array on the device. Allocates a CUDA array according to the cudaChannelFormatDesc structure desc and returns a handle to the new CUDA array in *array.
The cudaChannelFormatDesc is defined as:
struct cudaChannelFormatDesc { int x, y, z, w; enum cudaChannelFormatKind f; };where cudaChannelFormatKind is one of cudaChannelFormatKindSigned, cudaChannelFormatKindUnsigned, or cudaChannelFormatKindFloat.
cudaMalloc3DArray() can allocate the following:
A 1D array is allocated if the height and depth extents are both zero.
A 2D array is allocated if only the depth extent is zero.
A 3D array is allocated if all three extents are non-zero.
A 1D layered CUDA array is allocated if only the height extent is zero and the cudaArrayLayered flag is set. Each layer is a 1D array. The number of layers is determined by the depth extent.
A 2D layered CUDA array is allocated if all three extents are non-zero and the cudaArrayLayered flag is set. Each layer is a 2D array. The number of layers is determined by the depth extent.
A cubemap CUDA array is allocated if all three extents are non-zero and the cudaArrayCubemap flag is set. Width must be equal to height, and depth must be six. A cubemap is a special type of 2D layered CUDA array, where the six layers represent the six faces of a cube. The order of the six layers in memory is the same as that listed in cudaGraphicsCubeFace.
A cubemap layered CUDA array is allocated if all three extents are non-zero, and both, cudaArrayCubemap and cudaArrayLayered flags are set. Width must be equal to height, and depth must be a multiple of six. A cubemap layered CUDA array is a special type of 2D layered CUDA array that consists of a collection of cubemaps. The first six layers represent the first cubemap, the next six layers form the second cubemap, and so on.
The flags parameter enables different options to be specified that affect the allocation, as follows.
cudaArrayDefault: This flag's value is defined to be 0 and provides default array allocation
cudaArrayLayered: Allocates a layered CUDA array, with the depth extent indicating the number of layers
cudaArrayCubemap: Allocates a cubemap CUDA array. Width must be equal to height, and depth must be six. If the cudaArrayLayered flag is also set, depth must be a multiple of six.
cudaArraySurfaceLoadStore: Allocates a CUDA array that could be read from or written to using a surface reference.
cudaArrayTextureGather: This flag indicates that texture gather operations will be performed on the CUDA array. Texture gather can only be performed on 2D CUDA arrays.
The width, height and depth extents must meet certain size requirements as listed in the following table. All values are specified in elements.
Note that 2D CUDA arrays have different size requirements if the cudaArrayTextureGather flag is set. In that case, the valid range for (width, height, depth) is ((1,maxTexture2DGather[0]), (1,maxTexture2DGather[1]), 0).
CUDA array type |
Valid extents that must always be met {(width range in elements), (height range), (depth range)} |
Valid extents with cudaArraySurfaceLoadStore set {(width range in elements), (height range), (depth range)} |
1D |
{ (1,maxTexture1D), 0, 0 } |
{ (1,maxSurface1D), 0, 0 } |
2D |
{ (1,maxTexture2D[0]), (1,maxTexture2D[1]), 0 } |
{ (1,maxSurface2D[0]), (1,maxSurface2D[1]), 0 } |
3D |
{ (1,maxTexture3D[0]), (1,maxTexture3D[1]), (1,maxTexture3D[2]) } |
{ (1,maxSurface3D[0]), (1,maxSurface3D[1]), (1,maxSurface3D[2]) } |
1D Layered |
{ (1,maxTexture1DLayered[0]), 0, (1,maxTexture1DLayered[1]) } |
{ (1,maxSurface1DLayered[0]), 0, (1,maxSurface1DLayered[1]) } |
2D Layered |
{ (1,maxTexture2DLayered[0]), (1,maxTexture2DLayered[1]), (1,maxTexture2DLayered[2]) } |
{ (1,maxSurface2DLayered[0]), (1,maxSurface2DLayered[1]), (1,maxSurface2DLayered[2]) } |
Cubemap |
{ (1,maxTextureCubemap), (1,maxTextureCubemap), 6 } |
{ (1,maxSurfaceCubemap), (1,maxSurfaceCubemap), 6 } |
Cubemap Layered |
{ (1,maxTextureCubemapLayered[0]), (1,maxTextureCubemapLayered[0]), (1,maxTextureCubemapLayered[1]) } |
{ (1,maxSurfaceCubemapLayered[0]), (1,maxSurfaceCubemapLayered[0]), (1,maxSurfaceCubemapLayered[1]) } |
Note that this function may also return error codes from previous, asynchronous launches.
array
- Pointer to allocated array in device memorydesc
- Requested channel formatextent
- Requested allocation size (width field in elements)flags
- Flags for extensionscudaMalloc3D(jcuda.runtime.cudaPitchedPtr, jcuda.runtime.cudaExtent)
,
cudaMalloc(jcuda.Pointer, long)
,
cudaMallocPitch(jcuda.Pointer, long[], long, long)
,
cudaFree(jcuda.Pointer)
,
cudaFreeArray(jcuda.runtime.cudaArray)
,
cudaMallocHost(jcuda.Pointer, long)
,
cudaFreeHost(jcuda.Pointer)
,
cudaHostAlloc(jcuda.Pointer, long, int)
,
cudaExtent
public static int cudaMalloc3DArray(cudaArray arrayPtr, cudaChannelFormatDesc desc, cudaExtent extent, int flags)
cudaError_t cudaMalloc3DArray ( cudaArray_t* array, const cudaChannelFormatDesc* desc, cudaExtent extent, unsigned int flags = 0 )
Allocate an array on the device. Allocates a CUDA array according to the cudaChannelFormatDesc structure desc and returns a handle to the new CUDA array in *array.
The cudaChannelFormatDesc is defined as:
struct cudaChannelFormatDesc { int x, y, z, w; enum cudaChannelFormatKind f; };where cudaChannelFormatKind is one of cudaChannelFormatKindSigned, cudaChannelFormatKindUnsigned, or cudaChannelFormatKindFloat.
cudaMalloc3DArray() can allocate the following:
A 1D array is allocated if the height and depth extents are both zero.
A 2D array is allocated if only the depth extent is zero.
A 3D array is allocated if all three extents are non-zero.
A 1D layered CUDA array is allocated if only the height extent is zero and the cudaArrayLayered flag is set. Each layer is a 1D array. The number of layers is determined by the depth extent.
A 2D layered CUDA array is allocated if all three extents are non-zero and the cudaArrayLayered flag is set. Each layer is a 2D array. The number of layers is determined by the depth extent.
A cubemap CUDA array is allocated if all three extents are non-zero and the cudaArrayCubemap flag is set. Width must be equal to height, and depth must be six. A cubemap is a special type of 2D layered CUDA array, where the six layers represent the six faces of a cube. The order of the six layers in memory is the same as that listed in cudaGraphicsCubeFace.
A cubemap layered CUDA array is allocated if all three extents are non-zero, and both, cudaArrayCubemap and cudaArrayLayered flags are set. Width must be equal to height, and depth must be a multiple of six. A cubemap layered CUDA array is a special type of 2D layered CUDA array that consists of a collection of cubemaps. The first six layers represent the first cubemap, the next six layers form the second cubemap, and so on.
The flags parameter enables different options to be specified that affect the allocation, as follows.
cudaArrayDefault: This flag's value is defined to be 0 and provides default array allocation
cudaArrayLayered: Allocates a layered CUDA array, with the depth extent indicating the number of layers
cudaArrayCubemap: Allocates a cubemap CUDA array. Width must be equal to height, and depth must be six. If the cudaArrayLayered flag is also set, depth must be a multiple of six.
cudaArraySurfaceLoadStore: Allocates a CUDA array that could be read from or written to using a surface reference.
cudaArrayTextureGather: This flag indicates that texture gather operations will be performed on the CUDA array. Texture gather can only be performed on 2D CUDA arrays.
The width, height and depth extents must meet certain size requirements as listed in the following table. All values are specified in elements.
Note that 2D CUDA arrays have different size requirements if the cudaArrayTextureGather flag is set. In that case, the valid range for (width, height, depth) is ((1,maxTexture2DGather[0]), (1,maxTexture2DGather[1]), 0).
CUDA array type |
Valid extents that must always be met {(width range in elements), (height range), (depth range)} |
Valid extents with cudaArraySurfaceLoadStore set {(width range in elements), (height range), (depth range)} |
1D |
{ (1,maxTexture1D), 0, 0 } |
{ (1,maxSurface1D), 0, 0 } |
2D |
{ (1,maxTexture2D[0]), (1,maxTexture2D[1]), 0 } |
{ (1,maxSurface2D[0]), (1,maxSurface2D[1]), 0 } |
3D |
{ (1,maxTexture3D[0]), (1,maxTexture3D[1]), (1,maxTexture3D[2]) } |
{ (1,maxSurface3D[0]), (1,maxSurface3D[1]), (1,maxSurface3D[2]) } |
1D Layered |
{ (1,maxTexture1DLayered[0]), 0, (1,maxTexture1DLayered[1]) } |
{ (1,maxSurface1DLayered[0]), 0, (1,maxSurface1DLayered[1]) } |
2D Layered |
{ (1,maxTexture2DLayered[0]), (1,maxTexture2DLayered[1]), (1,maxTexture2DLayered[2]) } |
{ (1,maxSurface2DLayered[0]), (1,maxSurface2DLayered[1]), (1,maxSurface2DLayered[2]) } |
Cubemap |
{ (1,maxTextureCubemap), (1,maxTextureCubemap), 6 } |
{ (1,maxSurfaceCubemap), (1,maxSurfaceCubemap), 6 } |
Cubemap Layered |
{ (1,maxTextureCubemapLayered[0]), (1,maxTextureCubemapLayered[0]), (1,maxTextureCubemapLayered[1]) } |
{ (1,maxSurfaceCubemapLayered[0]), (1,maxSurfaceCubemapLayered[0]), (1,maxSurfaceCubemapLayered[1]) } |
Note that this function may also return error codes from previous, asynchronous launches.
array
- Pointer to allocated array in device memorydesc
- Requested channel formatextent
- Requested allocation size (width field in elements)flags
- Flags for extensionscudaMalloc3D(jcuda.runtime.cudaPitchedPtr, jcuda.runtime.cudaExtent)
,
cudaMalloc(jcuda.Pointer, long)
,
cudaMallocPitch(jcuda.Pointer, long[], long, long)
,
cudaFree(jcuda.Pointer)
,
cudaFreeArray(jcuda.runtime.cudaArray)
,
cudaMallocHost(jcuda.Pointer, long)
,
cudaFreeHost(jcuda.Pointer)
,
cudaHostAlloc(jcuda.Pointer, long, int)
,
cudaExtent
public static int cudaMallocMipmappedArray(cudaMipmappedArray mipmappedArray, cudaChannelFormatDesc desc, cudaExtent extent, int numLevels, int flags)
cudaError_t cudaMallocMipmappedArray ( cudaMipmappedArray_t* mipmappedArray, const cudaChannelFormatDesc* desc, cudaExtent extent, unsigned int numLevels, unsigned int flags = 0 )
Allocate a mipmapped array on the device. Allocates a CUDA mipmapped array according to the cudaChannelFormatDesc structure desc and returns a handle to the new CUDA mipmapped array in *mipmappedArray. numLevels specifies the number of mipmap levels to be allocated. This value is clamped to the range [1, 1 + floor(log2(max(width, height, depth)))].
The cudaChannelFormatDesc is defined as:
struct cudaChannelFormatDesc { int x, y, z, w; enum cudaChannelFormatKind f; };where cudaChannelFormatKind is one of cudaChannelFormatKindSigned, cudaChannelFormatKindUnsigned, or cudaChannelFormatKindFloat.
cudaMallocMipmappedArray() can allocate the following:
A 1D mipmapped array is allocated if the height and depth extents are both zero.
A 2D mipmapped array is allocated if only the depth extent is zero.
A 3D mipmapped array is allocated if all three extents are non-zero.
A 1D layered CUDA mipmapped array is allocated if only the height extent is zero and the cudaArrayLayered flag is set. Each layer is a 1D mipmapped array. The number of layers is determined by the depth extent.
A 2D layered CUDA mipmapped array is allocated if all three extents are non-zero and the cudaArrayLayered flag is set. Each layer is a 2D mipmapped array. The number of layers is determined by the depth extent.
A cubemap CUDA mipmapped array is allocated if all three extents are non-zero and the cudaArrayCubemap flag is set. Width must be equal to height, and depth must be six. The order of the six layers in memory is the same as that listed in cudaGraphicsCubeFace.
A cubemap layered CUDA mipmapped array is allocated if all three extents are non-zero, and both, cudaArrayCubemap and cudaArrayLayered flags are set. Width must be equal to height, and depth must be a multiple of six. A cubemap layered CUDA mipmapped array is a special type of 2D layered CUDA mipmapped array that consists of a collection of cubemap mipmapped arrays. The first six layers represent the first cubemap mipmapped array, the next six layers form the second cubemap mipmapped array, and so on.
The flags parameter enables different options to be specified that affect the allocation, as follows.
cudaArrayDefault: This flag's value is defined to be 0 and provides default mipmapped array allocation
cudaArrayLayered: Allocates a layered CUDA mipmapped array, with the depth extent indicating the number of layers
cudaArrayCubemap: Allocates a cubemap CUDA mipmapped array. Width must be equal to height, and depth must be six. If the cudaArrayLayered flag is also set, depth must be a multiple of six.
cudaArraySurfaceLoadStore: This flag indicates that individual mipmap levels of the CUDA mipmapped array will be read from or written to using a surface reference.
cudaArrayTextureGather: This flag indicates that texture gather operations will be performed on the CUDA array. Texture gather can only be performed on 2D CUDA mipmapped arrays, and the gather operations are performed only on the most detailed mipmap level.
The width, height and depth extents must meet certain size requirements as listed in the following table. All values are specified in elements.
CUDA array type |
Valid extents {(width range in elements), (height range), (depth range)} |
1D |
{ (1,maxTexture1DMipmap), 0, 0 } |
2D |
{ (1,maxTexture2DMipmap[0]), (1,maxTexture2DMipmap[1]), 0 } |
3D |
{ (1,maxTexture3D[0]), (1,maxTexture3D[1]), (1,maxTexture3D[2]) } |
1D Layered |
{ (1,maxTexture1DLayered[0]), 0, (1,maxTexture1DLayered[1]) } |
2D Layered |
{ (1,maxTexture2DLayered[0]), (1,maxTexture2DLayered[1]), (1,maxTexture2DLayered[2]) } |
Cubemap |
{ (1,maxTextureCubemap), (1,maxTextureCubemap), 6 } |
Cubemap Layered |
{ (1,maxTextureCubemapLayered[0]), (1,maxTextureCubemapLayered[0]), (1,maxTextureCubemapLayered[1]) } |
Note that this function may also return error codes from previous, asynchronous launches.
mipmappedArray
- Pointer to allocated mipmapped array in device memorydesc
- Requested channel formatextent
- Requested allocation size (width field in elements)numLevels
- Number of mipmap levels to allocateflags
- Flags for extensionscudaMalloc3D(jcuda.runtime.cudaPitchedPtr, jcuda.runtime.cudaExtent)
,
cudaMalloc(jcuda.Pointer, long)
,
cudaMallocPitch(jcuda.Pointer, long[], long, long)
,
cudaFree(jcuda.Pointer)
,
cudaFreeArray(jcuda.runtime.cudaArray)
,
cudaMallocHost(jcuda.Pointer, long)
,
cudaFreeHost(jcuda.Pointer)
,
cudaHostAlloc(jcuda.Pointer, long, int)
,
cudaExtent
public static int cudaGetMipmappedArrayLevel(cudaArray levelArray, cudaMipmappedArray mipmappedArray, int level)
cudaError_t cudaGetMipmappedArrayLevel ( cudaArray_t* levelArray, cudaMipmappedArray_const_t mipmappedArray, unsigned int level )
Gets a mipmap level of a CUDA mipmapped array. Returns in *levelArray a CUDA array that represents a single mipmap level of the CUDA mipmapped array mipmappedArray.
If level is greater than the maximum number of levels in this mipmapped array, cudaErrorInvalidValue is returned.
Note that this function may also return error codes from previous, asynchronous launches.
levelArray
- Returned mipmap level CUDA arraymipmappedArray
- CUDA mipmapped arraylevel
- Mipmap levelcudaMalloc3D(jcuda.runtime.cudaPitchedPtr, jcuda.runtime.cudaExtent)
,
cudaMalloc(jcuda.Pointer, long)
,
cudaMallocPitch(jcuda.Pointer, long[], long, long)
,
cudaFree(jcuda.Pointer)
,
cudaFreeArray(jcuda.runtime.cudaArray)
,
cudaMallocHost(jcuda.Pointer, long)
,
cudaFreeHost(jcuda.Pointer)
,
cudaHostAlloc(jcuda.Pointer, long, int)
,
cudaExtent
public static int cudaMemset3D(cudaPitchedPtr pitchDevPtr, int value, cudaExtent extent)
cudaError_t cudaMemset3D ( cudaPitchedPtr pitchedDevPtr, int value, cudaExtent extent )
Initializes or sets device memory to a value. Initializes each element of a 3D array to the specified value value. The object to initialize is defined by pitchedDevPtr. The pitch field of pitchedDevPtr is the width in memory in bytes of the 3D array pointed to by pitchedDevPtr, including any padding added to the end of each row. The xsize field specifies the logical width of each row in bytes, while the ysize field specifies the height of each 2D slice in rows.
The extents of the initialized region are specified as a width in bytes, a height in rows, and a depth in slices.
Extents with width greater than or equal to the xsize of pitchedDevPtr may perform significantly faster than extents narrower than the xsize. Secondarily, extents with height equal to the ysize of pitchedDevPtr will perform faster than when the height is shorter than the ysize.
This function performs fastest when the pitchedDevPtr has been allocated by cudaMalloc3D().
Note that this function is asynchronous with respect to the host unless pitchedDevPtr refers to pinned host memory.
Note that this function may also return error codes from previous, asynchronous launches.
pitchedDevPtr
- Pointer to pitched device memoryvalue
- Value to set for each byte of specified memoryextent
- Size parameters for where to set device memory (width field in bytes)cudaMemset(jcuda.Pointer, int, long)
,
cudaMemset2D(jcuda.Pointer, long, int, long, long)
,
cudaMemsetAsync(jcuda.Pointer, int, long, jcuda.runtime.cudaStream_t)
,
cudaMemset2DAsync(jcuda.Pointer, long, int, long, long, jcuda.runtime.cudaStream_t)
,
cudaMemset3DAsync(jcuda.runtime.cudaPitchedPtr, int, jcuda.runtime.cudaExtent, jcuda.runtime.cudaStream_t)
,
cudaMalloc3D(jcuda.runtime.cudaPitchedPtr, jcuda.runtime.cudaExtent)
,
cudaPitchedPtr
,
cudaExtent
public static int cudaMemsetAsync(Pointer devPtr, int value, long count, cudaStream_t stream)
cudaError_t cudaMemsetAsync ( void* devPtr, int value, size_t count, cudaStream_t stream = 0 )
Initializes or sets device memory to a value. Fills the first count bytes of the memory area pointed to by devPtr with the constant byte value value.
cudaMemsetAsync() is asynchronous with respect to the host, so the call may return before the memset is complete. The operation can optionally be associated to a stream by passing a non-zero stream argument. If stream is non-zero, the operation may overlap with operations in other streams.
Note that this function may also return error codes from previous, asynchronous launches.
devPtr
- Pointer to device memoryvalue
- Value to set for each byte of specified memorycount
- Size in bytes to setstream
- Stream identifiercudaMemset(jcuda.Pointer, int, long)
,
cudaMemset2D(jcuda.Pointer, long, int, long, long)
,
cudaMemset3D(jcuda.runtime.cudaPitchedPtr, int, jcuda.runtime.cudaExtent)
,
cudaMemset2DAsync(jcuda.Pointer, long, int, long, long, jcuda.runtime.cudaStream_t)
,
cudaMemset3DAsync(jcuda.runtime.cudaPitchedPtr, int, jcuda.runtime.cudaExtent, jcuda.runtime.cudaStream_t)
public static int cudaMemset2DAsync(Pointer devPtr, long pitch, int value, long width, long height, cudaStream_t stream)
cudaError_t cudaMemset2DAsync ( void* devPtr, size_t pitch, int value, size_t width, size_t height, cudaStream_t stream = 0 )
Initializes or sets device memory to a value. Sets to the specified value value a matrix (height rows of width bytes each) pointed to by dstPtr. pitch is the width in bytes of the 2D array pointed to by dstPtr, including any padding added to the end of each row. This function performs fastest when the pitch is one that has been passed back by cudaMallocPitch().
cudaMemset2DAsync() is asynchronous with respect to the host, so the call may return before the memset is complete. The operation can optionally be associated to a stream by passing a non-zero stream argument. If stream is non-zero, the operation may overlap with operations in other streams.
Note that this function may also return error codes from previous, asynchronous launches.
devPtr
- Pointer to 2D device memorypitch
- Pitch in bytes of 2D device memoryvalue
- Value to set for each byte of specified memorywidth
- Width of matrix set (columns in bytes)height
- Height of matrix set (rows)stream
- Stream identifiercudaMemset(jcuda.Pointer, int, long)
,
cudaMemset2D(jcuda.Pointer, long, int, long, long)
,
cudaMemset3D(jcuda.runtime.cudaPitchedPtr, int, jcuda.runtime.cudaExtent)
,
cudaMemsetAsync(jcuda.Pointer, int, long, jcuda.runtime.cudaStream_t)
,
cudaMemset3DAsync(jcuda.runtime.cudaPitchedPtr, int, jcuda.runtime.cudaExtent, jcuda.runtime.cudaStream_t)
public static int cudaMemset3DAsync(cudaPitchedPtr pitchedDevPtr, int value, cudaExtent extent, cudaStream_t stream)
cudaError_t cudaMemset3DAsync ( cudaPitchedPtr pitchedDevPtr, int value, cudaExtent extent, cudaStream_t stream = 0 )
Initializes or sets device memory to a value. Initializes each element of a 3D array to the specified value value. The object to initialize is defined by pitchedDevPtr. The pitch field of pitchedDevPtr is the width in memory in bytes of the 3D array pointed to by pitchedDevPtr, including any padding added to the end of each row. The xsize field specifies the logical width of each row in bytes, while the ysize field specifies the height of each 2D slice in rows.
The extents of the initialized region are specified as a width in bytes, a height in rows, and a depth in slices.
Extents with width greater than or equal to the xsize of pitchedDevPtr may perform significantly faster than extents narrower than the xsize. Secondarily, extents with height equal to the ysize of pitchedDevPtr will perform faster than when the height is shorter than the ysize.
This function performs fastest when the pitchedDevPtr has been allocated by cudaMalloc3D().
cudaMemset3DAsync() is asynchronous with respect to the host, so the call may return before the memset is complete. The operation can optionally be associated to a stream by passing a non-zero stream argument. If stream is non-zero, the operation may overlap with operations in other streams.
Note that this function may also return error codes from previous, asynchronous launches.
pitchedDevPtr
- Pointer to pitched device memoryvalue
- Value to set for each byte of specified memoryextent
- Size parameters for where to set device memory (width field in bytes)stream
- Stream identifiercudaMemset(jcuda.Pointer, int, long)
,
cudaMemset2D(jcuda.Pointer, long, int, long, long)
,
cudaMemset3D(jcuda.runtime.cudaPitchedPtr, int, jcuda.runtime.cudaExtent)
,
cudaMemsetAsync(jcuda.Pointer, int, long, jcuda.runtime.cudaStream_t)
,
cudaMemset2DAsync(jcuda.Pointer, long, int, long, long, jcuda.runtime.cudaStream_t)
,
cudaMalloc3D(jcuda.runtime.cudaPitchedPtr, jcuda.runtime.cudaExtent)
,
cudaPitchedPtr
,
cudaExtent
public static int cudaMemcpy3D(cudaMemcpy3DParms p)
cudaError_t cudaMemcpy3D ( const cudaMemcpy3DParms* p )
Copies data between 3D objects.
struct cudaExtent { size_t width; size_t height; size_t depth; }; struct cudaExtent make_cudaExtent(size_t w, size_t h, size_t d); struct cudaPos { size_t x; size_t y; size_t z; }; struct cudaPos make_cudaPos(size_t x, size_t y, size_t z); struct cudaMemcpy3DParms { cudaArray_t srcArray; struct cudaPos srcPos; struct cudaPitchedPtr srcPtr; cudaArray_t dstArray; struct cudaPos dstPos; struct cudaPitchedPtr dstPtr; struct cudaExtent extent; enum cudaMemcpyKind kind; };
cudaMemcpy3D() copies data betwen two 3D objects. The source and destination objects may be in either host memory, device memory, or a CUDA array. The source, destination, extent, and kind of copy performed is specified by the cudaMemcpy3DParms struct which should be initialized to zero before use:
cudaMemcpy3DParms myParms = {0};
The struct passed to cudaMemcpy3D() must specify one of srcArray or srcPtr and one of dstArray or dstPtr. Passing more than one non-zero source or destination will cause cudaMemcpy3D() to return an error.
The srcPos and dstPos fields are optional offsets into the source and destination objects and are defined in units of each object's elements. The element for a host or device pointer is assumed to be unsigned char. For CUDA arrays, positions must be in the range [0, 2048) for any dimension.
The extent field defines the dimensions of the transferred area in elements. If a CUDA array is participating in the copy, the extent is defined in terms of that array's elements. If no CUDA array is participating in the copy then the extents are defined in elements of unsigned char.
The kind field defines the direction of the copy. It must be one of cudaMemcpyHostToHost, cudaMemcpyHostToDevice, cudaMemcpyDeviceToHost, or cudaMemcpyDeviceToDevice.
If the source and destination are both arrays, cudaMemcpy3D() will return an error if they do not have the same element size.
The source and destination object may not overlap. If overlapping source and destination objects are specified, undefined behavior will result.
The source object must lie entirely within the region defined by srcPos and extent. The destination object must lie entirely within the region defined by dstPos and extent.
cudaMemcpy3D() returns an error if the pitch of srcPtr or dstPtr exceeds the maximum allowed. The pitch of a cudaPitchedPtr allocated with cudaMalloc3D() will always be valid.
Note that this function may also return error codes from previous, asynchronous launches.
This function exhibits synchronous behavior for most use cases.
p
- 3D memory copy parameterscudaMalloc3D(jcuda.runtime.cudaPitchedPtr, jcuda.runtime.cudaExtent)
,
cudaMalloc3DArray(jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc, jcuda.runtime.cudaExtent)
,
cudaMemset3D(jcuda.runtime.cudaPitchedPtr, int, jcuda.runtime.cudaExtent)
,
cudaMemcpy3DAsync(jcuda.runtime.cudaMemcpy3DParms, jcuda.runtime.cudaStream_t)
,
cudaMemcpy(jcuda.Pointer, jcuda.Pointer, long, int)
,
cudaMemcpy2D(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpyToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int)
,
cudaMemcpy2DToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpyFromArray(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpy2DFromArray(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpy2DArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyToSymbol(java.lang.String, jcuda.Pointer, long, long, int)
,
cudaMemcpyFromSymbol(jcuda.Pointer, java.lang.String, long, long, int)
,
cudaMemcpyAsync(jcuda.Pointer, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DAsync(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromArrayAsync(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DFromArrayAsync(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToSymbolAsync(java.lang.String, jcuda.Pointer, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromSymbolAsync(jcuda.Pointer, java.lang.String, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaExtent
,
cudaPos
public static int cudaMemcpy3DPeer(cudaMemcpy3DPeerParms p)
cudaError_t cudaMemcpy3DPeer ( const cudaMemcpy3DPeerParms* p )
Copies memory between devices. Perform a 3D memory copy according to the parameters specified in p. See the definition of the cudaMemcpy3DPeerParms structure for documentation of its parameters.
Note that this function is synchronous with respect to the host only if the source or destination of the transfer is host memory. Note also that this copy is serialized with respect to all pending and future asynchronous work in to the current device, the copy's source device, and the copy's destination device (use cudaMemcpy3DPeerAsync to avoid this synchronization).
Note that this function may also return error codes from previous, asynchronous launches.
This function exhibits synchronous behavior for most use cases.
p
- Parameters for the memory copycudaMemcpy(jcuda.Pointer, jcuda.Pointer, long, int)
,
cudaMemcpyPeer(jcuda.Pointer, int, jcuda.Pointer, int, long)
,
cudaMemcpyAsync(jcuda.Pointer, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyPeerAsync(jcuda.Pointer, int, jcuda.Pointer, int, long, jcuda.runtime.cudaStream_t)
,
cudaMemcpy3DPeerAsync(jcuda.runtime.cudaMemcpy3DPeerParms, jcuda.runtime.cudaStream_t)
public static int cudaMemcpy3DAsync(cudaMemcpy3DParms p, cudaStream_t stream)
cudaError_t cudaMemcpy3DAsync ( const cudaMemcpy3DParms* p, cudaStream_t stream = 0 )
Copies data between 3D objects.
struct cudaExtent { size_t width; size_t height; size_t depth; }; struct cudaExtent make_cudaExtent(size_t w, size_t h, size_t d); struct cudaPos { size_t x; size_t y; size_t z; }; struct cudaPos make_cudaPos(size_t x, size_t y, size_t z); struct cudaMemcpy3DParms { cudaArray_t srcArray; struct cudaPos srcPos; struct cudaPitchedPtr srcPtr; cudaArray_t dstArray; struct cudaPos dstPos; struct cudaPitchedPtr dstPtr; struct cudaExtent extent; enum cudaMemcpyKind kind; };
cudaMemcpy3DAsync() copies data betwen two 3D objects. The source and destination objects may be in either host memory, device memory, or a CUDA array. The source, destination, extent, and kind of copy performed is specified by the cudaMemcpy3DParms struct which should be initialized to zero before use:
cudaMemcpy3DParms myParms = {0};
The struct passed to cudaMemcpy3DAsync() must specify one of srcArray or srcPtr and one of dstArray or dstPtr. Passing more than one non-zero source or destination will cause cudaMemcpy3DAsync() to return an error.
The srcPos and dstPos fields are optional offsets into the source and destination objects and are defined in units of each object's elements. The element for a host or device pointer is assumed to be unsigned char. For CUDA arrays, positions must be in the range [0, 2048) for any dimension.
The extent field defines the dimensions of the transferred area in elements. If a CUDA array is participating in the copy, the extent is defined in terms of that array's elements. If no CUDA array is participating in the copy then the extents are defined in elements of unsigned char.
The kind field defines the direction of the copy. It must be one of cudaMemcpyHostToHost, cudaMemcpyHostToDevice, cudaMemcpyDeviceToHost, or cudaMemcpyDeviceToDevice.
If the source and destination are both arrays, cudaMemcpy3DAsync() will return an error if they do not have the same element size.
The source and destination object may not overlap. If overlapping source and destination objects are specified, undefined behavior will result.
The source object must lie entirely within the region defined by srcPos and extent. The destination object must lie entirely within the region defined by dstPos and extent.
cudaMemcpy3DAsync() returns an error if the pitch of srcPtr or dstPtr exceeds the maximum allowed. The pitch of a cudaPitchedPtr allocated with cudaMalloc3D() will always be valid.
cudaMemcpy3DAsync() is asynchronous with respect to the host, so the call may return before the copy is complete. The copy can optionally be associated to a stream by passing a non-zero stream argument. If kind is cudaMemcpyHostToDevice or cudaMemcpyDeviceToHost and stream is non-zero, the copy may overlap with operations in other streams.
Note that this function may also return error codes from previous, asynchronous launches.
This function exhibits asynchronous behavior for most use cases.
p
- 3D memory copy parametersstream
- Stream identifiercudaMalloc3D(jcuda.runtime.cudaPitchedPtr, jcuda.runtime.cudaExtent)
,
cudaMalloc3DArray(jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc, jcuda.runtime.cudaExtent)
,
cudaMemset3D(jcuda.runtime.cudaPitchedPtr, int, jcuda.runtime.cudaExtent)
,
cudaMemcpy3D(jcuda.runtime.cudaMemcpy3DParms)
,
cudaMemcpy(jcuda.Pointer, jcuda.Pointer, long, int)
,
cudaMemcpy2D(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpyToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int)
,
cudaMemcpy2DToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpyFromArray(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpy2DFromArray(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpy2DArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyToSymbol(java.lang.String, jcuda.Pointer, long, long, int)
,
cudaMemcpyFromSymbol(jcuda.Pointer, java.lang.String, long, long, int)
,
cudaMemcpyAsync(jcuda.Pointer, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DAsync(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromArrayAsync(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DFromArrayAsync(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToSymbolAsync(java.lang.String, jcuda.Pointer, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromSymbolAsync(jcuda.Pointer, java.lang.String, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaExtent
,
cudaPos
public static int cudaMemcpy3DPeerAsync(cudaMemcpy3DPeerParms p, cudaStream_t stream)
cudaError_t cudaMemcpy3DPeerAsync ( const cudaMemcpy3DPeerParms* p, cudaStream_t stream = 0 )
Copies memory between devices asynchronously. Perform a 3D memory copy according to the parameters specified in p. See the definition of the cudaMemcpy3DPeerParms structure for documentation of its parameters.
Note that this function may also return error codes from previous, asynchronous launches.
This function exhibits asynchronous behavior for most use cases.
p
- Parameters for the memory copystream
- Stream identifiercudaMemcpy(jcuda.Pointer, jcuda.Pointer, long, int)
,
cudaMemcpyPeer(jcuda.Pointer, int, jcuda.Pointer, int, long)
,
cudaMemcpyAsync(jcuda.Pointer, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyPeerAsync(jcuda.Pointer, int, jcuda.Pointer, int, long, jcuda.runtime.cudaStream_t)
,
cudaMemcpy3DPeerAsync(jcuda.runtime.cudaMemcpy3DPeerParms, jcuda.runtime.cudaStream_t)
public static int cudaMemGetInfo(long[] free, long[] total)
cudaError_t cudaMemGetInfo ( size_t* free, size_t* total )
Gets free and total device memory. Returns in *free and *total respectively, the free and total amount of memory available for allocation by the device in bytes.
Note that this function may also return error codes from previous, asynchronous launches.
free
- Returned free memory in bytestotal
- Returned total memory in bytespublic static int cudaArrayGetInfo(cudaChannelFormatDesc desc, cudaExtent extent, int[] flags, cudaArray array)
cudaError_t cudaArrayGetInfo ( cudaChannelFormatDesc* desc, cudaExtent* extent, unsigned int* flags, cudaArray_t array )
Gets info about the specified cudaArray. Returns in *desc, *extent and *flags respectively, the type, shape and flags of array.
Any of *desc, *extent and *flags may be specified as NULL.
Note that this function may also return error codes from previous, asynchronous launches.
desc
- Returned array typeextent
- Returned array shape. 2D arrays will have depth of zeroflags
- Returned array flagsarray
- The cudaArray to get info forpublic static int cudaHostAlloc(Pointer ptr, long size, int flags)
cudaError_t cudaHostAlloc ( void** pHost, size_t size, unsigned int flags )
Allocates page-locked memory on the host. Allocates size bytes of host memory that is page-locked and accessible to the device. The driver tracks the virtual memory ranges allocated with this function and automatically accelerates calls to functions such as cudaMemcpy(). Since the memory can be accessed directly by the device, it can be read or written with much higher bandwidth than pageable memory obtained with functions such as malloc(). Allocating excessive amounts of pinned memory may degrade system performance, since it reduces the amount of memory available to the system for paging. As a result, this function is best used sparingly to allocate staging areas for data exchange between host and device.
The flags parameter enables different options to be specified that affect the allocation, as follows.
cudaHostAllocDefault: This flag's value is defined to be 0 and causes cudaHostAlloc() to emulate cudaMallocHost().
cudaHostAllocPortable: The memory returned by this call will be considered as pinned memory by all CUDA contexts, not just the one that performed the allocation.
cudaHostAllocMapped: Maps the allocation into the CUDA address space. The device pointer to the memory may be obtained by calling cudaHostGetDevicePointer().
cudaHostAllocWriteCombined: Allocates the memory as write-combined (WC). WC memory can be transferred across the PCI Express bus more quickly on some system configurations, but cannot be read efficiently by most CPUs. WC memory is a good option for buffers that will be written by the CPU and read by the device via mapped pinned memory or host->device transfers.
All of these flags are orthogonal to one another: a developer may allocate memory that is portable, mapped and/or write-combined with no restrictions.
cudaSetDeviceFlags() must have been called with the cudaDeviceMapHost flag in order for the cudaHostAllocMapped flag to have any effect.
The cudaHostAllocMapped flag may be specified on CUDA contexts for devices that do not support mapped pinned memory. The failure is deferred to cudaHostGetDevicePointer() because the memory may be mapped into other CUDA contexts via the cudaHostAllocPortable flag.
Memory allocated by this function must be freed with cudaFreeHost().
Note that this function may also return error codes from previous, asynchronous launches.
pHost
- Device pointer to allocated memorysize
- Requested allocation size in bytesflags
- Requested properties of allocated memorycudaSetDeviceFlags(int)
,
cudaMallocHost(jcuda.Pointer, long)
,
cudaFreeHost(jcuda.Pointer)
public static int cudaHostRegister(Pointer ptr, long size, int flags)
cudaError_t cudaHostRegister ( void* ptr, size_t size, unsigned int flags )
Registers an existing host memory range for use by CUDA. Page-locks the memory range specified by ptr and size and maps it for the device(s) as specified by flags. This memory range also is added to the same tracking mechanism as cudaHostAlloc() to automatically accelerate calls to functions such as cudaMemcpy(). Since the memory can be accessed directly by the device, it can be read or written with much higher bandwidth than pageable memory that has not been registered. Page-locking excessive amounts of memory may degrade system performance, since it reduces the amount of memory available to the system for paging. As a result, this function is best used sparingly to register staging areas for data exchange between host and device.
The flags parameter enables different options to be specified that affect the allocation, as follows.
cudaHostRegisterPortable: The memory returned by this call will be considered as pinned memory by all CUDA contexts, not just the one that performed the allocation.
cudaHostRegisterMapped: Maps the allocation into the CUDA address space. The device pointer to the memory may be obtained by calling cudaHostGetDevicePointer(). This feature is available only on GPUs with compute capability greater than or equal to 1.1.
cudaHostRegisterIoMemory: The passed memory pointer is treated as pointing to some memory-mapped I/O space, e.g. belonging to a third-party PCIe device, and it will marked as non cache-coherent and contiguous.
All of these flags are orthogonal to one another: a developer may page-lock memory that is portable or mapped with no restrictions.
The CUDA context must have been created with the cudaMapHost flag in order for the cudaHostRegisterMapped flag to have any effect.
The cudaHostRegisterMapped flag may be specified on CUDA contexts for devices that do not support mapped pinned memory. The failure is deferred to cudaHostGetDevicePointer() because the memory may be mapped into other CUDA contexts via the cudaHostRegisterPortable flag.
The memory page-locked by this function must be unregistered with cudaHostUnregister().
Note that this function may also return error codes from previous, asynchronous launches.
ptr
- Host pointer to memory to page-locksize
- Size in bytes of the address range to page-lock in bytesflags
- Flags for allocation requestcudaHostUnregister(jcuda.Pointer)
,
JCuda#cudaHostGetFlags
,
cudaHostGetDevicePointer(jcuda.Pointer, jcuda.Pointer, int)
public static int cudaHostUnregister(Pointer ptr)
cudaError_t cudaHostUnregister ( void* ptr )
Unregisters a memory range that was registered with cudaHostRegister. Unmaps the memory range whose base address is specified by ptr, and makes it pageable again.
The base address must be the same one specified to cudaHostRegister().
Note that this function may also return error codes from previous, asynchronous launches.
ptr
- Host pointer to memory to unregistercudaHostUnregister(jcuda.Pointer)
public static int cudaHostGetDevicePointer(Pointer pDevice, Pointer pHost, int flags)
cudaError_t cudaHostGetDevicePointer ( void** pDevice, void* pHost, unsigned int flags )
Passes back device pointer of mapped host memory allocated by cudaHostAlloc or registered by cudaHostRegister. Passes back the device pointer corresponding to the mapped, pinned host buffer allocated by cudaHostAlloc() or registered by cudaHostRegister().
cudaHostGetDevicePointer() will fail if the cudaDeviceMapHost flag was not specified before deferred context creation occurred, or if called on a device that does not support mapped, pinned memory.
flags provides for future releases. For now, it must be set to 0.
Note that this function may also return error codes from previous, asynchronous launches.
pDevice
- Returned device pointer for mapped memorypHost
- Requested host pointer mappingflags
- Flags for extensions (must be 0 for now)cudaSetDeviceFlags(int)
,
cudaHostAlloc(jcuda.Pointer, long, int)
public static int cudaMallocManaged(Pointer devPtr, long size, int flags)
__host__ cudaError_t cudaMallocManaged ( void** devPtr, size_t size, unsigned int flags = cudaMemAttachGlobal )
Allocates size bytes of managed memory on the device and returns in *devPtr a pointer to the allocated memory. If the device doesn't support allocating managed memory, cudaErrorNotSupported is returned. Support for managed memory can be queried using the device attribute cudaDevAttrManagedMemory. The allocated memory is suitably aligned for any kind of variable. The memory is not cleared. If size is 0, cudaMallocManaged returns cudaErrorInvalidValue. The pointer is valid on the CPU and on all GPUs in the system that support managed memory. All accesses to this pointer must obey the Unified Memory programming model.
flags specifies the default stream association for this allocation. flags must be one of cudaMemAttachGlobal or cudaMemAttachHost. The default value for flags is cudaMemAttachGlobal. If cudaMemAttachGlobal is specified, then this memory is accessible from any stream on any device. If cudaMemAttachHost is specified, then the allocation is created with initial visibility restricted to host access only; an explicit call to cudaStreamAttachMemAsync will be required to enable access on the device.
If the association is later changed via cudaStreamAttachMemAsync to a single stream, the default association, as specifed during cudaMallocManaged, is restored when that stream is destroyed. For __managed__ variables, the default association is always cudaMemAttachGlobal. Note that destroying a stream is an asynchronous operation, and as a result, the change to default association won't happen until all work in the stream has completed.
Memory allocated with cudaMallocManaged should be released with cudaFree.
On a multi-GPU system with peer-to-peer support, where multiple GPUs support managed memory, the physical storage is created on the GPU which is active at the time cudaMallocManaged is called. All other GPUs will reference the data at reduced bandwidth via peer mappings over the PCIe bus. The Unified Memory management system does not migrate memory between GPUs.
On a multi-GPU system where multiple GPUs support managed memory, but not all pairs of such GPUs have peer-to-peer support between them, the physical storage is created in 'zero-copy' or system memory. All GPUs will reference the data at reduced bandwidth over the PCIe bus. In these circumstances, use of the environment variable, CUDA_VISIBLE_DEVICES, is recommended to restrict CUDA to only use those GPUs that have peer-to-peer support. Alternatively, users can also set CUDA_MANAGED_FORCE_DEVICE_ALLOC to a non-zero value to force the driver to always use device memory for physical storage. When this environment variable is set to a non-zero value, all devices used in that process that support managed memory have to be peer-to-peer compatible with each other. The error cudaErrorInvalidDevice will be returned if a device that supports managed memory is used and it is not peer-to-peer compatible with any of the other managed memory supporting devices that were previously used in that process, even if cudaDeviceReset has been called on those devices. These environment variables are described in the CUDA programming guide under the "CUDA environment variables" section.
devPtr
- The device pointersize
- The size in bytesflags
- The flagscudaMallocPitch(jcuda.Pointer, long[], long, long)
,
cudaFree(jcuda.Pointer)
,
cudaMallocArray(jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc, long, long)
,
cudaFreeArray(jcuda.runtime.cudaArray)
,
cudaMalloc3D(jcuda.runtime.cudaPitchedPtr, jcuda.runtime.cudaExtent)
,
cudaMalloc3DArray(jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc, jcuda.runtime.cudaExtent)
,
cudaMallocHost(jcuda.Pointer, long)
,
cudaFreeHost(jcuda.Pointer)
,
cudaHostAlloc(jcuda.Pointer, long, int)
,
cudaDeviceGetAttribute(int[], int, int)
,
cudaStreamAttachMemAsync(jcuda.runtime.cudaStream_t, jcuda.Pointer, long, int)
public static int cudaMalloc(Pointer devPtr, long size)
cudaError_t cudaMalloc ( void** devPtr, size_t size )
Allocate memory on the device. Allocates size bytes of linear memory on the device and returns in *devPtr a pointer to the allocated memory. The allocated memory is suitably aligned for any kind of variable. The memory is not cleared. cudaMalloc() returns cudaErrorMemoryAllocation in case of failure.
devPtr
- Pointer to allocated device memorysize
- Requested allocation size in bytescudaMallocPitch(jcuda.Pointer, long[], long, long)
,
cudaFree(jcuda.Pointer)
,
cudaMallocArray(jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc, long, long)
,
cudaFreeArray(jcuda.runtime.cudaArray)
,
cudaMalloc3D(jcuda.runtime.cudaPitchedPtr, jcuda.runtime.cudaExtent)
,
cudaMalloc3DArray(jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc, jcuda.runtime.cudaExtent)
,
cudaMallocHost(jcuda.Pointer, long)
,
cudaFreeHost(jcuda.Pointer)
,
cudaHostAlloc(jcuda.Pointer, long, int)
public static int cudaMallocHost(Pointer ptr, long size)
cudaError_t cudaMallocHost ( void** ptr, size_t size, unsigned int flags )
[C++ API] Allocates page-locked memory on the host Allocates size bytes of host memory that is page-locked and accessible to the device. The driver tracks the virtual memory ranges allocated with this function and automatically accelerates calls to functions such as cudaMemcpy(). Since the memory can be accessed directly by the device, it can be read or written with much higher bandwidth than pageable memory obtained with functions such as malloc(). Allocating excessive amounts of pinned memory may degrade system performance, since it reduces the amount of memory available to the system for paging. As a result, this function is best used sparingly to allocate staging areas for data exchange between host and device.
The flags parameter enables different options to be specified that affect the allocation, as follows.
cudaHostAllocDefault: This flag's value is defined to be 0.
cudaHostAllocPortable: The memory returned by this call will be considered as pinned memory by all CUDA contexts, not just the one that performed the allocation.
cudaHostAllocMapped: Maps the allocation into the CUDA address space. The device pointer to the memory may be obtained by calling cudaHostGetDevicePointer().
cudaHostAllocWriteCombined: Allocates the memory as write-combined (WC). WC memory can be transferred across the PCI Express bus more quickly on some system configurations, but cannot be read efficiently by most CPUs. WC memory is a good option for buffers that will be written by the CPU and read by the device via mapped pinned memory or host->device transfers.
All of these flags are orthogonal to one another: a developer may allocate memory that is portable, mapped and/or write-combined with no restrictions.
cudaSetDeviceFlags() must have been called with the cudaDeviceMapHost flag in order for the cudaHostAllocMapped flag to have any effect.
The cudaHostAllocMapped flag may be specified on CUDA contexts for devices that do not support mapped pinned memory. The failure is deferred to cudaHostGetDevicePointer() because the memory may be mapped into other CUDA contexts via the cudaHostAllocPortable flag.
Memory allocated by this function must be freed with cudaFreeHost().
Note that this function may also return error codes from previous, asynchronous launches.
ptr
- Pointer to allocated host memorysize
- Requested allocation size in bytesptr
- Device pointer to allocated memorysize
- Requested allocation size in bytesflags
- Requested properties of allocated memorycudaSetDeviceFlags(int)
,
cudaMallocHost(jcuda.Pointer, long)
,
cudaFreeHost(jcuda.Pointer)
,
cudaHostAlloc(jcuda.Pointer, long, int)
public static int cudaMallocPitch(Pointer devPtr, long[] pitch, long width, long height)
cudaError_t cudaMallocPitch ( void** devPtr, size_t* pitch, size_t width, size_t height )
Allocates pitched memory on the device. Allocates at least width (in bytes) * height bytes of linear memory on the device and returns in *devPtr a pointer to the allocated memory. The function may pad the allocation to ensure that corresponding pointers in any given row will continue to meet the alignment requirements for coalescing as the address is updated from row to row. The pitch returned in *pitch by cudaMallocPitch() is the width in bytes of the allocation. The intended usage of pitch is as a separate parameter of the allocation, used to compute addresses within the 2D array. Given the row and column of an array element of type T, the address is computed as:
T* pElement = (T*)((char*)BaseAddress + Row * pitch) + Column;
For allocations of 2D arrays, it is recommended that programmers consider performing pitch allocations using cudaMallocPitch(). Due to pitch alignment restrictions in the hardware, this is especially true if the application will be performing 2D memory copies between different regions of device memory (whether linear memory or CUDA arrays).
Note that this function may also return error codes from previous, asynchronous launches.
devPtr
- Pointer to allocated pitched device memorypitch
- Pitch for allocationwidth
- Requested pitched allocation width (in bytes)height
- Requested pitched allocation heightcudaMalloc(jcuda.Pointer, long)
,
cudaFree(jcuda.Pointer)
,
cudaMallocArray(jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc, long, long)
,
cudaFreeArray(jcuda.runtime.cudaArray)
,
cudaMallocHost(jcuda.Pointer, long)
,
cudaFreeHost(jcuda.Pointer)
,
cudaMalloc3D(jcuda.runtime.cudaPitchedPtr, jcuda.runtime.cudaExtent)
,
cudaMalloc3DArray(jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc, jcuda.runtime.cudaExtent)
,
cudaHostAlloc(jcuda.Pointer, long, int)
public static int cudaMallocArray(cudaArray array, cudaChannelFormatDesc desc, long width, long height)
cudaError_t cudaMallocArray ( cudaArray_t* array, const cudaChannelFormatDesc* desc, size_t width, size_t height = 0, unsigned int flags = 0 )
Allocate an array on the device. Allocates a CUDA array according to the cudaChannelFormatDesc structure desc and returns a handle to the new CUDA array in *array.
The cudaChannelFormatDesc is defined as:
struct cudaChannelFormatDesc { int x, y, z, w; enum cudaChannelFormatKind f; };where cudaChannelFormatKind is one of cudaChannelFormatKindSigned, cudaChannelFormatKindUnsigned, or cudaChannelFormatKindFloat.
The flags parameter enables different options to be specified that affect the allocation, as follows.
cudaArrayDefault: This flag's value is defined to be 0 and provides default array allocation
cudaArraySurfaceLoadStore: Allocates an array that can be read from or written to using a surface reference
cudaArrayTextureGather: This flag indicates that texture gather operations will be performed on the array.
width and height must meet certain size requirements. See cudaMalloc3DArray() for more details.
Note that this function may also return error codes from previous, asynchronous launches.
array
- Pointer to allocated array in device memorydesc
- Requested channel formatwidth
- Requested array allocation widthheight
- Requested array allocation heightflags
- Requested properties of allocated arraycudaMalloc(jcuda.Pointer, long)
,
cudaMallocPitch(jcuda.Pointer, long[], long, long)
,
cudaFree(jcuda.Pointer)
,
cudaFreeArray(jcuda.runtime.cudaArray)
,
cudaMallocHost(jcuda.Pointer, long)
,
cudaFreeHost(jcuda.Pointer)
,
cudaMalloc3D(jcuda.runtime.cudaPitchedPtr, jcuda.runtime.cudaExtent)
,
cudaMalloc3DArray(jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc, jcuda.runtime.cudaExtent)
,
cudaHostAlloc(jcuda.Pointer, long, int)
public static int cudaMallocArray(cudaArray array, cudaChannelFormatDesc desc, long width, long height, int flags)
cudaError_t cudaMallocArray ( cudaArray_t* array, const cudaChannelFormatDesc* desc, size_t width, size_t height = 0, unsigned int flags = 0 )
Allocate an array on the device. Allocates a CUDA array according to the cudaChannelFormatDesc structure desc and returns a handle to the new CUDA array in *array.
The cudaChannelFormatDesc is defined as:
struct cudaChannelFormatDesc { int x, y, z, w; enum cudaChannelFormatKind f; };where cudaChannelFormatKind is one of cudaChannelFormatKindSigned, cudaChannelFormatKindUnsigned, or cudaChannelFormatKindFloat.
The flags parameter enables different options to be specified that affect the allocation, as follows.
cudaArrayDefault: This flag's value is defined to be 0 and provides default array allocation
cudaArraySurfaceLoadStore: Allocates an array that can be read from or written to using a surface reference
cudaArrayTextureGather: This flag indicates that texture gather operations will be performed on the array.
width and height must meet certain size requirements. See cudaMalloc3DArray() for more details.
Note that this function may also return error codes from previous, asynchronous launches.
array
- Pointer to allocated array in device memorydesc
- Requested channel formatwidth
- Requested array allocation widthheight
- Requested array allocation heightflags
- Requested properties of allocated arraycudaMalloc(jcuda.Pointer, long)
,
cudaMallocPitch(jcuda.Pointer, long[], long, long)
,
cudaFree(jcuda.Pointer)
,
cudaFreeArray(jcuda.runtime.cudaArray)
,
cudaMallocHost(jcuda.Pointer, long)
,
cudaFreeHost(jcuda.Pointer)
,
cudaMalloc3D(jcuda.runtime.cudaPitchedPtr, jcuda.runtime.cudaExtent)
,
cudaMalloc3DArray(jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc, jcuda.runtime.cudaExtent)
,
cudaHostAlloc(jcuda.Pointer, long, int)
public static int cudaFree(Pointer devPtr)
cudaError_t cudaFree ( void* devPtr )
Frees memory on the device. Frees the memory space pointed to by devPtr, which must have been returned by a previous call to cudaMalloc() or cudaMallocPitch(). Otherwise, or if cudaFree(devPtr) has already been called before, an error is returned. If devPtr is 0, no operation is performed. cudaFree() returns cudaErrorInvalidDevicePointer in case of failure.
Note that this function may also return error codes from previous, asynchronous launches.
devPtr
- Device pointer to memory to freecudaMalloc(jcuda.Pointer, long)
,
cudaMallocPitch(jcuda.Pointer, long[], long, long)
,
cudaMallocArray(jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc, long, long)
,
cudaFreeArray(jcuda.runtime.cudaArray)
,
cudaMallocHost(jcuda.Pointer, long)
,
cudaFreeHost(jcuda.Pointer)
,
cudaMalloc3D(jcuda.runtime.cudaPitchedPtr, jcuda.runtime.cudaExtent)
,
cudaMalloc3DArray(jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc, jcuda.runtime.cudaExtent)
,
cudaHostAlloc(jcuda.Pointer, long, int)
public static int cudaFreeHost(Pointer ptr)
cudaError_t cudaFreeHost ( void* ptr )
Frees page-locked memory. Frees the memory space pointed to by hostPtr, which must have been returned by a previous call to cudaMallocHost() or cudaHostAlloc().
Note that this function may also return error codes from previous, asynchronous launches.
ptr
- Pointer to memory to freecudaMalloc(jcuda.Pointer, long)
,
cudaMallocPitch(jcuda.Pointer, long[], long, long)
,
cudaFree(jcuda.Pointer)
,
cudaMallocArray(jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc, long, long)
,
cudaFreeArray(jcuda.runtime.cudaArray)
,
cudaMallocHost(jcuda.Pointer, long)
,
cudaMalloc3D(jcuda.runtime.cudaPitchedPtr, jcuda.runtime.cudaExtent)
,
cudaMalloc3DArray(jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc, jcuda.runtime.cudaExtent)
,
cudaHostAlloc(jcuda.Pointer, long, int)
public static int cudaFreeArray(cudaArray array)
cudaError_t cudaFreeArray ( cudaArray_t array )
Frees an array on the device. Frees the CUDA array array, which must have been * returned by a previous call to cudaMallocArray(). If cudaFreeArray(array) has already been called before, cudaErrorInvalidValue is returned. If devPtr is 0, no operation is performed.
Note that this function may also return error codes from previous, asynchronous launches.
array
- Pointer to array to freecudaMalloc(jcuda.Pointer, long)
,
cudaMallocPitch(jcuda.Pointer, long[], long, long)
,
cudaFree(jcuda.Pointer)
,
cudaMallocArray(jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc, long, long)
,
cudaMallocHost(jcuda.Pointer, long)
,
cudaFreeHost(jcuda.Pointer)
,
cudaHostAlloc(jcuda.Pointer, long, int)
public static int cudaFreeMipmappedArray(cudaMipmappedArray mipmappedArray)
cudaError_t cudaFreeMipmappedArray ( cudaMipmappedArray_t mipmappedArray )
Frees a mipmapped array on the device. Frees the CUDA mipmapped array mipmappedArray, which must have been returned by a previous call to cudaMallocMipmappedArray(). If cudaFreeMipmappedArray(mipmappedArray) has already been called before, cudaErrorInvalidValue is returned.
Note that this function may also return error codes from previous, asynchronous launches.
mipmappedArray
- Pointer to mipmapped array to freecudaMalloc(jcuda.Pointer, long)
,
cudaMallocPitch(jcuda.Pointer, long[], long, long)
,
cudaFree(jcuda.Pointer)
,
cudaMallocArray(jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc, long, long)
,
cudaMallocHost(jcuda.Pointer, long)
,
cudaFreeHost(jcuda.Pointer)
,
cudaHostAlloc(jcuda.Pointer, long, int)
public static int cudaMemcpy(Pointer dst, Pointer src, long count, int cudaMemcpyKind_kind)
cudaError_t cudaMemcpy ( void* dst, const void* src, size_t count, cudaMemcpyKind kind )
Copies data between host and device. Copies count bytes from the memory area pointed to by src to the memory area pointed to by dst, where kind is one of cudaMemcpyHostToHost, cudaMemcpyHostToDevice, cudaMemcpyDeviceToHost, or cudaMemcpyDeviceToDevice, and specifies the direction of the copy. The memory areas may not overlap. Calling cudaMemcpy() with dst and src pointers that do not match the direction of the copy results in an undefined behavior.
Note that this function may also return error codes from previous, asynchronous launches.
This function exhibits synchronous behavior for most use cases.
dst
- Destination memory addresssrc
- Source memory addresscount
- Size in bytes to copykind
- Type of transfercudaMemcpy2D(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpyToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int)
,
cudaMemcpy2DToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpyFromArray(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpy2DFromArray(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpy2DArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyToSymbol(java.lang.String, jcuda.Pointer, long, long, int)
,
cudaMemcpyFromSymbol(jcuda.Pointer, java.lang.String, long, long, int)
,
cudaMemcpyAsync(jcuda.Pointer, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DAsync(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromArrayAsync(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DFromArrayAsync(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToSymbolAsync(java.lang.String, jcuda.Pointer, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromSymbolAsync(jcuda.Pointer, java.lang.String, long, long, int, jcuda.runtime.cudaStream_t)
public static int cudaMemcpyPeer(Pointer dst, int dstDevice, Pointer src, int srcDevice, long count)
cudaError_t cudaMemcpyPeer ( void* dst, int dstDevice, const void* src, int srcDevice, size_t count )
Copies memory between two devices. Copies memory from one device to memory on another device. dst is the base device pointer of the destination memory and dstDevice is the destination device. src is the base device pointer of the source memory and srcDevice is the source device. count specifies the number of bytes to copy.
Note that this function is asynchronous with respect to the host, but serialized with respect all pending and future asynchronous work in to the current device, srcDevice, and dstDevice (use cudaMemcpyPeerAsync to avoid this synchronization).
Note that this function may also return error codes from previous, asynchronous launches.
This function exhibits synchronous behavior for most use cases.
dst
- Destination device pointerdstDevice
- Destination devicesrc
- Source device pointersrcDevice
- Source devicecount
- Size of memory copy in bytescudaMemcpy(jcuda.Pointer, jcuda.Pointer, long, int)
,
cudaMemcpyAsync(jcuda.Pointer, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyPeerAsync(jcuda.Pointer, int, jcuda.Pointer, int, long, jcuda.runtime.cudaStream_t)
,
cudaMemcpy3DPeerAsync(jcuda.runtime.cudaMemcpy3DPeerParms, jcuda.runtime.cudaStream_t)
public static int cudaMemcpyToArray(cudaArray dst, long wOffset, long hOffset, Pointer src, long count, int cudaMemcpyKind_kind)
cudaError_t cudaMemcpyToArray ( cudaArray_t dst, size_t wOffset, size_t hOffset, const void* src, size_t count, cudaMemcpyKind kind )
Copies data between host and device. Copies count bytes from the memory area pointed to by src to the CUDA array dst starting at the upper left corner (wOffset, hOffset), where kind is one of cudaMemcpyHostToHost, cudaMemcpyHostToDevice, cudaMemcpyDeviceToHost, or cudaMemcpyDeviceToDevice, and specifies the direction of the copy.
Note that this function may also return error codes from previous, asynchronous launches.
This function exhibits synchronous behavior for most use cases.
dst
- Destination memory addresswOffset
- Destination starting X offsethOffset
- Destination starting Y offsetsrc
- Source memory addresscount
- Size in bytes to copykind
- Type of transfercudaMemcpy(jcuda.Pointer, jcuda.Pointer, long, int)
,
cudaMemcpy2D(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpy2DToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpyFromArray(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpy2DFromArray(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpy2DArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyToSymbol(java.lang.String, jcuda.Pointer, long, long, int)
,
cudaMemcpyFromSymbol(jcuda.Pointer, java.lang.String, long, long, int)
,
cudaMemcpyAsync(jcuda.Pointer, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DAsync(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromArrayAsync(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DFromArrayAsync(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToSymbolAsync(java.lang.String, jcuda.Pointer, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromSymbolAsync(jcuda.Pointer, java.lang.String, long, long, int, jcuda.runtime.cudaStream_t)
public static int cudaMemcpyFromArray(Pointer dst, cudaArray src, long wOffset, long hOffset, long count, int cudaMemcpyKind_kind)
cudaError_t cudaMemcpyFromArray ( void* dst, cudaArray_const_t src, size_t wOffset, size_t hOffset, size_t count, cudaMemcpyKind kind )
Copies data between host and device. Copies count bytes from the CUDA array src starting at the upper left corner (wOffset, hOffset) to the memory area pointed to by dst, where kind is one of cudaMemcpyHostToHost, cudaMemcpyHostToDevice, cudaMemcpyDeviceToHost, or cudaMemcpyDeviceToDevice, and specifies the direction of the copy.
Note that this function may also return error codes from previous, asynchronous launches.
This function exhibits synchronous behavior for most use cases.
dst
- Destination memory addresssrc
- Source memory addresswOffset
- Source starting X offsethOffset
- Source starting Y offsetcount
- Size in bytes to copykind
- Type of transfercudaMemcpy(jcuda.Pointer, jcuda.Pointer, long, int)
,
cudaMemcpy2D(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpyToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int)
,
cudaMemcpy2DToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpy2DFromArray(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpy2DArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyToSymbol(java.lang.String, jcuda.Pointer, long, long, int)
,
cudaMemcpyFromSymbol(jcuda.Pointer, java.lang.String, long, long, int)
,
cudaMemcpyAsync(jcuda.Pointer, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DAsync(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromArrayAsync(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DFromArrayAsync(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToSymbolAsync(java.lang.String, jcuda.Pointer, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromSymbolAsync(jcuda.Pointer, java.lang.String, long, long, int, jcuda.runtime.cudaStream_t)
public static int cudaMemcpyArrayToArray(cudaArray dst, long wOffsetDst, long hOffsetDst, cudaArray src, long wOffsetSrc, long hOffsetSrc, long count, int cudaMemcpyKind_kind)
cudaError_t cudaMemcpyArrayToArray ( cudaArray_t dst, size_t wOffsetDst, size_t hOffsetDst, cudaArray_const_t src, size_t wOffsetSrc, size_t hOffsetSrc, size_t count, cudaMemcpyKind kind = cudaMemcpyDeviceToDevice )
Copies data between host and device. Copies count bytes from the CUDA array src starting at the upper left corner (wOffsetSrc, hOffsetSrc) to the CUDA array dst starting at the upper left corner (wOffsetDst, hOffsetDst) where kind is one of cudaMemcpyHostToHost, cudaMemcpyHostToDevice, cudaMemcpyDeviceToHost, or cudaMemcpyDeviceToDevice, and specifies the direction of the copy.
Note that this function may also return error codes from previous, asynchronous launches.
dst
- Destination memory addresswOffsetDst
- Destination starting X offsethOffsetDst
- Destination starting Y offsetsrc
- Source memory addresswOffsetSrc
- Source starting X offsethOffsetSrc
- Source starting Y offsetcount
- Size in bytes to copykind
- Type of transfercudaMemcpy(jcuda.Pointer, jcuda.Pointer, long, int)
,
cudaMemcpy2D(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpyToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int)
,
cudaMemcpy2DToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpyFromArray(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpy2DFromArray(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpy2DArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyToSymbol(java.lang.String, jcuda.Pointer, long, long, int)
,
cudaMemcpyFromSymbol(jcuda.Pointer, java.lang.String, long, long, int)
,
cudaMemcpyAsync(jcuda.Pointer, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DAsync(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromArrayAsync(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DFromArrayAsync(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToSymbolAsync(java.lang.String, jcuda.Pointer, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromSymbolAsync(jcuda.Pointer, java.lang.String, long, long, int, jcuda.runtime.cudaStream_t)
public static int cudaMemcpy2D(Pointer dst, long dpitch, Pointer src, long spitch, long width, long height, int cudaMemcpyKind_kind)
cudaError_t cudaMemcpy2D ( void* dst, size_t dpitch, const void* src, size_t spitch, size_t width, size_t height, cudaMemcpyKind kind )
Copies data between host and device. Copies a matrix (height rows of width bytes each) from the memory area pointed to by src to the memory area pointed to by dst, where kind is one of cudaMemcpyHostToHost, cudaMemcpyHostToDevice, cudaMemcpyDeviceToHost, or cudaMemcpyDeviceToDevice, and specifies the direction of the copy. dpitch and spitch are the widths in memory in bytes of the 2D arrays pointed to by dst and src, including any padding added to the end of each row. The memory areas may not overlap. width must not exceed either dpitch or spitch. Calling cudaMemcpy2D() with dst and src pointers that do not match the direction of the copy results in an undefined behavior. cudaMemcpy2D() returns an error if dpitch or spitch exceeds the maximum allowed.
Note that this function may also return error codes from previous, asynchronous launches.
dst
- Destination memory addressdpitch
- Pitch of destination memorysrc
- Source memory addressspitch
- Pitch of source memorywidth
- Width of matrix transfer (columns in bytes)height
- Height of matrix transfer (rows)kind
- Type of transfercudaMemcpy(jcuda.Pointer, jcuda.Pointer, long, int)
,
cudaMemcpyToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int)
,
cudaMemcpy2DToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpyFromArray(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpy2DFromArray(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpy2DArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyToSymbol(java.lang.String, jcuda.Pointer, long, long, int)
,
cudaMemcpyFromSymbol(jcuda.Pointer, java.lang.String, long, long, int)
,
cudaMemcpyAsync(jcuda.Pointer, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DAsync(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromArrayAsync(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DFromArrayAsync(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToSymbolAsync(java.lang.String, jcuda.Pointer, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromSymbolAsync(jcuda.Pointer, java.lang.String, long, long, int, jcuda.runtime.cudaStream_t)
public static int cudaMemcpy2DToArray(cudaArray dst, long wOffset, long hOffset, Pointer src, long spitch, long width, long height, int cudaMemcpyKind_kind)
cudaError_t cudaMemcpy2DToArray ( cudaArray_t dst, size_t wOffset, size_t hOffset, const void* src, size_t spitch, size_t width, size_t height, cudaMemcpyKind kind )
Copies data between host and device. Copies a matrix (height rows of width bytes each) from the memory area pointed to by src to the CUDA array dst starting at the upper left corner (wOffset, hOffset) where kind is one of cudaMemcpyHostToHost, cudaMemcpyHostToDevice, cudaMemcpyDeviceToHost, or cudaMemcpyDeviceToDevice, and specifies the direction of the copy. spitch is the width in memory in bytes of the 2D array pointed to by src, including any padding added to the end of each row. wOffset + width must not exceed the width of the CUDA array dst. width must not exceed spitch. cudaMemcpy2DToArray() returns an error if spitch exceeds the maximum allowed.
Note that this function may also return error codes from previous, asynchronous launches.
This function exhibits synchronous behavior for most use cases.
dst
- Destination memory addresswOffset
- Destination starting X offsethOffset
- Destination starting Y offsetsrc
- Source memory addressspitch
- Pitch of source memorywidth
- Width of matrix transfer (columns in bytes)height
- Height of matrix transfer (rows)kind
- Type of transfercudaMemcpy(jcuda.Pointer, jcuda.Pointer, long, int)
,
cudaMemcpy2D(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpyToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int)
,
cudaMemcpyFromArray(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpy2DFromArray(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpy2DArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyToSymbol(java.lang.String, jcuda.Pointer, long, long, int)
,
cudaMemcpyFromSymbol(jcuda.Pointer, java.lang.String, long, long, int)
,
cudaMemcpyAsync(jcuda.Pointer, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DAsync(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromArrayAsync(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DFromArrayAsync(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToSymbolAsync(java.lang.String, jcuda.Pointer, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromSymbolAsync(jcuda.Pointer, java.lang.String, long, long, int, jcuda.runtime.cudaStream_t)
public static int cudaMemcpy2DFromArray(Pointer dst, long dpitch, cudaArray src, long wOffset, long hOffset, long width, long height, int cudaMemcpyKind_kind)
cudaError_t cudaMemcpy2DFromArray ( void* dst, size_t dpitch, cudaArray_const_t src, size_t wOffset, size_t hOffset, size_t width, size_t height, cudaMemcpyKind kind )
Copies data between host and device. Copies a matrix (height rows of width bytes each) from the CUDA array srcArray starting at the upper left corner (wOffset, hOffset) to the memory area pointed to by dst, where kind is one of cudaMemcpyHostToHost, cudaMemcpyHostToDevice, cudaMemcpyDeviceToHost, or cudaMemcpyDeviceToDevice, and specifies the direction of the copy. dpitch is the width in memory in bytes of the 2D array pointed to by dst, including any padding added to the end of each row. wOffset + width must not exceed the width of the CUDA array src. width must not exceed dpitch. cudaMemcpy2DFromArray() returns an error if dpitch exceeds the maximum allowed.
Note that this function may also return error codes from previous, asynchronous launches.
This function exhibits synchronous behavior for most use cases.
dst
- Destination memory addressdpitch
- Pitch of destination memorysrc
- Source memory addresswOffset
- Source starting X offsethOffset
- Source starting Y offsetwidth
- Width of matrix transfer (columns in bytes)height
- Height of matrix transfer (rows)kind
- Type of transfercudaMemcpy(jcuda.Pointer, jcuda.Pointer, long, int)
,
cudaMemcpy2D(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpyToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int)
,
cudaMemcpy2DToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpyFromArray(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpyArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpy2DArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyToSymbol(java.lang.String, jcuda.Pointer, long, long, int)
,
cudaMemcpyFromSymbol(jcuda.Pointer, java.lang.String, long, long, int)
,
cudaMemcpyAsync(jcuda.Pointer, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DAsync(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromArrayAsync(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DFromArrayAsync(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToSymbolAsync(java.lang.String, jcuda.Pointer, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromSymbolAsync(jcuda.Pointer, java.lang.String, long, long, int, jcuda.runtime.cudaStream_t)
public static int cudaMemcpy2DArrayToArray(cudaArray dst, long wOffsetDst, long hOffsetDst, cudaArray src, long wOffsetSrc, long hOffsetSrc, long width, long height, int cudaMemcpyKind_kind)
cudaError_t cudaMemcpy2DArrayToArray ( cudaArray_t dst, size_t wOffsetDst, size_t hOffsetDst, cudaArray_const_t src, size_t wOffsetSrc, size_t hOffsetSrc, size_t width, size_t height, cudaMemcpyKind kind = cudaMemcpyDeviceToDevice )
Copies data between host and device. Copies a matrix (height rows of width bytes each) from the CUDA array srcArray starting at the upper left corner (wOffsetSrc, hOffsetSrc) to the CUDA array dst starting at the upper left corner (wOffsetDst, hOffsetDst), where kind is one of cudaMemcpyHostToHost, cudaMemcpyHostToDevice, cudaMemcpyDeviceToHost, or cudaMemcpyDeviceToDevice, and specifies the direction of the copy. wOffsetDst + width must not exceed the width of the CUDA array dst. wOffsetSrc + width must not exceed the width of the CUDA array src.
Note that this function may also return error codes from previous, asynchronous launches.
This function exhibits synchronous behavior for most use cases.
dst
- Destination memory addresswOffsetDst
- Destination starting X offsethOffsetDst
- Destination starting Y offsetsrc
- Source memory addresswOffsetSrc
- Source starting X offsethOffsetSrc
- Source starting Y offsetwidth
- Width of matrix transfer (columns in bytes)height
- Height of matrix transfer (rows)kind
- Type of transfercudaMemcpy(jcuda.Pointer, jcuda.Pointer, long, int)
,
cudaMemcpy2D(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpyToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int)
,
cudaMemcpy2DToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpyFromArray(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpy2DFromArray(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpyToSymbol(java.lang.String, jcuda.Pointer, long, long, int)
,
cudaMemcpyFromSymbol(jcuda.Pointer, java.lang.String, long, long, int)
,
cudaMemcpyAsync(jcuda.Pointer, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DAsync(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromArrayAsync(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DFromArrayAsync(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToSymbolAsync(java.lang.String, jcuda.Pointer, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromSymbolAsync(jcuda.Pointer, java.lang.String, long, long, int, jcuda.runtime.cudaStream_t)
public static int cudaMemcpyToSymbol(String symbol, Pointer src, long count, long offset, int cudaMemcpyKind_kind)
template < class T > cudaError_t cudaMemcpyToSymbol ( const T& symbol, const void* src, size_t count, size_t offset = 0, cudaMemcpyKind kind = cudaMemcpyHostToDevice ) [inline]
[C++ API] Copies data to the given symbol on the device Copies count bytes from the memory area pointed to by src to the memory area offset bytes from the start of symbol symbol. The memory areas may not overlap. symbol is a variable that resides in global or constant memory space. kind can be either cudaMemcpyHostToDevice or cudaMemcpyDeviceToDevice.
Note that this function may also return error codes from previous, asynchronous launches.
This function exhibits synchronous behavior for most use cases.
Use of a string naming a variable as the symbol paramater was deprecated in CUDA 4.1 and removed in CUDA 5.0.
symbol
- Device symbol addresssrc
- Source memory addresscount
- Size in bytes to copyoffset
- Offset from start of symbol in byteskind
- Type of transfersymbol
- Device symbol referencesrc
- Source memory addresscount
- Size in bytes to copyoffset
- Offset from start of symbol in byteskind
- Type of transfercudaMemcpy(jcuda.Pointer, jcuda.Pointer, long, int)
,
cudaMemcpy2D(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpyToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int)
,
cudaMemcpy2DToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpyFromArray(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpy2DFromArray(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpy2DArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyFromSymbol(jcuda.Pointer, java.lang.String, long, long, int)
,
cudaMemcpyAsync(jcuda.Pointer, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DAsync(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromArrayAsync(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DFromArrayAsync(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToSymbolAsync(java.lang.String, jcuda.Pointer, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromSymbolAsync(jcuda.Pointer, java.lang.String, long, long, int, jcuda.runtime.cudaStream_t)
public static int cudaMemcpyFromSymbol(Pointer dst, String symbol, long count, long offset, int cudaMemcpyKind_kind)
template < class T > cudaError_t cudaMemcpyFromSymbol ( void* dst, const T& symbol, size_t count, size_t offset = 0, cudaMemcpyKind kind = cudaMemcpyDeviceToHost ) [inline]
[C++ API] Copies data from the given symbol on the device Copies count bytes from the memory area offset bytes from the start of symbol symbol to the memory area pointed to by dst. The memory areas may not overlap. symbol is a variable that resides in global or constant memory space. kind can be either cudaMemcpyDeviceToHost or cudaMemcpyDeviceToDevice.
Note that this function may also return error codes from previous, asynchronous launches.
This function exhibits synchronous behavior for most use cases.
Use of a string naming a variable as the symbol paramater was deprecated in CUDA 4.1 and removed in CUDA 5.0.
dst
- Destination memory addresssymbol
- Device symbol addresscount
- Size in bytes to copyoffset
- Offset from start of symbol in byteskind
- Type of transferdst
- Destination memory addresssymbol
- Device symbol referencecount
- Size in bytes to copyoffset
- Offset from start of symbol in byteskind
- Type of transfercudaMemcpy(jcuda.Pointer, jcuda.Pointer, long, int)
,
cudaMemcpy2D(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpyToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int)
,
cudaMemcpy2DToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpyFromArray(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpy2DFromArray(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpy2DArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyToSymbol(java.lang.String, jcuda.Pointer, long, long, int)
,
cudaMemcpyAsync(jcuda.Pointer, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DAsync(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromArrayAsync(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DFromArrayAsync(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToSymbolAsync(java.lang.String, jcuda.Pointer, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromSymbolAsync(jcuda.Pointer, java.lang.String, long, long, int, jcuda.runtime.cudaStream_t)
public static int cudaMemcpyAsync(Pointer dst, Pointer src, long count, int cudaMemcpyKind_kind, cudaStream_t stream)
cudaError_t cudaMemcpyAsync ( void* dst, const void* src, size_t count, cudaMemcpyKind kind, cudaStream_t stream = 0 )
Copies data between host and device. Copies count bytes from the memory area pointed to by src to the memory area pointed to by dst, where kind is one of cudaMemcpyHostToHost, cudaMemcpyHostToDevice, cudaMemcpyDeviceToHost, or cudaMemcpyDeviceToDevice, and specifies the direction of the copy. The memory areas may not overlap. Calling cudaMemcpyAsync() with dst and src pointers that do not match the direction of the copy results in an undefined behavior.
cudaMemcpyAsync() is asynchronous with respect to the host, so the call may return before the copy is complete. The copy can optionally be associated to a stream by passing a non-zero stream argument. If kind is cudaMemcpyHostToDevice or cudaMemcpyDeviceToHost and the stream is non-zero, the copy may overlap with operations in other streams.
Note that this function may also return error codes from previous, asynchronous launches.
This function exhibits asynchronous behavior for most use cases.
dst
- Destination memory addresssrc
- Source memory addresscount
- Size in bytes to copykind
- Type of transferstream
- Stream identifiercudaMemcpy(jcuda.Pointer, jcuda.Pointer, long, int)
,
cudaMemcpy2D(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpyToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int)
,
cudaMemcpy2DToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpyFromArray(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpy2DFromArray(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpy2DArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyToSymbol(java.lang.String, jcuda.Pointer, long, long, int)
,
cudaMemcpyFromSymbol(jcuda.Pointer, java.lang.String, long, long, int)
,
cudaMemcpy2DAsync(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromArrayAsync(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DFromArrayAsync(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToSymbolAsync(java.lang.String, jcuda.Pointer, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromSymbolAsync(jcuda.Pointer, java.lang.String, long, long, int, jcuda.runtime.cudaStream_t)
public static int cudaMemcpyPeerAsync(Pointer dst, int dstDevice, Pointer src, int srcDevice, long count, cudaStream_t stream)
cudaError_t cudaMemcpyPeerAsync ( void* dst, int dstDevice, const void* src, int srcDevice, size_t count, cudaStream_t stream = 0 )
Copies memory between two devices asynchronously. Copies memory from one device to memory on another device. dst is the base device pointer of the destination memory and dstDevice is the destination device. src is the base device pointer of the source memory and srcDevice is the source device. count specifies the number of bytes to copy.
Note that this function is asynchronous with respect to the host and all work in other streams and other devices.
Note that this function may also return error codes from previous, asynchronous launches.
This function exhibits asynchronous behavior for most use cases.
dst
- Destination device pointerdstDevice
- Destination devicesrc
- Source device pointersrcDevice
- Source devicecount
- Size of memory copy in bytesstream
- Stream identifiercudaMemcpy(jcuda.Pointer, jcuda.Pointer, long, int)
,
cudaMemcpyPeer(jcuda.Pointer, int, jcuda.Pointer, int, long)
,
cudaMemcpyAsync(jcuda.Pointer, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy3DPeerAsync(jcuda.runtime.cudaMemcpy3DPeerParms, jcuda.runtime.cudaStream_t)
public static int cudaMemcpyToArrayAsync(cudaArray dst, long wOffset, long hOffset, Pointer src, long count, int cudaMemcpyKind_kind, cudaStream_t stream)
cudaError_t cudaMemcpyToArrayAsync ( cudaArray_t dst, size_t wOffset, size_t hOffset, const void* src, size_t count, cudaMemcpyKind kind, cudaStream_t stream = 0 )
Copies data between host and device. Copies count bytes from the memory area pointed to by src to the CUDA array dst starting at the upper left corner (wOffset, hOffset), where kind is one of cudaMemcpyHostToHost, cudaMemcpyHostToDevice, cudaMemcpyDeviceToHost, or cudaMemcpyDeviceToDevice, and specifies the direction of the copy.
cudaMemcpyToArrayAsync() is asynchronous with respect to the host, so the call may return before the copy is complete. The copy can optionally be associated to a stream by passing a non-zero stream argument. If kind is cudaMemcpyHostToDevice or cudaMemcpyDeviceToHost and stream is non-zero, the copy may overlap with operations in other streams.
Note that this function may also return error codes from previous, asynchronous launches.
This function exhibits asynchronous behavior for most use cases.
dst
- Destination memory addresswOffset
- Destination starting X offsethOffset
- Destination starting Y offsetsrc
- Source memory addresscount
- Size in bytes to copykind
- Type of transferstream
- Stream identifiercudaMemcpy(jcuda.Pointer, jcuda.Pointer, long, int)
,
cudaMemcpy2D(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpyToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int)
,
cudaMemcpy2DToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpyFromArray(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpy2DFromArray(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpy2DArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyToSymbol(java.lang.String, jcuda.Pointer, long, long, int)
,
cudaMemcpyFromSymbol(jcuda.Pointer, java.lang.String, long, long, int)
,
cudaMemcpyAsync(jcuda.Pointer, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DAsync(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromArrayAsync(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DFromArrayAsync(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToSymbolAsync(java.lang.String, jcuda.Pointer, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromSymbolAsync(jcuda.Pointer, java.lang.String, long, long, int, jcuda.runtime.cudaStream_t)
public static int cudaMemcpyFromArrayAsync(Pointer dst, cudaArray src, long wOffset, long hOffset, long count, int cudaMemcpyKind_kind, cudaStream_t stream)
cudaError_t cudaMemcpyFromArrayAsync ( void* dst, cudaArray_const_t src, size_t wOffset, size_t hOffset, size_t count, cudaMemcpyKind kind, cudaStream_t stream = 0 )
Copies data between host and device. Copies count bytes from the CUDA array src starting at the upper left corner (wOffset, hOffset) to the memory area pointed to by dst, where kind is one of cudaMemcpyHostToHost, cudaMemcpyHostToDevice, cudaMemcpyDeviceToHost, or cudaMemcpyDeviceToDevice, and specifies the direction of the copy.
cudaMemcpyFromArrayAsync() is asynchronous with respect to the host, so the call may return before the copy is complete. The copy can optionally be associated to a stream by passing a non-zero stream argument. If kind is cudaMemcpyHostToDevice or cudaMemcpyDeviceToHost and stream is non-zero, the copy may overlap with operations in other streams.
Note that this function may also return error codes from previous, asynchronous launches.
This function exhibits asynchronous behavior for most use cases.
dst
- Destination memory addresssrc
- Source memory addresswOffset
- Source starting X offsethOffset
- Source starting Y offsetcount
- Size in bytes to copykind
- Type of transferstream
- Stream identifiercudaMemcpy(jcuda.Pointer, jcuda.Pointer, long, int)
,
cudaMemcpy2D(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpyToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int)
,
cudaMemcpy2DToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpyFromArray(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpy2DFromArray(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpy2DArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyToSymbol(java.lang.String, jcuda.Pointer, long, long, int)
,
cudaMemcpyFromSymbol(jcuda.Pointer, java.lang.String, long, long, int)
,
cudaMemcpyAsync(jcuda.Pointer, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DAsync(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DFromArrayAsync(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToSymbolAsync(java.lang.String, jcuda.Pointer, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromSymbolAsync(jcuda.Pointer, java.lang.String, long, long, int, jcuda.runtime.cudaStream_t)
public static int cudaMemcpy2DAsync(Pointer dst, long dpitch, Pointer src, long spitch, long width, long height, int cudaMemcpyKind_kind, cudaStream_t stream)
cudaError_t cudaMemcpy2DAsync ( void* dst, size_t dpitch, const void* src, size_t spitch, size_t width, size_t height, cudaMemcpyKind kind, cudaStream_t stream = 0 )
Copies data between host and device. Copies a matrix (height rows of width bytes each) from the memory area pointed to by src to the memory area pointed to by dst, where kind is one of cudaMemcpyHostToHost, cudaMemcpyHostToDevice, cudaMemcpyDeviceToHost, or cudaMemcpyDeviceToDevice, and specifies the direction of the copy. dpitch and spitch are the widths in memory in bytes of the 2D arrays pointed to by dst and src, including any padding added to the end of each row. The memory areas may not overlap. width must not exceed either dpitch or spitch. Calling cudaMemcpy2DAsync() with dst and src pointers that do not match the direction of the copy results in an undefined behavior. cudaMemcpy2DAsync() returns an error if dpitch or spitch is greater than the maximum allowed.
cudaMemcpy2DAsync() is asynchronous with respect to the host, so the call may return before the copy is complete. The copy can optionally be associated to a stream by passing a non-zero stream argument. If kind is cudaMemcpyHostToDevice or cudaMemcpyDeviceToHost and stream is non-zero, the copy may overlap with operations in other streams.
Note that this function may also return error codes from previous, asynchronous launches.
This function exhibits asynchronous behavior for most use cases.
dst
- Destination memory addressdpitch
- Pitch of destination memorysrc
- Source memory addressspitch
- Pitch of source memorywidth
- Width of matrix transfer (columns in bytes)height
- Height of matrix transfer (rows)kind
- Type of transferstream
- Stream identifiercudaMemcpy(jcuda.Pointer, jcuda.Pointer, long, int)
,
cudaMemcpy2D(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpyToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int)
,
cudaMemcpy2DToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpyFromArray(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpy2DFromArray(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpy2DArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyToSymbol(java.lang.String, jcuda.Pointer, long, long, int)
,
cudaMemcpyFromSymbol(jcuda.Pointer, java.lang.String, long, long, int)
,
cudaMemcpyAsync(jcuda.Pointer, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromArrayAsync(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DFromArrayAsync(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToSymbolAsync(java.lang.String, jcuda.Pointer, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromSymbolAsync(jcuda.Pointer, java.lang.String, long, long, int, jcuda.runtime.cudaStream_t)
public static int cudaMemcpy2DToArrayAsync(cudaArray dst, long wOffset, long hOffset, Pointer src, long spitch, long width, long height, int cudaMemcpyKind_kind, cudaStream_t stream)
cudaError_t cudaMemcpy2DToArrayAsync ( cudaArray_t dst, size_t wOffset, size_t hOffset, const void* src, size_t spitch, size_t width, size_t height, cudaMemcpyKind kind, cudaStream_t stream = 0 )
Copies data between host and device. Copies a matrix (height rows of width bytes each) from the memory area pointed to by src to the CUDA array dst starting at the upper left corner (wOffset, hOffset) where kind is one of cudaMemcpyHostToHost, cudaMemcpyHostToDevice, cudaMemcpyDeviceToHost, or cudaMemcpyDeviceToDevice, and specifies the direction of the copy. spitch is the width in memory in bytes of the 2D array pointed to by src, including any padding added to the end of each row. wOffset + width must not exceed the width of the CUDA array dst. width must not exceed spitch. cudaMemcpy2DToArrayAsync() returns an error if spitch exceeds the maximum allowed.
cudaMemcpy2DToArrayAsync() is asynchronous with respect to the host, so the call may return before the copy is complete. The copy can optionally be associated to a stream by passing a non-zero stream argument. If kind is cudaMemcpyHostToDevice or cudaMemcpyDeviceToHost and stream is non-zero, the copy may overlap with operations in other streams.
Note that this function may also return error codes from previous, asynchronous launches.
This function exhibits asynchronous behavior for most use cases.
dst
- Destination memory addresswOffset
- Destination starting X offsethOffset
- Destination starting Y offsetsrc
- Source memory addressspitch
- Pitch of source memorywidth
- Width of matrix transfer (columns in bytes)height
- Height of matrix transfer (rows)kind
- Type of transferstream
- Stream identifiercudaMemcpy(jcuda.Pointer, jcuda.Pointer, long, int)
,
cudaMemcpy2D(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpyToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int)
,
cudaMemcpy2DToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpyFromArray(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpy2DFromArray(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpy2DArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyToSymbol(java.lang.String, jcuda.Pointer, long, long, int)
,
cudaMemcpyFromSymbol(jcuda.Pointer, java.lang.String, long, long, int)
,
cudaMemcpyAsync(jcuda.Pointer, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DAsync(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromArrayAsync(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DFromArrayAsync(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToSymbolAsync(java.lang.String, jcuda.Pointer, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromSymbolAsync(jcuda.Pointer, java.lang.String, long, long, int, jcuda.runtime.cudaStream_t)
public static int cudaMemcpy2DFromArrayAsync(Pointer dst, long dpitch, cudaArray src, long wOffset, long hOffset, long width, long height, int cudaMemcpyKind_kind, cudaStream_t stream)
cudaError_t cudaMemcpy2DFromArrayAsync ( void* dst, size_t dpitch, cudaArray_const_t src, size_t wOffset, size_t hOffset, size_t width, size_t height, cudaMemcpyKind kind, cudaStream_t stream = 0 )
Copies data between host and device. Copies a matrix (height rows of width bytes each) from the CUDA array srcArray starting at the upper left corner (wOffset, hOffset) to the memory area pointed to by dst, where kind is one of cudaMemcpyHostToHost, cudaMemcpyHostToDevice, cudaMemcpyDeviceToHost, or cudaMemcpyDeviceToDevice, and specifies the direction of the copy. dpitch is the width in memory in bytes of the 2D array pointed to by dst, including any padding added to the end of each row. wOffset + width must not exceed the width of the CUDA array src. width must not exceed dpitch. cudaMemcpy2DFromArrayAsync() returns an error if dpitch exceeds the maximum allowed.
cudaMemcpy2DFromArrayAsync() is asynchronous with respect to the host, so the call may return before the copy is complete. The copy can optionally be associated to a stream by passing a non-zero stream argument. If kind is cudaMemcpyHostToDevice or cudaMemcpyDeviceToHost and stream is non-zero, the copy may overlap with operations in other streams.
Note that this function may also return error codes from previous, asynchronous launches.
This function exhibits asynchronous behavior for most use cases.
dst
- Destination memory addressdpitch
- Pitch of destination memorysrc
- Source memory addresswOffset
- Source starting X offsethOffset
- Source starting Y offsetwidth
- Width of matrix transfer (columns in bytes)height
- Height of matrix transfer (rows)kind
- Type of transferstream
- Stream identifiercudaMemcpy(jcuda.Pointer, jcuda.Pointer, long, int)
,
cudaMemcpy2D(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpyToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int)
,
cudaMemcpy2DToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpyFromArray(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpy2DFromArray(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpy2DArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyToSymbol(java.lang.String, jcuda.Pointer, long, long, int)
,
cudaMemcpyFromSymbol(jcuda.Pointer, java.lang.String, long, long, int)
,
cudaMemcpyAsync(jcuda.Pointer, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DAsync(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromArrayAsync(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToSymbolAsync(java.lang.String, jcuda.Pointer, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromSymbolAsync(jcuda.Pointer, java.lang.String, long, long, int, jcuda.runtime.cudaStream_t)
public static int cudaMemcpyToSymbolAsync(String symbol, Pointer src, long count, long offset, int cudaMemcpyKind_kind, cudaStream_t stream)
template < class T > cudaError_t cudaMemcpyToSymbolAsync ( const T& symbol, const void* src, size_t count, size_t offset = 0, cudaMemcpyKind kind = cudaMemcpyHostToDevice, cudaStream_t stream = 0 ) [inline]
[C++ API] Copies data to the given symbol on the device Copies count bytes from the memory area pointed to by src to the memory area offset bytes from the start of symbol symbol. The memory areas may not overlap. symbol is a variable that resides in global or constant memory space. kind can be either cudaMemcpyHostToDevice or cudaMemcpyDeviceToDevice.
cudaMemcpyToSymbolAsync() is asynchronous with respect to the host, so the call may return before the copy is complete. The copy can optionally be associated to a stream by passing a non-zero stream argument. If kind is cudaMemcpyHostToDevice and stream is non-zero, the copy may overlap with operations in other streams.
Note that this function may also return error codes from previous, asynchronous launches.
This function exhibits asynchronous behavior for most use cases.
Use of a string naming a variable as the symbol paramater was deprecated in CUDA 4.1 and removed in CUDA 5.0.
symbol
- Device symbol addresssrc
- Source memory addresscount
- Size in bytes to copyoffset
- Offset from start of symbol in byteskind
- Type of transferstream
- Stream identifiersymbol
- Device symbol referencesrc
- Source memory addresscount
- Size in bytes to copyoffset
- Offset from start of symbol in byteskind
- Type of transferstream
- Stream identifiercudaMemcpy(jcuda.Pointer, jcuda.Pointer, long, int)
,
cudaMemcpy2D(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpyToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int)
,
cudaMemcpy2DToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpyFromArray(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpy2DFromArray(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpy2DArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyToSymbol(java.lang.String, jcuda.Pointer, long, long, int)
,
cudaMemcpyFromSymbol(jcuda.Pointer, java.lang.String, long, long, int)
,
cudaMemcpyAsync(jcuda.Pointer, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DAsync(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromArrayAsync(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DFromArrayAsync(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromSymbolAsync(jcuda.Pointer, java.lang.String, long, long, int, jcuda.runtime.cudaStream_t)
public static int cudaMemcpyFromSymbolAsync(Pointer dst, String symbol, long count, long offset, int cudaMemcpyKind_kind, cudaStream_t stream)
template < class T > cudaError_t cudaMemcpyFromSymbolAsync ( void* dst, const T& symbol, size_t count, size_t offset = 0, cudaMemcpyKind kind = cudaMemcpyDeviceToHost, cudaStream_t stream = 0 ) [inline]
[C++ API] Copies data from the given symbol on the device Copies count bytes from the memory area offset bytes from the start of symbol symbol to the memory area pointed to by dst. The memory areas may not overlap. symbol is a variable that resides in global or constant memory space. kind can be either cudaMemcpyDeviceToHost or cudaMemcpyDeviceToDevice.
cudaMemcpyFromSymbolAsync() is asynchronous with respect to the host, so the call may return before the copy is complete. The copy can optionally be associated to a stream by passing a non-zero stream argument. If kind is cudaMemcpyDeviceToHost and stream is non-zero, the copy may overlap with operations in other streams.
Note that this function may also return error codes from previous, asynchronous launches.
This function exhibits asynchronous behavior for most use cases.
Use of a string naming a variable as the symbol paramater was deprecated in CUDA 4.1 and removed in CUDA 5.0.
dst
- Destination memory addresssymbol
- Device symbol addresscount
- Size in bytes to copyoffset
- Offset from start of symbol in byteskind
- Type of transferstream
- Stream identifierdst
- Destination memory addresssymbol
- Device symbol referencecount
- Size in bytes to copyoffset
- Offset from start of symbol in byteskind
- Type of transferstream
- Stream identifiercudaMemcpy(jcuda.Pointer, jcuda.Pointer, long, int)
,
cudaMemcpy2D(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpyToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int)
,
cudaMemcpy2DToArray(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int)
,
cudaMemcpyFromArray(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpy2DFromArray(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, int)
,
cudaMemcpy2DArrayToArray(jcuda.runtime.cudaArray, long, long, jcuda.runtime.cudaArray, long, long, long, long, int)
,
cudaMemcpyToSymbol(java.lang.String, jcuda.Pointer, long, long, int)
,
cudaMemcpyFromSymbol(jcuda.Pointer, java.lang.String, long, long, int)
,
cudaMemcpyAsync(jcuda.Pointer, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DAsync(jcuda.Pointer, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DToArrayAsync(jcuda.runtime.cudaArray, long, long, jcuda.Pointer, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyFromArrayAsync(jcuda.Pointer, jcuda.runtime.cudaArray, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy2DFromArrayAsync(jcuda.Pointer, long, jcuda.runtime.cudaArray, long, long, long, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpyToSymbolAsync(java.lang.String, jcuda.Pointer, long, long, int, jcuda.runtime.cudaStream_t)
public static int cudaMemset(Pointer mem, int c, long count)
cudaError_t cudaMemset ( void* devPtr, int value, size_t count )
Initializes or sets device memory to a value. Fills the first count bytes of the memory area pointed to by devPtr with the constant byte value value.
Note that this function is asynchronous with respect to the host unless devPtr refers to pinned host memory.
Note that this function may also return error codes from previous, asynchronous launches.
devPtr
- Pointer to device memoryvalue
- Value to set for each byte of specified memorycount
- Size in bytes to setcudaMemset2D(jcuda.Pointer, long, int, long, long)
,
cudaMemset3D(jcuda.runtime.cudaPitchedPtr, int, jcuda.runtime.cudaExtent)
,
cudaMemsetAsync(jcuda.Pointer, int, long, jcuda.runtime.cudaStream_t)
,
cudaMemset2DAsync(jcuda.Pointer, long, int, long, long, jcuda.runtime.cudaStream_t)
,
cudaMemset3DAsync(jcuda.runtime.cudaPitchedPtr, int, jcuda.runtime.cudaExtent, jcuda.runtime.cudaStream_t)
public static int cudaMemset2D(Pointer mem, long pitch, int c, long width, long height)
cudaError_t cudaMemset2D ( void* devPtr, size_t pitch, int value, size_t width, size_t height )
Initializes or sets device memory to a value. Sets to the specified value value a matrix (height rows of width bytes each) pointed to by dstPtr. pitch is the width in bytes of the 2D array pointed to by dstPtr, including any padding added to the end of each row. This function performs fastest when the pitch is one that has been passed back by cudaMallocPitch().
Note that this function is asynchronous with respect to the host unless devPtr refers to pinned host memory.
Note that this function may also return error codes from previous, asynchronous launches.
devPtr
- Pointer to 2D device memorypitch
- Pitch in bytes of 2D device memoryvalue
- Value to set for each byte of specified memorywidth
- Width of matrix set (columns in bytes)height
- Height of matrix set (rows)cudaMemset(jcuda.Pointer, int, long)
,
cudaMemset3D(jcuda.runtime.cudaPitchedPtr, int, jcuda.runtime.cudaExtent)
,
cudaMemsetAsync(jcuda.Pointer, int, long, jcuda.runtime.cudaStream_t)
,
cudaMemset2DAsync(jcuda.Pointer, long, int, long, long, jcuda.runtime.cudaStream_t)
,
cudaMemset3DAsync(jcuda.runtime.cudaPitchedPtr, int, jcuda.runtime.cudaExtent, jcuda.runtime.cudaStream_t)
public static int cudaGetChannelDesc(cudaChannelFormatDesc desc, cudaArray array)
cudaError_t cudaGetChannelDesc ( cudaChannelFormatDesc* desc, cudaArray_const_t array )
Get the channel descriptor of an array. Returns in *desc the channel descriptor of the CUDA array array.
Note that this function may also return error codes from previous, asynchronous launches.
desc
- Channel formatarray
- Memory array on devicecudaCreateChannelDesc(int, int, int, int, int)
,
cudaGetTextureReference(jcuda.runtime.textureReference, java.lang.String)
,
cudaBindTexture(long[], jcuda.runtime.textureReference, jcuda.Pointer, jcuda.runtime.cudaChannelFormatDesc, long)
,
cudaBindTexture2D(long[], jcuda.runtime.textureReference, jcuda.Pointer, jcuda.runtime.cudaChannelFormatDesc, long, long, long)
,
cudaBindTextureToArray(jcuda.runtime.textureReference, jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc)
,
cudaUnbindTexture(jcuda.runtime.textureReference)
,
cudaGetTextureAlignmentOffset(long[], jcuda.runtime.textureReference)
public static cudaChannelFormatDesc cudaCreateChannelDesc(int x, int y, int z, int w, int cudaChannelFormatKind_f)
template < class T > cudaChannelFormatDesc cudaCreateChannelDesc ( void ) [inline]
[C++ API] Returns a channel descriptor using the specified format Returns a channel descriptor with format f and number of bits of each component x, y, z, and w. The cudaChannelFormatDesc is defined as:
struct cudaChannelFormatDesc { int x, y, z, w; enum cudaChannelFormatKind f; };
where cudaChannelFormatKind is one of cudaChannelFormatKindSigned, cudaChannelFormatKindUnsigned, or cudaChannelFormatKindFloat.
x
- X componenty
- Y componentz
- Z componentw
- W componentf
- Channel formatcudaCreateChannelDesc(int, int, int, int, int)
,
cudaGetChannelDesc(jcuda.runtime.cudaChannelFormatDesc, jcuda.runtime.cudaArray)
,
cudaGetTextureReference(jcuda.runtime.textureReference, java.lang.String)
,
cudaBindTexture(long[], jcuda.runtime.textureReference, jcuda.Pointer, jcuda.runtime.cudaChannelFormatDesc, long)
,
cudaBindTexture2D(long[], jcuda.runtime.textureReference, jcuda.Pointer, jcuda.runtime.cudaChannelFormatDesc, long, long, long)
,
cudaBindTextureToArray(jcuda.runtime.textureReference, jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc)
,
cudaUnbindTexture(jcuda.runtime.textureReference)
,
cudaGetTextureAlignmentOffset(long[], jcuda.runtime.textureReference)
public static int cudaGetLastError()
cudaError_t cudaGetLastError ( void )
Returns the last error from a runtime call. Returns the last error that has been produced by any of the runtime calls in the same host thread and resets it to cudaSuccess.
Note that this function may also return error codes from previous, asynchronous launches.
cudaPeekAtLastError()
,
cudaGetErrorString(int)
public static int cudaPeekAtLastError()
cudaError_t cudaPeekAtLastError ( void )
Returns the last error from a runtime call. Returns the last error that has been produced by any of the runtime calls in the same host thread. Note that this call does not reset the error to cudaSuccess like cudaGetLastError().
Note that this function may also return error codes from previous, asynchronous launches.
cudaGetLastError()
,
cudaGetErrorString(int)
public static String cudaGetErrorName(int error)
error
- - Error code to convert to stringcudaGetErrorString(int)
,
cudaGetLastError()
,
{@link cudaError}
public static String cudaGetErrorString(int error)
const char* cudaGetErrorString | ( | cudaError_t | error | ) |
Returns the message string from an error code.
char*
pointer to a NULL-terminated stringcudaGetLastError()
,
cudaPeekAtLastError()
,
cudaError
public static int cudaStreamCreate(cudaStream_t stream)
cudaError_t cudaStreamCreate ( cudaStream_t* pStream )
Create an asynchronous stream. Creates a new asynchronous stream.
Note that this function may also return error codes from previous, asynchronous launches.
pStream
- Pointer to new stream identifiercudaStreamCreate(jcuda.runtime.cudaStream_t)
,
cudaStreamCreateWithFlags(jcuda.runtime.cudaStream_t, int)
,
cudaStreamQuery(jcuda.runtime.cudaStream_t)
,
cudaStreamSynchronize(jcuda.runtime.cudaStream_t)
,
cudaStreamWaitEvent(jcuda.runtime.cudaStream_t, jcuda.runtime.cudaEvent_t, int)
,
cudaStreamAddCallback(jcuda.runtime.cudaStream_t, jcuda.runtime.cudaStreamCallback, java.lang.Object, int)
,
cudaStreamDestroy(jcuda.runtime.cudaStream_t)
public static int cudaStreamCreateWithFlags(cudaStream_t pStream, int flags)
cudaError_t cudaStreamCreateWithFlags ( cudaStream_t* pStream, unsigned int flags )
Create an asynchronous stream. Creates a new asynchronous stream. The flags argument determines the behaviors of the stream. Valid values for flags are
cudaStreamDefault: Default stream creation flag.
cudaStreamNonBlocking: Specifies that work running in the created stream may run concurrently with work in stream 0 (the NULL stream), and that the created stream should perform no implicit synchronization with stream 0.
Note that this function may also return error codes from previous, asynchronous launches.
pStream
- Pointer to new stream identifierflags
- Parameters for stream creationcudaStreamCreate(jcuda.runtime.cudaStream_t)
,
cudaStreamQuery(jcuda.runtime.cudaStream_t)
,
cudaStreamSynchronize(jcuda.runtime.cudaStream_t)
,
cudaStreamWaitEvent(jcuda.runtime.cudaStream_t, jcuda.runtime.cudaEvent_t, int)
,
cudaStreamAddCallback(jcuda.runtime.cudaStream_t, jcuda.runtime.cudaStreamCallback, java.lang.Object, int)
,
cudaStreamDestroy(jcuda.runtime.cudaStream_t)
public static int cudaStreamCreateWithPriority(cudaStream_t pStream, int flags, int priority)
public static int cudaStreamGetPriority(cudaStream_t hStream, int[] priority)
public static int cudaStreamGetFlags(cudaStream_t hStream, int[] flags)
public static int cudaCtxResetPersistingL2Cache()
\brief Resets all persisting lines in cache to normal status.
Resets all persisting lines in cache to normal status.
Takes effect on function return.
\return
::cudaSuccess,
\notefnerr
\sa
::cudaAccessPolicyWindow
public static int cudaStreamCopyAttributes(cudaStream_t dst, cudaStream_t src)
\brief Copies attributes from source stream to destination stream.
Copies attributes from source stream \p src to destination stream \p dst.
Both streams must have the same context.
\param[out] dst Destination stream
\param[in] src Source stream
For attributes see ::cudaStreamAttrID
\return
::cudaSuccess,
::cudaErrorNotSupported
\notefnerr
\sa
::cudaAccessPolicyWindow
public static int cudaStreamGetAttribute(cudaStream_t hStream, int attr, cudaStreamAttrValue value_out)
\brief Queries stream attribute.
Queries attribute \p attr from \p hStream and stores it in corresponding
member of \p value_out.
\param[in] hStream
\param[in] attr
\param[out] value_out
\return
::cudaSuccess,
::cudaErrorInvalidValue,
::cudaErrorInvalidResourceHandle
\notefnerr
\sa
::cudaAccessPolicyWindow
public static int cudaStreamSetAttribute(cudaStream_t hStream, int attr, cudaStreamAttrValue value)
\brief Sets stream attribute.
Sets attribute \p attr on \p hStream from corresponding attribute of
\p value. The updated attribute will be applied to subsequent work
submitted to the stream. It will not affect previously submitted work.
\param[out] hStream
\param[in] attr
\param[in] value
\return
::cudaSuccess,
::cudaErrorInvalidValue,
::cudaErrorInvalidResourceHandle
\notefnerr
\sa
::cudaAccessPolicyWindow
public static int cudaStreamDestroy(cudaStream_t stream)
cudaError_t cudaStreamDestroy ( cudaStream_t stream )
Destroys and cleans up an asynchronous stream. Destroys and cleans up the asynchronous stream specified by stream.
In case the device is still doing work in the stream stream when cudaStreamDestroy() is called, the function will return immediately and the resources associated with stream will be released automatically once the device has completed all work in stream.
Note that this function may also return error codes from previous, asynchronous launches.
stream
- Stream identifiercudaStreamCreate(jcuda.runtime.cudaStream_t)
,
cudaStreamCreateWithFlags(jcuda.runtime.cudaStream_t, int)
,
cudaStreamQuery(jcuda.runtime.cudaStream_t)
,
cudaStreamWaitEvent(jcuda.runtime.cudaStream_t, jcuda.runtime.cudaEvent_t, int)
,
cudaStreamSynchronize(jcuda.runtime.cudaStream_t)
,
cudaStreamAddCallback(jcuda.runtime.cudaStream_t, jcuda.runtime.cudaStreamCallback, java.lang.Object, int)
public static int cudaStreamWaitEvent(cudaStream_t stream, cudaEvent_t event, int flags)
cudaError_t cudaStreamWaitEvent ( cudaStream_t stream, cudaEvent_t event, unsigned int flags )
Make a compute stream wait on an event. Makes all future work submitted to stream wait until event reports completion before beginning execution. This synchronization will be performed efficiently on the device. The event event may be from a different context than stream, in which case this function will perform cross-device synchronization.
The stream stream will wait only for the completion of the most recent host call to cudaEventRecord() on event. Once this call has returned, any functions (including cudaEventRecord() and cudaEventDestroy()) may be called on event again, and the subsequent calls will not have any effect on stream.
If stream is NULL, any future work submitted in any stream will wait for event to complete before beginning execution. This effectively creates a barrier for all future work submitted to the device on this thread.
If cudaEventRecord() has not been called on event, this call acts as if the record has already completed, and so is a functional no-op.
Note that this function may also return error codes from previous, asynchronous launches.
stream
- Stream to waitevent
- Event to wait onflags
- Parameters for the operation (must be 0)cudaStreamCreate(jcuda.runtime.cudaStream_t)
,
cudaStreamCreateWithFlags(jcuda.runtime.cudaStream_t, int)
,
cudaStreamQuery(jcuda.runtime.cudaStream_t)
,
cudaStreamSynchronize(jcuda.runtime.cudaStream_t)
,
cudaStreamAddCallback(jcuda.runtime.cudaStream_t, jcuda.runtime.cudaStreamCallback, java.lang.Object, int)
,
cudaStreamDestroy(jcuda.runtime.cudaStream_t)
public static int cudaStreamAddCallback(cudaStream_t stream, cudaStreamCallback callback, Object userData, int flags)
cudaError_t cudaStreamAddCallback ( cudaStream_t stream, cudaStreamCallback_t callback, void* userData, unsigned int flags )
Add a callback to a compute stream. Adds a callback to be called on the host after all currently enqueued items in the stream have completed. For each cudaStreamAddCallback call, a callback will be executed exactly once. The callback will block later work in the stream until it is finished.
The callback may be passed cudaSuccess or an error code. In the event of a device error, all subsequently executed callbacks will receive an appropriate cudaError_t.
Callbacks must not make any CUDA API calls. Attempting to use CUDA APIs will result in cudaErrorNotPermitted. Callbacks must not perform any synchronization that may depend on outstanding device work or other callbacks that are not mandated to run earlier. Callbacks without a mandated order (in independent streams) execute in undefined order and may be serialized.
This API requires compute capability 1.1 or greater. See cudaDeviceGetAttribute or cudaGetDeviceProperties to query compute capability. Calling this API with an earlier compute version will return cudaErrorNotSupported.
Note that this function may also return error codes from previous, asynchronous launches.
stream
- Stream to add callback tocallback
- The function to call once preceding stream operations are completeuserData
- User specified data to be passed to the callback functionflags
- Reserved for future use, must be 0cudaStreamCreate(jcuda.runtime.cudaStream_t)
,
cudaStreamCreateWithFlags(jcuda.runtime.cudaStream_t, int)
,
cudaStreamQuery(jcuda.runtime.cudaStream_t)
,
cudaStreamSynchronize(jcuda.runtime.cudaStream_t)
,
cudaStreamWaitEvent(jcuda.runtime.cudaStream_t, jcuda.runtime.cudaEvent_t, int)
,
cudaStreamDestroy(jcuda.runtime.cudaStream_t)
public static int cudaStreamSynchronize(cudaStream_t stream)
cudaError_t cudaStreamSynchronize ( cudaStream_t stream )
Waits for stream tasks to complete. Blocks until stream has completed all operations. If the cudaDeviceScheduleBlockingSync flag was set for this device, the host thread will block until the stream is finished with all of its tasks.
Note that this function may also return error codes from previous, asynchronous launches.
stream
- Stream identifiercudaStreamCreate(jcuda.runtime.cudaStream_t)
,
cudaStreamCreateWithFlags(jcuda.runtime.cudaStream_t, int)
,
cudaStreamQuery(jcuda.runtime.cudaStream_t)
,
cudaStreamWaitEvent(jcuda.runtime.cudaStream_t, jcuda.runtime.cudaEvent_t, int)
,
cudaStreamAddCallback(jcuda.runtime.cudaStream_t, jcuda.runtime.cudaStreamCallback, java.lang.Object, int)
,
cudaStreamDestroy(jcuda.runtime.cudaStream_t)
public static int cudaStreamQuery(cudaStream_t stream)
cudaError_t cudaStreamQuery ( cudaStream_t stream )
Queries an asynchronous stream for completion status. Returns cudaSuccess if all operations in stream have completed, or cudaErrorNotReady if not.
Note that this function may also return error codes from previous, asynchronous launches.
stream
- Stream identifiercudaStreamCreate(jcuda.runtime.cudaStream_t)
,
cudaStreamCreateWithFlags(jcuda.runtime.cudaStream_t, int)
,
cudaStreamWaitEvent(jcuda.runtime.cudaStream_t, jcuda.runtime.cudaEvent_t, int)
,
cudaStreamSynchronize(jcuda.runtime.cudaStream_t)
,
cudaStreamAddCallback(jcuda.runtime.cudaStream_t, jcuda.runtime.cudaStreamCallback, java.lang.Object, int)
,
cudaStreamDestroy(jcuda.runtime.cudaStream_t)
public static int cudaStreamAttachMemAsync(cudaStream_t stream, Pointer devPtr, long length, int flags)
public static int cudaEventCreate(cudaEvent_t event)
cudaError_t cudaEventCreate ( cudaEvent_t* event, unsigned int flags )
[C++ API] Creates an event object with the specified flags Creates an event object with the specified flags. Valid flags include:
cudaEventDefault: Default event creation flag.
cudaEventBlockingSync: Specifies that event should use blocking synchronization. A host thread that uses cudaEventSynchronize() to wait on an event created with this flag will block until the event actually completes.
cudaEventDisableTiming: Specifies that the created event does not need to record timing data. Events created with this flag specified and the cudaEventBlockingSync flag not specified will provide the best performance when used with cudaStreamWaitEvent() and cudaEventQuery().
Note that this function may also return error codes from previous, asynchronous launches.
event
- Newly created eventevent
- Newly created eventflags
- Flags for new eventcudaEventCreate(jcuda.runtime.cudaEvent_t)
,
cudaEventCreateWithFlags(jcuda.runtime.cudaEvent_t, int)
,
cudaEventRecord(jcuda.runtime.cudaEvent_t, jcuda.runtime.cudaStream_t)
,
cudaEventQuery(jcuda.runtime.cudaEvent_t)
,
cudaEventSynchronize(jcuda.runtime.cudaEvent_t)
,
cudaEventDestroy(jcuda.runtime.cudaEvent_t)
,
cudaEventElapsedTime(float[], jcuda.runtime.cudaEvent_t, jcuda.runtime.cudaEvent_t)
,
cudaStreamWaitEvent(jcuda.runtime.cudaStream_t, jcuda.runtime.cudaEvent_t, int)
public static int cudaEventCreateWithFlags(cudaEvent_t event, int flags)
cudaError_t cudaEventCreateWithFlags ( cudaEvent_t* event, unsigned int flags )
Creates an event object with the specified flags. Creates an event object with the specified flags. Valid flags include:
cudaEventDefault: Default event creation flag.
cudaEventBlockingSync: Specifies that event should use blocking synchronization. A host thread that uses cudaEventSynchronize() to wait on an event created with this flag will block until the event actually completes.
cudaEventDisableTiming: Specifies that the created event does not need to record timing data. Events created with this flag specified and the cudaEventBlockingSync flag not specified will provide the best performance when used with cudaStreamWaitEvent() and cudaEventQuery().
cudaEventInterprocess: Specifies that the created event may be used as an interprocess event by cudaIpcGetEventHandle(). cudaEventInterprocess must be specified along with cudaEventDisableTiming.
Note that this function may also return error codes from previous, asynchronous launches.
event
- Newly created eventflags
- Flags for new eventcudaEventCreate(jcuda.runtime.cudaEvent_t)
,
cudaEventSynchronize(jcuda.runtime.cudaEvent_t)
,
cudaEventDestroy(jcuda.runtime.cudaEvent_t)
,
cudaEventElapsedTime(float[], jcuda.runtime.cudaEvent_t, jcuda.runtime.cudaEvent_t)
,
cudaStreamWaitEvent(jcuda.runtime.cudaStream_t, jcuda.runtime.cudaEvent_t, int)
public static int cudaEventRecord(cudaEvent_t event, cudaStream_t stream)
cudaError_t cudaEventRecord ( cudaEvent_t event, cudaStream_t stream = 0 )
Records an event. Records an event. If stream is non-zero, the event is recorded after all preceding operations in stream have been completed; otherwise, it is recorded after all preceding operations in the CUDA context have been completed. Since operation is asynchronous, cudaEventQuery() and/or cudaEventSynchronize() must be used to determine when the event has actually been recorded.
If cudaEventRecord() has previously been called on event, then this call will overwrite any existing state in event. Any subsequent calls which examine the status of event will only examine the completion of this most recent call to cudaEventRecord().
Note that this function may also return error codes from previous, asynchronous launches.
event
- Event to recordstream
- Stream in which to record eventcudaEventCreate(jcuda.runtime.cudaEvent_t)
,
cudaEventCreateWithFlags(jcuda.runtime.cudaEvent_t, int)
,
cudaEventQuery(jcuda.runtime.cudaEvent_t)
,
cudaEventSynchronize(jcuda.runtime.cudaEvent_t)
,
cudaEventDestroy(jcuda.runtime.cudaEvent_t)
,
cudaEventElapsedTime(float[], jcuda.runtime.cudaEvent_t, jcuda.runtime.cudaEvent_t)
,
cudaStreamWaitEvent(jcuda.runtime.cudaStream_t, jcuda.runtime.cudaEvent_t, int)
public static int cudaEventQuery(cudaEvent_t event)
cudaError_t cudaEventQuery ( cudaEvent_t event )
Queries an event's status. Query the status of all device work preceding the most recent call to cudaEventRecord() (in the appropriate compute streams, as specified by the arguments to cudaEventRecord()).
If this work has successfully been completed by the device, or if cudaEventRecord() has not been called on event, then cudaSuccess is returned. If this work has not yet been completed by the device then cudaErrorNotReady is returned.
Note that this function may also return error codes from previous, asynchronous launches.
event
- Event to querycudaEventCreate(jcuda.runtime.cudaEvent_t)
,
cudaEventCreateWithFlags(jcuda.runtime.cudaEvent_t, int)
,
cudaEventRecord(jcuda.runtime.cudaEvent_t, jcuda.runtime.cudaStream_t)
,
cudaEventSynchronize(jcuda.runtime.cudaEvent_t)
,
cudaEventDestroy(jcuda.runtime.cudaEvent_t)
,
cudaEventElapsedTime(float[], jcuda.runtime.cudaEvent_t, jcuda.runtime.cudaEvent_t)
public static int cudaEventSynchronize(cudaEvent_t event)
cudaError_t cudaEventSynchronize ( cudaEvent_t event )
Waits for an event to complete. Wait until the completion of all device work preceding the most recent call to cudaEventRecord() (in the appropriate compute streams, as specified by the arguments to cudaEventRecord()).
If cudaEventRecord() has not been called on event, cudaSuccess is returned immediately.
Waiting for an event that was created with the cudaEventBlockingSync flag will cause the calling CPU thread to block until the event has been completed by the device. If the cudaEventBlockingSync flag has not been set, then the CPU thread will busy-wait until the event has been completed by the device.
Note that this function may also return error codes from previous, asynchronous launches.
event
- Event to wait forcudaEventCreate(jcuda.runtime.cudaEvent_t)
,
cudaEventCreateWithFlags(jcuda.runtime.cudaEvent_t, int)
,
cudaEventRecord(jcuda.runtime.cudaEvent_t, jcuda.runtime.cudaStream_t)
,
cudaEventQuery(jcuda.runtime.cudaEvent_t)
,
cudaEventDestroy(jcuda.runtime.cudaEvent_t)
,
cudaEventElapsedTime(float[], jcuda.runtime.cudaEvent_t, jcuda.runtime.cudaEvent_t)
public static int cudaEventDestroy(cudaEvent_t event)
cudaError_t cudaEventDestroy ( cudaEvent_t event )
Destroys an event object. Destroys the event specified by event.
In case event has been recorded but has not yet been completed when cudaEventDestroy() is called, the function will return immediately and the resources associated with event will be released automatically once the device has completed event.
Note that this function may also return error codes from previous, asynchronous launches.
event
- Event to destroycudaEventCreate(jcuda.runtime.cudaEvent_t)
,
cudaEventCreateWithFlags(jcuda.runtime.cudaEvent_t, int)
,
cudaEventQuery(jcuda.runtime.cudaEvent_t)
,
cudaEventSynchronize(jcuda.runtime.cudaEvent_t)
,
cudaEventRecord(jcuda.runtime.cudaEvent_t, jcuda.runtime.cudaStream_t)
,
cudaEventElapsedTime(float[], jcuda.runtime.cudaEvent_t, jcuda.runtime.cudaEvent_t)
public static int cudaEventElapsedTime(float[] ms, cudaEvent_t start, cudaEvent_t end)
cudaError_t cudaEventElapsedTime ( float* ms, cudaEvent_t start, cudaEvent_t end )
Computes the elapsed time between events. Computes the elapsed time between two events (in milliseconds with a resolution of around 0.5 microseconds).
If either event was last recorded in a non-NULL stream, the resulting time may be greater than expected (even if both used the same stream handle). This happens because the cudaEventRecord() operation takes place asynchronously and there is no guarantee that the measured latency is actually just between the two events. Any number of other different stream operations could execute in between the two measured events, thus altering the timing in a significant way.
If cudaEventRecord() has not been called on either event, then cudaErrorInvalidResourceHandle is returned. If cudaEventRecord() has been called on both events but one or both of them has not yet been completed (that is, cudaEventQuery() would return cudaErrorNotReady on at least one of the events), cudaErrorNotReady is returned. If either event was created with the cudaEventDisableTiming flag, then this function will return cudaErrorInvalidResourceHandle.
Note that this function may also return error codes from previous, asynchronous launches.
ms
- Time between start and end in msstart
- Starting eventend
- Ending eventcudaEventCreate(jcuda.runtime.cudaEvent_t)
,
cudaEventCreateWithFlags(jcuda.runtime.cudaEvent_t, int)
,
cudaEventQuery(jcuda.runtime.cudaEvent_t)
,
cudaEventSynchronize(jcuda.runtime.cudaEvent_t)
,
cudaEventDestroy(jcuda.runtime.cudaEvent_t)
,
cudaEventRecord(jcuda.runtime.cudaEvent_t, jcuda.runtime.cudaStream_t)
public static int cudaDeviceReset()
cudaError_t cudaDeviceReset ( void )
Destroy all allocations and reset all state on the current device in the current process. Explicitly destroys and cleans up all resources associated with the current device in the current process. Any subsequent API call to this device will reinitialize the device.
Note that this function will reset the device immediately. It is the caller's responsibility to ensure that the device is not being accessed by any other host threads from the process when this function is called.
Note that this function may also return error codes from previous, asynchronous launches.
cudaDeviceSynchronize()
public static int cudaDeviceSynchronize()
cudaError_t cudaDeviceSynchronize ( void )
Wait for compute device to finish. Blocks until the device has completed all preceding requested tasks. cudaDeviceSynchronize() returns an error if one of the preceding tasks has failed. If the cudaDeviceScheduleBlockingSync flag was set for this device, the host thread will block until the device has finished its work.
Note that this function may also return error codes from previous, asynchronous launches.
cudaDeviceReset()
public static int cudaDeviceSetLimit(int limit, long value)
cudaError_t cudaDeviceSetLimit ( cudaLimit limit, size_t value )
Set resource limits. Setting limit to value is a request by the application to update the current limit maintained by the device. The driver is free to modify the requested value to meet h/w requirements (this could be clamping to minimum or maximum values, rounding up to nearest element size, etc). The application can use cudaDeviceGetLimit() to find out exactly what the limit has been set to.
Setting each cudaLimit has its own specific restrictions, so each is discussed here.
cudaLimitStackSize controls the stack size in bytes of each GPU thread. This limit is only applicable to devices of compute capability 2.0 and higher. Attempting to set this limit on devices of compute capability less than 2.0 will result in the error cudaErrorUnsupportedLimit being returned.
cudaLimitPrintfFifoSize controls the size in bytes of the shared FIFO used by the printf() and fprintf() device system calls. Setting cudaLimitPrintfFifoSize must be performed before launching any kernel that uses the printf() or fprintf() device system calls, otherwise cudaErrorInvalidValue will be returned. This limit is only applicable to devices of compute capability 2.0 and higher. Attempting to set this limit on devices of compute capability less than 2.0 will result in the error cudaErrorUnsupportedLimit being returned.
cudaLimitMallocHeapSize controls the size in bytes of the heap used by the malloc() and free() device system calls. Setting cudaLimitMallocHeapSize must be performed before launching any kernel that uses the malloc() or free() device system calls, otherwise cudaErrorInvalidValue will be returned. This limit is only applicable to devices of compute capability 2.0 and higher. Attempting to set this limit on devices of compute capability less than 2.0 will result in the error cudaErrorUnsupportedLimit being returned.
cudaLimitDevRuntimeSyncDepth controls the maximum nesting depth of a grid at which a thread can safely call cudaDeviceSynchronize(). Setting this limit must be performed before any launch of a kernel that uses the device runtime and calls cudaDeviceSynchronize() above the default sync depth, two levels of grids. Calls to cudaDeviceSynchronize() will fail with error code cudaErrorSyncDepthExceeded if the limitation is violated. This limit can be set smaller than the default or up the maximum launch depth of 24. When setting this limit, keep in mind that additional levels of sync depth require the runtime to reserve large amounts of device memory which can no longer be used for user allocations. If these reservations of device memory fail, cudaDeviceSetLimit will return cudaErrorMemoryAllocation, and the limit can be reset to a lower value. This limit is only applicable to devices of compute capability 3.5 and higher. Attempting to set this limit on devices of compute capability less than 3.5 will result in the error cudaErrorUnsupportedLimit being returned.
cudaLimitDevRuntimePendingLaunchCount controls the maximum number of outstanding device runtime launches that can be made from the current device. A grid is outstanding from the point of launch up until the grid is known to have been completed. Device runtime launches which violate this limitation fail and return cudaErrorLaunchPendingCountExceeded when cudaGetLastError() is called after launch. If more pending launches than the default (2048 launches) are needed for a module using the device runtime, this limit can be increased. Keep in mind that being able to sustain additional pending launches will require the runtime to reserve larger amounts of device memory upfront which can no longer be used for allocations. If these reservations fail, cudaDeviceSetLimit will return cudaErrorMemoryAllocation, and the limit can be reset to a lower value. This limit is only applicable to devices of compute capability 3.5 and higher. Attempting to set this limit on devices of compute capability less than 3.5 will result in the error cudaErrorUnsupportedLimit being returned.
cudaLimitMaxL2FetchGranularity controls the L2 cache fetch granularity. Values can range from 0B to 128B. This is purely a performance hint and it can be ignored or clamped depending on the platform.
Note that this function may also return error codes from previous, asynchronous launches.
limit
- Limit to setvalue
- Size of limitcudaDeviceGetLimit(long[], int)
public static int cudaDeviceGetLimit(long[] pValue, int limit)
cudaError_t cudaDeviceGetLimit ( size_t* pValue, cudaLimit limit )
Returns resource limits. Returns in *pValue the current size of limit. The supported cudaLimit values are:
cudaLimitStackSize: stack size in bytes of each GPU thread;
cudaLimitPrintfFifoSize: size in bytes of the shared FIFO used by the printf() and fprintf() device system calls.
cudaLimitMallocHeapSize: size in bytes of the heap used by the malloc() and free() device system calls;
cudaLimitDevRuntimeSyncDepth: maximum grid depth at which a thread can isssue the device runtime call cudaDeviceSynchronize() to wait on child grid launches to complete.
cudaLimitDevRuntimePendingLaunchCount: maximum number of outstanding device runtime launches.
cudaLimitMaxL2FetchGranularity: L2 cache fetch granularity
Note that this function may also return error codes from previous, asynchronous launches.
pValue
- Returned size of the limitlimit
- Limit to querycudaDeviceSetLimit(int, long)
public static int cudaDeviceGetCacheConfig(int[] pCacheConfig)
cudaError_t cudaDeviceGetCacheConfig ( cudaFuncCache ** pCacheConfig )
Returns the preferred cache configuration for the current device. On devices where the L1 cache and shared memory use the same hardware resources, this returns through pCacheConfig the preferred cache configuration for the current device. This is only a preference. The runtime will use the requested configuration if possible, but it is free to choose a different configuration if required to execute functions.
This will return a pCacheConfig of cudaFuncCachePreferNone on devices where the size of the L1 cache and shared memory are fixed.
The supported cache configurations are:
cudaFuncCachePreferNone: no preference for shared memory or L1 (default)
cudaFuncCachePreferShared: prefer larger shared memory and smaller L1 cache
cudaFuncCachePreferL1: prefer larger L1 cache and smaller shared memory
Note that this function may also return error codes from previous, asynchronous launches.
pCacheConfig
- Returned cache configurationcudaDeviceSetCacheConfig(int)
,
cudaDeviceSetCacheConfig(int)
,
cudaDeviceSetCacheConfig(int)
public static int cudaDeviceGetStreamPriorityRange(int[] leastPriority, int[] greatestPriority)
public static int cudaDeviceGetSharedMemConfig(int[] pConfig)
cudaError_t cudaDeviceGetSharedMemConfig ( cudaSharedMemConfig ** pConfig )
Returns the shared memory configuration for the current device. This function will return in pConfig the current size of shared memory banks on the current device. On devices with configurable shared memory banks, cudaDeviceSetSharedMemConfig can be used to change this setting, so that all subsequent kernel launches will by default use the new bank size. When cudaDeviceGetSharedMemConfig is called on devices without configurable shared memory, it will return the fixed bank size of the hardware.
The returned bank configurations can be either:
cudaSharedMemBankSizeFourByte - shared memory bank width is four bytes.
cudaSharedMemBankSizeEightByte - shared memory bank width is eight bytes.
Note that this function may also return error codes from previous, asynchronous launches.
pConfig
- Returned cache configurationcudaDeviceSetCacheConfig(int)
,
cudaDeviceGetCacheConfig(int[])
,
cudaDeviceSetSharedMemConfig(int)
,
cudaDeviceSetCacheConfig(int)
public static int cudaDeviceSetSharedMemConfig(int config)
cudaError_t cudaDeviceSetSharedMemConfig ( cudaSharedMemConfig config )
Sets the shared memory configuration for the current device. On devices with configurable shared memory banks, this function will set the shared memory bank size which is used for all subsequent kernel launches. Any per-function setting of shared memory set via cudaFuncSetSharedMemConfig will override the device wide setting.
Changing the shared memory configuration between launches may introduce a device side synchronization point.
Changing the shared memory bank size will not increase shared memory usage or affect occupancy of kernels, but may have major effects on performance. Larger bank sizes will allow for greater potential bandwidth to shared memory, but will change what kinds of accesses to shared memory will result in bank conflicts.
This function will do nothing on devices with fixed shared memory bank size.
The supported bank configurations are:
cudaSharedMemBankSizeDefault: set bank width the device default (currently, four bytes)
cudaSharedMemBankSizeFourByte: set shared memory bank width to be four bytes natively.
cudaSharedMemBankSizeEightByte: set shared memory bank width to be eight bytes natively.
Note that this function may also return error codes from previous, asynchronous launches.
config
- Requested cache configurationcudaDeviceSetCacheConfig(int)
,
cudaDeviceGetCacheConfig(int[])
,
cudaDeviceGetSharedMemConfig(int[])
,
cudaDeviceSetCacheConfig(int)
public static int cudaDeviceSetCacheConfig(int cacheConfig)
cudaError_t cudaDeviceSetCacheConfig ( cudaFuncCache cacheConfig )
Sets the preferred cache configuration for the current device. On devices where the L1 cache and shared memory use the same hardware resources, this sets through cacheConfig the preferred cache configuration for the current device. This is only a preference. The runtime will use the requested configuration if possible, but it is free to choose a different configuration if required to execute the function. Any function preference set via cudaDeviceSetCacheConfig ( C API) or cudaDeviceSetCacheConfig ( C++ API) will be preferred over this device-wide setting. Setting the device-wide cache configuration to cudaFuncCachePreferNone will cause subsequent kernel launches to prefer to not change the cache configuration unless required to launch the kernel.
This setting does nothing on devices where the size of the L1 cache and shared memory are fixed.
Launching a kernel with a different preference than the most recent preference setting may insert a device-side synchronization point.
The supported cache configurations are:
cudaFuncCachePreferNone: no preference for shared memory or L1 (default)
cudaFuncCachePreferShared: prefer larger shared memory and smaller L1 cache
cudaFuncCachePreferL1: prefer larger L1 cache and smaller shared memory
Note that this function may also return error codes from previous, asynchronous launches.
cacheConfig
- Requested cache configurationcudaDeviceGetCacheConfig(int[])
,
cudaDeviceSetCacheConfig(int)
,
cudaDeviceSetCacheConfig(int)
public static int cudaDeviceGetByPCIBusId(int[] device, String pciBusId)
cudaError_t cudaDeviceGetByPCIBusId ( int* device, char* pciBusId )
Returns a handle to a compute device. Returns in *device a device ordinal given a PCI bus ID string.
Note that this function may also return error codes from previous, asynchronous launches.
device
- Returned device ordinalpciBusId
- String in one of the following forms: [domain]:[bus]:[device].[function] [domain]:[bus]:[device] [bus]:[device].[function] where domain, bus, device, and function are all hexadecimal valuescudaDeviceGetPCIBusId(java.lang.String[], int, int)
public static int cudaDeviceGetPCIBusId(String[] pciBusId, int len, int device)
cudaError_t cudaDeviceGetPCIBusId ( char* pciBusId, int len, int device )
Returns a PCI Bus Id string for the device. Returns an ASCII string identifying the device dev in the NULL-terminated string pointed to by pciBusId. len specifies the maximum length of the string that may be returned.
Note that this function may also return error codes from previous, asynchronous launches.
pciBusId
- Returned identifier string for the device in the following format [domain]:[bus]:[device].[function] where domain, bus, device, and function are all hexadecimal values. pciBusId should be large enough to store 13 characters including the NULL-terminator.len
- Maximum length of string to store in namedevice
- Device to get identifier string forcudaDeviceGetByPCIBusId(int[], java.lang.String)
public static int cudaIpcGetEventHandle(cudaIpcEventHandle handle, cudaEvent_t event)
cudaError_t cudaIpcGetEventHandle ( cudaIpcEventHandle_t* handle, cudaEvent_t event )
Gets an interprocess handle for a previously allocated event. Takes as input a previously allocated event. This event must have been created with the cudaEventInterprocess and cudaEventDisableTiming flags set. This opaque handle may be copied into other processes and opened with cudaIpcOpenEventHandle to allow efficient hardware synchronization between GPU work in different processes.
After the event has been been opened in the importing process, cudaEventRecord, cudaEventSynchronize, cudaStreamWaitEvent and cudaEventQuery may be used in either process. Performing operations on the imported event after the exported event has been freed with cudaEventDestroy will result in undefined behavior.
IPC functionality is restricted to devices with support for unified addressing on Linux operating systems.
handle
- Pointer to a user allocated cudaIpcEventHandle in which to return the opaque event handleevent
- Event allocated with cudaEventInterprocess and cudaEventDisableTiming flags.cudaEventCreate(jcuda.runtime.cudaEvent_t)
,
cudaEventDestroy(jcuda.runtime.cudaEvent_t)
,
cudaEventSynchronize(jcuda.runtime.cudaEvent_t)
,
cudaEventQuery(jcuda.runtime.cudaEvent_t)
,
cudaStreamWaitEvent(jcuda.runtime.cudaStream_t, jcuda.runtime.cudaEvent_t, int)
,
cudaIpcOpenEventHandle(jcuda.runtime.cudaEvent_t, jcuda.runtime.cudaIpcEventHandle)
,
cudaIpcGetMemHandle(jcuda.runtime.cudaIpcMemHandle, jcuda.Pointer)
,
cudaIpcOpenMemHandle(jcuda.Pointer, jcuda.runtime.cudaIpcMemHandle, int)
,
cudaIpcCloseMemHandle(jcuda.Pointer)
public static int cudaIpcOpenEventHandle(cudaEvent_t event, cudaIpcEventHandle handle)
cudaError_t cudaIpcOpenEventHandle ( cudaEvent_t* event, cudaIpcEventHandle_t handle )
Opens an interprocess event handle for use in the current process. Opens an interprocess event handle exported from another process with cudaIpcGetEventHandle. This function returns a cudaEvent_t that behaves like a locally created event with the cudaEventDisableTiming flag specified. This event must be freed with cudaEventDestroy.
Performing operations on the imported event after the exported event has been freed with cudaEventDestroy will result in undefined behavior.
IPC functionality is restricted to devices with support for unified addressing on Linux operating systems.
event
- Returns the imported eventhandle
- Interprocess handle to opencudaEventCreate(jcuda.runtime.cudaEvent_t)
,
cudaEventDestroy(jcuda.runtime.cudaEvent_t)
,
cudaEventSynchronize(jcuda.runtime.cudaEvent_t)
,
cudaEventQuery(jcuda.runtime.cudaEvent_t)
,
cudaStreamWaitEvent(jcuda.runtime.cudaStream_t, jcuda.runtime.cudaEvent_t, int)
,
cudaIpcGetEventHandle(jcuda.runtime.cudaIpcEventHandle, jcuda.runtime.cudaEvent_t)
,
cudaIpcGetMemHandle(jcuda.runtime.cudaIpcMemHandle, jcuda.Pointer)
,
cudaIpcOpenMemHandle(jcuda.Pointer, jcuda.runtime.cudaIpcMemHandle, int)
,
cudaIpcCloseMemHandle(jcuda.Pointer)
public static int cudaIpcGetMemHandle(cudaIpcMemHandle handle, Pointer devPtr)
cudaError_t cudaIpcGetMemHandle ( cudaIpcMemHandle_t* handle, void* devPtr )
/brief Gets an interprocess memory handle for an existing device memory allocation
Takes a pointer to the base of an existing device memory allocation created with cudaMalloc and exports it for use in another process. This is a lightweight operation and may be called multiple times on an allocation without adverse effects.
If a region of memory is freed with cudaFree and a subsequent call to cudaMalloc returns memory with the same device address, cudaIpcGetMemHandle will return a unique handle for the new memory.
IPC functionality is restricted to devices with support for unified addressing on Linux operating systems.
handle
- Pointer to user allocated cudaIpcMemHandle to return the handle in.devPtr
- Base pointer to previously allocated device memorycudaMalloc(jcuda.Pointer, long)
,
cudaFree(jcuda.Pointer)
,
cudaIpcGetEventHandle(jcuda.runtime.cudaIpcEventHandle, jcuda.runtime.cudaEvent_t)
,
cudaIpcOpenEventHandle(jcuda.runtime.cudaEvent_t, jcuda.runtime.cudaIpcEventHandle)
,
cudaIpcOpenMemHandle(jcuda.Pointer, jcuda.runtime.cudaIpcMemHandle, int)
,
cudaIpcCloseMemHandle(jcuda.Pointer)
public static int cudaIpcOpenMemHandle(Pointer devPtr, cudaIpcMemHandle handle, int flags)
cudaError_t cudaIpcOpenMemHandle ( void** devPtr, cudaIpcMemHandle_t handle, unsigned int flags )
/brief Opens an interprocess memory handle exported from another process and returns a device pointer usable in the local process.
Maps memory exported from another process with cudaIpcGetMemHandle into the current device address space. For contexts on different devices cudaIpcOpenMemHandle can attempt to enable peer access between the devices as if the user called cudaDeviceEnablePeerAccess. This behavior is controlled by the cudaIpcMemLazyEnablePeerAccess flag. cudaDeviceCanAccessPeer can determine if a mapping is possible.
Contexts that may open cudaIpcMemHandles are restricted in the following way. cudaIpcMemHandles from each device in a given process may only be opened by one context per device per other process.
Memory returned from cudaIpcOpenMemHandle must be freed with cudaIpcCloseMemHandle.
Calling cudaFree on an exported memory region before calling cudaIpcCloseMemHandle in the importing context will result in undefined behavior.
IPC functionality is restricted to devices with support for unified addressing on Linux operating systems.
devPtr
- Returned device pointerhandle
- cudaIpcMemHandle to openflags
- Flags for this operation. Must be specified as cudaIpcMemLazyEnablePeerAccesscudaMalloc(jcuda.Pointer, long)
,
cudaFree(jcuda.Pointer)
,
cudaIpcGetEventHandle(jcuda.runtime.cudaIpcEventHandle, jcuda.runtime.cudaEvent_t)
,
cudaIpcOpenEventHandle(jcuda.runtime.cudaEvent_t, jcuda.runtime.cudaIpcEventHandle)
,
cudaIpcGetMemHandle(jcuda.runtime.cudaIpcMemHandle, jcuda.Pointer)
,
cudaIpcCloseMemHandle(jcuda.Pointer)
,
cudaDeviceEnablePeerAccess(int, int)
,
cudaDeviceCanAccessPeer(int[], int, int)
public static int cudaIpcCloseMemHandle(Pointer devPtr)
cudaError_t cudaIpcCloseMemHandle ( void* devPtr )
Close memory mapped with cudaIpcOpenMemHandle. Unmaps memory returnd by cudaIpcOpenMemHandle. The original allocation in the exporting process as well as imported mappings in other processes will be unaffected.
Any resources used to enable peer access will be freed if this is the last mapping using them.
IPC functionality is restricted to devices with support for unified addressing on Linux operating systems.
devPtr
- Device pointer returned by cudaIpcOpenMemHandlecudaMalloc(jcuda.Pointer, long)
,
cudaFree(jcuda.Pointer)
,
cudaIpcGetEventHandle(jcuda.runtime.cudaIpcEventHandle, jcuda.runtime.cudaEvent_t)
,
cudaIpcOpenEventHandle(jcuda.runtime.cudaEvent_t, jcuda.runtime.cudaIpcEventHandle)
,
cudaIpcGetMemHandle(jcuda.runtime.cudaIpcMemHandle, jcuda.Pointer)
,
cudaIpcOpenMemHandle(jcuda.Pointer, jcuda.runtime.cudaIpcMemHandle, int)
@Deprecated public static int cudaThreadExit()
cudaError_t cudaThreadExit ( void )
Exit and clean up from CUDA launches. Deprecated Note that this function is deprecated because its name does not reflect its behavior. Its functionality is identical to the non-deprecated function cudaDeviceReset(), which should be used instead.
Explicitly destroys all cleans up all resources associated with the current device in the current process. Any subsequent API call to this device will reinitialize the device.
Note that this function will reset the device immediately. It is the caller's responsibility to ensure that the device is not being accessed by any other host threads from the process when this function is called.
Note that this function may also return error codes from previous, asynchronous launches.
cudaDeviceReset()
@Deprecated public static int cudaThreadSynchronize()
cudaError_t cudaThreadSynchronize ( void )
Wait for compute device to finish. Deprecated Note that this function is deprecated because its name does not reflect its behavior. Its functionality is similar to the non-deprecated function cudaDeviceSynchronize(), which should be used instead.
Blocks until the device has completed all preceding requested tasks. cudaThreadSynchronize() returns an error if one of the preceding tasks has failed. If the cudaDeviceScheduleBlockingSync flag was set for this device, the host thread will block until the device has finished its work.
Note that this function may also return error codes from previous, asynchronous launches.
cudaDeviceSynchronize()
@Deprecated public static int cudaThreadSetLimit(int limit, long value)
cudaError_t cudaThreadSetLimit ( cudaLimit limit, size_t value )
Set resource limits. Deprecated Note that this function is deprecated because its name does not reflect its behavior. Its functionality is identical to the non-deprecated function cudaDeviceSetLimit(), which should be used instead.
Setting limit to value is a request by the application to update the current limit maintained by the device. The driver is free to modify the requested value to meet h/w requirements (this could be clamping to minimum or maximum values, rounding up to nearest element size, etc). The application can use cudaThreadGetLimit() to find out exactly what the limit has been set to.
Setting each cudaLimit has its own specific restrictions, so each is discussed here.
cudaLimitStackSize controls the stack size of each GPU thread. This limit is only applicable to devices of compute capability 2.0 and higher. Attempting to set this limit on devices of compute capability less than 2.0 will result in the error cudaErrorUnsupportedLimit being returned.
cudaLimitPrintfFifoSize controls the size of the shared FIFO used by the printf() and fprintf() device system calls. Setting cudaLimitPrintfFifoSize must be performed before launching any kernel that uses the printf() or fprintf() device system calls, otherwise cudaErrorInvalidValue will be returned. This limit is only applicable to devices of compute capability 2.0 and higher. Attempting to set this limit on devices of compute capability less than 2.0 will result in the error cudaErrorUnsupportedLimit being returned.
cudaLimitMallocHeapSize controls the size of the heap used by the malloc() and free() device system calls. Setting cudaLimitMallocHeapSize must be performed before launching any kernel that uses the malloc() or free() device system calls, otherwise cudaErrorInvalidValue will be returned. This limit is only applicable to devices of compute capability 2.0 and higher. Attempting to set this limit on devices of compute capability less than 2.0 will result in the error cudaErrorUnsupportedLimit being returned.
Note that this function may also return error codes from previous, asynchronous launches.
limit
- Limit to setvalue
- Size in bytes of limitcudaDeviceSetLimit(int, long)
@Deprecated public static int cudaThreadGetCacheConfig(int[] pCacheConfig)
cudaError_t cudaThreadGetCacheConfig ( cudaFuncCache ** pCacheConfig )
Returns the preferred cache configuration for the current device. Deprecated Note that this function is deprecated because its name does not reflect its behavior. Its functionality is identical to the non-deprecated function cudaDeviceGetCacheConfig(), which should be used instead.
On devices where the L1 cache and shared memory use the same hardware resources, this returns through pCacheConfig the preferred cache configuration for the current device. This is only a preference. The runtime will use the requested configuration if possible, but it is free to choose a different configuration if required to execute functions.
This will return a pCacheConfig of cudaFuncCachePreferNone on devices where the size of the L1 cache and shared memory are fixed.
The supported cache configurations are:
cudaFuncCachePreferNone: no preference for shared memory or L1 (default)
cudaFuncCachePreferShared: prefer larger shared memory and smaller L1 cache
cudaFuncCachePreferL1: prefer larger L1 cache and smaller shared memory
Note that this function may also return error codes from previous, asynchronous launches.
pCacheConfig
- Returned cache configurationcudaDeviceGetCacheConfig(int[])
@Deprecated public static int cudaThreadSetCacheConfig(int cacheConfig)
cudaError_t cudaThreadSetCacheConfig ( cudaFuncCache cacheConfig )
Sets the preferred cache configuration for the current device. Deprecated Note that this function is deprecated because its name does not reflect its behavior. Its functionality is identical to the non-deprecated function cudaDeviceSetCacheConfig(), which should be used instead.
On devices where the L1 cache and shared memory use the same hardware resources, this sets through cacheConfig the preferred cache configuration for the current device. This is only a preference. The runtime will use the requested configuration if possible, but it is free to choose a different configuration if required to execute the function. Any function preference set via cudaDeviceSetCacheConfig ( C API) or cudaDeviceSetCacheConfig ( C++ API) will be preferred over this device-wide setting. Setting the device-wide cache configuration to cudaFuncCachePreferNone will cause subsequent kernel launches to prefer to not change the cache configuration unless required to launch the kernel.
This setting does nothing on devices where the size of the L1 cache and shared memory are fixed.
Launching a kernel with a different preference than the most recent preference setting may insert a device-side synchronization point.
The supported cache configurations are:
cudaFuncCachePreferNone: no preference for shared memory or L1 (default)
cudaFuncCachePreferShared: prefer larger shared memory and smaller L1 cache
cudaFuncCachePreferL1: prefer larger L1 cache and smaller shared memory
Note that this function may also return error codes from previous, asynchronous launches.
cacheConfig
- Requested cache configurationcudaDeviceSetCacheConfig(int)
@Deprecated public static int cudaThreadGetLimit(long[] pValue, int limit)
cudaError_t cudaThreadGetLimit ( size_t* pValue, cudaLimit limit )
Returns resource limits. Deprecated Note that this function is deprecated because its name does not reflect its behavior. Its functionality is identical to the non-deprecated function cudaDeviceGetLimit(), which should be used instead.
Returns in *pValue the current size of limit. The supported cudaLimit values are:
cudaLimitStackSize: stack size of each GPU thread;
cudaLimitPrintfFifoSize: size of the shared FIFO used by the printf() and fprintf() device system calls.
cudaLimitMallocHeapSize: size of the heap used by the malloc() and free() device system calls;
Note that this function may also return error codes from previous, asynchronous launches.
pValue
- Returned size in bytes of limitlimit
- Limit to querycudaDeviceGetLimit(long[], int)
public static int cudaGetSymbolAddress(Pointer devPtr, String symbol)
template < class T > cudaError_t cudaGetSymbolAddress ( void** devPtr, const T& symbol ) [inline]
[C++ API] Finds the address associated with a CUDA symbol Returns in *devPtr the address of symbol symbol on the device. symbol can either be a variable that resides in global or constant memory space. If symbol cannot be found, or if symbol is not declared in the global or constant memory space, *devPtr is unchanged and the error cudaErrorInvalidSymbol is returned.
Note that this function may also return error codes from previous, asynchronous launches.
devPtr
- Return device pointer associated with symbolsymbol
- Device symbol addressdevPtr
- Return device pointer associated with symbolsymbol
- Device symbol referencecudaGetSymbolAddress(jcuda.Pointer, java.lang.String)
,
cudaGetSymbolSize(long[], java.lang.String)
public static int cudaGetSymbolSize(long[] size, String symbol)
template < class T > cudaError_t cudaGetSymbolSize ( size_t* size, const T& symbol ) [inline]
[C++ API] Finds the size of the object associated with a CUDA symbol Returns in *size the size of symbol symbol. symbol must be a variable that resides in global or constant memory space. If symbol cannot be found, or if symbol is not declared in global or constant memory space, *size is unchanged and the error cudaErrorInvalidSymbol is returned.
Note that this function may also return error codes from previous, asynchronous launches.
size
- Size of object associated with symbolsymbol
- Device symbol addresssize
- Size of object associated with symbolsymbol
- Device symbol referencecudaGetSymbolAddress(jcuda.Pointer, java.lang.String)
,
cudaGetSymbolSize(long[], java.lang.String)
public static int cudaMemPrefetchAsync(Pointer devPtr, long count, int dstDevice, cudaStream_t stream)
devPtr
- Pointer to be prefetchedcount
- Size in bytesdstDevice
- Destination device to prefetch tostream
- Stream to enqueue prefetch operationcudaMemcpy(jcuda.Pointer, jcuda.Pointer, long, int)
,
cudaMemcpyPeer(jcuda.Pointer, int, jcuda.Pointer, int, long)
,
cudaMemcpyAsync(jcuda.Pointer, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy3DPeerAsync(jcuda.runtime.cudaMemcpy3DPeerParms, jcuda.runtime.cudaStream_t)
,
cudaMemAdvise(jcuda.Pointer, long, int, int)
public static int cudaMemAdvise(Pointer devPtr, long count, int advice, int device)
devPtr
- Pointer to memory to set the advice forcount
- Size in bytes of the memory rangeadvice
- Advice to be applied for the specified memory rangedevice
- Device to apply the advice forcudaMemcpy(jcuda.Pointer, jcuda.Pointer, long, int)
,
cudaMemcpyPeer(jcuda.Pointer, int, jcuda.Pointer, int, long)
,
cudaMemcpyAsync(jcuda.Pointer, jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemcpy3DPeerAsync(jcuda.runtime.cudaMemcpy3DPeerParms, jcuda.runtime.cudaStream_t)
,
cudaMemPrefetchAsync(jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
public static int cudaMemRangeGetAttribute(Pointer data, long dataSize, int attribute, Pointer devPtr, long count)
data
- A pointers to a memory location where the result of each
attribute query will be written to.dataSize
- Array containing the size of dataattribute
- The cudaMemRangeAttribute
to querydevPtr
- Start of the range to querycount
- Size of the range to querycudaMemRangeGetAttributes(jcuda.Pointer[], long[], int[], long, jcuda.Pointer, long)
,
cudaMemPrefetchAsync(jcuda.Pointer, long, int, jcuda.runtime.cudaStream_t)
,
cudaMemAdvise(jcuda.Pointer, long, int, int)
public static int cudaMemRangeGetAttributes(Pointer[] data, long[] dataSizes, int[] attributes, long numAttributes, Pointer devPtr, long count)
cudaMemRangeGetAttribute(jcuda.Pointer, long, int, jcuda.Pointer, long)
for attribute descriptions and
restrictions.
data
- A two-dimensional array containing pointers to memory
locations where the result of each attribute query will be written
to.dataSizes
- Array containing the sizes of each resultattributes
- An array of cudaMemRangeAttribute
to query
(numAttributes and the number of attributes in this array should
match)numAttributes
- Number of attributes to querydevPtr
- Start of the range to querycount
- Size of the range to querycudaMemRangeGetAttribute(jcuda.Pointer, long, int, jcuda.Pointer, long)
,
JCuda#cudaMemAdvisecudaMemPrefetchAsync
public static int cudaBindTexture(long[] offset, textureReference texref, Pointer devPtr, cudaChannelFormatDesc desc, long size)
template < class T, int dim, enum cudaTextureReadMode readMode > cudaError_t cudaBindTexture ( size_t* offset, const texture < T, dim, readMode > & tex, const void* devPtr, const cudaChannelFormatDesc& desc, size_t size = UINT_MAX ) [inline]
[C++ API] Binds a memory area to a texture Binds size bytes of the memory area pointed to by devPtr to texture reference tex. desc describes how the memory is interpreted when fetching values from the texture. The offset parameter is an optional byte offset as with the low-level cudaBindTexture() function. Any memory previously bound to tex is unbound.
Note that this function may also return error codes from previous, asynchronous launches.
offset
- Offset in bytestexref
- Texture to binddevPtr
- Memory area on devicedesc
- Channel formatsize
- Size of the memory area pointed to by devPtroffset
- Offset in bytestex
- Texture to binddevPtr
- Memory area on devicesize
- Size of the memory area pointed to by devPtroffset
- Offset in bytestex
- Texture to binddevPtr
- Memory area on devicedesc
- Channel formatsize
- Size of the memory area pointed to by devPtrcudaCreateChannelDesc(int, int, int, int, int)
,
cudaGetChannelDesc(jcuda.runtime.cudaChannelFormatDesc, jcuda.runtime.cudaArray)
,
cudaGetTextureReference(jcuda.runtime.textureReference, java.lang.String)
,
cudaBindTexture(long[], jcuda.runtime.textureReference, jcuda.Pointer, jcuda.runtime.cudaChannelFormatDesc, long)
,
cudaBindTexture2D(long[], jcuda.runtime.textureReference, jcuda.Pointer, jcuda.runtime.cudaChannelFormatDesc, long, long, long)
,
cudaBindTextureToArray(jcuda.runtime.textureReference, jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc)
,
cudaUnbindTexture(jcuda.runtime.textureReference)
,
cudaGetTextureAlignmentOffset(long[], jcuda.runtime.textureReference)
public static int cudaBindTexture2D(long[] offset, textureReference texref, Pointer devPtr, cudaChannelFormatDesc desc, long width, long height, long pitch)
template < class T, int dim, enum cudaTextureReadMode readMode > cudaError_t cudaBindTexture2D ( size_t* offset, const texture < T, dim, readMode > & tex, const void* devPtr, const cudaChannelFormatDesc& desc, size_t width, size_t height, size_t pitch ) [inline]
[C++ API] Binds a 2D memory area to a texture Binds the 2D memory area pointed to by devPtr to the texture reference tex. The size of the area is constrained by width in texel units, height in texel units, and pitch in byte units. desc describes how the memory is interpreted when fetching values from the texture. Any memory previously bound to tex is unbound.
Since the hardware enforces an alignment requirement on texture base addresses, cudaBindTexture2D() returns in *offset a byte offset that must be applied to texture fetches in order to read from the desired memory. This offset must be divided by the texel size and passed to kernels that read from the texture so they can be applied to the tex2D() function. If the device memory pointer was returned from cudaMalloc(), the offset is guaranteed to be 0 and NULL may be passed as the offset parameter.
Note that this function may also return error codes from previous, asynchronous launches.
offset
- Offset in bytestexref
- Texture reference to binddevPtr
- 2D memory area on devicedesc
- Channel formatwidth
- Width in texel unitsheight
- Height in texel unitspitch
- Pitch in bytesoffset
- Offset in bytestex
- Texture reference to binddevPtr
- 2D memory area on devicewidth
- Width in texel unitsheight
- Height in texel unitspitch
- Pitch in bytesoffset
- Offset in bytestex
- Texture reference to binddevPtr
- 2D memory area on devicedesc
- Channel formatwidth
- Width in texel unitsheight
- Height in texel unitspitch
- Pitch in bytescudaCreateChannelDesc(int, int, int, int, int)
,
cudaGetChannelDesc(jcuda.runtime.cudaChannelFormatDesc, jcuda.runtime.cudaArray)
,
cudaGetTextureReference(jcuda.runtime.textureReference, java.lang.String)
,
cudaBindTexture(long[], jcuda.runtime.textureReference, jcuda.Pointer, jcuda.runtime.cudaChannelFormatDesc, long)
,
cudaBindTexture2D(long[], jcuda.runtime.textureReference, jcuda.Pointer, jcuda.runtime.cudaChannelFormatDesc, long, long, long)
,
cudaBindTextureToArray(jcuda.runtime.textureReference, jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc)
,
cudaUnbindTexture(jcuda.runtime.textureReference)
,
cudaGetTextureAlignmentOffset(long[], jcuda.runtime.textureReference)
public static int cudaBindTextureToArray(textureReference texref, cudaArray array, cudaChannelFormatDesc desc)
template < class T, int dim, enum cudaTextureReadMode readMode > cudaError_t cudaBindTextureToArray ( const texture < T, dim, readMode > & tex, cudaArray_const_t array, const cudaChannelFormatDesc& desc ) [inline]
[C++ API] Binds an array to a texture Binds the CUDA array array to the texture reference tex. desc describes how the memory is interpreted when fetching values from the texture. Any CUDA array previously bound to tex is unbound.
Note that this function may also return error codes from previous, asynchronous launches.
texref
- Texture to bindarray
- Memory array on devicedesc
- Channel formattex
- Texture to bindarray
- Memory array on devicetex
- Texture to bindarray
- Memory array on devicedesc
- Channel formatcudaCreateChannelDesc(int, int, int, int, int)
,
cudaGetChannelDesc(jcuda.runtime.cudaChannelFormatDesc, jcuda.runtime.cudaArray)
,
cudaGetTextureReference(jcuda.runtime.textureReference, java.lang.String)
,
cudaBindTexture(long[], jcuda.runtime.textureReference, jcuda.Pointer, jcuda.runtime.cudaChannelFormatDesc, long)
,
cudaBindTexture2D(long[], jcuda.runtime.textureReference, jcuda.Pointer, jcuda.runtime.cudaChannelFormatDesc, long, long, long)
,
cudaBindTextureToArray(jcuda.runtime.textureReference, jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc)
,
cudaUnbindTexture(jcuda.runtime.textureReference)
,
cudaGetTextureAlignmentOffset(long[], jcuda.runtime.textureReference)
public static int cudaBindTextureToMipmappedArray(textureReference texref, cudaMipmappedArray mipmappedArray, cudaChannelFormatDesc desc)
template < class T, int dim, enum cudaTextureReadMode readMode > cudaError_t cudaBindTextureToMipmappedArray ( const texture < T, dim, readMode > & tex, cudaMipmappedArray_const_t mipmappedArray, const cudaChannelFormatDesc& desc ) [inline]
[C++ API] Binds a mipmapped array to a texture Binds the CUDA mipmapped array mipmappedArray to the texture reference tex. desc describes how the memory is interpreted when fetching values from the texture. Any CUDA mipmapped array previously bound to tex is unbound.
Note that this function may also return error codes from previous, asynchronous launches.
texref
- Texture to bindmipmappedArray
- Memory mipmapped array on devicedesc
- Channel formattex
- Texture to bindmipmappedArray
- Memory mipmapped array on devicetex
- Texture to bindmipmappedArray
- Memory mipmapped array on devicedesc
- Channel formatcudaCreateChannelDesc(int, int, int, int, int)
,
cudaGetChannelDesc(jcuda.runtime.cudaChannelFormatDesc, jcuda.runtime.cudaArray)
,
cudaGetTextureReference(jcuda.runtime.textureReference, java.lang.String)
,
cudaBindTexture(long[], jcuda.runtime.textureReference, jcuda.Pointer, jcuda.runtime.cudaChannelFormatDesc, long)
,
cudaBindTexture2D(long[], jcuda.runtime.textureReference, jcuda.Pointer, jcuda.runtime.cudaChannelFormatDesc, long, long, long)
,
cudaBindTextureToArray(jcuda.runtime.textureReference, jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc)
,
cudaUnbindTexture(jcuda.runtime.textureReference)
,
cudaGetTextureAlignmentOffset(long[], jcuda.runtime.textureReference)
public static int cudaUnbindTexture(textureReference texref)
template < class T, int dim, enum cudaTextureReadMode readMode > cudaError_t cudaUnbindTexture ( const texture < T, dim, readMode > & tex ) [inline]
[C++ API] Unbinds a texture Unbinds the texture bound to tex.
Note that this function may also return error codes from previous, asynchronous launches.
texref
- Texture to unbindtex
- Texture to unbindcudaCreateChannelDesc(int, int, int, int, int)
,
cudaGetChannelDesc(jcuda.runtime.cudaChannelFormatDesc, jcuda.runtime.cudaArray)
,
cudaGetTextureReference(jcuda.runtime.textureReference, java.lang.String)
,
cudaBindTexture(long[], jcuda.runtime.textureReference, jcuda.Pointer, jcuda.runtime.cudaChannelFormatDesc, long)
,
cudaBindTexture2D(long[], jcuda.runtime.textureReference, jcuda.Pointer, jcuda.runtime.cudaChannelFormatDesc, long, long, long)
,
cudaBindTextureToArray(jcuda.runtime.textureReference, jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc)
,
cudaUnbindTexture(jcuda.runtime.textureReference)
,
cudaGetTextureAlignmentOffset(long[], jcuda.runtime.textureReference)
public static int cudaGetTextureAlignmentOffset(long[] offset, textureReference texref)
template < class T, int dim, enum cudaTextureReadMode readMode > cudaError_t cudaGetTextureAlignmentOffset ( size_t* offset, const texture < T, dim, readMode > & tex ) [inline]
[C++ API] Get the alignment offset of a texture Returns in *offset the offset that was returned when texture reference tex was bound.
Note that this function may also return error codes from previous, asynchronous launches.
offset
- Offset of texture reference in bytestexref
- Texture to get offset ofoffset
- Offset of texture reference in bytestex
- Texture to get offset ofcudaCreateChannelDesc(int, int, int, int, int)
,
cudaGetChannelDesc(jcuda.runtime.cudaChannelFormatDesc, jcuda.runtime.cudaArray)
,
cudaGetTextureReference(jcuda.runtime.textureReference, java.lang.String)
,
cudaBindTexture(long[], jcuda.runtime.textureReference, jcuda.Pointer, jcuda.runtime.cudaChannelFormatDesc, long)
,
cudaBindTexture2D(long[], jcuda.runtime.textureReference, jcuda.Pointer, jcuda.runtime.cudaChannelFormatDesc, long, long, long)
,
cudaBindTextureToArray(jcuda.runtime.textureReference, jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc)
,
cudaUnbindTexture(jcuda.runtime.textureReference)
,
cudaGetTextureAlignmentOffset(long[], jcuda.runtime.textureReference)
public static int cudaGetTextureReference(textureReference texref, String symbol)
cudaError_t cudaGetTextureReference ( const textureReference** texref, const void* symbol )
Get the texture reference associated with a symbol. Returns in *texref the structure associated to the texture reference defined by symbol symbol.
Note that this function may also return error codes from previous, asynchronous launches.
Use of a string naming a variable as the symbol paramater was removed in CUDA 5.0.
texref
- Texture reference associated with symbolsymbol
- Texture to get reference forcudaCreateChannelDesc(int, int, int, int, int)
,
cudaGetChannelDesc(jcuda.runtime.cudaChannelFormatDesc, jcuda.runtime.cudaArray)
,
cudaGetTextureAlignmentOffset(long[], jcuda.runtime.textureReference)
,
cudaBindTexture(long[], jcuda.runtime.textureReference, jcuda.Pointer, jcuda.runtime.cudaChannelFormatDesc, long)
,
cudaBindTexture2D(long[], jcuda.runtime.textureReference, jcuda.Pointer, jcuda.runtime.cudaChannelFormatDesc, long, long, long)
,
cudaBindTextureToArray(jcuda.runtime.textureReference, jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc)
,
cudaUnbindTexture(jcuda.runtime.textureReference)
public static int cudaBindSurfaceToArray(surfaceReference surfref, cudaArray array, cudaChannelFormatDesc desc)
template < class T, int dim > cudaError_t cudaBindSurfaceToArray ( const surface < T, dim > & surf, cudaArray_const_t array, const cudaChannelFormatDesc& desc ) [inline]
[C++ API] Binds an array to a surface Binds the CUDA array array to the surface reference surf. desc describes how the memory is interpreted when dealing with the surface. Any CUDA array previously bound to surf is unbound.
Note that this function may also return error codes from previous, asynchronous launches.
surfref
- Surface to bindarray
- Memory array on devicedesc
- Channel formatsurf
- Surface to bindarray
- Memory array on devicesurf
- Surface to bindarray
- Memory array on devicedesc
- Channel formatcudaBindSurfaceToArray(jcuda.runtime.surfaceReference, jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc)
,
cudaBindSurfaceToArray(jcuda.runtime.surfaceReference, jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc)
public static int cudaGetSurfaceReference(surfaceReference surfref, String symbol)
cudaError_t cudaGetSurfaceReference ( const surfaceReference** surfref, const void* symbol )
Get the surface reference associated with a symbol. Returns in *surfref the structure associated to the surface reference defined by symbol symbol.
Note that this function may also return error codes from previous, asynchronous launches.
Use of a string naming a variable as the symbol paramater was removed in CUDA 5.0.
surfref
- Surface reference associated with symbolsymbol
- Surface to get reference forcudaBindSurfaceToArray(jcuda.runtime.surfaceReference, jcuda.runtime.cudaArray, jcuda.runtime.cudaChannelFormatDesc)
public static int cudaCreateTextureObject(cudaTextureObject pTexObject, cudaResourceDesc pResDesc, cudaTextureDesc pTexDesc, cudaResourceViewDesc pResViewDesc)
cudaError_t cudaCreateTextureObject ( cudaTextureObject_t* pTexObject, const cudaResourceDesc* pResDesc, const cudaTextureDesc* pTexDesc, const cudaResourceViewDesc* pResViewDesc )
Creates a texture object. Creates a texture object and returns it in pTexObject. pResDesc describes the data to texture from. pTexDesc describes how the data should be sampled. pResViewDesc is an optional argument that specifies an alternate format for the data described by pResDesc, and also describes the subresource region to restrict access to when texturing. pResViewDesc can only be specified if the type of resource is a CUDA array or a CUDA mipmapped array.
Texture objects are only supported on devices of compute capability 3.0 or higher.
The cudaResourceDesc structure is defined as:
struct cudaResourceDesc { enum cudaResourceType resType; union { struct { cudaArray_t array; } array; struct { cudaMipmappedArray_t mipmap; } mipmap; struct { void *devPtr; struct cudaChannelFormatDesc desc; size_t sizeInBytes; } linear; struct { void *devPtr; struct cudaChannelFormatDesc desc; size_t width; size_t height; size_t pitchInBytes; } pitch2D; } res; };where:
enum cudaResourceType { cudaResourceTypeArray = 0x00, cudaResourceTypeMipmappedArray = 0x01, cudaResourceTypeLinear = 0x02, cudaResourceTypePitch2D = 0x03 };
If cudaResourceDesc::resType is set to cudaResourceTypeArray, cudaResourceDesc::res::array::array must be set to a valid CUDA array handle.
If cudaResourceDesc::resType is set to cudaResourceTypeMipmappedArray, cudaResourceDesc::res::mipmap::mipmap must be set to a valid CUDA mipmapped array handle.
If cudaResourceDesc::resType is set to cudaResourceTypeLinear, cudaResourceDesc::res::linear::devPtr must be set to a valid device pointer, that is aligned to cudaDeviceProp::textureAlignment. cudaResourceDesc::res::linear::desc describes the format and the number of components per array element. cudaResourceDesc::res::linear::sizeInBytes specifies the size of the array in bytes. The total number of elements in the linear address range cannot exceed cudaDeviceProp::maxTexture1DLinear. The number of elements is computed as (sizeInBytes / sizeof(desc)).
If cudaResourceDesc::resType is set to cudaResourceTypePitch2D, cudaResourceDesc::res::pitch2D::devPtr must be set to a valid device pointer, that is aligned to cudaDeviceProp::textureAlignment. cudaResourceDesc::res::pitch2D::desc describes the format and the number of components per array element. cudaResourceDesc::res::pitch2D::width and cudaResourceDesc::res::pitch2D::height specify the width and height of the array in elements, and cannot exceed cudaDeviceProp::maxTexture2DLinear[0] and cudaDeviceProp::maxTexture2DLinear[1] respectively. cudaResourceDesc::res::pitch2D::pitchInBytes specifies the pitch between two rows in bytes and has to be aligned to cudaDeviceProp::texturePitchAlignment. Pitch cannot exceed cudaDeviceProp::maxTexture2DLinear[2].
The cudaTextureDesc struct is defined as
struct cudaTextureDesc { enum cudaTextureAddressMode addressMode[3]; enum cudaTextureFilterMode filterMode; enum cudaTextureReadMode readMode; int sRGB; int normalizedCoords; unsigned int maxAnisotropy; enum cudaTextureFilterMode mipmapFilterMode; float mipmapLevelBias; float minMipmapLevelClamp; float maxMipmapLevelClamp; };where
enum cudaTextureAddressMode { cudaAddressModeWrap = 0, cudaAddressModeClamp = 1, cudaAddressModeMirror = 2, cudaAddressModeBorder = 3 };This is ignored if cudaResourceDesc::resType is cudaResourceTypeLinear. Also, if cudaTextureDesc::normalizedCoords is set to zero, the only supported address mode is cudaAddressModeClamp.
enum cudaTextureFilterMode { cudaFilterModePoint = 0, cudaFilterModeLinear = 1 };This is ignored if cudaResourceDesc::resType is cudaResourceTypeLinear.
enum cudaTextureReadMode { cudaReadModeElementType = 0, cudaReadModeNormalizedFloat = 1 };Note that this applies only to 8-bit and 16-bit integer formats. 32-bit integer format would not be promoted, regardless of whether or not this cudaTextureDesc::readMode is set cudaReadModeNormalizedFloat is specified.
cudaTextureDesc::sRGB specifies whether sRGB to linear conversion should be performed during texture fetch.
cudaTextureDesc::normalizedCoords specifies whether the texture coordinates will be normalized or not.
cudaTextureDesc::maxAnisotropy specifies the maximum anistropy ratio to be used when doing anisotropic filtering. This value will be clamped to the range [1,16].
cudaTextureDesc::mipmapFilterMode specifies the filter mode when the calculated mipmap level lies between two defined mipmap levels.
cudaTextureDesc::mipmapLevelBias specifies the offset to be applied to the calculated mipmap level.
cudaTextureDesc::minMipmapLevelClamp specifies the lower end of the mipmap level range to clamp access to.
cudaTextureDesc::maxMipmapLevelClamp specifies the upper end of the mipmap level range to clamp access to.
The cudaResourceViewDesc struct is defined as
struct cudaResourceViewDesc { enum cudaResourceViewFormat format; size_t width; size_t height; size_t depth; unsigned int firstMipmapLevel; unsigned int lastMipmapLevel; unsigned int firstLayer; unsigned int lastLayer; };where:
cudaResourceViewDesc::format specifies how the data contained in the CUDA array or CUDA mipmapped array should be interpreted. Note that this can incur a change in size of the texture data. If the resource view format is a block compressed format, then the underlying CUDA array or CUDA mipmapped array has to have a 32-bit unsigned integer format with 2 or 4 channels, depending on the block compressed format. For ex., BC1 and BC4 require the underlying CUDA array to have a 32-bit unsigned int with 2 channels. The other BC formats require the underlying resource to have the same 32-bit unsigned int format but with 4 channels.
cudaResourceViewDesc::width specifies the new width of the texture data. If the resource view format is a block compressed format, this value has to be 4 times the original width of the resource. For non block compressed formats, this value has to be equal to that of the original resource.
cudaResourceViewDesc::height specifies the new height of the texture data. If the resource view format is a block compressed format, this value has to be 4 times the original height of the resource. For non block compressed formats, this value has to be equal to that of the original resource.
cudaResourceViewDesc::depth specifies the new depth of the texture data. This value has to be equal to that of the original resource.
cudaResourceViewDesc::firstMipmapLevel specifies the most detailed mipmap level. This will be the new mipmap level zero. For non-mipmapped resources, this value has to be zero.cudaTextureDesc::minMipmapLevelClamp and cudaTextureDesc::maxMipmapLevelClamp will be relative to this value. For ex., if the firstMipmapLevel is set to 2, and a minMipmapLevelClamp of 1.2 is specified, then the actual minimum mipmap level clamp will be 3.2.
cudaResourceViewDesc::lastMipmapLevel specifies the least detailed mipmap level. For non-mipmapped resources, this value has to be zero.
cudaResourceViewDesc::firstLayer specifies the first layer index for layered textures. This will be the new layer zero. For non-layered resources, this value has to be zero.
cudaResourceViewDesc::lastLayer specifies the last layer index for layered textures. For non-layered resources, this value has to be zero.
pTexObject
- Texture object to createpResDesc
- Resource descriptorpTexDesc
- Texture descriptorpResViewDesc
- Resource view descriptorcudaDestroyTextureObject(jcuda.runtime.cudaTextureObject)
public static int cudaDestroyTextureObject(cudaTextureObject texObject)
cudaError_t cudaDestroyTextureObject ( cudaTextureObject_t texObject )
Destroys a texture object. Destroys the texture object specified by texObject.
texObject
- Texture object to destroycudaCreateTextureObject(jcuda.runtime.cudaTextureObject, jcuda.runtime.cudaResourceDesc, jcuda.runtime.cudaTextureDesc, jcuda.runtime.cudaResourceViewDesc)
public static int cudaGetTextureObjectResourceDesc(cudaResourceDesc pResDesc, cudaTextureObject texObject)
cudaError_t cudaGetTextureObjectResourceDesc ( cudaResourceDesc* pResDesc, cudaTextureObject_t texObject )
Returns a texture object's resource descriptor. Returns the resource descriptor for the texture object specified by texObject.
pResDesc
- Resource descriptortexObject
- Texture objectcudaCreateTextureObject(jcuda.runtime.cudaTextureObject, jcuda.runtime.cudaResourceDesc, jcuda.runtime.cudaTextureDesc, jcuda.runtime.cudaResourceViewDesc)
public static int cudaGetTextureObjectTextureDesc(cudaTextureDesc pTexDesc, cudaTextureObject texObject)
cudaError_t cudaGetTextureObjectTextureDesc ( cudaTextureDesc* pTexDesc, cudaTextureObject_t texObject )
Returns a texture object's texture descriptor. Returns the texture descriptor for the texture object specified by texObject.
pTexDesc
- Texture descriptortexObject
- Texture objectcudaCreateTextureObject(jcuda.runtime.cudaTextureObject, jcuda.runtime.cudaResourceDesc, jcuda.runtime.cudaTextureDesc, jcuda.runtime.cudaResourceViewDesc)
public static int cudaGetTextureObjectResourceViewDesc(cudaResourceViewDesc pResViewDesc, cudaTextureObject texObject)
cudaError_t cudaGetTextureObjectResourceViewDesc ( cudaResourceViewDesc* pResViewDesc, cudaTextureObject_t texObject )
Returns a texture object's resource view descriptor. Returns the resource view descriptor for the texture object specified by texObject. If no resource view was specified, cudaErrorInvalidValue is returned.
pResViewDesc
- Resource view descriptortexObject
- Texture objectcudaCreateTextureObject(jcuda.runtime.cudaTextureObject, jcuda.runtime.cudaResourceDesc, jcuda.runtime.cudaTextureDesc, jcuda.runtime.cudaResourceViewDesc)
public static int cudaCreateSurfaceObject(cudaSurfaceObject pSurfObject, cudaResourceDesc pResDesc)
cudaError_t cudaCreateSurfaceObject ( cudaSurfaceObject_t* pSurfObject, const cudaResourceDesc* pResDesc )
Creates a surface object. Creates a surface object and returns it in pSurfObject. pResDesc describes the data to perform surface load/stores on. cudaResourceDesc::resType must be cudaResourceTypeArray and cudaResourceDesc::res::array::array must be set to a valid CUDA array handle.
Surface objects are only supported on devices of compute capability 3.0 or higher.
pSurfObject
- Surface object to createpResDesc
- Resource descriptorcudaDestroySurfaceObject(jcuda.runtime.cudaSurfaceObject)
public static int cudaDestroySurfaceObject(cudaSurfaceObject surfObject)
cudaError_t cudaDestroySurfaceObject ( cudaSurfaceObject_t surfObject )
Destroys a surface object. Destroys the surface object specified by surfObject.
surfObject
- Surface object to destroycudaCreateSurfaceObject(jcuda.runtime.cudaSurfaceObject, jcuda.runtime.cudaResourceDesc)
public static int cudaGetSurfaceObjectResourceDesc(cudaResourceDesc pResDesc, cudaSurfaceObject surfObject)
cudaError_t cudaGetSurfaceObjectResourceDesc ( cudaResourceDesc* pResDesc, cudaSurfaceObject_t surfObject )
Returns a surface object's resource descriptor Returns the resource descriptor for the surface object specified by surfObject.
pResDesc
- Resource descriptorsurfObject
- Surface objectcudaCreateSurfaceObject(jcuda.runtime.cudaSurfaceObject, jcuda.runtime.cudaResourceDesc)
public static int cudaLaunchHostFunc(cudaStream_t stream, cudaHostFn fn, Object userData)
hStream
- - Stream to enqueue function call infn
- - The function to call once preceding stream operations are completeuserData
- - User-specified data to be passed to the functionJCuda#cudaStreamQuery
JCuda#cudaStreamSynchronize
JCuda#cudaStreamWaitEvent
JCuda#cudaStreamDestroy
JCuda#cudaMallocManaged
JCuda#cudaStreamAttachMemAsync
JCuda#cudaStreamAddCallback
JCuda#cuLaunchHostFunc
@Deprecated public static int cudaConfigureCall(dim3 gridDim, dim3 blockDim, long sharedMem, cudaStream_t stream)
cudaError_t cudaConfigureCall ( dim3 gridDim, dim3 blockDim, size_t sharedMem = 0, cudaStream_t stream = 0 )
Configure a device-launch. Specifies the grid and block dimensions for the device call to be executed similar to the execution configuration syntax. cudaConfigureCall() is stack based. Each call pushes data on top of an execution stack. This data contains the dimension for the grid and thread blocks, together with any arguments for the call.
Note that this function may also return error codes from previous, asynchronous launches.
gridDim
- Grid dimensionsblockDim
- Block dimensionssharedMem
- Shared memorystream
- Stream identifiercudaDeviceSetCacheConfig(int)
,
cudaFuncGetAttributes(jcuda.runtime.cudaFuncAttributes, java.lang.String)
,
cudaLaunch(java.lang.String)
,
JCuda#cudaSetDoubleForDevice
,
JCuda#cudaSetDoubleForHost
,
cudaSetupArgument(jcuda.Pointer, long, long)
@Deprecated public static int cudaSetupArgument(Pointer arg, long size, long offset)
template < class T > cudaError_t cudaSetupArgument ( T arg, size_t offset ) [inline]
[C++ API] Configure a device launch Pushes size bytes of the argument pointed to by arg at offset bytes from the start of the parameter passing area, which starts at offset 0. The arguments are stored in the top of the execution stack. cudaSetupArgument() must be preceded by a call to cudaConfigureCall().
Note that this function may also return error codes from previous, asynchronous launches.
arg
- Argument to push for a kernel launchsize
- Size of argumentoffset
- Offset in argument stack to push new argarg
- Argument to push for a kernel launchoffset
- Offset in argument stack to push new argcudaConfigureCall(jcuda.runtime.dim3, jcuda.runtime.dim3, long, jcuda.runtime.cudaStream_t)
,
cudaFuncGetAttributes(jcuda.runtime.cudaFuncAttributes, java.lang.String)
,
cudaLaunch(java.lang.String)
,
JCuda#cudaSetDoubleForDevice
,
JCuda#cudaSetDoubleForHost
,
cudaSetupArgument(jcuda.Pointer, long, long)
public static int cudaFuncGetAttributes(cudaFuncAttributes attr, String func)
template < class T > cudaError_t cudaFuncGetAttributes ( cudaFuncAttributes* attr, T* entry ) [inline]
[C++ API] Find out attributes for a given function This function obtains the attributes of a function specified via entry. The parameter entry must be a pointer to a function that executes on the device. The parameter specified by entry must be declared as a __global__ function. The fetched attributes are placed in attr. If the specified function does not exist, then cudaErrorInvalidDeviceFunction is returned.
Note that some function attributes such as maxThreadsPerBlock may vary based on the device that is currently being used.
Note that this function may also return error codes from previous, asynchronous launches.
attr
- Return pointer to function's attributesfunc
- Device function symbolattr
- Return pointer to function's attributesentry
- Function to get attributes ofcudaConfigureCall(jcuda.runtime.dim3, jcuda.runtime.dim3, long, jcuda.runtime.cudaStream_t)
,
cudaDeviceSetCacheConfig(int)
,
cudaFuncGetAttributes(jcuda.runtime.cudaFuncAttributes, java.lang.String)
,
cudaLaunch(java.lang.String)
,
JCuda#cudaSetDoubleForDevice
,
JCuda#cudaSetDoubleForHost
,
cudaSetupArgument(jcuda.Pointer, long, long)
public static int cudaLaunch(String symbol)
template < class T > cudaError_t cudaLaunch ( T* func ) [inline]
[C++ API] Launches a device function Launches the function entry on the device. The parameter entry must be a function that executes on the device. The parameter specified by entry must be declared as a __global__ function. cudaLaunch() must be preceded by a call to cudaConfigureCall() since it pops the data that was pushed by cudaConfigureCall() from the execution stack.
Note that this function may also return error codes from previous, asynchronous launches.
func
- Device function symbolcudaConfigureCall(jcuda.runtime.dim3, jcuda.runtime.dim3, long, jcuda.runtime.cudaStream_t)
,
cudaDeviceSetCacheConfig(int)
,
cudaFuncGetAttributes(jcuda.runtime.cudaFuncAttributes, java.lang.String)
,
cudaLaunch(java.lang.String)
,
JCuda#cudaSetDoubleForDevice
,
JCuda#cudaSetDoubleForHost
,
cudaSetupArgument(jcuda.Pointer, long, long)
,
cudaThreadGetCacheConfig(int[])
,
cudaThreadSetCacheConfig(int)
@Deprecated public static int cudaGLSetGLDevice(int device)
cudaError_t cudaGLSetGLDevice ( int device )
Sets a CUDA device to use OpenGL interoperability. DeprecatedThis function is deprecated as of CUDA 5.0.This function is deprecated and should no longer be used. It is no longer necessary to associate a CUDA device with an OpenGL context in order to achieve maximum interoperability performance.
Note that this function may also return error codes from previous, asynchronous launches.
device
- Device to use for OpenGL interoperabilitycudaGraphicsGLRegisterBuffer(jcuda.runtime.cudaGraphicsResource, int, int)
,
cudaGraphicsGLRegisterImage(jcuda.runtime.cudaGraphicsResource, int, int, int)
public static int cudaGLGetDevices(int[] pCudaDeviceCount, int[] pCudaDevices, int cudaDeviceCount, int cudaGLDeviceList_deviceList)
cudaError_t cudaGLGetDevices ( unsigned int* pCudaDeviceCount, int* pCudaDevices, unsigned int cudaDeviceCount, cudaGLDeviceList deviceList )
Gets the CUDA devices associated with the current OpenGL context. Returns in *pCudaDeviceCount the number of CUDA-compatible devices corresponding to the current OpenGL context. Also returns in *pCudaDevices at most cudaDeviceCount of the CUDA-compatible devices corresponding to the current OpenGL context. If any of the GPUs being used by the current OpenGL context are not CUDA capable then the call will return cudaErrorNoDevice.
Note that this function may also return error codes from previous, asynchronous launches.
pCudaDeviceCount
- Returned number of CUDA devices corresponding to the current OpenGL contextpCudaDevices
- Returned CUDA devices corresponding to the current OpenGL contextcudaDeviceCount
- The size of the output device array pCudaDevicesdeviceList
- The set of devices to return. This set may be cudaGLDeviceListAll for all devices, cudaGLDeviceListCurrentFrame for the devices used to render the current frame (in SLI), or cudaGLDeviceListNextFrame for the devices used to render the next frame (in SLI).cudaGraphicsUnregisterResource(jcuda.runtime.cudaGraphicsResource)
,
cudaGraphicsMapResources(int, jcuda.runtime.cudaGraphicsResource[], jcuda.runtime.cudaStream_t)
,
cudaGraphicsSubResourceGetMappedArray(jcuda.runtime.cudaArray, jcuda.runtime.cudaGraphicsResource, int, int)
,
cudaGraphicsResourceGetMappedPointer(jcuda.Pointer, long[], jcuda.runtime.cudaGraphicsResource)
public static int cudaGraphicsGLRegisterImage(cudaGraphicsResource resource, int image, int target, int Flags)
cudaError_t cudaGraphicsGLRegisterImage ( cudaGraphicsResource** resource, GLuint image, GLenum target, unsigned int flags )
Register an OpenGL texture or renderbuffer object. Registers the texture or renderbuffer object specified by image for access by CUDA. A handle to the registered object is returned as resource.
target must match the type of the object, and must be one of GL_TEXTURE_2D, GL_TEXTURE_RECTANGLE, GL_TEXTURE_CUBE_MAP, GL_TEXTURE_3D, GL_TEXTURE_2D_ARRAY, or GL_RENDERBUFFER.
The register flags flags specify the intended usage, as follows:
cudaGraphicsRegisterFlagsNone: Specifies no hints about how this resource will be used. It is therefore assumed that this resource will be read from and written to by CUDA. This is the default value.
cudaGraphicsRegisterFlagsReadOnly: Specifies that CUDA will not write to this resource.
cudaGraphicsRegisterFlagsWriteDiscard: Specifies that CUDA will not read from this resource and will write over the entire contents of the resource, so none of the data previously stored in the resource will be preserved.
cudaGraphicsRegisterFlagsSurfaceLoadStore: Specifies that CUDA will bind this resource to a surface reference.
cudaGraphicsRegisterFlagsTextureGather: Specifies that CUDA will perform texture gather operations on this resource.
The following image formats are supported. For brevity's sake, the list is abbreviated. For ex., {GL_R, GL_RG} X {8, 16} would expand to the following 4 formats {GL_R8, GL_R16, GL_RG8, GL_RG16} :
GL_RED, GL_RG, GL_RGBA, GL_LUMINANCE, GL_ALPHA, GL_LUMINANCE_ALPHA, GL_INTENSITY
{GL_R, GL_RG, GL_RGBA} X {8, 16, 16F, 32F, 8UI, 16UI, 32UI, 8I, 16I, 32I}
{GL_LUMINANCE, GL_ALPHA, GL_LUMINANCE_ALPHA, GL_INTENSITY} X {8, 16, 16F_ARB, 32F_ARB, 8UI_EXT, 16UI_EXT, 32UI_EXT, 8I_EXT, 16I_EXT, 32I_EXT}
The following image classes are currently disallowed:
Textures with borders
Multisampled renderbuffers
Note that this function may also return error codes from previous, asynchronous launches.
resource
- Pointer to the returned object handleimage
- name of texture or renderbuffer object to be registeredtarget
- Identifies the type of object specified by imageflags
- Register flagscudaGraphicsUnregisterResource(jcuda.runtime.cudaGraphicsResource)
,
cudaGraphicsMapResources(int, jcuda.runtime.cudaGraphicsResource[], jcuda.runtime.cudaStream_t)
,
cudaGraphicsSubResourceGetMappedArray(jcuda.runtime.cudaArray, jcuda.runtime.cudaGraphicsResource, int, int)
public static int cudaGraphicsGLRegisterBuffer(cudaGraphicsResource resource, int buffer, int Flags)
cudaError_t cudaGraphicsGLRegisterBuffer ( cudaGraphicsResource** resource, GLuint buffer, unsigned int flags )
Registers an OpenGL buffer object. Registers the buffer object specified by buffer for access by CUDA. A handle to the registered object is returned as resource. The register flags flags specify the intended usage, as follows:
cudaGraphicsRegisterFlagsNone: Specifies no hints about how this resource will be used. It is therefore assumed that this resource will be read from and written to by CUDA. This is the default value.
cudaGraphicsRegisterFlagsReadOnly: Specifies that CUDA will not write to this resource.
cudaGraphicsRegisterFlagsWriteDiscard: Specifies that CUDA will not read from this resource and will write over the entire contents of the resource, so none of the data previously stored in the resource will be preserved.
Note that this function may also return error codes from previous, asynchronous launches.
resource
- Pointer to the returned object handlebuffer
- name of buffer object to be registeredflags
- Register flagscudaGraphicsUnregisterResource(jcuda.runtime.cudaGraphicsResource)
,
cudaGraphicsMapResources(int, jcuda.runtime.cudaGraphicsResource[], jcuda.runtime.cudaStream_t)
,
cudaGraphicsResourceGetMappedPointer(jcuda.Pointer, long[], jcuda.runtime.cudaGraphicsResource)
@Deprecated public static int cudaGLRegisterBufferObject(int bufObj)
cudaError_t cudaGLRegisterBufferObject ( GLuint bufObj )
Registers a buffer object for access by CUDA. DeprecatedThis function is deprecated as of CUDA 3.0.Registers the buffer object of ID bufObj for access by CUDA. This function must be called before CUDA can map the buffer object. The OpenGL context used to create the buffer, or another context from the same share group, must be bound to the current thread when this is called.
Note that this function may also return error codes from previous, asynchronous launches.
bufObj
- Buffer object ID to registercudaGraphicsGLRegisterBuffer(jcuda.runtime.cudaGraphicsResource, int, int)
@Deprecated public static int cudaGLMapBufferObject(Pointer devPtr, int bufObj)
cudaError_t cudaGLMapBufferObject ( void** devPtr, GLuint bufObj )
Maps a buffer object for access by CUDA. DeprecatedThis function is deprecated as of CUDA 3.0.Maps the buffer object of ID bufObj into the address space of CUDA and returns in *devPtr the base pointer of the resulting mapping. The buffer must have previously been registered by calling cudaGLRegisterBufferObject(). While a buffer is mapped by CUDA, any OpenGL operation which references the buffer will result in undefined behavior. The OpenGL context used to create the buffer, or another context from the same share group, must be bound to the current thread when this is called.
All streams in the current thread are synchronized with the current GL context.
Note that this function may also return error codes from previous, asynchronous launches.
devPtr
- Returned device pointer to CUDA objectbufObj
- Buffer object ID to mapcudaGraphicsMapResources(int, jcuda.runtime.cudaGraphicsResource[], jcuda.runtime.cudaStream_t)
@Deprecated public static int cudaGLUnmapBufferObject(int bufObj)
cudaError_t cudaGLUnmapBufferObject ( GLuint bufObj )
Unmaps a buffer object for access by CUDA. DeprecatedThis function is deprecated as of CUDA 3.0.Unmaps the buffer object of ID bufObj for access by CUDA. When a buffer is unmapped, the base address returned by cudaGLMapBufferObject() is invalid and subsequent references to the address result in undefined behavior. The OpenGL context used to create the buffer, or another context from the same share group, must be bound to the current thread when this is called.
All streams in the current thread are synchronized with the current GL context.
Note that this function may also return error codes from previous, asynchronous launches.
bufObj
- Buffer object to unmapcudaGraphicsUnmapResources(int, jcuda.runtime.cudaGraphicsResource[], jcuda.runtime.cudaStream_t)
@Deprecated public static int cudaGLUnregisterBufferObject(int bufObj)
cudaError_t cudaGLUnregisterBufferObject ( GLuint bufObj )
Unregisters a buffer object for access by CUDA. DeprecatedThis function is deprecated as of CUDA 3.0.Unregisters the buffer object of ID bufObj for access by CUDA and releases any CUDA resources associated with the buffer. Once a buffer is unregistered, it may no longer be mapped by CUDA. The GL context used to create the buffer, or another context from the same share group, must be bound to the current thread when this is called.
Note that this function may also return error codes from previous, asynchronous launches.
bufObj
- Buffer object to unregistercudaGraphicsUnregisterResource(jcuda.runtime.cudaGraphicsResource)
@Deprecated public static int cudaGLSetBufferObjectMapFlags(int bufObj, int flags)
cudaError_t cudaGLSetBufferObjectMapFlags ( GLuint bufObj, unsigned int flags )
Set usage flags for mapping an OpenGL buffer. DeprecatedThis function is deprecated as of CUDA 3.0.Set flags for mapping the OpenGL buffer bufObj
Changes to flags will take effect the next time bufObj is mapped. The flags argument may be any of the following:
cudaGLMapFlagsNone: Specifies no hints about how this buffer will be used. It is therefore assumed that this buffer will be read from and written to by CUDA kernels. This is the default value.
cudaGLMapFlagsReadOnly: Specifies that CUDA kernels which access this buffer will not write to the buffer.
cudaGLMapFlagsWriteDiscard: Specifies that CUDA kernels which access this buffer will not read from the buffer and will write over the entire contents of the buffer, so none of the data previously stored in the buffer will be preserved.
If bufObj has not been registered for use with CUDA, then cudaErrorInvalidResourceHandle is returned. If bufObj is presently mapped for access by CUDA, then cudaErrorUnknown is returned.
Note that this function may also return error codes from previous, asynchronous launches.
bufObj
- Registered buffer object to set flags forflags
- Parameters for buffer mappingcudaGraphicsResourceSetMapFlags(jcuda.runtime.cudaGraphicsResource, int)
@Deprecated public static int cudaGLMapBufferObjectAsync(Pointer devPtr, int bufObj, cudaStream_t stream)
cudaError_t cudaGLMapBufferObjectAsync ( void** devPtr, GLuint bufObj, cudaStream_t stream )
Maps a buffer object for access by CUDA. DeprecatedThis function is deprecated as of CUDA 3.0.Maps the buffer object of ID bufObj into the address space of CUDA and returns in *devPtr the base pointer of the resulting mapping. The buffer must have previously been registered by calling cudaGLRegisterBufferObject(). While a buffer is mapped by CUDA, any OpenGL operation which references the buffer will result in undefined behavior. The OpenGL context used to create the buffer, or another context from the same share group, must be bound to the current thread when this is called.
Stream /p stream is synchronized with the current GL context.
Note that this function may also return error codes from previous, asynchronous launches.
devPtr
- Returned device pointer to CUDA objectbufObj
- Buffer object ID to mapstream
- Stream to synchronizecudaGraphicsMapResources(int, jcuda.runtime.cudaGraphicsResource[], jcuda.runtime.cudaStream_t)
@Deprecated public static int cudaGLUnmapBufferObjectAsync(int bufObj, cudaStream_t stream)
cudaError_t cudaGLUnmapBufferObjectAsync ( GLuint bufObj, cudaStream_t stream )
Unmaps a buffer object for access by CUDA. DeprecatedThis function is deprecated as of CUDA 3.0.Unmaps the buffer object of ID bufObj for access by CUDA. When a buffer is unmapped, the base address returned by cudaGLMapBufferObject() is invalid and subsequent references to the address result in undefined behavior. The OpenGL context used to create the buffer, or another context from the same share group, must be bound to the current thread when this is called.
Stream /p stream is synchronized with the current GL context.
Note that this function may also return error codes from previous, asynchronous launches.
bufObj
- Buffer object to unmapstream
- Stream to synchronizecudaGraphicsUnmapResources(int, jcuda.runtime.cudaGraphicsResource[], jcuda.runtime.cudaStream_t)
public static int cudaDriverGetVersion(int[] driverVersion)
cudaError_t cudaDriverGetVersion ( int* driverVersion )
Returns the CUDA driver version. Returns in *driverVersion the version number of the installed CUDA driver. If no driver is installed, then 0 is returned as the driver version (via driverVersion). This function automatically returns cudaErrorInvalidValue if the driverVersion argument is NULL.
Note that this function may also return error codes from previous, asynchronous launches.
driverVersion
- Returns the CUDA driver version.cudaRuntimeGetVersion(int[])
public static int cudaRuntimeGetVersion(int[] runtimeVersion)
cudaError_t cudaRuntimeGetVersion ( int* runtimeVersion )
Returns the CUDA Runtime version. Returns in *runtimeVersion the version number of the installed CUDA Runtime. This function automatically returns cudaErrorInvalidValue if the runtimeVersion argument is NULL.
runtimeVersion
- Returns the CUDA Runtime version.cudaDriverGetVersion(int[])
public static int cudaPointerGetAttributes(cudaPointerAttributes attributes, Pointer ptr)
cudaError_t cudaPointerGetAttributes ( cudaPointerAttributes* attributes, const void* ptr )
Returns attributes about a specified pointer. Returns in *attributes the attributes of the pointer ptr.
The cudaPointerAttributes structure is defined as:
struct cudaPointerAttributes { enum cudaMemoryType memoryType; int device; void *devicePointer; void *hostPointer; }In this structure, the individual fields mean
memoryType identifies the physical location of the memory associated with pointer ptr. It can be cudaMemoryTypeHost for host memory or cudaMemoryTypeDevice for device memory.
device is the device against which ptr was allocated. If ptr has memory type cudaMemoryTypeDevice then this identifies the device on which the memory referred to by ptr physically resides. If ptr has memory type cudaMemoryTypeHost then this identifies the device which was current when the allocation was made (and if that device is deinitialized then this allocation will vanish with that device's state).
devicePointer is the device pointer alias through which the memory referred to by ptr may be accessed on the current device. If the memory referred to by ptr cannot be accessed directly by the current device then this is NULL.
hostPointer is the host pointer alias through which the memory referred to by ptr may be accessed on the host. If the memory referred to by ptr cannot be accessed directly by the host then this is NULL.
attributes
- Attributes for the specified pointerptr
- Pointer to get attributes forcudaGetDeviceCount(int[])
,
cudaGetDevice(int[])
,
cudaSetDevice(int)
,
cudaChooseDevice(int[], jcuda.runtime.cudaDeviceProp)
public static int cudaDeviceCanAccessPeer(int[] canAccessPeer, int device, int peerDevice)
cudaError_t cudaDeviceCanAccessPeer ( int* canAccessPeer, int device, int peerDevice )
Queries if a device may directly access a peer device's memory. Returns in *canAccessPeer a value of 1 if device device is capable of directly accessing memory from peerDevice and 0 otherwise. If direct access of peerDevice from device is possible, then access may be enabled by calling cudaDeviceEnablePeerAccess().
Note that this function may also return error codes from previous, asynchronous launches.
canAccessPeer
- Returned access capabilitydevice
- Device from which allocations on peerDevice are to be directly accessed.peerDevice
- Device on which the allocations to be directly accessed by device reside.cudaDeviceEnablePeerAccess(int, int)
,
cudaDeviceDisablePeerAccess(int)
public static int cudaDeviceEnablePeerAccess(int peerDevice, int flags)
cudaError_t cudaDeviceEnablePeerAccess ( int peerDevice, unsigned int flags )
Enables direct access to memory allocations on a peer device. On success, all allocations from peerDevice will immediately be accessible by the current device. They will remain accessible until access is explicitly disabled using cudaDeviceDisablePeerAccess() or either device is reset using cudaDeviceReset().
Note that access granted by this call is unidirectional and that in order to access memory on the current device from peerDevice, a separate symmetric call to cudaDeviceEnablePeerAccess() is required.
Each device can support a system-wide maximum of eight peer connections.
Peer access is not supported in 32 bit applications.
Returns cudaErrorInvalidDevice if cudaDeviceCanAccessPeer() indicates that the current device cannot directly access memory from peerDevice.
Returns cudaErrorPeerAccessAlreadyEnabled if direct access of peerDevice from the current device has already been enabled.
Returns cudaErrorInvalidValue if flags is not 0.
Note that this function may also return error codes from previous, asynchronous launches.
peerDevice
- Peer device to enable direct access to from the current deviceflags
- Reserved for future use and must be set to 0cudaDeviceCanAccessPeer(int[], int, int)
,
cudaDeviceDisablePeerAccess(int)
public static int cudaDeviceDisablePeerAccess(int peerDevice)
cudaError_t cudaDeviceDisablePeerAccess ( int peerDevice )
Disables direct access to memory allocations on a peer device. Returns cudaErrorPeerAccessNotEnabled if direct access to memory on peerDevice has not yet been enabled from the current device.
Note that this function may also return error codes from previous, asynchronous launches.
peerDevice
- Peer device to disable direct access tocudaDeviceCanAccessPeer(int[], int, int)
,
cudaDeviceEnablePeerAccess(int, int)
public static int cudaGraphicsUnregisterResource(cudaGraphicsResource resource)
cudaError_t cudaGraphicsUnregisterResource ( cudaGraphicsResource_t resource )
Unregisters a graphics resource for access by CUDA. Unregisters the graphics resource resource so it is not accessible by CUDA unless registered again.
If resource is invalid then cudaErrorInvalidResourceHandle is returned.
Note that this function may also return error codes from previous, asynchronous launches.
resource
- Resource to unregisterJCuda#cudaGraphicsD3D9RegisterResource
,
JCuda#cudaGraphicsD3D10RegisterResource
,
JCuda#cudaGraphicsD3D11RegisterResource
,
cudaGraphicsGLRegisterBuffer(jcuda.runtime.cudaGraphicsResource, int, int)
,
cudaGraphicsGLRegisterImage(jcuda.runtime.cudaGraphicsResource, int, int, int)
public static int cudaGraphicsResourceSetMapFlags(cudaGraphicsResource resource, int flags)
cudaError_t cudaGraphicsResourceSetMapFlags ( cudaGraphicsResource_t resource, unsigned int flags )
Set usage flags for mapping a graphics resource. Set flags for mapping the graphics resource resource.
Changes to flags will take effect the next time resource is mapped. The flags argument may be any of the following:
cudaGraphicsMapFlagsNone: Specifies no hints about how resource will be used. It is therefore assumed that CUDA may read from or write to resource.
cudaGraphicsMapFlagsReadOnly: Specifies that CUDA will not write to resource.
cudaGraphicsMapFlagsWriteDiscard: Specifies CUDA will not read from resource and will write over the entire contents of resource, so none of the data previously stored in resource will be preserved.
If resource is presently mapped for access by CUDA then cudaErrorUnknown is returned. If flags is not one of the above values then cudaErrorInvalidValue is returned.
Note that this function may also return error codes from previous, asynchronous launches.
resource
- Registered resource to set flags forflags
- Parameters for resource mappingcudaGraphicsMapResources(int, jcuda.runtime.cudaGraphicsResource[], jcuda.runtime.cudaStream_t)
public static int cudaGraphicsMapResources(int count, cudaGraphicsResource[] resources, cudaStream_t stream)
cudaError_t cudaGraphicsMapResources ( int count, cudaGraphicsResource_t* resources, cudaStream_t stream = 0 )
Map graphics resources for access by CUDA. Maps the count graphics resources in resources for access by CUDA.
The resources in resources may be accessed by CUDA until they are unmapped. The graphics API from which resources were registered should not access any resources while they are mapped by CUDA. If an application does so, the results are undefined.
This function provides the synchronization guarantee that any graphics calls issued before cudaGraphicsMapResources() will complete before any subsequent CUDA work issued in stream begins.
If resources contains any duplicate entries then cudaErrorInvalidResourceHandle is returned. If any of resources are presently mapped for access by CUDA then cudaErrorUnknown is returned.
Note that this function may also return error codes from previous, asynchronous launches.
count
- Number of resources to mapresources
- Resources to map for CUDAstream
- Stream for synchronizationcudaGraphicsResourceGetMappedPointer(jcuda.Pointer, long[], jcuda.runtime.cudaGraphicsResource)
,
cudaGraphicsSubResourceGetMappedArray(jcuda.runtime.cudaArray, jcuda.runtime.cudaGraphicsResource, int, int)
,
cudaGraphicsUnmapResources(int, jcuda.runtime.cudaGraphicsResource[], jcuda.runtime.cudaStream_t)
public static int cudaGraphicsUnmapResources(int count, cudaGraphicsResource[] resources, cudaStream_t stream)
cudaError_t cudaGraphicsUnmapResources ( int count, cudaGraphicsResource_t* resources, cudaStream_t stream = 0 )
Unmap graphics resources. Unmaps the count graphics resources in resources.
Once unmapped, the resources in resources may not be accessed by CUDA until they are mapped again.
This function provides the synchronization guarantee that any CUDA work issued in stream before cudaGraphicsUnmapResources() will complete before any subsequently issued graphics work begins.
If resources contains any duplicate entries then cudaErrorInvalidResourceHandle is returned. If any of resources are not presently mapped for access by CUDA then cudaErrorUnknown is returned.
Note that this function may also return error codes from previous, asynchronous launches.
count
- Number of resources to unmapresources
- Resources to unmapstream
- Stream for synchronizationcudaGraphicsMapResources(int, jcuda.runtime.cudaGraphicsResource[], jcuda.runtime.cudaStream_t)
public static int cudaGraphicsResourceGetMappedPointer(Pointer devPtr, long[] size, cudaGraphicsResource resource)
cudaError_t cudaGraphicsResourceGetMappedPointer ( void** devPtr, size_t* size, cudaGraphicsResource_t resource )
Get an device pointer through which to access a mapped graphics resource. Returns in *devPtr a pointer through which the mapped graphics resource resource may be accessed. Returns in *size the size of the memory in bytes which may be accessed from that pointer. The value set in devPtr may change every time that resource is mapped.
If resource is not a buffer then it cannot be accessed via a pointer and cudaErrorUnknown is returned. If resource is not mapped then cudaErrorUnknown is returned. *
Note that this function may also return error codes from previous, asynchronous launches.
devPtr
- Returned pointer through which resource may be accessedsize
- Returned size of the buffer accessible starting at *devPtrresource
- Mapped resource to accesscudaGraphicsMapResources(int, jcuda.runtime.cudaGraphicsResource[], jcuda.runtime.cudaStream_t)
,
cudaGraphicsSubResourceGetMappedArray(jcuda.runtime.cudaArray, jcuda.runtime.cudaGraphicsResource, int, int)
public static int cudaGraphicsSubResourceGetMappedArray(cudaArray arrayPtr, cudaGraphicsResource resource, int arrayIndex, int mipLevel)
cudaError_t cudaGraphicsSubResourceGetMappedArray ( cudaArray_t* array, cudaGraphicsResource_t resource, unsigned int arrayIndex, unsigned int mipLevel )
Get an array through which to access a subresource of a mapped graphics resource. Returns in *array an array through which the subresource of the mapped graphics resource resource which corresponds to array index arrayIndex and mipmap level mipLevel may be accessed. The value set in array may change every time that resource is mapped.
If resource is not a texture then it cannot be accessed via an array and cudaErrorUnknown is returned. If arrayIndex is not a valid array index for resource then cudaErrorInvalidValue is returned. If mipLevel is not a valid mipmap level for resource then cudaErrorInvalidValue is returned. If resource is not mapped then cudaErrorUnknown is returned.
Note that this function may also return error codes from previous, asynchronous launches.
array
- Returned array through which a subresource of resource may be accessedresource
- Mapped resource to accessarrayIndex
- Array index for array textures or cubemap face index as defined by cudaGraphicsCubeFace for cubemap textures for the subresource to accessmipLevel
- Mipmap level for the subresource to accesscudaGraphicsResourceGetMappedPointer(jcuda.Pointer, long[], jcuda.runtime.cudaGraphicsResource)
public static int cudaGraphicsResourceGetMappedMipmappedArray(cudaMipmappedArray mipmappedArray, cudaGraphicsResource resource)
cudaError_t cudaGraphicsResourceGetMappedMipmappedArray ( cudaMipmappedArray_t* mipmappedArray, cudaGraphicsResource_t resource )
Get a mipmapped array through which to access a mapped graphics resource. Returns in *mipmappedArray a mipmapped array through which the mapped graphics resource resource may be accessed. The value set in mipmappedArray may change every time that resource is mapped.
If resource is not a texture then it cannot be accessed via an array and cudaErrorUnknown is returned. If resource is not mapped then cudaErrorUnknown is returned.
Note that this function may also return error codes from previous, asynchronous launches.
mipmappedArray
- Returned mipmapped array through which resource may be accessedresource
- Mapped resource to accesscudaGraphicsResourceGetMappedPointer(jcuda.Pointer, long[], jcuda.runtime.cudaGraphicsResource)
public static int cudaProfilerInitialize(String configFile, String outputFile, int outputMode)
cudaError_t cudaProfilerInitialize ( const char* configFile, const char* outputFile, cudaOutputMode_t outputMode )
Initialize the CUDA profiler. Using this API user can initialize the CUDA profiler by specifying the configuration file, output file and output file format. This API is generally used to profile different set of counters by looping the kernel launch. The configFile parameter can be used to select profiling options including profiler counters. Refer to the "Compute Command Line Profiler User Guide" for supported profiler options and counters.
Limitation: The CUDA profiler cannot be initialized with this API if another profiling tool is already active, as indicated by the cudaErrorProfilerDisabled return code.
Typical usage of the profiling APIs is as follows:
for each set of counters/options { cudaProfilerInitialize(); //Initialize profiling,set the counters/options in the config file ... cudaProfilerStart(); // code to be profiled cudaProfilerStop(); ... cudaProfilerStart(); // code to be profiled cudaProfilerStop(); ... }
Note that this function may also return error codes from previous, asynchronous launches.
configFile
- Name of the config file that lists the counters/options for profiling.outputFile
- Name of the outputFile where the profiling results will be stored.outputMode
- outputMode, can be cudaKeyValuePair OR cudaCSV.cudaProfilerStart()
,
cudaProfilerStop()
public static int cudaProfilerStart()
cudaError_t cudaProfilerStart ( void )
Enable profiling. Enables profile collection by the active profiling tool. If profiling is already enabled, then cudaProfilerStart() has no effect.
cudaProfilerStart and cudaProfilerStop APIs are used to programmatically control the profiling granularity by allowing profiling to be done only on selective pieces of code.
Note that this function may also return error codes from previous, asynchronous launches.
cudaProfilerInitialize(java.lang.String, java.lang.String, int)
,
cudaProfilerStop()
public static int cudaProfilerStop()
cudaError_t cudaProfilerStop ( void )
Disable profiling. Disables profile collection by the active profiling tool. If profiling is already disabled, then cudaProfilerStop() has no effect.
cudaProfilerStart and cudaProfilerStop APIs are used to programmatically control the profiling granularity by allowing profiling to be done only on selective pieces of code.
Note that this function may also return error codes from previous, asynchronous launches.
cudaProfilerInitialize(java.lang.String, java.lang.String, int)
,
cudaProfilerStart()
Copyright © 2020. All rights reserved.