Public Member Functions | Protected Member Functions | List of all members
PxGpuDispatcher Class Referenceabstract

A GpuTask dispatcher. More...

#include <PxGpuDispatcher.h>

Public Member Functions

virtual void startSimulation ()=0
 Record the start of a simulation step. More...
 
virtual void startGroup ()=0
 Record the start of a GpuTask batch submission. More...
 
virtual void submitTask (PxTask &task)=0
 Submit a GpuTask for execution. More...
 
virtual void finishGroup ()=0
 Record the end of a GpuTask batch submission. More...
 
virtual void addCompletionPrereq (PxBaseTask &task)=0
 Add a CUDA completion prerequisite dependency to a task. More...
 
virtual PxCudaContextManagergetCudaContextManager ()=0
 Retrieve the PxCudaContextManager associated with this PxGpuDispatcher. More...
 
virtual void stopSimulation ()=0
 Record the end of a simulation frame. More...
 
virtual bool failureDetected () const =0
 Returns true if a CUDA call has returned a non-recoverable error. More...
 
virtual void forceFailureMode ()=0
 Force the PxGpuDispatcher into failure mode. More...
 
virtual void * getCurrentProfileBuffer () const =0
 Returns a pointer to the current in-use profile buffer. More...
 
virtual PxU16 registerKernelNames (const char **, PxU16 count)=0
 Register kernel names with PlatformAnalyzer. More...
 
virtual void launchCopyKernel (PxGpuCopyDesc *desc, PxU32 count, CUstream stream)=0
 Launch a copy kernel with arbitrary number of copy commands. More...
 
virtual PxBaseTaskgetPreLaunchTask ()=0
 Query pre launch task that runs before launching gpu kernels. More...
 
virtual void addPreLaunchDependent (PxBaseTask &dependent)=0
 Adds a gpu launch task that gets executed after the pre launch task. More...
 
virtual PxBaseTaskgetPostLaunchTask ()=0
 Query post launch task that runs after the gpu is done. More...
 
virtual void addPostLaunchDependent (PxBaseTask &dependent)=0
 Adds a task that gets executed after the post launch task. More...
 

Protected Member Functions

virtual ~PxGpuDispatcher ()
 protected destructor More...
 

Detailed Description

A GpuTask dispatcher.

A PxGpuDispatcher executes GpuTasks submitted by one or more TaskManagers (one or more scenes). It maintains a CPU worker thread which waits on GpuTask "groups" to be submitted. The submission API is explicitly sessioned so that GpuTasks are dispatched together as a group whenever possible to improve parallelism on the GPU.

A PxGpuDispatcher cannot be allocated ad-hoc, they are created as a result of creating a PxCudaContextManager. Every PxCudaContextManager has a PxGpuDispatcher instance that can be queried. In this way, each PxGpuDispatcher is tied to exactly one CUDA context.

A scene will use CPU fallback Tasks for GpuTasks if the PxTaskManager provided to it does not have a PxGpuDispatcher. For this reason, the PxGpuDispatcher must be assigned to the PxTaskManager before the PxTaskManager is given to a scene.

Multiple TaskManagers may safely share a single PxGpuDispatcher instance, thus enabling scenes to share a CUDA context.

Only failureDetected() is intended for use by the user. The rest of the PxGpuDispatcher public methods are reserved for internal use by only both TaskManagers and GpuTasks.

Constructor & Destructor Documentation

virtual PxGpuDispatcher::~PxGpuDispatcher ( )
inlineprotectedvirtual

protected destructor

GpuDispatchers are allocated and freed by their PxCudaContextManager.

Member Function Documentation

virtual void PxGpuDispatcher::addCompletionPrereq ( PxBaseTask task)
pure virtual

Add a CUDA completion prerequisite dependency to a task.

A GpuTask calls this function to add a prerequisite dependency on another task (usually a CpuTask) preventing that task from starting until all of the CUDA kernels and copies already launched have been completed. The PxGpuDispatcher will increment that task's reference count, blocking its execution, until the CUDA work is complete.

This is generally only required when a CPU task is expecting the results of the CUDA kernels to have been copied into host memory.

This mechanism is not at all not required to ensure CUDA kernels and copies are issued in the correct order. Kernel issue order is determined by normal task dependencies. The rule of thumb is to only use a blocking completion prerequisite if the task in question depends on a completed GPU->Host DMA.

The PxGpuDispatcher issues a blocking event record to CUDA for the purposes of tracking the already submitted CUDA work. When this event is resolved, the PxGpuDispatcher manually decrements the reference count of the specified task, allowing it to execute (assuming it does not have other pending prerequisites).

virtual void PxGpuDispatcher::addPostLaunchDependent ( PxBaseTask dependent)
pure virtual

Adds a task that gets executed after the post launch task.

This is part of an optional feature to schedule multiple gpu features at the same time to get kernels to run in parallel.

Note
Each call adds a reference to the pre-launch task.
virtual void PxGpuDispatcher::addPreLaunchDependent ( PxBaseTask dependent)
pure virtual

Adds a gpu launch task that gets executed after the pre launch task.

This is part of an optional feature to schedule multiple gpu features at the same time to get kernels to run in parallel.

Note
Each call adds a reference to the pre-launch task.
virtual bool PxGpuDispatcher::failureDetected ( ) const
pure virtual

Returns true if a CUDA call has returned a non-recoverable error.

A return value of true indicates a fatal error has occurred. To protect itself, the PxGpuDispatcher enters a fall through mode that allows GpuTasks to complete without being executed. This allows simulations to continue but leaves GPU content static or corrupted.

The user may try to recover from these failures by deleting GPU content so the visual artifacts are mimimized. But there is no way to recover the state of the GPU actors before the failure. Once a CUDA context is in this state, the only recourse is to create a new CUDA context, a new scene, and start over.

This is our "Best Effort" attempt to not turn a soft failure into a hard failure because continued use of a CUDA context after it has returned an error will usually result in a driver reset. However if the initial failure was serious enough, a reset may have already occurred by the time we learn of it.

virtual void PxGpuDispatcher::finishGroup ( )
pure virtual

Record the end of a GpuTask batch submission.

A PxTaskManager calls this function to notify the PxGpuDispatcher that it is done submitting a group of GpuTasks (GpuTasks which were all make ready to run by the same prerequisite dependency becoming resolved). If no other group submissions are in progress, the PxGpuDispatcher will execute the set of ready tasks.

virtual void PxGpuDispatcher::forceFailureMode ( )
pure virtual

Force the PxGpuDispatcher into failure mode.

This API should be used if user code detects a non-recoverable CUDA error. This ensures the PxGpuDispatcher does not launch any further CUDA work. Subsequent calls to failureDetected() will return true.

virtual PxCudaContextManager* PxGpuDispatcher::getCudaContextManager ( )
pure virtual

Retrieve the PxCudaContextManager associated with this PxGpuDispatcher.

Every PxCudaContextManager has one PxGpuDispatcher, and every PxGpuDispatcher has one PxCudaContextManager.

virtual void* PxGpuDispatcher::getCurrentProfileBuffer ( ) const
pure virtual

Returns a pointer to the current in-use profile buffer.

The returned pointer should be passed to all kernel launches to enable CTA/Warp level profiling. If a data collector is not attached, or CTA profiling is not enabled, the pointer will be zero.

virtual PxBaseTask& PxGpuDispatcher::getPostLaunchTask ( )
pure virtual

Query post launch task that runs after the gpu is done.

This is part of an optional feature to schedule multiple gpu features at the same time to get kernels to run in parallel.

Note
Do not set the continuation on the returned task, but use addPostLaunchDependent().
virtual PxBaseTask& PxGpuDispatcher::getPreLaunchTask ( )
pure virtual

Query pre launch task that runs before launching gpu kernels.

This is part of an optional feature to schedule multiple gpu features at the same time to get kernels to run in parallel.

Note
Do not set the continuation on the returned task, but use addPreLaunchDependent().
virtual void PxGpuDispatcher::launchCopyKernel ( PxGpuCopyDesc desc,
PxU32  count,
CUstream  stream 
)
pure virtual

Launch a copy kernel with arbitrary number of copy commands.

This method is intended to be called from Kernel GpuTasks, but it can function outside of that context as well.

If count is 1, the descriptor is passed to the kernel as arguments, so it may be declared on the stack.

If count is greater than 1, the kernel will read the descriptors out of host memory. Because of this, the descriptor array must be located in page locked (pinned) memory. The provided descriptors may be modified by this method (converting host pointers to their GPU mapped equivalents) and should be considered owned by CUDA until the current batch of work has completed, so descriptor arrays should not be freed or modified until you have received a completion notification.

If your GPU does not support mapping of page locked memory (SM>=1.1), this function degrades to calling CUDA copy methods.

virtual PxU16 PxGpuDispatcher::registerKernelNames ( const char **  ,
PxU16  count 
)
pure virtual

Register kernel names with PlatformAnalyzer.

The returned PxU16 must be stored and used as a base offset for the ID passed to the KERNEL_START|STOP_EVENT macros.

virtual void PxGpuDispatcher::startGroup ( )
pure virtual

Record the start of a GpuTask batch submission.

A PxTaskManager calls this function to notify the PxGpuDispatcher that one or more GpuTasks are about to be submitted for execution. The PxGpuDispatcher will not read the incoming task queue until it receives one finishGroup() call for each startGroup() call. This is to ensure as many GpuTasks as possible are executed together as a group, generating optimal parallelism on the GPU.

virtual void PxGpuDispatcher::startSimulation ( )
pure virtual

Record the start of a simulation step.

A PxTaskManager calls this function to record the beginning of a simulation step. The PxGpuDispatcher uses this notification to initialize the profiler state.

virtual void PxGpuDispatcher::stopSimulation ( )
pure virtual

Record the end of a simulation frame.

A PxTaskManager calls this function to record the completion of its dependency graph. If profiling is enabled, the PxGpuDispatcher will trigger the retrieval of profiling data from the GPU at this point.

virtual void PxGpuDispatcher::submitTask ( PxTask task)
pure virtual

Submit a GpuTask for execution.

Submitted tasks are pushed onto an incoming queue. The PxGpuDispatcher will take the contents of this queue every time the pending group count reaches 0 and run the group of submitted GpuTasks as an interleaved group.


The documentation for this class was generated from the following file:


Copyright © 2008-2015 NVIDIA Corporation, 2701 San Tomas Expressway, Santa Clara, CA 95050 U.S.A. All rights reserved. www.nvidia.com