Implement a hw semaphore which is used to track the gpfifo submission.
This is implementation used when the userd.gp_get() is not defined and
also the feature flag NVGPU_SUPPORT_SEMA_BASED_GPFIFO_GET is set.
At the end of each job submitted, submit a semaphore to write the
gpfifo get pointer at hw semaphore addr. At next job submission
processing we will read the gpfifo.get from the designated hw semaphore
location.
JIRA NVGPU-9588
Change-Id: Ic88ace1a3f60e3f38f159e1861464ebcaea04469
Signed-off-by: Ramalingam C <ramalingamc@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2898143
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-by: Martin Radev <mradev@nvidia.com>
Reviewed-by: Ankur Kishore <ankkishore@nvidia.com>
Tested-by: Martin Radev <mradev@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
This patch adds nvenc support for TU104
- Fetch engine/dev info for nvenc
- Falcon NS boot (fw loading) support
- Engine context creation for nvenc
- Skip golden image for multimedia engines
- Avoid subctx for nvenc as it is a non-VEID engine
- Job submission/flow changes for nvenc
- Code refactoring to scale up the support for other multimedia
engines in future.
Bug 3763551
Change-Id: I03d4e731ebcef456bcc5ce157f3aa39883270dc0
Signed-off-by: Santosh BS <santoshb@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2859416
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Rajesh Devaraj <rdevaraj@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
If NEXT bit remains set for a channel being unbound, it can lead to
MMU fault of type unbound inst block. When userspace is closing the
channel and NEXT bit is set, userspace retries.
When force killing the channel, nvgpu can retry few iterations to
ensure the channel is truly idle and unbound. If the channel is
really stuck then unbind will fail and TSG will be aborted.
Bug 3800844
Change-Id: I8fb024630ff2dd272245ae27116f3db6d6e0f788
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2787533
(cherry picked from commit 99e39f4b387743a93b05ba4b097c33b23fbbcf68)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2786479
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
Subcontext PDBs and valid mask in the instance blocks of the channels
in various subcontexts has to be updated when new subcontext is
created or a subcontext is removed.
Replayable fault state is cached in the channel structure. Replayable
fault state for subcontext is set based on first channel's bind
parameter. It was earlier programmed in function channel_setup_ramfc.
init_inst_block_core is updated to setup TSG level pdb map and mask.
Added new hal gv11b_channel_bind to enable the subcontext on channel
bind.
Bug 3677982
Change-Id: I58156c5b3ab6309b6a4b8e72b0e798d6a39c1bee
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2719994
Reviewed-by: Ankur Kishore <ankkishore@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
The wait_pending HAL is now modified to simply
check the pending status of a given runlist.
The while loop is removed from this HAL.
A new function nvgpu_runlist_wait_pending_legacy() is
added that emulates the older wait_pending() HAL.
nvgpu_runlist_tick() is modified to accept a 64 bit
"preempt_grace_ns" value.
These changes prepare for upcoming control-fifo parser
changes.
Jira NVGPU-8619
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Change-Id: If3f288eb6f2181743c53b657219b3b30d56d26bc
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2766100
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
KMD needs to send the domain id and GPU_VA corresponding
to the struct runlist_domains to GSP. In the current
implementation, struct nvgpu_runlist_domain contains
the domain name instead of domain id. This requires
an additional search by name everytime an update
is needed to be submitted to the GSP.
Modify the struct nvgpu_runlist_domain to store domain id
instead of domain name. This simplifies the flow and avoids
unnecessary search.
Removed the conditional check for existence of shadow domain
as its a deadcode. Shadow Domain is not searchable in the list
of domains inside the struct nvgpu_runlist.
Jira NVGPU-8610
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Change-Id: I0d67cfa93d89186240290e933aa750702b14f4f0
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2744890
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Update HAL function runlist.write_state to skip in-active
runlists in fifo.runlists. It is possible for one or more
engines to be floorswept in which case their associated
runlist will be in-active, example, if host supports
3 runlists(0, 1, 2) each serving 3 engines(0, 1, 2), and
engine-1 is floorswept, then runlist-1 becomes in-active and
the entry fifo->runlists[1] will be set to NULL.
Bug 3650588
Change-Id: Iaf9d75e310903c47b842e84dcfa2209d9fe7da96
Signed-off-by: Antony Clince Alex <aalex@nvidia.com>
(cherry picked from commit e29a2019cf8f4796737c670f98164f7783448d49)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2717075
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
GVS: Gerrit_Virtual_Submit
In current form, the default domain acts like any schedulable
domain. TSGs are bound to it and it can be enumerated via the
public interfaces.
The new expectation for the default domain is meant to change
from the current form to a pseudo domain that cannot act like
an ordinary domain in other ways, i.e. it must not be reachable
by in particular the domain management API, it can't be removed,
does not show up in lists, and TSGs cannot be explicitly bound to
this domain. It won't participate in round-robin domain scheduling.
It is not really a domain, and acts like one only when activated in
the manual mode.
Following changes are made overall to support the above change in
definition.
1) Domain creation and attaching the domain to the scheduler are now
split into two separate functions. The new default domain (having ID
= UINT64_MAX) is created separately from a static function without
linking it with other domains in the scheduler.
2) struct nvgpu_nvs_scheduler explicitely stores the default domain
to support direct lookups.
3) TSGs are initially not bound to default domain/rl_domain.
Jira NVGPU-8165
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Change-Id: I916d11f4eea5124d8d64176dc77f3806c6139695
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2697477
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Fixed following Coverity Defects:
ioctl_as.c : Bad bit shift operation
mc_tu104.c : Bad bit shift operation
vm.c : Bad bit shift operation
vm_remap.c : Bad bit shift operation
A new linux header file for ilog2 is created.
The files which used the old ilog2 function
have been changed to use the new nvgpu_ilog2
function.
CID 9847922
CID 9869507
CID 9859508
CID 10112314
CID 10127813
CID 10127899
CID 10128004
Signed-off-by: Jinesh Parakh <jparakh@nvidia.com>
Change-Id: Ia201eea7cc426c3d6581e1e5ae3b882dbab3b490
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2700994
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
In Drive 6.0, the error reporting is supported only for orin (ga10b)
in dev-main. For this purpose, this patch does the following:
- Removes the redundant reporting of following IDs from gv11b:
- GPU_HOST_PFIFO_SCHED_ERROR
- GPU_HOST_PFIFO_CTXSW_TIMEOUT_ERROR
- GPU_HOST_PBDMA_HCE_ERROR
- GPU_MMU_L1TLB_SA_DATA_ECC_UNCORRECTED
- GPU_MMU_L1TLB_FA_DATA_ECC_UNCORRECTED
- GPU_LTC_CACHE_DSTG_ECC_CORRECTED
- GPU_LTC_CACHE_TSTG_ECC_UNCORRECTED
- Migrates the reporting of following IDs from gv11b to ga10b:
- GPU_SM_L1_TAG_ECC_CORRECTED
- GPU_SM_L1_TAG_ECC_UNCORRECTED
- GPU_SM_CBU_ECC_UNCORRECTED
- GPU_SM_LRF_ECC_UNCORRECTED
- GPU_SM_L1_DATA_ECC_UNCORRECTED
- GPU_SM_ICACHE_L1_DATA_ECC_UNCORRECTED
- GPU_SM_ICACHE_L0_PREDECODE_ECC_UNCORRECTED
- GPU_SM_L1_TAG_MISS_FIFO_ECC_UNCORRECTED
- GPU_SM_L1_TAG_S2R_PIXPRF_ECC_UNCORRECTED
- Removes the unused ID that doesn't have any HSI related to it:
- GPU_HOST_PBDMA_PREEMPT_ERROR
In addition to the above, this patch does the following:
- Updates error IDs related to page fault error.
- Updates look-up table to remove unused error IDs.
JIRA NVGPU-8094
Bug 200729736
Change-Id: Ifea76d38ba609c894560e61ff5a6e406290f919e
Signed-off-by: Rajesh Devaraj <rdevaraj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2685249
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Dinesh T <dt@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
GVS: Gerrit_Virtual_Submit
g->fifo.runlists[] has size of g->fifo.max_runlists. During quiesce,
U32_MAX bitmask is passed to g->ops.runlist.write_state() HAL to
disable all the runlist. The Ga10b HAL implementation of
g->ops.runlist.write_state() references into runlists[] structure
for all the bits set in input runlist mask. For mask=U32_MAX,
there is NULL pointer dereference when runlist_id exceeds
g->fifo.max_runlists.
Add runlist_id boundary check before dereferencing the runlists[]
structure.
Update Gk20a HAL too with similar guard to make sure incorrect mask
doesn't get written to the register.
JIRA NVGPU-8102
Change-Id: Ic613aa38361b8b23d953c76d6924aba6bf6d5ea9
Signed-off-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2680847
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
GVS: Gerrit_Virtual_Submit
This patch does the following changes:
- Compiles-out unused error reporting APIs and the related
data structures from safety build. For this purpose, it
introduces the new flag: CONFIG_NVGPU_INTR_DEBUG
- Updates nvgpu_report_err_to_sdl() API with one more argument,
hw_unit_id. This aids in finding whether an error to be reported
is corrected or uncorrected from LUT.
- Triggers SW quiesce, if an uncorrected error is reported to
Safety_Services, in safety build.
- Renames files in cic folder by replacing gv11b with ga10b,
since error reporting for gv11b is not supported in dev-main.
JIRA NVGPU-8002
Change-Id: Ic01e73b0208252abba1f615a2c98d770cdf41ca4
Signed-off-by: Rajesh Devaraj <rdevaraj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2668466
Reviewed-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
GVS: Gerrit_Virtual_Submit
In DRIVE 6.0, NvGPU is allowed to report only 32-bit metadata to
Safety_Services. So, there is no need to have distinct APIs for
reporting errors from units like GR, MM, FIFO to SDL unit. All
these error reporting APIs will be replaced with a single API. To
meet this objective, this patch does the following changes:
- Replaces nvgpu_report_*_err with nvgpu_report_err_to_sdl.
- Removes the reporting of error messages.
- Replaces nvgpu_log() with nvgpu_err(), for error reporting.
- Removes error reporting to Safety_Services from nvgpu_report_*_err.
However, nvgpu_report_*_err APIs and their related files are not
removed. During the creation of nvgpu-mon, they will be moved under
nvgpu-rm, in debug builds.
Note:
- There will be a follow-up patch to fix error IDs.
- As discussed in https://nvbugs/3491596 (comment #12), the high
level expectation is to report only errors.
JIRA NVGPU-7450
Change-Id: I428f2a9043086462754ac36a15edf6094985316f
Signed-off-by: Rajesh Devaraj <rdevaraj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2662590
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
- implemented device info cmd to send device info to the gsp for
runlist submission. Currently GSP scheduler support only GR
engine '0' instance.
- implemented runlist submit cmd. GSP firmware will submit the
corresponding runlist by writing into submit registers. This
command is direct replacement of hw_submit ga10b hal for GR engine.
NVGPU-6790
Signed-off-by: Ramesh Mylavarapu <rmylavarapu@nvidia.com>
Change-Id: I5dc573a6ad698fe20b49a3466a8e50b94cae74df
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2608923
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Most of the Orin chip specific code is compiled out of safety build
with CONFIG_NVGPU_NON_FUSA and CONFIG_NVGPU_HAL_NON_FUSA. Remove the
config protection from Orin/GA10B specific code. Currently all code
is enabled. Code not required in safety will be compiled out later
in separate activity.
Other noteworthy changes in this patch related to safety build:
- In ga10b_ce_request_idle(), add a log print to dump num_pce so that
compiler does not complain about unused variable num_pce.
- In ga10b_fifo_ctxsw_timeout_isr(), protect variables active_eng_id and
recover under CONFIG_NVGPU_KERNEL_MODE_SUBMIT to fix compilation
errors of unused variables.
- Compile out HAL gops.pbdma.force_ce_split() from safety since this HAL
is GA100 specific and not required for GA10B.
- Compile out gr_ga100_process_context_buffer_priv_segment() with
CONFIG_NVGPU_DEBUGGER.
- Compile out VAB support with CONFIG_NVGPU_HAL_NON_FUSA.
- In ga10b_gr_intr_handle_sw_method(), protect left_shift_by_2 variable
with appropriate configs to fix unused variable compilation error.
- In ga10b_intr_isr_stall_host2soc_3(), compile ELPG function calls
with CONFIG_NVGPU_POWER_PG.
- In ga10b_pmu_handle_swgen1_irq(), move whole function body under
CONFIG_NVGPU_FALCON_DEBUG to fix unused variable compilation errors.
- Add below TU104 specific files in safety build since some of the code
in those files is required for GA10B. Unnecessary code will be
compiled out later on.
hal/gr/init/gr_init_tu104.c
hal/class/class_tu104.c
hal/mc/mc_tu104.c
hal/fifo/usermode_tu104.c
hal/gr/falcon/gr_falcon_tu104.c
- Compile out GA10B specific debugger/profiler related files from
safety build.
- Disable CONFIG_NVGPU_FALCON_DEBUG from safety debug build temporarily
to work around compilation errors seen with keeping this config
enabled. Config will be re-enabled in safety debug build later.
Jira NVGPU-7276
Change-Id: I35f2489830ac083d52504ca411c3f1d96e72fc48
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2627048
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
On certain platforms, not all copy engine instances are usable. The user
shouldn't submit any work to these engines. To enforce this, remove
these engines from active/host_engine list, this should ensure that these
engines do not get advertised to userspace. In order to accomplish this
introduce the following functions:
- nvgpu_engine_remove_one_dev: This function removes the specified device
entry from following device lists: fifo->host_engines, fifo->active_engines,
runlist->rl_dev_list, runlist->eng_bitmask.
Replace iteration over LCE device type entries using
nvgpu_device_for_each(g, dev, NVGPU_DEVTYPE_LCE), along with this introduce
macro nvgpu_device_for_each_safe.
Introduce gpu_dbg_ce flag for CE debugging.
Bug 3370462
Change-Id: I2e21f18363c6e53630d129da241c8fece106cd33
Signed-off-by: Antony Clince Alex <aalex@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2616711
Reviewed-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
The current runlist code assumes a single runlist buffer to hold all TSG
and channel entries. Create separate RL domain and domain memory types
to hold data that is related to only a scheduling domain and not
directly to the runlist hardware; in the future, more than one domains
may exist and one of them is enabled at a time.
The domain is used only internally by the runlist code at this point and
is functionally equivalent to the current runlist memory that houses the
round robin entries.
The double buffering is still kept, although more domains might benefit
from some cleverness. Although any number of created domains may be
edited in runtime, nly one runlist memory is accessed by the hardware at
a time. To spare some contiguous memory, this should be considered an
opportunity for optimization in the future.
Jira NVGPU-6425
Change-Id: Id99c55f058ad56daa48b732240f05b3195debfb1
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2618386
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>