Commit Graph

22 Commits

Author SHA1 Message Date
Tejal Kudav
ab2b0b5949 gpu: nvgpu: Set unserviceable flag early during RC
During recovery, we set ch->unserviceable at the end after we preempt
the TSG and reset the engines. It might be too late and user-space
might submit more work to the broken channel which is not desirable.
Move setting this unserviceable flag right at the start
of recovery sequence.
Another thread doing a submit can still read the unserviceable flag
just before it is set here, leaving that submit stuck if recovery
completes before the submit thread advances enough to set up a post
fence visible for other threads. This could be fixed with a big lock
or with a double check at the end of the submit code after the job
data has been made visible.
We still release the fences, semaphore and error notifier wait queues
at the end; so user-space would not trigger channel unbind while
channel is being recovered.

Also, change the handle_mmu_fault APIs to return void as the
debug_dump return value is not used in any of the caller APIs.

JIRA NVGPU-5843

Change-Id: Ib42c2816dd1dca542e4f630805411cab75fad90e
Signed-off-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2385256
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Rajesh Devaraj
935c5f6578 gpu: nvgpu: fix misra violations in SDL
This patch addresses misra violations due to SDL error reporting
callbacks. In particular, it addresses the following misra violation:

- misra_c_2012_directive_4_7_violation: Calling function
  "nvgpu_report_*_err()" which returns error information without testing
  the error information.

JIRA NVGPU-4025

Change-Id: Ia10b6b3fd9c127a8c5189c3b6ba316f243cedf04
Signed-off-by: Rajesh Devaraj <rdevaraj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2196895
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:05:52 -06:00
Debarshi Dutta
69ef86e627 gpu: nvgpu: move safe code HAL files to fusa
This patch moves all the safe static and non-static functions as well
as its dependencies such as static declared structs into files with
_fusa.c extension. If the original file is left with no functions
remaining then the file is deleted.

Added changes in Makefile, Makefile.sources, nvgpu-hal-new.yaml for
compilation.

Jira NVGPU-3690

Change-Id: I81af67c308705faf8a681df63a6778e7de2076cf
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2146761
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: Sagar Kamble <skamble@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-07-03 02:46:15 -07:00
Rajesh Devaraj
29ec6ad40f gpu: nvgpu: report fb_flush_timeout error
This patch adds the support to report fb_flush_timeout error to 3LSS.
Specifically, it adds the following service-ID:
NVGUARD_SERVICE_IGPU_HOST_SWERR_PFIFO_FB_FLUSH_TIMEOUT_ERROR

JIRA NVGPU-3460
JIRA NVGPU-3461

Change-Id: Iddf978eedbc676197a19e47e72e08cd71c478a08
Signed-off-by: Rajesh Devaraj <rdevaraj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2138051
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: Raghuram Kothakota <rkothakota@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-06-19 22:51:20 -07:00
Rajesh Devaraj
fcb7635a92 gpu: nvgpu: gops initialization for SDL
This patch moves gops init related to SDL from qnx to common-core. For this
purpose, it does the following changes:
- Adds stub functions for linux and posix.
- Updates nvgpu_init.c for mapping err_ops with report error APIs.
- Updates nvgpu_err.h header file to include prototypes related to error
  reporting APIs.
- Updates nvgpu-linux.yaml file to include sdl_stub file.

Jira NVGPU-3237

Change-Id: Idbdbe6f8437bf53504b29dc2d50214484ad18d6f
Signed-off-by: Rajesh Devaraj <rdevaraj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2119681
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-05-30 02:18:05 -07:00
Alex Waterman
3f05901828 Revert "gpu: nvgpu: clear pbdma intr after recovery"
This reverts commit 6554696006.

Change-Id: Ifd86f0d75e309c3593b69cdd042e6cb49a1c53bc
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2125117
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
2019-05-24 13:32:04 -07:00
Peng Liu
6554696006 gpu: nvgpu: clear pbdma intr after recovery
pbdma fault recovery function reads pbdma status info to retrieve
channel id, tsg id and engine id. pbdma interrupts can only be cleared
after that information has been read otherwise because pbdma exits
from stall state, channel/tsg/engine could have changed and fault
recovery function reads information different from that when interrupt
is issued.

Bug 2123866

Change-Id: Ia0e0462ae02ec89a333c81bd933a74fbae8ae1e7
Signed-off-by: Peng Liu <pengliu@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2123774
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-05-24 10:05:42 -07:00
Seema Khowala
cfb4ff0bfb gpu: nvgpu: rename struct fifo_gk20a
Rename
struct fifo_gk20a -> nvgpu_fifo

JIRA NVGPU-2012

Change-Id: Ifb5854592c88894ecd830da092ada27c7f05380d
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2109625
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Alex Waterman <alexw@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Adeel Raza <araza@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-05-03 16:25:43 -07:00
Seema Khowala
85fe940bed gpu: nvgpu: clean up unused header in fifo
Clean up unused headers in fifo module

JIRA NVGPU-2012

Change-Id: Iff4ad3e02a18167dd83904819d04a7eface56a3a
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2104400
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-04-25 12:55:21 -07:00
Seema Khowala
6ba1f5db3b gpu: nvgpu: move chip specific teardown_mask/unmask_intr
Move chip specific functions for teardown_mask_intr and
teardown_unmask_intr to hal/fifo/fifo_intr_[chip].[ch]

Renamed
teardown_mask_intr -> intr_set_recover_mask
teardown_unmask_intr -> intr_unset_recover_mask

JIRA NVGPU-1314

Change-Id: If233565cbdb09d77cfebd4346edcc3fe64584355
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2093980
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-04-18 15:55:53 -07:00
Seema Khowala
ca628dfd6e gpu: nvgpu: move engine functions to engines.c
Removed
fifo.runlist_busy_engines ops

Moved to engines.c and renamed
gk20a_fifo_get_failing_engine_data -> nvgpu_engine_find_busy_doing_ctxsw
gk20a_fifo_get_faulty_id_type -> nvgpu_engine_get_id_and_type
gk20a_fifo_runlist_busy_engines -> nvgpu_engine_get_runlist_busy_engines

JIRA NVGPU-1314

Change-Id: I89c81f331321d47a616a785082d66f9b4a51ff71
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2093788
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-04-18 15:55:24 -07:00
Seema Khowala
66cb9495a5 gpu: nvgpu: move mmu_fault_pending ops out from mm
Moved
-mmu_fault_pending mm ops to is_mmu_fault_pending mc ops
-mmu_fault_pending fb ops to is_mmu_fault_pending fb.intr ops. This
is needed to check if mmu fault intr is pending for volta onwards.

Added
is_mmu_fault_pending fifo ops. This is needed to check if mmu fault
interrupt is pending for chips prior to volta

JIRA NVGPU-1313

Change-Id: Ie8e778387cd486cb19b18c4aee734c581dcd9229
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2094895
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-04-11 22:25:01 -07:00
Rajesh Devaraj
5fd2175509 gpu: nvgpu: Enable the reporting of PFIFO errors
- Enable the reporting of PFIFO related errors such as engine syncpoint error,
  memop timeout error, lb error to 3LSS framework.
- Remove the reporting of bind_error from gk20a since we already report it
  from gv11b related fifo hal file.

Jira NVGPU-3087

Change-Id: Ic002be3a12a049010165870b861cdfb13a7f33d8
Signed-off-by: Rajesh Devaraj <rdevaraj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2088579
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-04-07 22:44:06 -07:00
Seema Khowala
93fd6644f4 gpu: nvgpu: move mmu_fault hals to hal/fifo
Moved below hals from {chip}/fifo_{chip}.[ch] to hal/fifo

get_mmu_fault_info
get_mmu_fault_desc
get_mmu_fault_client_desc
get_mmu_fault_gpc_desc

Moved gk20a_fifo_handle_dropped_mmu_fault to hal/fifo

JIRA NVGPU-1313

Change-Id: I949bcd482156c6e381006387372f13770277e8c5
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2083287
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-04-03 13:35:33 -07:00
Debarshi Dutta
c48bfdd0d6 gpu: nvgpu: move gk20a_fifo_pbdma_fault_rc to common.rc unit
gk20a_fifo_pbdma_fault_rc is moved to common.rc unit and renamed to
nvgpu_rc_pbdma_fault.

The function is modified such that when the pbdma id is a channel,
recovery is issued only when the channel is part of a valid tsg.

Jira NVGPU-2950

Change-Id: I5e975cf79810479f83ffd50581c214a64d1619a6
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2083749
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-03-29 04:47:28 -07:00
Debarshi Dutta
b1ceb5c4d2 gpu: nvgpu: modify handle_pbdma_intr* functions
RC_TYPE_PBDMA_FAULT is the only recovery type for all the pbdma intr
functions. Thus, rc_type variable is changed to a boolean type
in all implementations of handle_pbdma_intr* functions.

"handled" variable is unused and removed from all the implementations of
handle_pbdma_intr* functions.

handle_pbdma_intr* HAL ops are renamed to handle_intr*.

Jira NVGPU-2950

Change-Id: I9605d930225a38ed76f25b6a94cb02d855f522dd
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2083748
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-03-29 04:47:19 -07:00
Debarshi Dutta
52cbc88a00 gpu: nvgpu: add pbdma intr_enable HAL ops.
A new HAL ops intr_enable() is constructed in
hal.fifo.pbdma unit. The implementation for this HAL ops
is based on gm20b and gv11b architectures.

Jira NVGPU-2950

Change-Id: Ifd9c3bfad4264449c52f411e8cad8674c3756048
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2073536
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-03-28 01:15:07 -07:00
Debarshi Dutta
ce5c43d24a gpu: nvgpu: re-org top level pbdma interrupt handler
fifo_pbdma_isr is moved to fifo_intr_gk20a HAL unit and renamed to
gk20a_fifo_pbdma_isr.

The pbdma specific handling part of the function
gk20a_fifo_handle_pbdma_intr is now separated into a top level HAL
function named handle_pbdma_intr. This HAL function is implemented
for GM20B and all the other architectures use the same implementation.
handle_pbdma_intr can accept NULL values for the parameters handled and
error_notifier.

gk20a_fifo_handle_pbdma_intr is called from
gv11b_fifo_poll_pbdma_chan_status and gk20a_fifo_pbdma_isr.
The call to gk20a_fifo_handle_pbdma_intr from
gv11b_fifo_poll_pbdma_chan_status doesn't progress to recovery.
Thus, the function gk20a_fifo_handle_pbdma_intr is removed to decouple
pbdma handling from recovery. gv11b_fifo_poll_pbdma_chan_status now
directly calls the HAL handle_pbdma_intr. For gk20a_fifo_pbdma_isr,
rc_type is used to proceed to recovery by calling
gk20a_fifo_pbdma_fault_rc.

gk20a_fifo_pbdma_fault_rc is changed to public from static.

Jira NVGPU-2950

Change-Id: I4f3597aca2317d4b745cd47bab9dd95c927160a9
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2073535
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-03-28 01:14:53 -07:00
Seshendra Gadagottu
b82f2075ae gpu: nvgpu: gr: basic falcon hal functions
Created gr falcon hal unit with moving following hal functions
from gr to gr falcon:
u32 (*fecs_base_addr)(void);
u32 (*gpccs_base_addr)(void);
void (*dump_stats)(struct gk20a *g);
u32 (*fecs_ctxsw_mailbox_size)(void);
u32 (*get_fecs_ctx_state_store_major_rev_id)(struct gk20a *g);

Modified chip hals to populate these new functions and related code
now refers to gr falcon hals.

Modified kernel headers to have following defs for
fecs/gpccs base address in gm20b/gp10b/gv11b/tu104:
static inline u32 gr_fecs_irqsset_r(void);
static inline u32 gr_gpcs_gpccs_irqsset_r(void);

Created base gm20b hals for fecs/gpccs_base_addr and
removed redundant gp106 related hals.

JIRA NVGPU-1881

Change-Id: I16e820cc1c89223f57988f1e5723fd8fdcbfe89d
Signed-off-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2081245
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-03-27 10:26:33 -07:00
Seema Khowala
dfafddcc21 gpu: nvgpu: move common and chip specific ctxsw timeout
Delete apply_ctxsw_timeout_intr ops and add
ctxsw_timeout_enable ops

Move chip specific sched_error and ctxsw_timeout
functions to hal/fifo/fifo_intr_* and hal/fifo/ctxsw_timeout_*

Add nvgpu_rc_ctxsw_timeout function under common/rc/rc.c

Do not check ctxsw timeout for channels that are no more
bound to tsg.

JIRA NVGPU-1312

Change-Id: Ide977fb60b3b72a27d9f22873f7a416c3bd1181d
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2075734
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-03-25 22:47:45 -07:00
Seema Khowala
fe2a599700 gpu: nvgpu: rename fifo_eng_timeout_us
Rename fifo_eng_timeout_us to ctxsw_timeout_period_ms for
clarity.

JIRA NVGPU-1312

Change-Id: I23faff3df7160c1193f797ac03769ef2ecf4449e
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2076776
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-03-25 22:47:09 -07:00
Seema Khowala
f66f3e1341 gpu: nvgpu: move fifo intr to hal/fifo
Removed intr_0_error_mask ops

Added below ops for fifo intr
intr_0_enable
intr_1_enable
intr_0_isr
intr_1_isr

JIRA NVGPU-1310

Change-Id: I19bd1a380a89cffd582d6c4a0b7796a46fec5afb
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2072144
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-03-25 11:03:39 -07:00