linux-nvgpu

mirror of git://nv-tegra.nvidia.com/linux-nvgpu.git synced 2025-12-22 17:36:20 +03:00

Author	SHA1	Message	Date
Tejal Kudav	ab2b0b5949	gpu: nvgpu: Set unserviceable flag early during RC During recovery, we set ch->unserviceable at the end after we preempt the TSG and reset the engines. It might be too late and user-space might submit more work to the broken channel which is not desirable. Move setting this unserviceable flag right at the start of recovery sequence. Another thread doing a submit can still read the unserviceable flag just before it is set here, leaving that submit stuck if recovery completes before the submit thread advances enough to set up a post fence visible for other threads. This could be fixed with a big lock or with a double check at the end of the submit code after the job data has been made visible. We still release the fences, semaphore and error notifier wait queues at the end; so user-space would not trigger channel unbind while channel is being recovered. Also, change the handle_mmu_fault APIs to return void as the debug_dump return value is not used in any of the caller APIs. JIRA NVGPU-5843 Change-Id: Ib42c2816dd1dca542e4f630805411cab75fad90e Signed-off-by: Tejal Kudav <tkudav@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2385256 Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Rajesh Devaraj	935c5f6578	gpu: nvgpu: fix misra violations in SDL This patch addresses misra violations due to SDL error reporting callbacks. In particular, it addresses the following misra violation: - misra_c_2012_directive_4_7_violation: Calling function "nvgpu_report_*_err()" which returns error information without testing the error information. JIRA NVGPU-4025 Change-Id: Ia10b6b3fd9c127a8c5189c3b6ba316f243cedf04 Signed-off-by: Rajesh Devaraj <rdevaraj@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2196895 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:05:52 -06:00
Debarshi Dutta	69ef86e627	gpu: nvgpu: move safe code HAL files to fusa This patch moves all the safe static and non-static functions as well as its dependencies such as static declared structs into files with _fusa.c extension. If the original file is left with no functions remaining then the file is deleted. Added changes in Makefile, Makefile.sources, nvgpu-hal-new.yaml for compilation. Jira NVGPU-3690 Change-Id: I81af67c308705faf8a681df63a6778e7de2076cf Signed-off-by: Debarshi Dutta <ddutta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2146761 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: Sagar Kamble <skamble@nvidia.com> GVS: Gerrit_Virtual_Submit Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2019-07-03 02:46:15 -07:00
Rajesh Devaraj	29ec6ad40f	gpu: nvgpu: report fb_flush_timeout error This patch adds the support to report fb_flush_timeout error to 3LSS. Specifically, it adds the following service-ID: NVGUARD_SERVICE_IGPU_HOST_SWERR_PFIFO_FB_FLUSH_TIMEOUT_ERROR JIRA NVGPU-3460 JIRA NVGPU-3461 Change-Id: Iddf978eedbc676197a19e47e72e08cd71c478a08 Signed-off-by: Rajesh Devaraj <rdevaraj@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2138051 Reviewed-by: Automatic_Commit_Validation_User Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: Raghuram Kothakota <rkothakota@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> GVS: Gerrit_Virtual_Submit Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2019-06-19 22:51:20 -07:00
Rajesh Devaraj	fcb7635a92	gpu: nvgpu: gops initialization for SDL This patch moves gops init related to SDL from qnx to common-core. For this purpose, it does the following changes: - Adds stub functions for linux and posix. - Updates nvgpu_init.c for mapping err_ops with report error APIs. - Updates nvgpu_err.h header file to include prototypes related to error reporting APIs. - Updates nvgpu-linux.yaml file to include sdl_stub file. Jira NVGPU-3237 Change-Id: Idbdbe6f8437bf53504b29dc2d50214484ad18d6f Signed-off-by: Rajesh Devaraj <rdevaraj@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2119681 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2019-05-30 02:18:05 -07:00
Alex Waterman	3f05901828	Revert "gpu: nvgpu: clear pbdma intr after recovery" This reverts commit `6554696006`. Change-Id: Ifd86f0d75e309c3593b69cdd042e6cb49a1c53bc Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2125117 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>	2019-05-24 13:32:04 -07:00
Peng Liu	6554696006	gpu: nvgpu: clear pbdma intr after recovery pbdma fault recovery function reads pbdma status info to retrieve channel id, tsg id and engine id. pbdma interrupts can only be cleared after that information has been read otherwise because pbdma exits from stall state, channel/tsg/engine could have changed and fault recovery function reads information different from that when interrupt is issued. Bug 2123866 Change-Id: Ia0e0462ae02ec89a333c81bd933a74fbae8ae1e7 Signed-off-by: Peng Liu <pengliu@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2123774 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2019-05-24 10:05:42 -07:00
Seema Khowala	cfb4ff0bfb	gpu: nvgpu: rename struct fifo_gk20a Rename struct fifo_gk20a -> nvgpu_fifo JIRA NVGPU-2012 Change-Id: Ifb5854592c88894ecd830da092ada27c7f05380d Signed-off-by: Seema Khowala <seemaj@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2109625 Reviewed-by: Automatic_Commit_Validation_User Reviewed-by: Alex Waterman <alexw@nvidia.com> GVS: Gerrit_Virtual_Submit Reviewed-by: Adeel Raza <araza@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2019-05-03 16:25:43 -07:00
Seema Khowala	85fe940bed	gpu: nvgpu: clean up unused header in fifo Clean up unused headers in fifo module JIRA NVGPU-2012 Change-Id: Iff4ad3e02a18167dd83904819d04a7eface56a3a Signed-off-by: Seema Khowala <seemaj@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2104400 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: Automatic_Commit_Validation_User Reviewed-by: Thomas Fleury <tfleury@nvidia.com> Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com> GVS: Gerrit_Virtual_Submit Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2019-04-25 12:55:21 -07:00
Seema Khowala	6ba1f5db3b	gpu: nvgpu: move chip specific teardown_mask/unmask_intr Move chip specific functions for teardown_mask_intr and teardown_unmask_intr to hal/fifo/fifo_intr_[chip].[ch] Renamed teardown_mask_intr -> intr_set_recover_mask teardown_unmask_intr -> intr_unset_recover_mask JIRA NVGPU-1314 Change-Id: If233565cbdb09d77cfebd4346edcc3fe64584355 Signed-off-by: Seema Khowala <seemaj@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2093980 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2019-04-18 15:55:53 -07:00
Seema Khowala	ca628dfd6e	gpu: nvgpu: move engine functions to engines.c Removed fifo.runlist_busy_engines ops Moved to engines.c and renamed gk20a_fifo_get_failing_engine_data -> nvgpu_engine_find_busy_doing_ctxsw gk20a_fifo_get_faulty_id_type -> nvgpu_engine_get_id_and_type gk20a_fifo_runlist_busy_engines -> nvgpu_engine_get_runlist_busy_engines JIRA NVGPU-1314 Change-Id: I89c81f331321d47a616a785082d66f9b4a51ff71 Signed-off-by: Seema Khowala <seemaj@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2093788 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2019-04-18 15:55:24 -07:00
Seema Khowala	66cb9495a5	gpu: nvgpu: move mmu_fault_pending ops out from mm Moved -mmu_fault_pending mm ops to is_mmu_fault_pending mc ops -mmu_fault_pending fb ops to is_mmu_fault_pending fb.intr ops. This is needed to check if mmu fault intr is pending for volta onwards. Added is_mmu_fault_pending fifo ops. This is needed to check if mmu fault interrupt is pending for chips prior to volta JIRA NVGPU-1313 Change-Id: Ie8e778387cd486cb19b18c4aee734c581dcd9229 Signed-off-by: Seema Khowala <seemaj@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2094895 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2019-04-11 22:25:01 -07:00
Rajesh Devaraj	5fd2175509	gpu: nvgpu: Enable the reporting of PFIFO errors - Enable the reporting of PFIFO related errors such as engine syncpoint error, memop timeout error, lb error to 3LSS framework. - Remove the reporting of bind_error from gk20a since we already report it from gv11b related fifo hal file. Jira NVGPU-3087 Change-Id: Ic002be3a12a049010165870b861cdfb13a7f33d8 Signed-off-by: Rajesh Devaraj <rdevaraj@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2088579 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2019-04-07 22:44:06 -07:00
Seema Khowala	93fd6644f4	gpu: nvgpu: move mmu_fault hals to hal/fifo Moved below hals from {chip}/fifo_{chip}.[ch] to hal/fifo get_mmu_fault_info get_mmu_fault_desc get_mmu_fault_client_desc get_mmu_fault_gpc_desc Moved gk20a_fifo_handle_dropped_mmu_fault to hal/fifo JIRA NVGPU-1313 Change-Id: I949bcd482156c6e381006387372f13770277e8c5 Signed-off-by: Seema Khowala <seemaj@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2083287 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2019-04-03 13:35:33 -07:00
Debarshi Dutta	c48bfdd0d6	gpu: nvgpu: move gk20a_fifo_pbdma_fault_rc to common.rc unit gk20a_fifo_pbdma_fault_rc is moved to common.rc unit and renamed to nvgpu_rc_pbdma_fault. The function is modified such that when the pbdma id is a channel, recovery is issued only when the channel is part of a valid tsg. Jira NVGPU-2950 Change-Id: I5e975cf79810479f83ffd50581c214a64d1619a6 Signed-off-by: Debarshi Dutta <ddutta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2083749 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> GVS: Gerrit_Virtual_Submit Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2019-03-29 04:47:28 -07:00
Debarshi Dutta	b1ceb5c4d2	gpu: nvgpu: modify handle_pbdma_intr* functions RC_TYPE_PBDMA_FAULT is the only recovery type for all the pbdma intr functions. Thus, rc_type variable is changed to a boolean type in all implementations of handle_pbdma_intr* functions. "handled" variable is unused and removed from all the implementations of handle_pbdma_intr* functions. handle_pbdma_intr* HAL ops are renamed to handle_intr*. Jira NVGPU-2950 Change-Id: I9605d930225a38ed76f25b6a94cb02d855f522dd Signed-off-by: Debarshi Dutta <ddutta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2083748 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2019-03-29 04:47:19 -07:00
Debarshi Dutta	52cbc88a00	gpu: nvgpu: add pbdma intr_enable HAL ops. A new HAL ops intr_enable() is constructed in hal.fifo.pbdma unit. The implementation for this HAL ops is based on gm20b and gv11b architectures. Jira NVGPU-2950 Change-Id: Ifd9c3bfad4264449c52f411e8cad8674c3756048 Signed-off-by: Debarshi Dutta <ddutta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2073536 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2019-03-28 01:15:07 -07:00
Debarshi Dutta	ce5c43d24a	gpu: nvgpu: re-org top level pbdma interrupt handler fifo_pbdma_isr is moved to fifo_intr_gk20a HAL unit and renamed to gk20a_fifo_pbdma_isr. The pbdma specific handling part of the function gk20a_fifo_handle_pbdma_intr is now separated into a top level HAL function named handle_pbdma_intr. This HAL function is implemented for GM20B and all the other architectures use the same implementation. handle_pbdma_intr can accept NULL values for the parameters handled and error_notifier. gk20a_fifo_handle_pbdma_intr is called from gv11b_fifo_poll_pbdma_chan_status and gk20a_fifo_pbdma_isr. The call to gk20a_fifo_handle_pbdma_intr from gv11b_fifo_poll_pbdma_chan_status doesn't progress to recovery. Thus, the function gk20a_fifo_handle_pbdma_intr is removed to decouple pbdma handling from recovery. gv11b_fifo_poll_pbdma_chan_status now directly calls the HAL handle_pbdma_intr. For gk20a_fifo_pbdma_isr, rc_type is used to proceed to recovery by calling gk20a_fifo_pbdma_fault_rc. gk20a_fifo_pbdma_fault_rc is changed to public from static. Jira NVGPU-2950 Change-Id: I4f3597aca2317d4b745cd47bab9dd95c927160a9 Signed-off-by: Debarshi Dutta <ddutta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2073535 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2019-03-28 01:14:53 -07:00
Seshendra Gadagottu	b82f2075ae	gpu: nvgpu: gr: basic falcon hal functions Created gr falcon hal unit with moving following hal functions from gr to gr falcon: u32 (fecs_base_addr)(void); u32 (gpccs_base_addr)(void); void (dump_stats)(struct gk20a g); u32 (fecs_ctxsw_mailbox_size)(void); u32 (get_fecs_ctx_state_store_major_rev_id)(struct gk20a *g); Modified chip hals to populate these new functions and related code now refers to gr falcon hals. Modified kernel headers to have following defs for fecs/gpccs base address in gm20b/gp10b/gv11b/tu104: static inline u32 gr_fecs_irqsset_r(void); static inline u32 gr_gpcs_gpccs_irqsset_r(void); Created base gm20b hals for fecs/gpccs_base_addr and removed redundant gp106 related hals. JIRA NVGPU-1881 Change-Id: I16e820cc1c89223f57988f1e5723fd8fdcbfe89d Signed-off-by: Seshendra Gadagottu <sgadagottu@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2081245 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2019-03-27 10:26:33 -07:00
Seema Khowala	dfafddcc21	gpu: nvgpu: move common and chip specific ctxsw timeout Delete apply_ctxsw_timeout_intr ops and add ctxsw_timeout_enable ops Move chip specific sched_error and ctxsw_timeout functions to hal/fifo/fifo_intr_* and hal/fifo/ctxsw_timeout_* Add nvgpu_rc_ctxsw_timeout function under common/rc/rc.c Do not check ctxsw timeout for channels that are no more bound to tsg. JIRA NVGPU-1312 Change-Id: Ide977fb60b3b72a27d9f22873f7a416c3bd1181d Signed-off-by: Seema Khowala <seemaj@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2075734 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2019-03-25 22:47:45 -07:00
Seema Khowala	fe2a599700	gpu: nvgpu: rename fifo_eng_timeout_us Rename fifo_eng_timeout_us to ctxsw_timeout_period_ms for clarity. JIRA NVGPU-1312 Change-Id: I23faff3df7160c1193f797ac03769ef2ecf4449e Signed-off-by: Seema Khowala <seemaj@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2076776 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2019-03-25 22:47:09 -07:00
Seema Khowala	f66f3e1341	gpu: nvgpu: move fifo intr to hal/fifo Removed intr_0_error_mask ops Added below ops for fifo intr intr_0_enable intr_1_enable intr_0_isr intr_1_isr JIRA NVGPU-1310 Change-Id: I19bd1a380a89cffd582d6c4a0b7796a46fec5afb Signed-off-by: Seema Khowala <seemaj@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2072144 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2019-03-25 11:03:39 -07:00

22 Commits