linux-nvgpu

mirror of git://nv-tegra.nvidia.com/linux-nvgpu.git synced 2025-12-24 10:34:43 +03:00

Author	SHA1	Message	Date
Rajesh Devaraj	7138e7e673	gpu: nvgpu: export function definitions across chips To avoid duplication of same code across multiple chips, export the following functions through the corresponding headers for the consumption of other GPU enabling functions: - ga10b_gr_intr_report_tpc_sm_rams_ecc_err - gv11b_gr_intr_report_l1_tag_uncorrected_err - gv11b_gr_intr_report_l1_tag_corrected_err - gv11b_gr_intr_report_icache_uncorrected_err JIRA NVGPU-9075 Change-Id: I927285b6e638479ac52cd5d214711e490e5f151e Signed-off-by: Rajesh Devaraj <rdevaraj@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2798371 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Seema Khowala <seemaj@nvidia.com> GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>	2022-10-28 15:30:20 -07:00
Sagar Kamble	e96740c59a	gpu: nvgpu: change ecc error counters increment to wrapping type Usage of nvgpu_safe_add_u32 to increment nvgpu maintained corrected ecc error counters can lead to BUG due to overflow as corrected ecc errors can keep coming in and system will continue to operate. In some configurations, uncorrected error counters can also overflow and lead to BUG. Increment these counters and their delta calculations to use nvgpu_wrapping_add_u32. JIRA NVGPU-7054 Change-Id: I85ddddfa46062744cccbe0756ad942787e72f01b Signed-off-by: Sagar Kamble <skamble@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2601152 (cherry picked from commit f016e59189d2bd66e23f17ccb638f6d384b82fbd) Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2623638 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2022-05-17 08:41:07 -07:00
Richard Zhao	1ce899ce46	gpu: nvgpu: fix compile error of new compile flags Preparing to push hvrtos gpu server changes which requires bellow CFLAGS: -Werror -Wall -Wextra \ -Wmissing-braces -Wpointer-arith -Wundef \ -Wconversion -Wsign-conversion \ -Wformat-security \ -Wmissing-declarations -Wredundant-decls -Wimplicit-fallthrough Jira GVSCI-11640 Signed-off-by: Richard Zhao <rizhao@nvidia.com> Change-Id: I25167f17f231ed741f19af87ca0aa72991563a0f Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2653746 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svcacv <svcacv@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-05-07 15:11:49 -07:00
Tejal Kudav	dae284c74b	gpu: nvgpu: Disable GR functional intrs on safety Disable below interrupts on safety as they do not report any error condition and are not used by CUDA and Graphics(VKSC) on safety build. Signoff from CUDA and VKSC is on Bug https://nvbugs/3588603 1. NV_PGRAPH_INTR_NOTIFY: This intr is set when the Notification style is WRITE_THEN_AWAKEN. 2. NV_PGRAPH_INTR_SEMAPHORE: This is set when a 3d class sempahore is released as the result ofa SetSemaphoreD method, when the AwakenEnable field is TRUE. 3. NV_PGRAPH_INTR_BUFFER_NOTIFY: This bit is set when a Mem2mem DMA completes and the LaunchDma method specifies the interrupt type as INTERRUPT 4. NV_PGRAPH_INTR_DEBUG_METHODS: This is debug feature and not used on QNX safety Bug 3588603 JIRA NVGPU-8166 Change-Id: I6d07dfd2857ac047fac4599421600d364251df76 Signed-off-by: Tejal Kudav <tkudav@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2694363 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2022-04-13 02:35:35 -07:00
Antony Clince Alex	83fe3fd35e	gpu: nvgpu: add errata NVGPU_ERRATA_3524791 Update PES, ROP exception handling for NVGPU_ERRATA_3524791. Enable the errata for all Volta+ chips. ROP, PES exceptions are being reported using the physical-id, where logical-id should have been used. All ESR status registers are reported using logical-id, so this matches with the SW expectation. To address the (1), update ROP, PES exception handler translate from physical to logical-id before reading the status registers. Bug 3524791 Change-Id: Ieacbfb306bb0e69cf0113dc92f18e401573722e3 Signed-off-by: Antony Clince Alex <aalex@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2680029 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2022-04-13 02:32:30 -07:00
Antony Clince Alex	32a9c6923c	gpu: nvgpu: gv11b: udpate PES exception handling At present, the driver only report/handle exceptions from PES0, however, Volta+ chips have 2 PES units within a GPC. Update the PES exception handler to report/handle exceptions from both PES0,1 units. Bug 3524791 Change-Id: I71ac75cc1abe492b7aa781d8d16077f4da3a997b Signed-off-by: Antony Clince Alex <aalex@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2679931 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Seema Khowala <seemaj@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: Seema Khowala <seemaj@nvidia.com>	2022-04-13 02:32:19 -07:00
Rajesh Devaraj	43ba356132	gpu: nvgpu: fix typo in error id SM_LRF_ECC_UNCORRECTED error is incorrectly reported using the error ID of SM_CBU_ECC_UNCORRECTED error. This patch fixes this typo issue. Bug 3366818 Change-Id: I9c274be45776711ab9c70ef66a75dc23afa276a6 Signed-off-by: Rajesh Devaraj <rdevaraj@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2688984 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: svcacv <svcacv@nvidia.com> Reviewed-by: Dinesh T <dt@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: Ankur Kishore <ankkishore@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-03-29 10:12:17 -07:00
Rajesh Devaraj	37c6b8b1c3	gpu: nvgpu: update reporting of errors to sdl In Drive 6.0, the error reporting is supported only for orin (ga10b) in dev-main. For this purpose, this patch does the following: - Removes the redundant reporting of following IDs from gv11b: - GPU_HOST_PFIFO_SCHED_ERROR - GPU_HOST_PFIFO_CTXSW_TIMEOUT_ERROR - GPU_HOST_PBDMA_HCE_ERROR - GPU_MMU_L1TLB_SA_DATA_ECC_UNCORRECTED - GPU_MMU_L1TLB_FA_DATA_ECC_UNCORRECTED - GPU_LTC_CACHE_DSTG_ECC_CORRECTED - GPU_LTC_CACHE_TSTG_ECC_UNCORRECTED - Migrates the reporting of following IDs from gv11b to ga10b: - GPU_SM_L1_TAG_ECC_CORRECTED - GPU_SM_L1_TAG_ECC_UNCORRECTED - GPU_SM_CBU_ECC_UNCORRECTED - GPU_SM_LRF_ECC_UNCORRECTED - GPU_SM_L1_DATA_ECC_UNCORRECTED - GPU_SM_ICACHE_L1_DATA_ECC_UNCORRECTED - GPU_SM_ICACHE_L0_PREDECODE_ECC_UNCORRECTED - GPU_SM_L1_TAG_MISS_FIFO_ECC_UNCORRECTED - GPU_SM_L1_TAG_S2R_PIXPRF_ECC_UNCORRECTED - Removes the unused ID that doesn't have any HSI related to it: - GPU_HOST_PBDMA_PREEMPT_ERROR In addition to the above, this patch does the following: - Updates error IDs related to page fault error. - Updates look-up table to remove unused error IDs. JIRA NVGPU-8094 Bug 200729736 Change-Id: Ifea76d38ba609c894560e61ff5a6e406290f919e Signed-off-by: Rajesh Devaraj <rdevaraj@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2685249 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Dinesh T <dt@nvidia.com> Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-03-23 21:02:15 -07:00
Rajesh Devaraj	c5822b0d98	gpu: nvgpu: add error prints for errors reported to sdl In Drive 6.0, only error IDs are reported to Safety_Services. The additional debug/error information is printed using nvgpu_err(). JIRA NVGPU-8094 Bug 3491596 Change-Id: Ie90f3e1453e6a796d5c76373c11f8a5a188ac590 Signed-off-by: Rajesh Devaraj <rdevaraj@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2684289 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Dinesh T <dt@nvidia.com> Reviewed-by: Ankur Kishore <ankkishore@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-03-22 17:55:10 -07:00
Rajesh Devaraj	329807b8f9	gpu: nvgpu: update error ids for pgraph This patch updates PGRAPH related error IDs for ga10b. Since sub error type is not supported in Safety_Services 6.0, dedicated error IDs have been allocated for all sub-errors in PGRAPH. JIRA NVGPU-8094 Change-Id: Ic8de5815c5ea63e290d11ffca598e58812573603 Signed-off-by: Rajesh Devaraj <rdevaraj@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2678289 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2022-03-09 04:42:36 -08:00
Tejal Kudav	9b7c8cdd8c	gpu: nvgpu: Update GR intr code as per Orin HSIs Most SM RAMs are protected with parity (except L1 D-cache TAG mem which is protected with SEC-DED ECC). The memory corruption errors reported by these RAMs are therefore uncorrected errors only. Remove the code to handle corrected errors from GR SM ECC. The SM RAMS ECC errors currently report error to SDL using ID GPU_SM_L1_TAG_ECC_(UN)CORRECTED. Update the error reporting to use the newly created error IDs for Drive 6.0. JIRA NVGPU-7987 Change-Id: Ic426d45f851d87aafaa7963b937535582cdafadf Signed-off-by: Tejal Kudav <tkudav@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2674389 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2022-03-08 11:42:32 -08:00
Rajesh Devaraj	0699220b85	gpu: nvgpu: compile-out unused apis from safety build This patch does the following changes: - Compiles-out unused error reporting APIs and the related data structures from safety build. For this purpose, it introduces the new flag: CONFIG_NVGPU_INTR_DEBUG - Updates nvgpu_report_err_to_sdl() API with one more argument, hw_unit_id. This aids in finding whether an error to be reported is corrected or uncorrected from LUT. - Triggers SW quiesce, if an uncorrected error is reported to Safety_Services, in safety build. - Renames files in cic folder by replacing gv11b with ga10b, since error reporting for gv11b is not supported in dev-main. JIRA NVGPU-8002 Change-Id: Ic01e73b0208252abba1f615a2c98d770cdf41ca4 Signed-off-by: Rajesh Devaraj <rdevaraj@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2668466 Reviewed-by: Tejal Kudav <tkudav@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-02-14 22:00:33 -08:00
Rajesh Devaraj	7dc013d242	gpu: nvgpu: merge error reporting apis In DRIVE 6.0, NvGPU is allowed to report only 32-bit metadata to Safety_Services. So, there is no need to have distinct APIs for reporting errors from units like GR, MM, FIFO to SDL unit. All these error reporting APIs will be replaced with a single API. To meet this objective, this patch does the following changes: - Replaces nvgpu_report__err with nvgpu_report_err_to_sdl. - Removes the reporting of error messages. - Replaces nvgpu_log() with nvgpu_err(), for error reporting. - Removes error reporting to Safety_Services from nvgpu_report__err. However, nvgpu_report_*_err APIs and their related files are not removed. During the creation of nvgpu-mon, they will be moved under nvgpu-rm, in debug builds. Note: - There will be a follow-up patch to fix error IDs. - As discussed in https://nvbugs/3491596 (comment #12), the high level expectation is to report only errors. JIRA NVGPU-7450 Change-Id: I428f2a9043086462754ac36a15edf6094985316f Signed-off-by: Rajesh Devaraj <rdevaraj@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2662590 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2022-02-09 00:41:18 -08:00
Richard Zhao	e81a36e56a	gpu: nvgpu: hal: fix compile error of new compile flags It's preparing to add bellow CFLAGS: -Werror -Wall -Wextra \ -Wmissing-braces -Wpointer-arith -Wundef \ -Wconversion -Wsign-conversion \ -Wformat-security \ -Wmissing-declarations -Wredundant-decls -Wimplicit-fallthrough Jira GVSCI-11640 Signed-off-by: Richard Zhao <rizhao@nvidia.com> Change-Id: Ia16ef186da1e97badff9dd0bf8cbd6700dd77b15 Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2555057 Reviewed-by: Shashank Singh <shashsingh@nvidia.com> Reviewed-by: Aparna Das <aparnad@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-01-13 12:36:19 -08:00
Deepak Nibade	3d9c67a0e7	gpu: nvgpu: enable Orin support in safety build Most of the Orin chip specific code is compiled out of safety build with CONFIG_NVGPU_NON_FUSA and CONFIG_NVGPU_HAL_NON_FUSA. Remove the config protection from Orin/GA10B specific code. Currently all code is enabled. Code not required in safety will be compiled out later in separate activity. Other noteworthy changes in this patch related to safety build: - In ga10b_ce_request_idle(), add a log print to dump num_pce so that compiler does not complain about unused variable num_pce. - In ga10b_fifo_ctxsw_timeout_isr(), protect variables active_eng_id and recover under CONFIG_NVGPU_KERNEL_MODE_SUBMIT to fix compilation errors of unused variables. - Compile out HAL gops.pbdma.force_ce_split() from safety since this HAL is GA100 specific and not required for GA10B. - Compile out gr_ga100_process_context_buffer_priv_segment() with CONFIG_NVGPU_DEBUGGER. - Compile out VAB support with CONFIG_NVGPU_HAL_NON_FUSA. - In ga10b_gr_intr_handle_sw_method(), protect left_shift_by_2 variable with appropriate configs to fix unused variable compilation error. - In ga10b_intr_isr_stall_host2soc_3(), compile ELPG function calls with CONFIG_NVGPU_POWER_PG. - In ga10b_pmu_handle_swgen1_irq(), move whole function body under CONFIG_NVGPU_FALCON_DEBUG to fix unused variable compilation errors. - Add below TU104 specific files in safety build since some of the code in those files is required for GA10B. Unnecessary code will be compiled out later on. hal/gr/init/gr_init_tu104.c hal/class/class_tu104.c hal/mc/mc_tu104.c hal/fifo/usermode_tu104.c hal/gr/falcon/gr_falcon_tu104.c - Compile out GA10B specific debugger/profiler related files from safety build. - Disable CONFIG_NVGPU_FALCON_DEBUG from safety debug build temporarily to work around compilation errors seen with keeping this config enabled. Config will be re-enabled in safety debug build later. Jira NVGPU-7276 Change-Id: I35f2489830ac083d52504ca411c3f1d96e72fc48 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2627048 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-11-26 08:46:47 -08:00
Seshendra Gadagottu	c2901b6835	gpu: nvgpu: correct debug messages for fecs ecc errors Following error message is getting printed even when there are no fecs ecc errors: nvgpu: 17000000.ga10x gv11b_gr_intr_handle_fecs_ecc_error:114 [ERR] error count corrected: 0, uncorrected 0 To avoid confusion, print error messages only when fecs errors are reported. Bug 3417834 Change-Id: I96317555b11e1976f33add4b1dc8d84c936c26fb Signed-off-by: Seshendra Gadagottu <sgadagottu@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2625723 Reviewed-by: Seema Khowala <seemaj@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2021-11-17 17:31:52 -08:00
Deepak Nibade	5d51872620	Revert "gpu: nvgpu: support SW methods in safety temporarily" This reverts commit `e0db40c3a5`. CUDA change to stop using SW methods in safety is integrated and this temporary patch can be reverted now. Bug 200748548 Change-Id: Ibdfd42b1fbfbfdf24455426e1b8001ad8b6218d5 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2623433 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-11-12 02:54:20 -08:00
Debarshi Dutta	fdc967d6a2	gpu: nvgpu: change macros to inline functions The macros defined within the C file in the form (\ fb_mmu_l2tlb_ecc_status_corrected_err_l2tlb_sa_data_m() \|\ fb_mmu_l2tlb_ecc_status_corrected_err_l2tlb1_sa_data_m() \ ) are difficult to detect correctly in libclang based static analyzers. As a consequence, Hal Checker might be missing some coverage. Such masks are converted into a static function format to help mitigate this issue. Change-Id: Id43e25abda8db4c79f7f6fc604eb6e76e9f6282c Signed-off-by: Debarshi Dutta <ddutta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2598063 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2021-10-28 12:05:08 -07:00
Deepak Nibade	e0db40c3a5	gpu: nvgpu: support SW methods in safety temporarily Removing SW methods completely requires CUDA changes in perforce. Because of stage-main and module codelines infrastructure this cannot be done in single series. Hence keep alive couple of SW methods for CUDA compatibility until SW methods are removed from CUDA code. SW methods do not perform any action. Next steps are for CUDA to stop using SW methods. Once that is integrated this patch can be reverted to completely remove SW method support in safety build. Bug 200748548 Change-Id: I2c70a647bec126bab2d8cac4fe10393be8ba0fb9 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2604749 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Rajesh Devaraj <rdevaraj@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-10-04 18:04:01 -07:00
Deepak Nibade	d1f3f81553	gpu: nvgpu: remove SW methods from safety build Improved SDL heartbeat mechanism detects the interrupts triggered by SW method and treats them as errors. Hence remove the SW method support completely from safety build. Registers set by SW methods are now set by default for all the contexts. Implement new HAL gops.gr.init.set_default_compute_regs() to set the registers in patch context. Call this HAL while creating each context. Update gv11b_gr_intr_handle_sw_method() to treat all compute SW methods as invalid. Update unit test test_gr_intr_sw_exceptions() so that it now expects failure for any method/data. Bug 200748548 Change-Id: I614f6411bbe7000c22f1891bbaf06982e8bd7f0b Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2527249 (cherry picked from commit bb6e0f9aa1404f79bcfbdd308b8c174a4fc83250) Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2602638 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: svcacv <svcacv@nvidia.com> Reviewed-by: Rajesh Devaraj <rdevaraj@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-10-04 18:03:55 -07:00
Deepak Nibade	af989f6212	gpu: nvgpu: fix misra rule 13.2 violations in common.gr unit Fix MISRA rule 13.2 violations of below type from common.gr unit: nvgpu/drivers/gpu/nvgpu/common/gr/gr_intr.c:108 Type: MISRA C-2012 Side Effects (MISRA C-2012 Rule 13.2, Required) nvgpu/drivers/gpu/nvgpu/common/gr/gr_intr.c:108: 1. misra_c_2012_rule_13_2_violation: In "nvgpu_safe_add_u32(nvgpu_gr_gpc_offset(g, gpc), nvgpu_gr_tpc_offset(g, tpc))", there are 2 function calls in the arguments for which the order of evaluation is undefined. Jira NVGPU-7127 Change-Id: Ie867fb62098eed3a45ec01b941eda93b94220b4b Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2598696 (cherry picked from commit 15483df6ca1017e5b9d6f2dff35f7e57094a2b4d) Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2601976 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: V M S Seeta Rama Raju Mudundi <srajum@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2021-09-29 15:14:34 -07:00
Divya Singhatwaria	9984b59a00	gpu: nvgpu: Add ELPG protected call for GR and CE intr - Accessing any PGRAPH registers in GR intr retrigger ISR routine when ELPG is engaged causes idle snap. - This idle snap is caught when nvgpu_submit_illegal_class test is run. - To avoid access to PGRAPH registers when ELPG is engaged add elpg protected call for GR intr retrigger and CE ISR and retrigger HALs Bug 200777033 Change-Id: Ieef4a423faf79f09476d696c3078b113750548bb Signed-off-by: Divya Singhatwaria <dsinghatwari@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2586449 Reviewed-by: Mahantesh Kumbar <mkumbar@nvidia.com> Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-09-24 07:11:38 -07:00
Debarshi Dutta	791dc18666	gpu: nvgpu: bvec for struct nvgpu_tsg_sm_error_state fields Add Setter and Getter methods for accessing tsg->sm_error_states. Getter returns a constant pointer for struct nvgpu_tsg_sm_error_state. This renders it unnecessary to add BVEC for above fields for the struct in multiple locations. The current design ensures that only a constant pointer is obtained from the owner unit i.e. FIFO. The following new methods are added. Both unit tests and BVEC tests are added for them as well. nvgpu_tsg_store_sm_error_state nvgpu_tsg_get_sm_error_state Jira NVGPU-6947 Change-Id: I82c22a2774862c8579baa41b6fb8292fa164704a Signed-off-by: Debarshi Dutta <ddutta@nvidia.com> (cherry picked from commit 79574638671a0c6efe41cd3423668fcd1bd96826) Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2556938 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com> Reviewed-by: Shashank Singh <shashsingh@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2021-09-13 20:57:09 -07:00
Tejal Kudav	524d222149	gpu: nvgpu: Print error msg on GR ECC errors Currently, the ECC errors go unnoticed. ECC errors in GR are handled by incrementing the ecc counters and clearing the intr registers. No action is taken bassed on the ECC counters. The ECC intr prints are also not printed by default. Ideally, the ECC errors should trigger recovery if they cross the error threshold. While we wait for the ideal ECC handling, convert the conditional ECC prints to error messages. JIRA NVGPU-7058 Change-Id: I2e54aeb5aef1d76dd6d4162eedd21cd529089b54 Signed-off-by: Tejal Kudav <tkudav@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2573029 Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com> Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: V M S Seeta Rama Raju Mudundi <srajum@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-08-09 10:04:34 -07:00
Deepak Nibade	4edf952e3e	gpu: nvgpu: fix rule 5.1 misra violations in common.gr Fix rule 5.1 misra violations in common.gr by renaming below functions : nvgpu_gr_config_get_gpc_tpc_mask_base -> nvgpu_gr_config_get_base_mask_gpc_tpc nvgpu_gr_config_get_gpc_tpc_count_base -> nvgpu_gr_config_get_base_count_gpc_tpc gm20b_ctxsw_prog_set_priv_access_map_config_mode -> gm20b_ctxsw_prog_set_config_mode_priv_access_map gm20b_ctxsw_prog_set_priv_access_map_addr -> gm20b_ctxsw_prog_set_addr_priv_access_map gm20b_gr_falcon_read_fecs_ctxsw_mailbox -> gm20b_gr_falcon_read_mailbox_fecs_ctxsw gm20b_gr_falcon_read_fecs_ctxsw_status0 -> gm20b_gr_falcon_read_status0_fecs_ctxsw gm20b_gr_falcon_read_fecs_ctxsw_status1 -> gm20b_gr_falcon_read_status1_fecs_ctxsw gv11b_gr_intr_get_sm_hww_warp_esr_pc -> gv11b_gr_intr_get_warp_esr_pc_sm_hww gv11b_gr_intr_get_sm_hww_warp_esr -> gv11b_gr_intr_get_warp_esr_sm_hww Jira NVGPU-6779 Change-Id: Icbe23a7b022373785968fc417ee247e2d80cfcc6 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2554521 (cherry picked from commit 1432650774506f2a7e45f70b084f498736d0d0c5) Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2555330 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-07-13 09:20:41 -07:00
Lakshmanan M	e9872a0d91	gpu: nvgpu: Skip graphics unit access when MIG is enabled This CL covers the following modifications, 1) Added logic to skip the graphics unit specific sw context load register write during context creation when MIG is enabled. 2) Added logic to skip the graphics unit specific sw method register write when MIG is enabled. 3) Added logic to skip the graphics unit specific slcg and blcg gr register write when MIG is enabled. 4) Fixed some priv errors observed during MIG boot. 5) Added MIG Physical support for GPU count < 1. 6) Host clk register access is not allowed for GA100. So skipped to access host clk register. 7) Added utiliy api - nvgpu_gr_exec_with_ret_for_all_instances() 8) Added gr_pri_mme_shadow_ram_index_nvclass_v() reg field to identify the sw method class number. Bug 200649233 Change-Id: Ie434226f007ee5df75a506fedeeb10c3d6e227a3 Signed-off-by: Lakshmanan M <lm@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2549811 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-07-02 16:41:51 -07:00
tkudav	0526e7eaa9	gpu: nvgpu: Create CIC-mon and CIC-rm subunits common.cic unit is divided into common.cic.mon and common.cic.rm based on rm and mon process split. CIC-mon subunit includes the code which is utilized in critical interrupt handling path like initialization, error detection and error reporting path. CIC-rm subunit includes the code corresponding to rest of interrupt handling(like collecting error debug data from registers) and ISR status management (status of deferred interrupts). Split the CIC APIs and data-members into above two subunits. JIRA NVGPU-6899 Change-Id: I151b59105ff570607c4a62e974785e9c1323ef69 Signed-off-by: Tejal Kudav <tkudav@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2551897 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-07-02 09:57:56 -07:00
Antony Clince Alex	f9cac0c64d	gpu: nvgpu: remove nvgpu_next files Remove all nvgpu_next files and move the code into corresponding nvgpu files. Merge nvgpu-next-*.yaml into nvgpu-.yaml files. Jira NVGPU-4771 Change-Id: I595311be3c7bbb4f6314811e68712ff01763801e Signed-off-by: Antony Clince Alex <aalex@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2547557 Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com> Reviewed-by: Mahantesh Kumbar <mkumbar@nvidia.com> Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-06-27 05:02:53 -07:00
Sagar Kadamati	3e43f92f21	gpu: nvgpu: add ga10b & ga100 sources Mass copy ga10b & ga100 sources from nvgpu-next repo. TOP COMMIT-ID: 98f530e6924c844a1bf46816933a7fe015f3cce1 Jira NVGPU-4771 Change-Id: Ibf7102e9208133f8ef3bd3a98381138d5396d831 Signed-off-by: Sagar Kadamati <skadamati@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2524817 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com> Reviewed-by: Debarshi Dutta <ddutta@nvidia.com> Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2021-06-17 12:56:16 -07:00
Deepak Nibade	c4dee40c49	gpu: nvgpu: fix MISRA violations in common.gr 1. misra_c_2012_rule_8_6_violation: "gm20b_gr_init_fe_go_idle_timeout" is declared but never defined. Fix by adding config CONFIG_NVGPU_HAL_NON_FUSA for header declaration of "gm20b_gr_init_fe_go_idle_timeout" 2. misra_c_2012_rule_5_7_violation: Identifier "ops" is already used to represent a type. Fix by renaming local variable ops to nonstall_ops in gm20b_gr_intr_nonstall_isr() 3. missing_default: No default case found for the switch statement "switch (offset << 2)" Fix by adding break and default statements to switch case in gv11b_gr_intr_handle_sw_method() Jira NVGPU-6779 Change-Id: I8df097ec66479edcd2e81bf46bab5b5db52ac8c8 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2541246 (cherry picked from commit c4d9fe0449f8c6ee209051abfe58c6f3a745808d) Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2543012 Reviewed-by: svcacv <svcacv@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-06-11 18:04:33 -07:00
Tejal Kudav	9f43914933	gpu: nvgpu: Move Intr handling common code to CIC CIC (Central Interrupt controller) will be responsible for the interrupt handling. common.cic unit is the placeholder for all interrupt related code. Move interrupt related defines and Public APIs present in common.mc to common.cic. Note: The common.mc interrupts related struct definitions are not moved as part of this patch. Adapt the code to use interrupt handling related defines and public APIs migrated from common.mc to common.cic JIRA NVGPU-6899 Change-Id: I747e2b556c0dd66d58d74ee5bb36768b9370d276 Signed-off-by: Tejal Kudav <tkudav@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2535618 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-05-31 19:37:31 -07:00
Vedashree Vidwans	86cb03d2f1	gpu: nvgpu: Replace WAR keyword with "fix" Replace/remove "WAR" keyword in the comments in nvgpu driver with "fix". Rename below functions and corresponding gops to replace "war" word with "errata" word: - g.pdb_cache_war_mem - ramin.init_pdb_cache_war - ramin.deinit_pdb_cache_war - tu104_ramin_init_pdb_cache_war - tu104_ramin_deinit_pdb_cache_war - fb.apply_pdb_cache_war - tu104_fb_apply_pdb_cache_war - nvgpu_init_mm_pdb_cache_war - nvlink.set_sw_war - gv100_nvlink_set_sw_war Jira NVGPU-6680 Change-Id: Ieaad2441fac87e4544eddbca3624b82076b2ee73 Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2515700 Reviewed-by: Seema Khowala <seemaj@nvidia.com> Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2021-04-28 19:14:49 -07:00
Vedashree Vidwans	aba26fa082	gpu: nvgpu: handle chip specific erratas Currently, there are few chip specific erratas present in nvgpu code. For better traceability of the erratas and corresponding fixes, introduce flags to indicate existing erratas on a chip. These flags decide if a corresponding solution is applied to the chip(s). This patch introduces below functions to handle errata flags: - nvgpu_init_errata_flags - nvgpu_set_errata - nvgpu_is_errata_present - nvgpu_print_errata_flags - nvgpu_free_errata_flags nvgpu_print_errata_flags: print below details of erratas present in chip 1. errata flag name 2. chip where the errata was first discovered 3. short description of the errata Flags corresponding to erratas present in a chip are set during chip hal init sequence. JIRA NVGPU-6510 Change-Id: Id5a8fb627222ac0a585aba071af052950f4de965 Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2498095 Reviewed-by: Seema Khowala <seemaj@nvidia.com> Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2021-04-28 19:14:44 -07:00
Debarshi Dutta	e9a8fa028e	gpu: nvgpu: disable ssync access when MIG is enabled Disable access to ssync unit when MIG is enabled as ssync is part of GR and not Compute. A runtime check is now added for the below function. gv11b_gr_intr_enable_hww_exceptions The following priv errors are seen. SYS write error: ADR 0x00405a14 WRDAT 0xc0000000 master 0x00000000 [ERR] INFO 0x19400200: (subid 0x00000019 priv_level 0 local_ordering 1) [ERR] CODE 0xbadf1100 Jira NVGPU-6699 Change-Id: I9a08f1b6ab58affdcaa18e8ca314a4a00478a3e5 Signed-off-by: Debarshi Dutta <ddutta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2514761 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Lakshmanan M <lm@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2021-04-20 07:47:19 -07:00
Antony Clince Alex	7072b39783	gpu: nvgpu: gv11b: update prop hww handling Update prop hww handling to read and print additional diagnostic registers. Generate following registers: - gr_gpc0_prop_hww_esr_coord_r - gr_gpc0_prop_hww_esr_format_r - gr_gpc0_prop_hww_esr_state_r - gr_gpc0_prop_hww_esr_state2_r - gr_gpc0_prop_hww_esr_offset_r Rename following registers and associated fields: - pes_hww_esr => gpc0_ppc0_pes_hww_esr - setup_hww_esr = > gpc0_setup_hww_esr - zcull_hww_esr => gpc0_zcull_hww_esr - prop_hww_esr => gpc0_prop_hww_esr Jira NVGPU-6078 Bug 2865015 Change-Id: I131c48d2375ef0a76ac6c57ff1eb019f7c113286 Signed-off-by: Antony Clince Alex <aalex@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2472894 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Seema Khowala <seemaj@nvidia.com> Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2021-03-12 04:38:04 -08:00
Vedashree Vidwans	df1c9c4640	gpu: nvgpu: remove trivial operations before BUG Corrected ECC errors are not applicable to GV11B. Remove unnecesary lines of code before invoking BUG(). Jira NVGPU-6272 Change-Id: I410d6463efd39584efdff7939ad66fae8ee63afc Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2473098 Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Rajesh Devaraj <rdevaraj@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-01-22 07:05:31 -08:00
tkudav	2ca4f145e4	gpu: nvgpu: Fix HAL checker pointed mismatches Add new HALs for register field definition/value changes in GV11B as compared to Pascal. Update the HALs for recent chips too if applicable. Bug 200604892 Change-Id: I14ee9440859007e86a1ffa937df399a31e2628bd Signed-off-by: Tejal Kudav <tkudav@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2437564 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Antony Clince Alex	8a9acf8a7e	gpu: nvgpu: move set_hww_esr_report_mask to golden context init The driver configures the sm hww global, warp ESR report masks during poweron as part of gops_gr.gr_init_support. However, during golden context init, these are overwritten with default entries from sw_ctx_load list; this leaves the report masks in a state inconsistent with the driver expectation. The driver should configure the sm hww warp, global ESR report masks during golden context init and not before it; Hence, move set_hww_esr_report_mask from power-on path to golden context init. In addition, update set_hww_esr_report_mask to do RMW, so as to retain the values loaded from sw_ctx_load list. Update global ESR report mask to enable all exceptions. Bug 3029888 Bug 2997718 Change-Id: Id7ad4cff5409982143f49695c95c5e1d1c9fdec9 Signed-off-by: Antony Clince Alex <aalex@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2367466 Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Seema Khowala <seemaj@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2020-12-15 14:13:28 -06:00
Seema Khowala	b91b1f06e1	gpu: nvgpu: check and handle all bits set in fecs_host_intr_status Check all the bits set in fecs_host_intr_status h/w register. Read fecs_host_intr_status before calling handle_fecs_error and store this info in isr_data. JIRA NVGPU-5502 Change-Id: I198b11aa62e394706007d6dc034fe0ac8da2bcb5 Signed-off-by: Seema Khowala <seemaj@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2343684 Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2020-12-15 14:13:28 -06:00
Rajesh Devaraj	b8c6ad3f5f	gpu: nvgpu: remove service IDs This patch removes the reporting of _ECC_CORRECTED errors which are not applicable to GV11B. Specifically, this patch removes the code related to the reporting of the following service IDs: NVGUARD_SERVICE_IGPU_SM_SWERR_LRF_ECC_CORRECTED NVGUARD_SERVICE_IGPU_SM_SWERR_CBU_ECC_CORRECTED NVGUARD_SERVICE_IGPU_PMU_SWERR_FALCON_DMEM_ECC_CORRECTED NVGUARD_SERVICE_IGPU_GPCCS_SWERR_FALCON_DMEM_ECC_CORRECTED NVGUARD_SERVICE_IGPU_FECS_SWERR_FALCON_DMEM_ECC_CORRECTED NVGUARD_SERVICE_IGPU_GCC_SWERR_L15_ECC_CORRECTED NVGUARD_SERVICE_IGPU_MMU_SWERR_L1TLB_FA_DATA_ECC_CORRECTED NVGUARD_SERVICE_IGPU_MMU_SWERR_L1TLB_SA_DATA_ECC_CORRECTED NVGUARD_SERVICE_IGPU_HUBMMU_SWERR_L2TLB_SA_DATA_ECC_CORRECTED NVGUARD_SERVICE_IGPU_HUBMMU_SWERR_TLB_SA_DATA_ECC_CORRECTED NVGUARD_SERVICE_IGPU_HUBMMU_SWERR_PTE_DATA_ECC_CORRECTED NVGUARD_SERVICE_IGPU_HUBMMU_SWERR_PDE0_DATA_ECC_CORRECTED NVGUARD_SERVICE_IGPU_SM_SWERR_ICACHE_L0_DATA_ECC_CORRECTED NVGUARD_SERVICE_IGPU_SM_SWERR_L1_DATA_ECC_CORRECTED NVGUARD_SERVICE_IGPU_SM_SWERR_ICACHE_L0_PREDECODE_ECC_CORRECTED NVGUARD_SERVICE_IGPU_SM_SWERR_ICACHE_L1_DATA_ECC_CORRECTED NVGUARD_SERVICE_IGPU_SM_SWERR_ICACHE_L1_PREDECODE_ECC_CORRECTED NVGUARD_SERVICE_IGPU_SM_SWERR_L1_TAG_MISS_FIFO_ECC_CORRECTED NVGUARD_SERVICE_IGPU_SM_SWERR_L1_TAG_S2R_PIXPRF_ECC_CORRECTED NVGUARD_SERVICE_IGPU_LTC_SWERR_CACHE_TSTG_ECC_CORRECTED NVGUARD_SERVICE_IGPU_LTC_SWERR_CACHE_RSTG_ECC_CORRECTED NVGUARD_SERVICE_IGPU_LTC_SWERR_CACHE_DSTG_BE_ECC_CORRECTED Bug 200616002 Change-Id: I199c396f9f6a6be007bd6d3c556199b5a73c3c91 Signed-off-by: Rajesh Devaraj <rdevaraj@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2349587 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Antony Clince Alex <aalex@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Vinod G	6a7bf6cdc0	gpu: nvgpu: update sm ecc_status_error handling Use gv11b_gr_intr_handle_tpc_sm_ecc_exception function for future chip to avoid code replication. Add sm_ecc_status_errors hal to read the ecc_status_errors Jira NVGPU-5033 Signed-off-by: Vinod G <vinodg@nvidia.com> Change-Id: I4a25837d9b833a48307b9353b82ff6597f985e41 Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2325537 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Vinod G	0e0b966f0c	gpu: nvgpu: update gr exception hal Make generic gr exception static functions to public functions. Jira NVGPU-5033 Signed-off-by: Vinod G <vinodg@nvidia.com> Change-Id: I9ac4cbc728edda813a487f80af622559a798b319 Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2324676 Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: Vedashree Vidwans <vvidwans@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
vinodg	a9d8fc96a7	gpu: nvgpu: hal correction for class error handling Add new hal function gp10b_gr_intr_handle_class_error. Update handle_class_error hal function for gp10b, gv11b and tu104 to gp10b_gr_intr_handle_class_error from gm20b_gr_intr_handle_class_error. gr_trapped_data_mme_pc uses 12 bits from gp10b. Move gm20b_gr_intr_handle_class_error hal function to non-fusa section. Jira NVGPU-4913 Signed-off-by: vinodg <vinodg@nvidia.com> Change-Id: Ic93013ba43d4bf409527109f2f2d43db11c4238e Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2314249 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Seema Khowala	c8050eabec	gpu: nvgpu: gr: fix return value of handle_sw_method Currently chip specific functions for handle_sw_method does not return -EINVAL if class does not match as expected. Fix it by setting default return value to -EINVAL and updating return value to 0 for successful class/method matches. JIRA NVGPU-4909 Change-Id: Ifb3aa35215171ddbc64d6f1d23f8944c9fe44b2d Signed-off-by: Seema Khowala <seemaj@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2285848 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2020-12-15 14:13:28 -06:00
vinodg	e32529da57	gpu: nvgpu: compile out unused code in gr.intr with safety build handle_tex_exception hal is not set for safety build. Add CONFIG_NVGPU_HAL_NON_FUSA checking for that hal. log_mme_exception hal is supported only for turing. Add CONFIG_NVGPU_DGPU checking for that hal. gr_intr_handle_class_error always return -EINVAL. Change the return as void to avoid unwanted error checking. nvgpu_gr_intr_get_channel_from_ctx function parameter curr_tsgid will never be NULL based on the current call. Remove unwanted (curr_tsgid != NULL) check from this function. Jira NVGPU-4454 Change-Id: I165d1cc5f9e308dfb11d905b59151b44f63a31bb Signed-off-by: vinodg <vinodg@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2259763 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:10:29 -06:00
vinodg	01039aec35	gpu: nvgpu: disable unused compute sw method for safety Add missing check with CONFIG_NVGPU_HAL_NON_FUSA in tu10x code for NVC0C0_SET_SHADER_EXCEPTIONS. Jira NVGPU-4454 Change-Id: Id8f0560c5061f7c017c84361059007d936dc53b5 Signed-off-by: vinodg <vinodg@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2257034 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> GVS: Gerrit_Virtual_Submit Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:10:29 -06:00
vinodg	22dea4ca3b	gpu: nvgpu: disable unused compute sw method in safety build. NVC0C0_SET_SHADER_EXCEPTIONS is never used by CUDA software. Hence disable that feature with CONFIG_NVGPU_HAL_NON_FUSA checking for safety build. Jira NVGPU-4454 Change-Id: If71b97bf8b2a6a8f8d0c7206b8e801094b5b1b7c Signed-off-by: vinodg <vinodg@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2256345 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com> GVS: Gerrit_Virtual_Submit Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:10:29 -06:00
Seshendra Gadagottu	d8058743d7	gpu: nvgpu: prepare class unit for safety build Move graphics related defs and functions under CONFIG_NVGPU_GRAPHICS switch. Move classes not supported in GV11B under CONFIG_NVGPU_NON_FUSA switch. Add missing valid class numbers to gpu_class.is_valid HAL. Also remove un-used class defs from class.h header. Lot of qnx safety tests are still using graphics 3d class. Until those tests got fixed, allowing 3d graphics class as valid class for safety build. JIRA NVGPU-4301 Change-Id: Ifd2a13bee3210821799c2bca10e7245eb3c79121 Signed-off-by: Seshendra Gadagottu <sgadagottu@nvidia.com> Signed-off-by: Tejal Kudav <tkudav@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2224658 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:10:29 -06:00
Philip Elcan	9169e8c048	gpu: nvgpu: mc: move mc declarations to mc.h Move declarations that belong to mc from gk20a.h to mc.h where they belong. JIRA NVGPU-2532 Change-Id: I91934ff60e2735c61d16459c04507fed6e1c96d7 Signed-off-by: Philip Elcan <pelcan@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2214421 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:10:29 -06:00
vinodg	0e9d835bd0	gpu: nvgpu: code correction in gr intr unit Remove the check for "tpc > U8_MAX" in sm ecc error reporting function. This check was added to limit TPC id in 8bits. No need for the "tpc > U8_MAX" check for assigning tpc = tpc & 0xFFU Jira NVGPU-4096 Change-Id: Ia5e2a94a9896b69bd8e06bf9e17c47f510eded1f Signed-off-by: vinodg <vinodg@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2211169 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com> Reviewed-by: Automatic_Commit_Validation_User GVS: Gerrit_Virtual_Submit Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:05:52 -06:00

1 2 3

116 Commits