linux-nvgpu

mirror of git://nv-tegra.nvidia.com/linux-nvgpu.git synced 2025-12-24 10:34:43 +03:00

Author	SHA1	Message	Date
Jinesh Parakh	b8b90f85ee	gpu: nvgpu: Fix CERT-C Violations Fix the following CERT-C Violations: gsp_runlist.c : CERT EXP34-C channel_sync_syncpt.c : CERT ERR33-C grmgr_ga100.c : CERT ERR33-C grmgr_ga10b.c : CERT ERR33-C debug.c : CERT ERR33-C debug_fecs_trace.c : CERT EXP34-C ioctl.c : CERT ERR33-C CID 495110 CID 141061 CID 222881 CID 222890 CID 450994 CID 366644 CID 466529 Bug 3512546 Signed-off-by: Jinesh Parakh <jparakh@nvidia.com> Change-Id: I318a27a6fcb8ea8f6d5d6c1f65d940c48d6f8dfc Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2723008 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Sagar Kamble <skamble@nvidia.com> Reviewed-by: Sachin Nikam <snikam@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-06-03 12:05:26 -07:00
Divya	dcec7f184e	gpu: nvgpu: disable elpg earlier in recovery path When MMU fault happens, if the id_type = 1, that means fault happened in TSG. So in that path we set the error notifier and let userspace know about faulty channel. During this, we check if debugger is attached or not by reading gr_gpc0_tpc0_sm0_dbgr_control0_r() register. During this time ELPG is enabled and this read causes IDLE SNAP error for ELPG. To resolve this, move CG/PG disable function call early in fifo recover code path. This ensures that ELPG is disabled early before any read happens for any GR register. Bug 3660592 Change-Id: Ie5d01b7ccf00167b58f260e9142aa5deb2a08be4 Signed-off-by: Divya <dsinghatwari@nvidia.com> (cherry picked from commit f09e429f2d142c20529bedc05acf193805e1bb25) Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2720655 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Mahantesh Kumbar <mkumbar@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-06-01 06:41:57 -07:00
Jinesh Parakh	cb78bca971	gpu: nvgpu: Fix Out-of-bounds access defect Fix following Coverity Defect: priv_ring_ga10b_fusa.c : Out-of-bounds access CID 10062315 Bug 3460991 Signed-off-by: Jinesh Parakh <jparakh@nvidia.com> Change-Id: I76e4122132ad13cc53d24816c59586695d6f80a4 Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2708565 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2022-05-31 11:07:30 -07:00
Rajesh Devaraj	019bee2174	gpu: nvgpu: add additional registers to allowlist To add GL/VK support for shader debugging via the SM trap handler functionality, a write operation to the following PRI registers need to be allowed in all chips (ga10b, gv11b, gm20b, gp10b): - NV_PGRAPH_PRI_GPCS_MMU_DEBUG_CTRL - NV_PGRAPH_PRI_GPCS_TPCS_SM_SCH_MACRO_SCHED - NV_PGRAPH_PRI_GPCS_TPCS_SMS_DBGR_CONTROL0 - NV_PGRAPH_PRI_GPCS_TPCS_SMS_HWW_WARP_ESR_REPORT_MASK - NV_PGRAPH_PRI_GPCS_TPCS_SMS_HWW_GLOBAL_ESR_REPORT_MASK In this patch, we are adding the above registers into allowlist, if they were absent. Note that these registers included only in non-safety using CONFIG_NVGPU_SET_FALCON_ACCESS_MAP flag. Bug 3642131 Change-Id: I5f62731944b6b3e059afa80a491c3cf5c3656f60 Signed-off-by: Rajesh Devaraj <rdevaraj@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2715799 Reviewed-by: svcacv <svcacv@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: Christopher Lentini <clentini@nvidia.com> Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: Christopher Lentini <clentini@nvidia.com>	2022-05-31 05:55:17 -07:00
prsethi	697215afd3	gpu: nvpgu: configure static ZBC table Patch defines a ZBC static table and configure it at sw layer. Later existing API read this sw configuration and program it to hw. This is applicable only for ga10b safety build and for other chips/ configuration it will be supported in the legacy way. Bug 3585766 Change-Id: I00d79162c0b096616e3f555da965e82e47c014d1 Signed-off-by: prsethi <prsethi@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2713821 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2022-05-29 10:56:58 -07:00
atanand	2ebc0bdf98	gpu: nvgpu: add broadcast to unicast expansion Add broadcast to unicast expansion for NV_PLTCG_LTCS_MISC_LTC_PM and PMM*_[GPC\|FBP]SROUTER broadcast registers for non-resident regops. Bug: 3442801 Change-Id: I88dcf00f4f6e910f0342d3968970070e0248a786 Signed-off-by: atanand <atanand@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2704951 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2022-05-28 08:59:44 -07:00
Krishna Reddy	961925be02	Revert "gpu: nvgpu: correct usage for gk20a_busy_noresume" This reverts commit `c1ea9e3955`. Reason for revert: ap_vulkan, ap_opengles, ap_mods tests failures Bug 3661058 Bug 3661080 Bug 3659004 Change-Id: I929b5675a4fb0ddc8cbf3eeefc982b4ba04ddc59 Signed-off-by: Krishna Reddy <vdumpa@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2718996 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>	2022-05-27 14:49:26 -07:00
Antony Clince Alex	a80c445a5d	gpu: nvgpu: ga10b: update runlist.write_state Update HAL function runlist.write_state to skip in-active runlists in fifo.runlists. It is possible for one or more engines to be floorswept in which case their associated runlist will be in-active, example, if host supports 3 runlists(0, 1, 2) each serving 3 engines(0, 1, 2), and engine-1 is floorswept, then runlist-1 becomes in-active and the entry fifo->runlists[1] will be set to NULL. Bug 3650588 Change-Id: Iaf9d75e310903c47b842e84dcfa2209d9fe7da96 Signed-off-by: Antony Clince Alex <aalex@nvidia.com> (cherry picked from commit e29a2019cf8f4796737c670f98164f7783448d49) Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2717075 Reviewed-by: svcacv <svcacv@nvidia.com> Reviewed-by: Seema Khowala <seemaj@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-05-27 14:46:18 -07:00
Jinesh Parakh	bb73cf9597	gpu: nvgpu: Fixed out-of-bounds Coverity Defects Fix following Coverity Defects: clk_mon_tu104.c : Out-of-bounds read and Out-of-bounds access CID 10061400 CID 10061401 Bug 3460991 Changed the datatype of domain_mask from u32 to unsigned long to solve the out-of-bounds defect. Signed-off-by: Jinesh Parakh <jparakh@nvidia.com> Change-Id: I1c43bd90053264ee4104ca8c3a33d9ea07f04045 Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2708765 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2022-05-25 11:44:59 -07:00
Debarshi Dutta	c1ea9e3955	gpu: nvgpu: correct usage for gk20a_busy_noresume Background: In case of a deferred suspend implemented by gk20a_idle, the device waits for a delay before suspending and invoking power gating callbacks. This helps minimize resume latency for any resume calls(gk20a_busy) that occur before the delay. Now, some APIs spread across the driver requires that if the device is powered on, then they can proceed with register writes, but if its powered off, then it must return. Examples of such APIs include l2_flush, fb_flush and even nvs_thread. We have relied on some hacks to ensure the device is kept powered on to prevent any such delayed suspension to proceed. However, this still raced for some calls like ioctl l2_flush, so gk20a_busy() was added (Refer to commit Id dd341e7ecbaf65843cb8059f9d57a8be58952f63) Upstream linux kernel has introduced the API pm_runtime_get_if_active specifically to handle the corner case for locking the state during the event of a deferred suspend. According to the Linux kernel docs, invoking the API with ign_usage_count parameter set to true, prevents an incoming suspend if it has not already suspended. With this, there is no longer a need to check whether nvgpu_is_powered_off(). Changed the behavior of gk20a_busy_noresume() to return bool. It returns true, iff it managed to prevent an imminent suspend, else returns false. For cases where PM runtime is disabled, the code follows the existing implementation. Added missing gk20a_busy_noresume() calls to tlb_invalidate. Also, moved gk20a_pm_deinit to after nvgpu_quiesce() in the module removal path. This is done to prevent regs access after registers are locked out at the end of nvgpu_quiesce. This can happen as some free function calls post quiesce might still have l2_flush, fb_flush deep inside their stack, hence invoke gk20a_pm_deinit to disable pm_runtime immediately after quiesce. Kept the legacy implementation same for VGPU and older kernels Jira NVGPU-8487 Signed-off-by: Debarshi Dutta <ddutta@nvidia.com> Change-Id: I972f9afe577b670c44fc09e3177a5ce8a44ca338 Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2715654 Reviewed-by: Sagar Kamble <skamble@nvidia.com> Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-05-25 04:59:46 -07:00
Dinesh T	6e4c3275bf	gpu: nvgpu: Set max_ways_evict_cache to maximum This is setting evict_max_ways for L2 cache to the maximum supported value for safety. In normal build L2 cache MAX_EVICT_LAST is configure via KMD and RegOps. RegOps is enabled only on standard build with CONFIG_DEBUGGER flag. This method we cant use it for safety build. Safety we can make use of the patch buffer to patch the register while creating the context. JIRA NVGPU-8227 Change-Id: Iec5d73197239b9cad31c6b593ca2b87c224aad5e Signed-off-by: Dinesh T <dt@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2708702 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2022-05-18 22:57:54 -07:00
Richard Zhao	7a39d8e9ce	gpu: nvgpu: hal: remove .fb.mem_unlock for ga10b .fb.mem_unlock is specific to dGPU. Jira GVSCI-9976 Signed-off-by: Richard Zhao <rizhao@nvidia.com> Change-Id: Id1a52becaa5c2b0a00899c3bc92aa18058a1f97d Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2708394 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2022-05-18 00:59:05 -07:00
Sagar Kamble	e96740c59a	gpu: nvgpu: change ecc error counters increment to wrapping type Usage of nvgpu_safe_add_u32 to increment nvgpu maintained corrected ecc error counters can lead to BUG due to overflow as corrected ecc errors can keep coming in and system will continue to operate. In some configurations, uncorrected error counters can also overflow and lead to BUG. Increment these counters and their delta calculations to use nvgpu_wrapping_add_u32. JIRA NVGPU-7054 Change-Id: I85ddddfa46062744cccbe0756ad942787e72f01b Signed-off-by: Sagar Kamble <skamble@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2601152 (cherry picked from commit f016e59189d2bd66e23f17ccb638f6d384b82fbd) Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2623638 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2022-05-17 08:41:07 -07:00
Sagar Kamble	d3b417ce2c	gpu: nvgpu: address priv_ring unit code inspection gaps 1. Hardcoded constants are defined using #define are converted to const. 2. set_ppriv_timeout_settings HAL is not applicable from gm20b. Hence remove it completely. JIRA NVGPU-6903 Change-Id: Ic096c5dc87aa45db0aa05482947cd032ae72bdd4 Signed-off-by: Sagar Kamble <skamble@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2552581 (cherry picked from commit c5fb38a54208330f24754fed33d7242903dbac59) Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2623635 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2022-05-17 08:40:46 -07:00
Debarshi Dutta	48cd58d332	gpu: nvgpu: add timeout error handling Report a timeout error when fb_mmu_ctrl_r() register doesn't correctly reflect the tlb invalidate status. Jira NVGPU-7192 Signed-off-by: Debarshi Dutta <ddutta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2603267 (cherry picked from commit b16ed38d087667bc2bddaddde820648d6a931064) Change-Id: I2360c8741b396b26079438a917770e0bb051c661 Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2700042 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-05-13 01:00:40 -07:00
Debarshi Dutta	76cc8870e1	nvgpu: gpu: update default nvs domain implementation In current form, the default domain acts like any schedulable domain. TSGs are bound to it and it can be enumerated via the public interfaces. The new expectation for the default domain is meant to change from the current form to a pseudo domain that cannot act like an ordinary domain in other ways, i.e. it must not be reachable by in particular the domain management API, it can't be removed, does not show up in lists, and TSGs cannot be explicitly bound to this domain. It won't participate in round-robin domain scheduling. It is not really a domain, and acts like one only when activated in the manual mode. Following changes are made overall to support the above change in definition. 1) Domain creation and attaching the domain to the scheduler are now split into two separate functions. The new default domain (having ID = UINT64_MAX) is created separately from a static function without linking it with other domains in the scheduler. 2) struct nvgpu_nvs_scheduler explicitely stores the default domain to support direct lookups. 3) TSGs are initially not bound to default domain/rl_domain. Jira NVGPU-8165 Signed-off-by: Debarshi Dutta <ddutta@nvidia.com> Change-Id: I916d11f4eea5124d8d64176dc77f3806c6139695 Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2697477 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2022-05-12 00:24:58 -07:00
Sagar Kamble	d82400d2b8	gpu: nvgpu: fix MISRA Rule 5.1 violation BVEC changes for nvgpu_rc_pbdma_fault and nvgpu_rc_mmu_fault started reporting below MISRA issue. kernel/nvgpu/drivers/gpu/nvgpu/common/fifo/tsg.c:321: 1. misra_c_2012_rule_5_1_violation: Declaration with identifier "nvgpu_tsg_unbind_channel_check_hw_state", which is ambiguous. kernel/nvgpu/drivers/gpu/nvgpu/common/fifo/tsg.c:349: 2. other_declaration: The first 31 characters of identifiers "nvgpu_tsg_unbind_channel_check_ctx_reload" and "nvgpu_tsg_unbind_channel_check_hw_state" are identical. Do below renames to fix the issue. Doing both for consistency. s/nvgpu_tsg_unbind_channel_check_hw_state/nvgpu_tsg_unbind_channel_hw_state_check s/nvgpu_tsg_unbind_channel_check_ctx_reload/nvgpu_tsg_unbind_channel_ctx_reload_check JIRA NVGPU-6772 Change-Id: Ib92cabe11c486621351bf15ddb86e20d16d514c4 Signed-off-by: Sagar Kamble <skamble@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2584152 (cherry picked from commit a619f259c6a4ffccb05550767212989af60c2a90) Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2706551 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-05-11 04:18:12 -07:00
Shashank Singh	ba22f6263b	gpu: nvgpu: move gv11b code under config flag Move gv11b specific code under CONFIG_NVGPU_GV11B_SUPPORT so that gv11b support can be removed for qnx later as it is no longer POR for qnx on dev-main. Jira NVGPU-8189 Change-Id: Idc17cfa22199f2b69a1bab0849cd2bd2e0fb6288 Signed-off-by: Shashank Singh <shashsingh@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2693828 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2022-05-11 04:13:23 -07:00
Sagar Kamble	9d6269ce7f	gpu: nvgpu: assert gr dev is non-NULL nvgpu_device_get can return NULL if supplied invalid ID or instance ID. We expect GR device struct to be non-NULL there hence just assert that it is indeed non-NULL in gr_reset_engine and ga10b_grmgr_init_gr_manager. CID 224133 CID 250232 Bug 3512546 Change-Id: Id09a1c436a8e49b921111b940d3d013bd66bff7a Signed-off-by: Sagar Kamble <skamble@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2707018 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2022-05-07 23:24:39 -07:00
Richard Zhao	1ce899ce46	gpu: nvgpu: fix compile error of new compile flags Preparing to push hvrtos gpu server changes which requires bellow CFLAGS: -Werror -Wall -Wextra \ -Wmissing-braces -Wpointer-arith -Wundef \ -Wconversion -Wsign-conversion \ -Wformat-security \ -Wmissing-declarations -Wredundant-decls -Wimplicit-fallthrough Jira GVSCI-11640 Signed-off-by: Richard Zhao <rizhao@nvidia.com> Change-Id: I25167f17f231ed741f19af87ca0aa72991563a0f Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2653746 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svcacv <svcacv@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-05-07 15:11:49 -07:00
Antony Clince Alex	61ae0b7642	gpu: nvgpu: fix emulate mode enable The emulate mode support is determined after chip detect and is flagged by using NVGPU_SUPPORT_EMULATE_MODE flag. The present logic prevents user from configuring the emulate mode sysfs knobs if this flag is not set, however the emulate mode usecase requires the user to configure the syfs knob prior to power-on, hence defer emulate mode check to a later stage after chip detect. Bug 3621460 Change-Id: If522527542fa8d7e95ccbcff43b74adbb9e976e6 Signed-off-by: Antony Clince Alex <aalex@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2703953 Reviewed-by: svcacv <svcacv@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Mayur Poojary <mpoojary@nvidia.com> Reviewed-by: Mahantesh Kumbar <mkumbar@nvidia.com> Reviewed-by: Ankur Kishore <ankkishore@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: David Li <davli@nvidia.com>	2022-04-29 06:17:59 -07:00
Jinesh Parakh	5b38f48227	gpu: nvgpu: Fix Bad bit shift Coverity issues Fixed following Coverity Defects: sec2_tu104.c : Bad bit shift operation grmgr_ga10b.c : Bad bit shift operation clk_gm20b.c : Bad bit shift operation gsp_ga10b.c : Bad bit shift operation CID 9869505 CID 9869506 CID 10062538 CID 10112176 CID 10127981 Bug 3460991 Signed-off-by: Jinesh Parakh <jparakh@nvidia.com> Change-Id: I9cbb2403ccd41d6cbe99f73f27f915969094fa5b Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2689262 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2022-04-29 06:08:50 -07:00
Jinesh Parakh	622fe70dab	gpu: nvgpu: Fix Bad bit shift Coverity issues Fixed following Coverity Defects: ioctl_as.c : Bad bit shift operation mc_tu104.c : Bad bit shift operation vm.c : Bad bit shift operation vm_remap.c : Bad bit shift operation A new linux header file for ilog2 is created. The files which used the old ilog2 function have been changed to use the new nvgpu_ilog2 function. CID 9847922 CID 9869507 CID 9859508 CID 10112314 CID 10127813 CID 10127899 CID 10128004 Signed-off-by: Jinesh Parakh <jparakh@nvidia.com> Change-Id: Ia201eea7cc426c3d6581e1e5ae3b882dbab3b490 Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2700994 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2022-04-28 04:08:45 -07:00
Sagar Kadamati	ac879458ca	gpu: nvgpu: gr: split gm20b_gr_falcon_ctrl_ctxsw NVGPU_GR_FALCON_METHOD_FECS_TRACE_FLUSH is not used from gp10b So spliting gm20b_gr_falcon_ctrl_ctxsw() into below functions * gm20b_gr_falcon_ctrl_ctxsw() * gm20b_gr_falcon_ctrl_ctxsw_internal() Jira NVGPU-7287 Change-Id: I00433d5ac8dc4f64d4d90c8ae0cebee424a5bd41 Signed-off-by: Sagar Kadamati <skadamati@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2694431 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svcacv <svcacv@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Ankur Kishore <ankkishore@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-04-15 04:30:29 -07:00
Tejal Kudav	dae284c74b	gpu: nvgpu: Disable GR functional intrs on safety Disable below interrupts on safety as they do not report any error condition and are not used by CUDA and Graphics(VKSC) on safety build. Signoff from CUDA and VKSC is on Bug https://nvbugs/3588603 1. NV_PGRAPH_INTR_NOTIFY: This intr is set when the Notification style is WRITE_THEN_AWAKEN. 2. NV_PGRAPH_INTR_SEMAPHORE: This is set when a 3d class sempahore is released as the result ofa SetSemaphoreD method, when the AwakenEnable field is TRUE. 3. NV_PGRAPH_INTR_BUFFER_NOTIFY: This bit is set when a Mem2mem DMA completes and the LaunchDma method specifies the interrupt type as INTERRUPT 4. NV_PGRAPH_INTR_DEBUG_METHODS: This is debug feature and not used on QNX safety Bug 3588603 JIRA NVGPU-8166 Change-Id: I6d07dfd2857ac047fac4599421600d364251df76 Signed-off-by: Tejal Kudav <tkudav@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2694363 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2022-04-13 02:35:35 -07:00
Antony Clince Alex	83fe3fd35e	gpu: nvgpu: add errata NVGPU_ERRATA_3524791 Update PES, ROP exception handling for NVGPU_ERRATA_3524791. Enable the errata for all Volta+ chips. ROP, PES exceptions are being reported using the physical-id, where logical-id should have been used. All ESR status registers are reported using logical-id, so this matches with the SW expectation. To address the (1), update ROP, PES exception handler translate from physical to logical-id before reading the status registers. Bug 3524791 Change-Id: Ieacbfb306bb0e69cf0113dc92f18e401573722e3 Signed-off-by: Antony Clince Alex <aalex@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2680029 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2022-04-13 02:32:30 -07:00
Antony Clince Alex	32a9c6923c	gpu: nvgpu: gv11b: udpate PES exception handling At present, the driver only report/handle exceptions from PES0, however, Volta+ chips have 2 PES units within a GPC. Update the PES exception handler to report/handle exceptions from both PES0,1 units. Bug 3524791 Change-Id: I71ac75cc1abe492b7aa781d8d16077f4da3a997b Signed-off-by: Antony Clince Alex <aalex@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2679931 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Seema Khowala <seemaj@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: Seema Khowala <seemaj@nvidia.com>	2022-04-13 02:32:19 -07:00
Antony Clince Alex	62d6f753d2	gpu: nvgpu: add support for PES, ROP floorsweeping Volta+ chips supports PES floorsweeping and Ampere+(iGPU) chips supports ROP floorsweeping. At present, the driver isn't aware of PES, ROP floorsweeping, make the driver PES, ROP floorsweeping aware by introducing the following fields in nvgpu_gr_config: - gpc_(rop/pes)_mask: Contains the bit mask of non FSed ROP/PES units per GPC. - gpc_(rop/pes)_logical_id_map: Translates per GPC ROP/PES physical id to logical id. Introduce the following HAL functions to read PES/ROP FS data: - gops_fuse.fuse_status_opt_(pes/rop)_gpc: This fuction gets the FS config from the fuse. - gops_top.get_max_(pes/rop)_per_gpc: Gets the maximum number of PES/ROP units that can be present in a GPC. In addition, introduce the enabled flag NVGPU_SUPPORT_PES_FS to identify chips which support PES floorsweeping, piggyback on NVGPU_SUPPORT_ROP_IN_GPC enabled flag to identify ROP floorsweeping. Bug 3524791 Change-Id: I065bab6c02618fe38892c8c890b069c340b85301 Signed-off-by: Antony Clince Alex <aalex@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2679570 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2022-04-13 02:32:14 -07:00
Antony Clince Alex	19a8adeae1	gpu: nvgpu: prof: add new resource type Add new profiler resource type NVGPU_PROFILER_PM_RESOURCE_TYPE_PC_SAMPLER. Introduce regops HAL get_hwpm_pc_sampler_register_ranges to get allowlist for PC_SAMPLER resources. Re-generate allowlist files to include register ranges for PC_SAMPLER resources. Update uapi header to advertise new resource type NVGPU_PROFILER_PM_RESOURCE_ARG_PC_SAMPLER. Bug 3408536 Change-Id: I7009ef822665771eed727da48ef1e89dcc6b9c4b Signed-off-by: Antony Clince Alex <aalex@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2689057 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-04-12 16:30:52 -07:00
Sagar Kamble	11819380e8	gpu: nvgpu: remove invalid NULL check mmufault will not be NULL when recovery is triggered of type RC_TYPE_MMU_FAULT. NULL check for it at one place followed by dereference is flagged as CERT issue. Remove this invalid NULL check. CID 17871 Bug 3512546 Change-Id: Ice8035f5df33c45ef0afb4c2a1395e0d5455652c Signed-off-by: Sagar Kamble <skamble@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2692544 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: svcacv <svcacv@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-04-08 00:00:36 -07:00
Rajesh Devaraj	43ba356132	gpu: nvgpu: fix typo in error id SM_LRF_ECC_UNCORRECTED error is incorrectly reported using the error ID of SM_CBU_ECC_UNCORRECTED error. This patch fixes this typo issue. Bug 3366818 Change-Id: I9c274be45776711ab9c70ef66a75dc23afa276a6 Signed-off-by: Rajesh Devaraj <rdevaraj@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2688984 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: svcacv <svcacv@nvidia.com> Reviewed-by: Dinesh T <dt@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: Ankur Kishore <ankkishore@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-03-29 10:12:17 -07:00
Jinesh Parakh	bbaf01590c	gpu: nvgpu: Fix Logically dead code Coverity bugs Fixed following Coverity Defects: ioctl_clk_arb.c : Logically dead code gr_gp10b.c : Logically dead code vfe_var.c : Logically dead code grmgr_ga10b.c : Logically dead code vm_remap.c : Logically dead code falcon_debug.c : Logically dead code CID 1994001 CID 3008644 CID 9870823 CID 10062537 CID 10127915 CID 10128008 Bug 3460991 Signed-off-by: Jinesh Parakh <jparakh@nvidia.com> Change-Id: I711d2ccb480328d8f0a4ba49e877612669f3d41f Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2686362 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: Sagar Kamble <skamble@nvidia.com> Reviewed-by: Sachin Nikam <snikam@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-03-28 07:36:44 -07:00
Tejal Kudav	fd339ca599	gpu: nvgpu: Add err reporting call for CE HSI Below two error IDs are not relevat for safety build, but should be reported in standard NON_FUSA build for Linux error reporting supporting. GPU_CE_FBUF_CRC_FAIL GPU_CE_FBUF_MAGIC_CHK_FAIL JIRA NVGPU-8148 Bug 3366818 Change-Id: Ia276940c74a0965b7501e52596577e04c3447493 Signed-off-by: Tejal Kudav <tkudav@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2687017 Reviewed-by: Rajesh Devaraj <rdevaraj@nvidia.com> Reviewed-by: svcacv <svcacv@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-03-25 21:38:12 -07:00
mkumbar	0b9dc3dbc3	gpu: nvgpu: PMU NVRISCV engine HSI support Below listed HSI are handled with PMU ISR handler and all these triggers interrupt from individual unit upon issue. -Add ECC check for IMEM, DMEM, DCLS, REG, and MPU as per HSI req -Add MEMERR check for GPU_PMU_ACCESS_TIMEOUT_UNCORRECTED PMU HSI id -Add IOPMP check for GPU_PMU_ILLEGAL_ACCESS_UNCORRECTED PMU HSI id -Add WDT check for GPU_PMU_WDT_UNCORRECTED PMU HSI id Bug 3491596 Bug 3366818 Change-Id: I751d653e447017ac62a2459da2c6bb9da506f438 Signed-off-by: mkumbar <mkumbar@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2686566 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2022-03-25 13:25:39 -07:00
Antony Clince Alex	f670687441	gpu: nvgpu: move ltc_tstg_mgmt register setup The ltc_ltcs_ltss_tstg_set_mgmt_3 register should only be configured after ACR init, hence move it down the init order from early_init to finalize_poweron after acr is loaded. Bug 3514215 Change-Id: I2462715d25f75b7476ab163cd6c9f73ced5efb6d Signed-off-by: Antony Clince Alex <aalex@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2685547 Reviewed-by: svcacv <svcacv@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Mahantesh Kumbar <mkumbar@nvidia.com> Reviewed-by: Seema Khowala <seemaj@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-03-25 13:14:15 -07:00
Antony Clince Alex	c6f07c14a9	Revert "nvgpu: ga10b: disable errata NVGPU_ERRATA_200601972" This reverts commit `7194fe5f1f`. Bug 3514215 Change-Id: Ief98d8d34d72d233432d0ff0c1997c7f62b8b7d6 Signed-off-by: Antony Clince Alex <aalex@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2674181 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2022-03-25 13:13:58 -07:00
Rajesh Devaraj	37c6b8b1c3	gpu: nvgpu: update reporting of errors to sdl In Drive 6.0, the error reporting is supported only for orin (ga10b) in dev-main. For this purpose, this patch does the following: - Removes the redundant reporting of following IDs from gv11b: - GPU_HOST_PFIFO_SCHED_ERROR - GPU_HOST_PFIFO_CTXSW_TIMEOUT_ERROR - GPU_HOST_PBDMA_HCE_ERROR - GPU_MMU_L1TLB_SA_DATA_ECC_UNCORRECTED - GPU_MMU_L1TLB_FA_DATA_ECC_UNCORRECTED - GPU_LTC_CACHE_DSTG_ECC_CORRECTED - GPU_LTC_CACHE_TSTG_ECC_UNCORRECTED - Migrates the reporting of following IDs from gv11b to ga10b: - GPU_SM_L1_TAG_ECC_CORRECTED - GPU_SM_L1_TAG_ECC_UNCORRECTED - GPU_SM_CBU_ECC_UNCORRECTED - GPU_SM_LRF_ECC_UNCORRECTED - GPU_SM_L1_DATA_ECC_UNCORRECTED - GPU_SM_ICACHE_L1_DATA_ECC_UNCORRECTED - GPU_SM_ICACHE_L0_PREDECODE_ECC_UNCORRECTED - GPU_SM_L1_TAG_MISS_FIFO_ECC_UNCORRECTED - GPU_SM_L1_TAG_S2R_PIXPRF_ECC_UNCORRECTED - Removes the unused ID that doesn't have any HSI related to it: - GPU_HOST_PBDMA_PREEMPT_ERROR In addition to the above, this patch does the following: - Updates error IDs related to page fault error. - Updates look-up table to remove unused error IDs. JIRA NVGPU-8094 Bug 200729736 Change-Id: Ifea76d38ba609c894560e61ff5a6e406290f919e Signed-off-by: Rajesh Devaraj <rdevaraj@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2685249 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Dinesh T <dt@nvidia.com> Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-03-23 21:02:15 -07:00
Rajesh Devaraj	c5822b0d98	gpu: nvgpu: add error prints for errors reported to sdl In Drive 6.0, only error IDs are reported to Safety_Services. The additional debug/error information is printed using nvgpu_err(). JIRA NVGPU-8094 Bug 3491596 Change-Id: Ie90f3e1453e6a796d5c76373c11f8a5a188ac590 Signed-off-by: Rajesh Devaraj <rdevaraj@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2684289 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Dinesh T <dt@nvidia.com> Reviewed-by: Ankur Kishore <ankkishore@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-03-22 17:55:10 -07:00
Divya	201b5c1c7f	gpu: nvgpu: add SLCG support for GSP and CTRL unit Add SLCG register programming for GSP and CTRL units Bug 3452217 Change-Id: I69e414a82b5c12f26ff3b6626c328b5c0aa9e04c Signed-off-by: Divya <dsinghatwari@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2678782 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2022-03-18 07:54:48 -07:00
Divya	e9563e40d1	gpu: nvgpu: add GSP and CTRL CG support Add SLCG clock gating support for GSP and CTRL units Bug 3452217 Change-Id: Ic014fc03c8f4ad951b1fba8ab7b3e1cd1a23a59c Signed-off-by: Divya <dsinghatwari@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2678680 Reviewed-by: svcacv <svcacv@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Deepak Goyal <dgoyal@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-03-18 07:54:37 -07:00
Antony Clince Alex	40231858a5	gpu: nvgpu: add enable flag for gpu emulate mode Introduce enable flag NVGPU_SUPPORT_EMULATE_MODE, and bring emulate mode feature under this flag. At present, gpu emulate mode is only support on ga10b. Jira NVGPU-8120 Change-Id: I85269992926c3cf8f2d1dd70882979e1c4656984 Signed-off-by: Antony Clince Alex <aalex@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2681613 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2022-03-17 19:48:03 -07:00
Sagar Kamble	2740221686	gpu: nvgpu: update OPT_ECC_EN fuse disabled error log This fuse is enabled only on safety systems. Bug 3536502 Change-Id: I9884c3df1d08cef7fca58885388803259780d62c Signed-off-by: Sagar Kamble <skamble@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2679034 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Antony Clince Alex <aalex@nvidia.com> Reviewed-by: Ankur Kishore <ankkishore@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-03-16 08:21:37 -07:00
Sagar Kadamati	7e045bd2b4	gpu: nvgpu: split ecc interrupts Move non ecc related code from ga10b_ltc_intr_handle_lts_intr3() to ga10b_ltc_intr3_interrupts() update below functions to non-static * ga10b_ltc_intr_handle_lts_intr * ga10b_ltc_intr_handle_lts_intr2 * ga10b_ltc_intr_handle_lts_intr3 Jira NVGPU-7287 Change-Id: If5b6fcd8cfc1696c2837323cf4fc8a7504b813a0 Signed-off-by: Sagar Kadamati <skadamati@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2676830 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Seema Khowala <seemaj@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-03-16 03:15:22 -07:00
mpoojary	c1a995403a	gpu: nvgpu: Add ACR error reporting to SDL -Add check for ECC parity errors in IMEM, DMEM, EMEM, DCLS, REG for ACR running in GSP engine. The EXTIRQ3 external interrupt is set from ACR pointing towards host. -Add function to check error type when ACR or Bootrom execution fails and report accordingly to SDL with relevant error codes. This is a part of HSI safety requirements. Bug 3564039 Jira NVGPU-8108 Change-Id: I65407371f7a1d1ba50a10bdf443ef6b903eeaa36 Signed-off-by: mpoojary <mpoojary@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2678100 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2022-03-15 17:33:42 -07:00
Dinesh T	358f62a9d7	gpu: nvgpu: Add compression for safety This is adding compression support for qnx-safety by - Adding the compression related files under FUSA. - Adding new posix contig-pool.c for user space compilation. Bug 3426194 Change-Id: Ib3c8e587409dc12099c1196f55a87858d4dc520e Signed-off-by: Dinesh T <dt@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2652963 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2022-03-15 17:30:57 -07:00
Tejal Kudav	b80b2bdab8	gpu: nvgpu: Add CE interrupt handling a. LAUNCH_ERR - Userspace error. - Triggered due to faulty launch. - Handle using recovery to reset CE engine and teardown the faulty channel. b. An INVALID_CONFIG - - Triggered when LCE is mapped to floorswept PCE. - On iGPU, we use the default PCE 2 LCE HW mapping. The default mapping can be read from NV_CE_PCE2LCE_CONFIG INIT value in CE refmanual. - NvGPU driver configures the mapping on dGPUs (currently only on Turing). - So, this interrupt can only be triggered if there is kernel or HW error - Recovery ( which is killing the context + engine reset) will not help resolve this error. - Trigger Quiesce as part of handling. c. A MTHD_BUFFER_FAULT - - NvGPU driver allocates fault buffers for all TSGs or contexts, maps them in BAR2 VA space and writes the VA into channel instance block. - Can be triggered only due to kernel bug - Recovery will not help, need quiesce d. FBUF_CRC_FAIL - Triggered when the CRC entry read from the method fault buffer does not match the computed CRC from the methods contained in the buffer. - This indicates memory corruption and is a fatal interrupt which at least requires the LCE to be reset before operations can start again, if not the entire GPU. - Better to quiesce on memory corruption CE Engine reset (via recovery) will not help. e. FBUF_MAGIC_CHK_FAIL - Triggered when the MAGIC_NUM entry read from the method fault buf does not match NV_CE_MTHD_BUFFER_GLOBAL_HDR_MAGIC_NUM_VAL - This indicates memory corruption and is a fatal interrupt - Better to quiesce on memory corruption f. STALLING_DEBUG - Only triggered with SW write for debug purposes - Debug interrupt, currently ignored Move launch error handling from GP10b to GV11b HAL as - 1. LAUNCHERR_REPORT errcode METHOD_BUFFER_ACCESS_FAULT is not defined on Pascal 2. We do not support GP10b on dev-main ToT JIRA NVGPU-8102 Change-Id: Idc84119bc23b5e85f3479fe62cc8720e98b627a5 Signed-off-by: Tejal Kudav <tkudav@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2678893 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2022-03-14 17:12:14 -07:00
Tejal Kudav	15739c52e9	gpu: nvgpu: Fix NULL ptr deref during quiesce g->fifo.runlists[] has size of g->fifo.max_runlists. During quiesce, U32_MAX bitmask is passed to g->ops.runlist.write_state() HAL to disable all the runlist. The Ga10b HAL implementation of g->ops.runlist.write_state() references into runlists[] structure for all the bits set in input runlist mask. For mask=U32_MAX, there is NULL pointer dereference when runlist_id exceeds g->fifo.max_runlists. Add runlist_id boundary check before dereferencing the runlists[] structure. Update Gk20a HAL too with similar guard to make sure incorrect mask doesn't get written to the register. JIRA NVGPU-8102 Change-Id: Ic613aa38361b8b23d953c76d6924aba6bf6d5ea9 Signed-off-by: Tejal Kudav <tkudav@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2680847 Reviewed-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-by: svcacv <svcacv@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-03-14 17:12:01 -07:00
Deepak Nibade	a1ef716f9d	gpu: nvgpu: set graphics specific PRI values for graphics contexts Add new HAL gops.gr.init.set_default_gfx_regs() to set graphics specific PRI values for graphics contexts in function nvgpu_gr_obj_ctx_alloc(). Add new HAL gops.gr.init.capture_gfx_regs() to capture and save init values for the PRIs. Add new struct nvgpu_gr_obj_ctx_gfx_regs to hold the PRI init values. Define HAL functions gv11b_gr_init_set_default_gfx_regs() and gv11b_gr_init_capture_gfx_regs(). Set the HAL functions for gv11b and ga10b. Register accessors required to set PRIs are auto-generated. Bug 3506078 Change-Id: I4c2843a274f3c924e402541e600e104ed0c9ed1c Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2671598 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Shashank Singh <shashsingh@nvidia.com> Reviewed-by: Jonathan Mccaffrey <jmccaffrey@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-03-14 13:17:05 -07:00
Dinesh T	e4cf52123f	gpu: nvgpu: Add ce halt function This is adding CE halt fuction to reset CE properly by setting stall req and waiting for stallack. Bug 200641946 Change-Id: I501ccf68a4f6fe95911e73fa2eb65bde93a9f3e9 Signed-off-by: Dinesh T <dt@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2678366 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2022-03-11 20:44:38 -08:00
Tejal Kudav	3bfab5df3f	gpu: nvgpu: Disable fault mthd buf intrs on safety Below CE interrupts are disabled on safety build as fault and switch mechanism is not supported on safety: NV_CE_LCE_INTR_STATUS_MTHD_BUFFER_FAULT NV_CE_LCE_INTR_STATUS_FBUF_CRC_FAIL NV_CE_LCE_INTR_STATUS_FBUF_MAGIC_CHK_FAIL Bug 3548082 Change-Id: I400cd02a8c9888b7ef0d71bbc1f7d792b48e8227 Signed-off-by: Tejal Kudav <tkudav@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2679052 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2022-03-10 16:04:37 -08:00

1 2 3 4 5 ...

1234 Commits