linux-nvgpu

mirror of git://nv-tegra.nvidia.com/linux-nvgpu.git synced 2025-12-23 18:16:01 +03:00

Author	SHA1	Message	Date
Jinesh Parakh	658f83ca48	gpu: nvgpu: Fix Explicit null dereference Fix the following Coverity Defect: pwrpolicy.c : Explicit null dereference CID 10059138 Bug 3460991 Signed-off-by: Jinesh Parakh <jparakh@nvidia.com> Change-Id: Ie572e0608d0b07d5023e7cca878d16087cfc284f Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2717978 Reviewed-by: svcacv <svcacv@nvidia.com> Reviewed-by: Ramesh Mylavarapu <rmylavarapu@nvidia.com> Reviewed-by: Sachin Nikam <snikam@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-05-30 12:49:04 -07:00
prsethi	697215afd3	gpu: nvpgu: configure static ZBC table Patch defines a ZBC static table and configure it at sw layer. Later existing API read this sw configuration and program it to hw. This is applicable only for ga10b safety build and for other chips/ configuration it will be supported in the legacy way. Bug 3585766 Change-Id: I00d79162c0b096616e3f555da965e82e47c014d1 Signed-off-by: prsethi <prsethi@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2713821 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2022-05-29 10:56:58 -07:00
Krishna Reddy	961925be02	Revert "gpu: nvgpu: correct usage for gk20a_busy_noresume" This reverts commit `c1ea9e3955`. Reason for revert: ap_vulkan, ap_opengles, ap_mods tests failures Bug 3661058 Bug 3661080 Bug 3659004 Change-Id: I929b5675a4fb0ddc8cbf3eeefc982b4ba04ddc59 Signed-off-by: Krishna Reddy <vdumpa@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2718996 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>	2022-05-27 14:49:26 -07:00
Jinesh Parakh	bb73cf9597	gpu: nvgpu: Fixed out-of-bounds Coverity Defects Fix following Coverity Defects: clk_mon_tu104.c : Out-of-bounds read and Out-of-bounds access CID 10061400 CID 10061401 Bug 3460991 Changed the datatype of domain_mask from u32 to unsigned long to solve the out-of-bounds defect. Signed-off-by: Jinesh Parakh <jparakh@nvidia.com> Change-Id: I1c43bd90053264ee4104ca8c3a33d9ea07f04045 Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2708765 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2022-05-25 11:44:59 -07:00
Debarshi Dutta	c1ea9e3955	gpu: nvgpu: correct usage for gk20a_busy_noresume Background: In case of a deferred suspend implemented by gk20a_idle, the device waits for a delay before suspending and invoking power gating callbacks. This helps minimize resume latency for any resume calls(gk20a_busy) that occur before the delay. Now, some APIs spread across the driver requires that if the device is powered on, then they can proceed with register writes, but if its powered off, then it must return. Examples of such APIs include l2_flush, fb_flush and even nvs_thread. We have relied on some hacks to ensure the device is kept powered on to prevent any such delayed suspension to proceed. However, this still raced for some calls like ioctl l2_flush, so gk20a_busy() was added (Refer to commit Id dd341e7ecbaf65843cb8059f9d57a8be58952f63) Upstream linux kernel has introduced the API pm_runtime_get_if_active specifically to handle the corner case for locking the state during the event of a deferred suspend. According to the Linux kernel docs, invoking the API with ign_usage_count parameter set to true, prevents an incoming suspend if it has not already suspended. With this, there is no longer a need to check whether nvgpu_is_powered_off(). Changed the behavior of gk20a_busy_noresume() to return bool. It returns true, iff it managed to prevent an imminent suspend, else returns false. For cases where PM runtime is disabled, the code follows the existing implementation. Added missing gk20a_busy_noresume() calls to tlb_invalidate. Also, moved gk20a_pm_deinit to after nvgpu_quiesce() in the module removal path. This is done to prevent regs access after registers are locked out at the end of nvgpu_quiesce. This can happen as some free function calls post quiesce might still have l2_flush, fb_flush deep inside their stack, hence invoke gk20a_pm_deinit to disable pm_runtime immediately after quiesce. Kept the legacy implementation same for VGPU and older kernels Jira NVGPU-8487 Signed-off-by: Debarshi Dutta <ddutta@nvidia.com> Change-Id: I972f9afe577b670c44fc09e3177a5ce8a44ca338 Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2715654 Reviewed-by: Sagar Kamble <skamble@nvidia.com> Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-05-25 04:59:46 -07:00
Sagar Kamble	a0b0acad05	gpu: nvgpu: pass pmu rpc struct as char pointer nvgpu_pmu_rpc_execute takes pmu rpc header address and dereferences it at address past header based on rpc struct that the header is part of. This usage of pointer is not right and confuses CERT checker. Instead, pass the rpc struct address as char pointer and use as header or rpc struct as per need. CID 17141 CID 154223 CID 17557 CID 154226 CID 153904 CID 153926 CID 153929 CID 153925 CID 153925 CID 225346 CID 225355 CID 225356 CID 225360 CID 225361 CID 225365 CID 225367 CID 296735 CID 330244 CID 17557 Bug 3512546 Change-Id: I93b154d4321e75c0d2b41f43d7c2b701682962a3 Signed-off-by: Sagar Kamble <skamble@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2710224 Reviewed-by: svcacv <svcacv@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Mahantesh Kumbar <mkumbar@nvidia.com> Reviewed-by: Sachin Nikam <snikam@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-05-24 04:43:35 -07:00
mkumbar	5339bd3466	gpu: nvgpu: Add extra delay for ACR commands in non-silicon platforms Increase delay for non-silicon platforms between ACR commands and before polling to skip incorrect reading of IRQSTAT register and generate false PMU external interrupt. Bug 3596273 Change-Id: I0163cddbaa1919ac949467f65c74e06f85817aec Signed-off-by: mkumbar <mkumbar@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2699396 Reviewed-by: Divya Singhatwaria <dsinghatwari@nvidia.com> Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com> Tested-by: Divya Singhatwaria <dsinghatwari@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-05-20 10:36:04 -07:00
Dinesh T	6e4c3275bf	gpu: nvgpu: Set max_ways_evict_cache to maximum This is setting evict_max_ways for L2 cache to the maximum supported value for safety. In normal build L2 cache MAX_EVICT_LAST is configure via KMD and RegOps. RegOps is enabled only on standard build with CONFIG_DEBUGGER flag. This method we cant use it for safety build. Safety we can make use of the patch buffer to patch the register while creating the context. JIRA NVGPU-8227 Change-Id: Iec5d73197239b9cad31c6b593ca2b87c224aad5e Signed-off-by: Dinesh T <dt@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2708702 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2022-05-18 22:57:54 -07:00
Richard Zhao	802aadf263	nvgpu: move nvgpu_falcon_copy_from/to_emem out of CONFIG_NVGPU_DGPU nvgpu_falcon_copy_from/to_emem are also used by iGPU in engine_emem_queue. Jira GVSCI-9976 Signed-off-by: Richard Zhao <rizhao@nvidia.com> Change-Id: Ia36a38521807714eb5ad52b6e81c9f31ecc8fda6 Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2708509 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Mahantesh Kumbar <mkumbar@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-05-18 00:59:10 -07:00
Richard Zhao	db4a1713cb	gpu: nvgpu: gr: move .load_sw_bundle64() out of CONFIG_NVGPU_DGPU .load_sw_bundle64 is also used by ga10b. Jira GVSCI-9976 Change-Id: Ife46dd5bf40a9e143cf119a64dd0d2adcb1ae81c Signed-off-by: Richard Zhao <rizhao@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2708393 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Mahantesh Kumbar <mkumbar@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-05-18 00:58:54 -07:00
Richard Zhao	d603838110	gpu: nvgpu: pmu: move lsfm_sw_gv100.h out of CONFIG_NVGPU_DGPU ga10b needs to call nvgpu_gv100_lsfm_sw_init() too, so the header cannot be protected by CONFIG_NVGPU_DGPU. Jira GVSCI-9976 Change-Id: I3f6016c3d5f924492629134e528a24cc20544365 Signed-off-by: Richard Zhao <rizhao@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2708392 Reviewed-by: Mahantesh Kumbar <mkumbar@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-05-18 00:58:48 -07:00
Sagar Kamble	da884615d3	gpu: nvgpu: fix pmu_board_obj init in construct_pwr_policy Fix below CERT violation: In construct_pwr_policy: Do not dereference null pointers. This was introduced in the below commit: commit `700bd83b41` ("gpu: nvgpu: Rename/clean boardobj unit") CID 203372 Bug 3512546 Change-Id: I30a2ce13f9df343a1dc74fdd7427ccf65b228a3e Signed-off-by: Sagar Kamble <skamble@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2710234 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Sachin Nikam <snikam@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-05-17 08:44:18 -07:00
Sagar Kamble	d3b417ce2c	gpu: nvgpu: address priv_ring unit code inspection gaps 1. Hardcoded constants are defined using #define are converted to const. 2. set_ppriv_timeout_settings HAL is not applicable from gm20b. Hence remove it completely. JIRA NVGPU-6903 Change-Id: Ic096c5dc87aa45db0aa05482947cd032ae72bdd4 Signed-off-by: Sagar Kamble <skamble@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2552581 (cherry picked from commit c5fb38a54208330f24754fed33d7242903dbac59) Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2623635 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2022-05-17 08:40:46 -07:00
Debarshi Dutta	76cc8870e1	nvgpu: gpu: update default nvs domain implementation In current form, the default domain acts like any schedulable domain. TSGs are bound to it and it can be enumerated via the public interfaces. The new expectation for the default domain is meant to change from the current form to a pseudo domain that cannot act like an ordinary domain in other ways, i.e. it must not be reachable by in particular the domain management API, it can't be removed, does not show up in lists, and TSGs cannot be explicitly bound to this domain. It won't participate in round-robin domain scheduling. It is not really a domain, and acts like one only when activated in the manual mode. Following changes are made overall to support the above change in definition. 1) Domain creation and attaching the domain to the scheduler are now split into two separate functions. The new default domain (having ID = UINT64_MAX) is created separately from a static function without linking it with other domains in the scheduler. 2) struct nvgpu_nvs_scheduler explicitely stores the default domain to support direct lookups. 3) TSGs are initially not bound to default domain/rl_domain. Jira NVGPU-8165 Signed-off-by: Debarshi Dutta <ddutta@nvidia.com> Change-Id: I916d11f4eea5124d8d64176dc77f3806c6139695 Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2697477 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2022-05-12 00:24:58 -07:00
Debarshi Dutta	26525cb1cf	gpu: nvgpu: runlist changes for default domain implementation In order to support the concept of the default domain, a new rl domain is created that shadows all the other domains i.e. all channels of all TSGs are replicated here. This is scheduled by default during GPU boot. 1) The shadow rl_domain is constructed during poweron sequence via nvgpu_runlist_alloc_shadow_rl_domain(). struct nvgpu_runlist is appended to store this separately as 'shadow_rl_domain'. This is scheduled in background as long as no other user created rl domains exist. 2) 'shadow_rl_domain' is scheduled out once user created rl domain exist. At this point, any updates in the user created rl domains are synchronized with the 'shadow_rl_domain'. i.e. 'shadow_rl_domain' is also reconstructed to contain active channels and tsgs from the rl domain. 3) 'shadow_rl_domain' is scheduled back in when the last user created rl domain is removed. 4) In future for manual mode, driver shall support explicitely switching to 'shadow_rl_domain'. Also, we will move to an implementation where 'shadow_rl_domain' is switched out only when other domains are actively scheduled. These changes will be implemented later. Jira NVGPU-8165 Signed-off-by: Debarshi Dutta <ddutta@nvidia.com> Change-Id: Ia6a07d6bfe90e7f6c9e04a867f58c01b9243c3b0 Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2704702 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-05-12 00:24:46 -07:00
Sagar Kamble	d82400d2b8	gpu: nvgpu: fix MISRA Rule 5.1 violation BVEC changes for nvgpu_rc_pbdma_fault and nvgpu_rc_mmu_fault started reporting below MISRA issue. kernel/nvgpu/drivers/gpu/nvgpu/common/fifo/tsg.c:321: 1. misra_c_2012_rule_5_1_violation: Declaration with identifier "nvgpu_tsg_unbind_channel_check_hw_state", which is ambiguous. kernel/nvgpu/drivers/gpu/nvgpu/common/fifo/tsg.c:349: 2. other_declaration: The first 31 characters of identifiers "nvgpu_tsg_unbind_channel_check_ctx_reload" and "nvgpu_tsg_unbind_channel_check_hw_state" are identical. Do below renames to fix the issue. Doing both for consistency. s/nvgpu_tsg_unbind_channel_check_hw_state/nvgpu_tsg_unbind_channel_hw_state_check s/nvgpu_tsg_unbind_channel_check_ctx_reload/nvgpu_tsg_unbind_channel_ctx_reload_check JIRA NVGPU-6772 Change-Id: Ib92cabe11c486621351bf15ddb86e20d16d514c4 Signed-off-by: Sagar Kamble <skamble@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2584152 (cherry picked from commit a619f259c6a4ffccb05550767212989af60c2a90) Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2706551 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-05-11 04:18:12 -07:00
mkumbar	162d7ec32d	gpu: nvgpu: falcon debug unit update - Don't print error if debug display buffer is empty. Bug 3623500 Bug 3418561 Change-Id: I066999fb0f7d41d491c3b01df2b976fcfa833ebf Signed-off-by: mkumbar <mkumbar@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2704967 Reviewed-by: svcacv <svcacv@nvidia.com> Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-05-11 04:16:18 -07:00
Sagar Kamble	9d6269ce7f	gpu: nvgpu: assert gr dev is non-NULL nvgpu_device_get can return NULL if supplied invalid ID or instance ID. We expect GR device struct to be non-NULL there hence just assert that it is indeed non-NULL in gr_reset_engine and ga10b_grmgr_init_gr_manager. CID 224133 CID 250232 Bug 3512546 Change-Id: Id09a1c436a8e49b921111b940d3d013bd66bff7a Signed-off-by: Sagar Kamble <skamble@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2707018 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2022-05-07 23:24:39 -07:00
Sagar Kamble	c32c4025a4	gpu: nvgpu: fix the ce app ctx cleanup tsg and ch members in ce_ctx may remain uninitialized when the cleanup function nvgpu_ce_delete_gpu_context_locked is called. Guard the references to those. CID 438091 Bug 3512546 Change-Id: I0ce96f9bad1e4f7fd331171b3f134c48c893839f Signed-off-by: Sagar Kamble <skamble@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2707470 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Sachin Nikam <snikam@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-05-07 15:18:39 -07:00
mkumbar	2506dd2b86	gpu: nvgpu: set ACR FW load flag as per platform -Add ACR FW load flag which will be set based on platform and load the requested FW accordingly. Bug 3572869 Change-Id: I6643f183fed104fef059dd691036a2c509073a50 Signed-off-by: mkumbar <mkumbar@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2689022 Reviewed-by: svcacv <svcacv@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: Andy Chiang <achiang@nvidia.com>	2022-05-07 15:13:03 -07:00
Richard Zhao	1ce899ce46	gpu: nvgpu: fix compile error of new compile flags Preparing to push hvrtos gpu server changes which requires bellow CFLAGS: -Werror -Wall -Wextra \ -Wmissing-braces -Wpointer-arith -Wundef \ -Wconversion -Wsign-conversion \ -Wformat-security \ -Wmissing-declarations -Wredundant-decls -Wimplicit-fallthrough Jira GVSCI-11640 Signed-off-by: Richard Zhao <rizhao@nvidia.com> Change-Id: I25167f17f231ed741f19af87ca0aa72991563a0f Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2653746 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svcacv <svcacv@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-05-07 15:11:49 -07:00
Richard Zhao	c30afdce02	gpu: nvgpu: add periodic timer API move fecs_trace polling from kthread to timer API. Jira GVSCI-10883 Signed-off-by: Richard Zhao <rizhao@nvidia.com> Change-Id: I224754b7205f1d0eefdc19a73a98f42e4d3e9d0e Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2700601 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Shashank Singh <shashsingh@nvidia.com> Reviewed-by: Aparna Das <aparnad@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-05-02 23:16:44 -07:00
Antony Clince Alex	61ae0b7642	gpu: nvgpu: fix emulate mode enable The emulate mode support is determined after chip detect and is flagged by using NVGPU_SUPPORT_EMULATE_MODE flag. The present logic prevents user from configuring the emulate mode sysfs knobs if this flag is not set, however the emulate mode usecase requires the user to configure the syfs knob prior to power-on, hence defer emulate mode check to a later stage after chip detect. Bug 3621460 Change-Id: If522527542fa8d7e95ccbcff43b74adbb9e976e6 Signed-off-by: Antony Clince Alex <aalex@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2703953 Reviewed-by: svcacv <svcacv@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Mayur Poojary <mpoojary@nvidia.com> Reviewed-by: Mahantesh Kumbar <mkumbar@nvidia.com> Reviewed-by: Ankur Kishore <ankkishore@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: David Li <davli@nvidia.com>	2022-04-29 06:17:59 -07:00
Jinesh Parakh	131933d528	gpu: nvgpu: Fix Division by zero defect Fix following Coverity Defect: profile.c : Division or modulo by zero CID 10061399 Bug 3460991 Signed-off-by: Jinesh Parakh <jparakh@nvidia.com> Change-Id: I03979af4ab105f659cf0fe3eac8d21946dfca950 Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2695362 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: Sagar Kamble <skamble@nvidia.com> Reviewed-by: Sachin Nikam <snikam@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-04-29 06:10:48 -07:00
Jinesh Parakh	622fe70dab	gpu: nvgpu: Fix Bad bit shift Coverity issues Fixed following Coverity Defects: ioctl_as.c : Bad bit shift operation mc_tu104.c : Bad bit shift operation vm.c : Bad bit shift operation vm_remap.c : Bad bit shift operation A new linux header file for ilog2 is created. The files which used the old ilog2 function have been changed to use the new nvgpu_ilog2 function. CID 9847922 CID 9869507 CID 9859508 CID 10112314 CID 10127813 CID 10127899 CID 10128004 Signed-off-by: Jinesh Parakh <jparakh@nvidia.com> Change-Id: Ia201eea7cc426c3d6581e1e5ae3b882dbab3b490 Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2700994 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2022-04-28 04:08:45 -07:00
Jinesh Parakh	167e7b0256	gpu: nvgpu: Fix Unused value Defect Fix following Coverity Defect: nvgpu_init.c : Unused value The ret variable was being reassigned the error code from nvgpu_cic_mon_deinit(g) without taking into account the previous ret value. We need to propagate whether there is an error (the last known error is returned) or not using ret, the temp_ret variable helps in verifying this. Similar coding style followed in the entire function. CID 10127863 Bug 3460991 Signed-off-by: Jinesh Parakh <jparakh@nvidia.com> Change-Id: I732ba5269ebbbe68f113e53229df40ae49ccc13c Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2697104 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2022-04-27 20:19:29 -07:00
Martin Radev	60481ea5e4	gpu: nvgpu: Free regops allowlist after failure-prone operations The function nvgpu_profiler_unbind_pm_resources is responsible for destroying the regops allowlist object, but unfortunately does it prior to any of the failure-prone operations. Because this function can be called multiple times, in rare cases it can happen that object is deallocated twice. This patch fixes the issue by moving the free operations after the failure-prone operations. Bug 3591603 Change-Id: I3415712da561ccf162c9fb7f3ebb942faa9d9420 Signed-off-by: Martin Radev <mradev@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2693803 (cherry picked from commit I3415712da561ccf162c9fb7f3ebb942faa9d9420) Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2693799 Reviewed-by: Deepak Nibade <dnibade@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-04-26 17:48:45 -07:00
mpoojary	769ec3f88b	gpu: nvgpu: pmu: Add support to set nvgpu_next pmu init Select nvgpu_next_pmu_init when config_next flag is set. This will let pmu load nvgpu_next binaries. Bug 3579665 Change-Id: Ifc15ba1ff5eacfba22de9676d5fe93beda608153 Signed-off-by: mpoojary <mpoojary@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2702292 Reviewed-by: Antony Clince Alex <aalex@nvidia.com> Reviewed-by: Mahantesh Kumbar <mkumbar@nvidia.com> Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: Seema Khowala <seemaj@nvidia.com>	2022-04-26 04:09:02 -07:00
Sagar Kamble	0725a98ea9	gpu: nvgpu: remove array comparison to NULL The queues element in struct nvgpu_gsp_sched is an array. Remove its comparison against NULL. CID 10132247 Bug 3460991 Change-Id: I2380cdb9287cc34b54b13fd9c1bab67a4a21a698 Signed-off-by: Sagar Kamble <skamble@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2693940 Reviewed-by: svcacv <svcacv@nvidia.com> Reviewed-by: Ramesh Mylavarapu <rmylavarapu@nvidia.com> Reviewed-by: Sachin Nikam <snikam@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-04-14 17:02:58 -07:00
Sagar Kamble	e1cdfaa208	gpu: nvgpu: fix CERT EXP34-C issue Fix CERT issue in nvgpu_gr_falcon_bind_fecs_elpg where nvgpu_pmu_pg_buf could return NULL. nvgpu_pmu_pg_buf is called from context where PG will be enabled hence remove the NULL return logic as it is dead code. Replace nvgpu_pmu_pg_buf and nvgpu_pmu_pg_buf_get_cpu_va functions by new function nvgpu_pmu_pg_buf_alloc. CID 17860 Bug 3512546 Change-Id: I09820a966dadeb258167ce1433ca256f94845896 Signed-off-by: Sagar Kamble <skamble@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2692466 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2022-04-14 17:02:34 -07:00
Tejal Kudav	dae284c74b	gpu: nvgpu: Disable GR functional intrs on safety Disable below interrupts on safety as they do not report any error condition and are not used by CUDA and Graphics(VKSC) on safety build. Signoff from CUDA and VKSC is on Bug https://nvbugs/3588603 1. NV_PGRAPH_INTR_NOTIFY: This intr is set when the Notification style is WRITE_THEN_AWAKEN. 2. NV_PGRAPH_INTR_SEMAPHORE: This is set when a 3d class sempahore is released as the result ofa SetSemaphoreD method, when the AwakenEnable field is TRUE. 3. NV_PGRAPH_INTR_BUFFER_NOTIFY: This bit is set when a Mem2mem DMA completes and the LaunchDma method specifies the interrupt type as INTERRUPT 4. NV_PGRAPH_INTR_DEBUG_METHODS: This is debug feature and not used on QNX safety Bug 3588603 JIRA NVGPU-8166 Change-Id: I6d07dfd2857ac047fac4599421600d364251df76 Signed-off-by: Tejal Kudav <tkudav@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2694363 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2022-04-13 02:35:35 -07:00
Antony Clince Alex	62d6f753d2	gpu: nvgpu: add support for PES, ROP floorsweeping Volta+ chips supports PES floorsweeping and Ampere+(iGPU) chips supports ROP floorsweeping. At present, the driver isn't aware of PES, ROP floorsweeping, make the driver PES, ROP floorsweeping aware by introducing the following fields in nvgpu_gr_config: - gpc_(rop/pes)_mask: Contains the bit mask of non FSed ROP/PES units per GPC. - gpc_(rop/pes)_logical_id_map: Translates per GPC ROP/PES physical id to logical id. Introduce the following HAL functions to read PES/ROP FS data: - gops_fuse.fuse_status_opt_(pes/rop)_gpc: This fuction gets the FS config from the fuse. - gops_top.get_max_(pes/rop)_per_gpc: Gets the maximum number of PES/ROP units that can be present in a GPC. In addition, introduce the enabled flag NVGPU_SUPPORT_PES_FS to identify chips which support PES floorsweeping, piggyback on NVGPU_SUPPORT_ROP_IN_GPC enabled flag to identify ROP floorsweeping. Bug 3524791 Change-Id: I065bab6c02618fe38892c8c890b069c340b85301 Signed-off-by: Antony Clince Alex <aalex@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2679570 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2022-04-13 02:32:14 -07:00
Antony Clince Alex	19a8adeae1	gpu: nvgpu: prof: add new resource type Add new profiler resource type NVGPU_PROFILER_PM_RESOURCE_TYPE_PC_SAMPLER. Introduce regops HAL get_hwpm_pc_sampler_register_ranges to get allowlist for PC_SAMPLER resources. Re-generate allowlist files to include register ranges for PC_SAMPLER resources. Update uapi header to advertise new resource type NVGPU_PROFILER_PM_RESOURCE_ARG_PC_SAMPLER. Bug 3408536 Change-Id: I7009ef822665771eed727da48ef1e89dcc6b9c4b Signed-off-by: Antony Clince Alex <aalex@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2689057 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-04-12 16:30:52 -07:00
Sagar Kamble	71eee998b1	gpu: nvgpu: sanitize page allocator name strncpy to a string from non-null terminating string without checking the size of the source string can lead to target string become non-null terminating. This can't be passed to strcat as it expects null terminating string. Check the size of the source string "name" in nvgpu_page_allocator_init. CID 81207 Bug 3512546 Change-Id: I4b245a8c2236038c40912cee72d4dbf1ca14a525 Signed-off-by: Sagar Kamble <skamble@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2692604 Reviewed-by: svcacv <svcacv@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: Sachin Nikam <snikam@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-04-07 03:24:15 -07:00
Divya	fb019bf43a	gpu: nvgpu: async cmd resp for gv11b - When DISALLOW cmd is sent from driver to PMU the actual completion of the disallow will be acknowledged by PMU via a PG EVENT: ASYNC_CMD_RESP. - Disallow needs a delayed ACK from PMU in order to disable the ELPG. - If ELPG is already engaged, the DISALLOW cmd will trigger ELPG exit and then transition to PMU_PG_STATE_DISALLOW. - After this whole process is completed, PMU will send DISALLOW_ACK through ASYNC_CMD_RESP msg. - After disallow command is sent from the driver, NvGPU driver waits/polls for disallow command ack. This is sent immediately by msg framework of PMU. - Then, the driver will poll/wait for ASYNC_CMD_RESP event which is the delayed DISALLOW ACK. - The driver captures the ASYNC_CMD_RESP sent from PMU. - set disallow_state to ELPG_OFF. - If the driver does not wait/poll for this delayed disallow ack from PMU, it can result in erros as PMU is still processing DISALLOW cmd but the driver progressed further. Bug 3580271 Change-Id: I332180c05b6a398107f065d54e9718b7038fb1b2 Signed-off-by: Divya <dsinghatwari@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2689500 Reviewed-by: Sagar Kamble <skamble@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-04-07 03:21:29 -07:00
Antony Clince Alex	9e0fd1a093	gpu: nvgpu: gr: update gr suspend Update GR suspend routine to clear GR falcon "coldboot_bootstrap_done" flag, this is needed because GPU power rails are turned off during suspend cycle due to which GR falcons need to be bootstrapped again during resume. Function "nvgpu_gr_falcon_suspend" is added to clear the above mentioned flag. Bug 3497398 Bug 3514055 Change-Id: If852a2c09f05c096f287b845c56d8b4f335ec8e7 Signed-off-by: Antony Clince Alex <aalex@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2670554 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: Sagar Kamble <skamble@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-03-28 23:47:06 -07:00
Jinesh Parakh	bbaf01590c	gpu: nvgpu: Fix Logically dead code Coverity bugs Fixed following Coverity Defects: ioctl_clk_arb.c : Logically dead code gr_gp10b.c : Logically dead code vfe_var.c : Logically dead code grmgr_ga10b.c : Logically dead code vm_remap.c : Logically dead code falcon_debug.c : Logically dead code CID 1994001 CID 3008644 CID 9870823 CID 10062537 CID 10127915 CID 10128008 Bug 3460991 Signed-off-by: Jinesh Parakh <jparakh@nvidia.com> Change-Id: I711d2ccb480328d8f0a4ba49e877612669f3d41f Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2686362 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: Sagar Kamble <skamble@nvidia.com> Reviewed-by: Sachin Nikam <snikam@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-03-28 07:36:44 -07:00
Jinesh Parakh	d4cb2eb3c0	gpu: nvgpu: Fix Dereference Coverity issues Fixed following Coverity Defects: fw.c : Dereference after null check channel.c : Dereference before null check log.c : Dereference before null check CID 10064128 CID 10056456 CID 10127934 Bug 3460991 Signed-off-by: Jinesh Parakh <jparakh@nvidia.com> Change-Id: I9c075f5c38c2254d5c656af58bb002714bd53396 Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2685320 Reviewed-by: Sagar Kamble <skamble@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Sachin Nikam <snikam@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-03-28 07:36:10 -07:00
Dinesh T	90d245978f	gpu: nvgpu: Fix for Compression enablement on safety This is removing NON-FUSA code that is needed for compression enablement on safety. The code is needed for comptag update on page table entry used by the GPU. Bug 3582013 Change-Id: Ib4e5c9810fabac5f479e0993184b9abf35df4afb Signed-off-by: Dinesh T <dt@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2686411 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Martin Radev <mradev@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-03-25 21:37:15 -07:00
Konsta Hölttä	e9d453806c	gpu: nvgpu: move duplicate timer api to common The high level API for the timer unit is the same across all OSs, so get rid of the slight code duplication by moving the timer init functions under a new file in common code: - nvgpu_timeout_init_cpu_timer - nvgpu_timeout_init_cpu_timer_sw - nvgpu_timeout_init_retry Much of the timer logic is also duplicated, but it is mixed between OS specific current time retrieval. With some refactoring and addition of an OS independent time keeping layer, that logic could also be made shared. Change-Id: I75d02ceb0d32022b0ba7f3bcd9fdb13d47039dbc Signed-off-by: Konsta Hölttä <kholtta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2669510 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2022-03-25 21:33:21 -07:00
mkumbar	8cce8dea70	gpu: nvgpu: PMU NVRISCV BR failure HSI - Add PMU NVRISCV BR failure HSI support. - Created a falcon unit function to check for the BR competition status check and called from other units as needed. Bug 3491596 Bug 3366818 Change-Id: I5c3c6a7e6aeaad68f77e6b24f21239e40d9a7f78 Signed-off-by: mkumbar <mkumbar@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2686370 Reviewed-by: Rajesh Devaraj <rdevaraj@nvidia.com> Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-03-25 13:25:27 -07:00
Antony Clince Alex	f670687441	gpu: nvgpu: move ltc_tstg_mgmt register setup The ltc_ltcs_ltss_tstg_set_mgmt_3 register should only be configured after ACR init, hence move it down the init order from early_init to finalize_poweron after acr is loaded. Bug 3514215 Change-Id: I2462715d25f75b7476ab163cd6c9f73ced5efb6d Signed-off-by: Antony Clince Alex <aalex@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2685547 Reviewed-by: svcacv <svcacv@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Mahantesh Kumbar <mkumbar@nvidia.com> Reviewed-by: Seema Khowala <seemaj@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-03-25 13:14:15 -07:00
Rajesh Devaraj	c5822b0d98	gpu: nvgpu: add error prints for errors reported to sdl In Drive 6.0, only error IDs are reported to Safety_Services. The additional debug/error information is printed using nvgpu_err(). JIRA NVGPU-8094 Bug 3491596 Change-Id: Ie90f3e1453e6a796d5c76373c11f8a5a188ac590 Signed-off-by: Rajesh Devaraj <rdevaraj@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2684289 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Dinesh T <dt@nvidia.com> Reviewed-by: Ankur Kishore <ankkishore@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-03-22 17:55:10 -07:00
Rajesh Devaraj	9edbac4494	gpu: nvgpu: add macros related to error reporting This patch does the following: - adds macros related to error reporting - introduces a flag to enable polling for error reporting JIRA NVGPU-8094 Bug 200729736 Change-Id: Ib02e8b7a7765e45eb1b3b3c6dba3720d5421a638 Signed-off-by: Rajesh Devaraj <rdevaraj@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2683864 Reviewed-by: svcacv <svcacv@nvidia.com> Reviewed-by: Dinesh T <dt@nvidia.com> Reviewed-by: Ankur Kishore <ankkishore@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-03-21 10:51:31 -07:00
Divya	201b5c1c7f	gpu: nvgpu: add SLCG support for GSP and CTRL unit Add SLCG register programming for GSP and CTRL units Bug 3452217 Change-Id: I69e414a82b5c12f26ff3b6626c328b5c0aa9e04c Signed-off-by: Divya <dsinghatwari@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2678782 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2022-03-18 07:54:48 -07:00
mpoojary	7df16ee9c4	gpu: nvgpu: Add support for acr safety binaries Add support to pick ACR safety binaries when in safety for ga10b Jira NVGPU-8108 Change-Id: I3aca5e9d4b6e90af87cc7d8520366304ab579ec3 Signed-off-by: mpoojary <mpoojary@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2680710 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2022-03-17 12:21:28 -07:00
mpoojary	c1a995403a	gpu: nvgpu: Add ACR error reporting to SDL -Add check for ECC parity errors in IMEM, DMEM, EMEM, DCLS, REG for ACR running in GSP engine. The EXTIRQ3 external interrupt is set from ACR pointing towards host. -Add function to check error type when ACR or Bootrom execution fails and report accordingly to SDL with relevant error codes. This is a part of HSI safety requirements. Bug 3564039 Jira NVGPU-8108 Change-Id: I65407371f7a1d1ba50a10bdf443ef6b903eeaa36 Signed-off-by: mpoojary <mpoojary@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2678100 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2022-03-15 17:33:42 -07:00
Dinesh T	358f62a9d7	gpu: nvgpu: Add compression for safety This is adding compression support for qnx-safety by - Adding the compression related files under FUSA. - Adding new posix contig-pool.c for user space compilation. Bug 3426194 Change-Id: Ib3c8e587409dc12099c1196f55a87858d4dc520e Signed-off-by: Dinesh T <dt@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2652963 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2022-03-15 17:30:57 -07:00
Tejal Kudav	b80b2bdab8	gpu: nvgpu: Add CE interrupt handling a. LAUNCH_ERR - Userspace error. - Triggered due to faulty launch. - Handle using recovery to reset CE engine and teardown the faulty channel. b. An INVALID_CONFIG - - Triggered when LCE is mapped to floorswept PCE. - On iGPU, we use the default PCE 2 LCE HW mapping. The default mapping can be read from NV_CE_PCE2LCE_CONFIG INIT value in CE refmanual. - NvGPU driver configures the mapping on dGPUs (currently only on Turing). - So, this interrupt can only be triggered if there is kernel or HW error - Recovery ( which is killing the context + engine reset) will not help resolve this error. - Trigger Quiesce as part of handling. c. A MTHD_BUFFER_FAULT - - NvGPU driver allocates fault buffers for all TSGs or contexts, maps them in BAR2 VA space and writes the VA into channel instance block. - Can be triggered only due to kernel bug - Recovery will not help, need quiesce d. FBUF_CRC_FAIL - Triggered when the CRC entry read from the method fault buffer does not match the computed CRC from the methods contained in the buffer. - This indicates memory corruption and is a fatal interrupt which at least requires the LCE to be reset before operations can start again, if not the entire GPU. - Better to quiesce on memory corruption CE Engine reset (via recovery) will not help. e. FBUF_MAGIC_CHK_FAIL - Triggered when the MAGIC_NUM entry read from the method fault buf does not match NV_CE_MTHD_BUFFER_GLOBAL_HDR_MAGIC_NUM_VAL - This indicates memory corruption and is a fatal interrupt - Better to quiesce on memory corruption f. STALLING_DEBUG - Only triggered with SW write for debug purposes - Debug interrupt, currently ignored Move launch error handling from GP10b to GV11b HAL as - 1. LAUNCHERR_REPORT errcode METHOD_BUFFER_ACCESS_FAULT is not defined on Pascal 2. We do not support GP10b on dev-main ToT JIRA NVGPU-8102 Change-Id: Idc84119bc23b5e85f3479fe62cc8720e98b627a5 Signed-off-by: Tejal Kudav <tkudav@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2678893 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2022-03-14 17:12:14 -07:00
Deepak Nibade	a1ef716f9d	gpu: nvgpu: set graphics specific PRI values for graphics contexts Add new HAL gops.gr.init.set_default_gfx_regs() to set graphics specific PRI values for graphics contexts in function nvgpu_gr_obj_ctx_alloc(). Add new HAL gops.gr.init.capture_gfx_regs() to capture and save init values for the PRIs. Add new struct nvgpu_gr_obj_ctx_gfx_regs to hold the PRI init values. Define HAL functions gv11b_gr_init_set_default_gfx_regs() and gv11b_gr_init_capture_gfx_regs(). Set the HAL functions for gv11b and ga10b. Register accessors required to set PRIs are auto-generated. Bug 3506078 Change-Id: I4c2843a274f3c924e402541e600e104ed0c9ed1c Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2671598 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Shashank Singh <shashsingh@nvidia.com> Reviewed-by: Jonathan Mccaffrey <jmccaffrey@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-03-14 13:17:05 -07:00

1 2 3 4 5 ...

3229 Commits