linux-nvgpu

mirror of git://nv-tegra.nvidia.com/linux-nvgpu.git synced 2025-12-22 17:36:20 +03:00

Author	SHA1	Message	Date
Debarshi Dutta	2e3c3aada6	gpu: nvgpu: fix deinit of GR Existing implementation of GR de-init doesn't account for multiple instances of struct nvgpu_gr. As a fix, below changes are added. 1) nvgpu_gr_free is unified for VGPU as well as native. 2) All the GR instances are freed. 3) Appropriate NULL checks are added when freeing GR memories. 4) 2D, 3D, I2M and ZBC etc are explicitely disabled when MIG is set. 5) In ioctl_ctrl, checks are added to not return error when zbc is NULL for VGPU as requests are rerouted to RMserver. Jira NVGPU-6920 Change-Id: Icaa40f88f523c2cdbfe3a4fd6a55681ea7a83d12 Signed-off-by: Debarshi Dutta <ddutta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2578500 Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com> Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com> Reviewed-by: Dinesh T <dt@nvidia.com> Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: Antony Clince Alex <aalex@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2021-08-23 05:27:45 -07:00
Tejal Kudav	b33079d47e	gpu: nvgpu: Move intr data members from MC to CIC Move interrupt specific data-members from common.mc to common.cic Some of these data members like sw_irq_stall_last_handled_cond need To be initialized much earlier during the OS specific init/probe stage. Also, some more members from struct nvgpu_interrupts(like stall_size, stall_lines[]), which will soon be moved to CIC will also need to be initialized early during the OS specific probe stage. However, the chip specific LUT can only be initialized after the hal_init stage where the HALs are all initialized. Split the CIC init to accommodate the above initialization requirements. JIRA NVGPU-6899 Change-Id: I9333db4cde59bb0aa8f6eb9f8472f00369817a5d Signed-off-by: Tejal Kudav <tkudav@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2552535 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-07-19 18:06:28 -07:00
Divya Singhatwaria	842bef7124	gpu: nvgpu: Support GPC and FBP Floorsweeping - Add gops_fbp_fs and gops_gpc_pg struct - Add HALs to write to NV_FUSE_CTRL_OPT_FBP and NV_FUSE_CTRL_OPT_GPC fuses needed for floorsweeping - Add set_fbp_mask and set_gpc_mask to probe FBP and GPC mask respectively during gpu probe - Add sysfs node: fbp_fs_mask and gpc_fs_mask to store FBP and GPC floorsweeping mask sent from userspace - Move the floorsweeping programming early in NVGPU’s GPU init function and then issue a PRI init. JIRA NVGPU-6433 Change-Id: I84764d625c69914c107e1e8c7f29c476c2f64f78 Signed-off-by: Divya Singhatwaria <dsinghatwari@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2499571 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com> Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-07-19 06:17:25 -07:00
Divya Singhatwaria	9f30609550	gpu: nvgpu: Rename TPC powergating mutex Rename tpc_pg_lock to static_pg_lock and have_tpc_pg_lock to have_static_pg_lock as it is used for tpc/gpc/fbp power gating. JIRA NVGPU-6433 Change-Id: I4c56b9710e303ad9e872bad4b5ed9a167acb9dd6 Signed-off-by: Divya Singhatwaria <dsinghatwari@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2537489 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-07-18 02:46:25 -07:00
Ramesh Mylavarapu	d328bff79e	gpu: nvgpu: gsp NVRISCV load and bootstrap Changes: - This change will only init gsp software state, nvgpu_gsp_bootstrap need to be called. - CONFIG_NVGPU_GSP_SCHEDULER flag is created to compile out the gsp scheduler code when needed. - Created GSP engine reset which is needed when ACR completed execution and need to load gsp fw. NVGPU-6783 Signed-off-by: Ramesh Mylavarapu <rmylavarapu@nvidia.com> Change-Id: I2ce43e512b01df59443559eab621ed39868ad158 Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2554267 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-07-15 17:21:03 -07:00
Vedashree Vidwans	43980bfe06	gpu: nvgpu: remove nvgpu_is_bpmp_running usage BPMP driver doesn't support any API to check whether bpmp is running. Remove use of nvgpu_is_bpmp_running. Bug 200720732 Change-Id: Id266e65d4af598dd056cbdbaa219d0d53b7b3fb3 Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2556448 Reviewed-by: svcacv <svcacv@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com> Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-07-15 10:06:42 -07:00
Pekka Jylhä-Ollila	8a72068508	Revert "gpu: nvgpu: gsp NVRISCV load and bootstrap" This reverts commit `aef4b80acb`. Change-Id: I47e02bf97e6a3aaa9acdd7f5eec41518b31ee5dc Signed-off-by: Pekka Jylhä-Ollila <pjylhaollila@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2554105 Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>	2021-07-05 06:01:52 -07:00
Ramesh Mylavarapu	aef4b80acb	gpu: nvgpu: gsp NVRISCV load and bootstrap Changes: - This change will only init gsp software state, nvgpu_gsp_bootstrap need to be called. - CONFIG_NVGPU_GSP_SCHEDULER flag is created to compile out the gsp scheduler code when needed. - Created GSP engine reset which is needed when ACR completed execution and need to load gsp fw. NVGPU-6783 Signed-off-by: Ramesh Mylavarapu <rmylavarapu@nvidia.com> Change-Id: I26263ee5bae07de056f676ed0fddc1193b5af82d Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2530438 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com> Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-07-04 13:34:51 -07:00
tkudav	0526e7eaa9	gpu: nvgpu: Create CIC-mon and CIC-rm subunits common.cic unit is divided into common.cic.mon and common.cic.rm based on rm and mon process split. CIC-mon subunit includes the code which is utilized in critical interrupt handling path like initialization, error detection and error reporting path. CIC-rm subunit includes the code corresponding to rest of interrupt handling(like collecting error debug data from registers) and ISR status management (status of deferred interrupts). Split the CIC APIs and data-members into above two subunits. JIRA NVGPU-6899 Change-Id: I151b59105ff570607c4a62e974785e9c1323ef69 Signed-off-by: Tejal Kudav <tkudav@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2551897 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-07-02 09:57:56 -07:00
Richard Zhao	ff75647d59	gpu: nvgpu: unify power state management code The management code of g->power_on_state on different OS are almost same, so moved the code to the common place. Jira GVSCI-10882 Signed-off-by: Richard Zhao <rizhao@nvidia.com> Change-Id: I890015867b7bbdf3f749ab275ffd085ef76dfec2 Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2542846 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svcacv <svcacv@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com> Reviewed-by: Debarshi Dutta <ddutta@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2021-06-23 09:26:49 -07:00
Lakshmanan M	19186c8a02	gpu: nvgpu: select map access type from dmabuf permission and user request Add api to translate dmabuf's fmode_t to gk20a_mem_rw_flag for read only/read write mapping selection. By default dmabuf fd mapping permission should be a maximum access permission associated to a particual dmabuf fd. Remove bit flag MAP_ACCESS_NO_WRITE and add 2 bit values for user access requests NVGPU_VM_MAP_ACCESS_DEFAULT\|READ_ONLY\| READ_WRITE. To unify map access type handling in Linux and QNX move the parameter NVGPU_VM_MAP_ACCESS_* check to common function nvgpu_vm_map. Set MAP_ACCESS_TYPE enabled flag in common characteristics init function as it is supported for Linux and QNX. Bug 200717195 Bug 3250920 Change-Id: I1a249f7c52bda099390dd4f371b005e1a7cef62f Signed-off-by: Lakshmanan M <lm@nvidia.com> Signed-off-by: Sagar Kamble <skamble@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2507150 Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com> Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-06-21 14:48:32 -07:00
Lakshmanan M	7d473f4dcc	gpu: nvgpu: Expose logical mask for MIG 1) Expose logical mask instead of physical mask when MIG is enabled. For legacy, NvGpu expose physical mask. 2) Added fb related info in struct nvgpu_gpu_instance(). 4) Added utility api to get the logical id for a given local id nvgpu_grmgr_get_gr_gpc_logical_id() 5) Added grmgr api to get max_gpc_count nvgpu_grmgr_get_max_gpc_count(). 5) Added grmgr's fbp api to get num_fbps and its enable masks. nvgpu_grmgr_get_num_fbps() nvgpu_grmgr_get_fbp_en_mask() nvgpu_grmgr_get_fbp_rop_l2_en_mask() 6) Used grmgr's fbp apis in ioctl_ctrl.c 7) Moved fbp_init_support() in nvgpu_early_init() 8) Added nvgpu_assert handling in grmgr.c 9) Added vgpu hal for get_max_gpc_count(). JIRA NVGPU-5656 Change-Id: I90ac2ad99be608001e7d5d754f6242ad26c70cdb Signed-off-by: Lakshmanan M <lm@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2538508 Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Dinesh T <dt@nvidia.com> Reviewed-by: Rajesh Devaraj <rdevaraj@nvidia.com> Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2021-06-10 03:05:21 -07:00
Tejal Kudav	9f43914933	gpu: nvgpu: Move Intr handling common code to CIC CIC (Central Interrupt controller) will be responsible for the interrupt handling. common.cic unit is the placeholder for all interrupt related code. Move interrupt related defines and Public APIs present in common.mc to common.cic. Note: The common.mc interrupts related struct definitions are not moved as part of this patch. Adapt the code to use interrupt handling related defines and public APIs migrated from common.mc to common.cic JIRA NVGPU-6899 Change-Id: I747e2b556c0dd66d58d74ee5bb36768b9370d276 Signed-off-by: Tejal Kudav <tkudav@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2535618 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-05-31 19:37:31 -07:00
Tejal Kudav	e0a1fcf5f5	gpu: nvgpu: Add Central Intr Controller unit Add a new Central Interrupt Controller(CIC) unit in common code. The interrupt handling is done in a distributed manner currently. The error handling policy for different errors resides in each unit's ISR code. The goal is to converge this data under one central place - the CIC unit. This patch creates framework for CIC unit and moves the gv11b QNX safety LUT to CIC unit. All the error reporting APIs from different units are also moved to CIC. New APIs are exposed by CIC unit to access its internal data like: 1. Struct err_desc - the static err handling /injection data per error id 2. Num_hw_modules - the number of error reporting HW units supported by CIC Init and deinit of CIC unit: 1. CIC unit should be initialized earlyon during boot so that it is available for any interrupt handling. 2. Initialize CIC just before the interrupts are enabled during boot. 3. Similarly, CIC is disabled late during deinit cycle; right after the interrupts are masked. LUT: 1. LUT is currently used only for reporting error to safety services in gv11b QNX safety build. 2. This error handling policy LUT currently has only two levels of handing - correctable and quiecse. 3. Once, the error handling policy decision is moved from leaf unit nodes to CIC, LUT will be updated to have additional levels like fast recovery and full recovery. 4. Also, then a separate LUT will be added for each platform/build. 5. In current framework, the LUT is set to NULL for all configurations except gv11b. report_err() ops is added to report error to safety services. This ops is only effective for gv11b qnx build; and set to NULL for other configurations. NVGPU-6521 NVGPU-6523 NVGPU-6750 NVGPU-6758 NVGPU-6760 NVGPU-6754 Change-Id: I24be7836a96d787741e37b732e19863ed8014635 Signed-off-by: Tejal Kudav <tkudav@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2518683 Reviewed-by: Ajesh K V <akv@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-05-25 14:28:04 -07:00
Seshendra Gadagottu	85efe929ca	gpu: nvgpu: prod programming for slcg timer unit Added init function for common.ptimer unit and called this init function during nvgpu early init. int nvgpu_ptimer_init(struct gk20a g); Added following helper function for programming prod values for slcg timer unit: void nvgpu_cg_slcg_timer_load_enable(struct gk20a g); Invoked prod programming for slcg timer unit from nvgpu_ptimer_init. Jira NVGPU-6026 Change-Id: I29e32380a4d05ec8276d7ebe59bc2733917f8184 Signed-off-by: Seshendra Gadagottu <sgadagottu@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2524037 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-05-19 04:06:43 -07:00
ajesh	b15bd97c08	gpu: nvgpu: fix misra violation in bug unit Modify the callback interface from bug to quiesce unit to remove a possible cyclic dependency in the bug unit. Make the list of callbacks from bug unit, UT specific. The quiesce callback function and argument are kept in separate variables, and in a normal run the only callback that bug unit would invoke will be the quiesce specific function. These changes will fix the violation of Rule 17.2 in bug unit. JIRA NVGPU-6537 Change-Id: Icb6bc92077f8d26c87425768b09a7194a98e015d Signed-off-by: ajesh <akv@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2527207 (cherry picked from commit 7696565648c5dd573a03be19ba9525856b781ea6) Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2530900 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com> Reviewed-by: Rajesh Devaraj <rdevaraj@nvidia.com> Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-05-18 18:20:18 -07:00
Lakshmanan M	d956938d3f	gpu: nvgpu: Add load_timestamp_prod in grmgr init 1) Moved load_timestamp_prod handling in nvgpu_init_gr_manager(). 2) Moved fifo.reset_enable_hw in nvgpu_early_init() - In simulation/emulation/GPU standalone platform, XBAR, L2 and HUB are enabled during g->ops.fifo.reset_enable_hw(). This introduces a dependency to get the MIG map conf information. (if nvgpu_is_bpmp_running() == false treated as simulation/emulation/GPU standalone platform). Bug 3307879 JIRA NVGPU-6633 Change-Id: I4cba3a527de4723a6500f9658ec1dcadc23b37e3 Signed-off-by: Lakshmanan M <lm@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2528174 Tested-by: Antony Clince Alex <aalex@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com> Reviewed-by: Antony Clince Alex <aalex@nvidia.com> Reviewed-by: Dinesh T <dt@nvidia.com> Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2021-05-12 16:09:52 -07:00
Lakshmanan M	c041ad5b4b	gpu: nvgpu: split nvgpu power on sequence into 2 stages 1) nvgpu poweron sequence split into two stages: - nvgpu_early_init() - Initializes the sub units which are required to be initialized before the grgmr init. For creating dev node, grmgr init and its dependency unit needs to move to early stage of GPU power on. After successful nvgpu_early_init() sequence, NvGpu can indetify the number of MIG instance required for each physical GPU. - nvgpu_finalize_poweron() - Initializes the sub units which can be initialized at the later stage of GPU power on sequence. - grmgr init depends on the following HAL sub units, * device - To get the device caps. * priv_ring - To get the gpc count and other MIG config programming. * fb - MIG config programming. * ltc - MIG config programming. * bios, bus, ecc and clk - dependent module of priv_ring/fb/ltc. 2) g->ops.xve.reset_gpu() should be called before GPU sub unit initialization. Hence, added g->ops.xve.reset_gpu() HAL in the early stage of dGPU power on sequence. 3) Increased xve_reset timeout from 100ms to 200ms. 4) Added nvgpu_assert() for gpc_count, gpc_mask and max_veid_count_per_tsg for identify the GPU boot device probe failure during nvgpu_init_gr_manager(). JIRA NVGPU-6633 Change-Id: I5d43bf711198e6b3f8eebcec3027ba17c15fc692 Signed-off-by: Lakshmanan M <lm@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2521894 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: Debarshi Dutta <ddutta@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2021-04-29 14:23:48 -07:00
Lakshmanan M	3f8c562004	gpu: nvgpu: Add nvgpu_early_poweron() support 1) NvGpu dev node needs to be created in gpu power on early stage to avoid latency introduced by udevd. For creating dev node, device and grmgr init needs to move to early stage of GPU power on. After grmgr init, NvGpu can identify the number of MIG instance required for each physical GPU. For that, added a new API nvgpu_early_poweron() to handle early init which is required for before dev node creation. 2) Removed fifo dependency in nvgpu_init_gr_manager() 3) Used get_max_subctx_count() directly to query the veid/subctx count. JIRA NVGPU-6633 Change-Id: Ib9d7c3e184c71237b0da9305515ccd8ceda1d5ad Signed-off-by: Lakshmanan M <lm@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2517173 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-04-22 15:00:54 -07:00
Seshendra Gadagottu	21e1328ea1	gpu: nvgpu: add fb gops for set_atomic_mode Separated set_atomic_mode functionality from init_fs_state/enable_nvlink and created new fb gops for set_atomic_mode. In gpu init sequence, set_atomic_mode is called after acr_construct_execute to take care of design changes required for nvgpu-next architectures. Updated fb_gv11b_init_test to use set_atomic_mode gops along with init_fs_state. Bug 3268664 Change-Id: I1ab9eb21cc4cce77f3325c4e8821a75b6e85fba2 Signed-off-by: Seshendra Gadagottu <sgadagottu@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2508095 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-04-22 14:58:36 -07:00
absalam	3ec369d60a	gpu: nvgpu: Disable Clock Arbitor for TU104 This patch is to disable the clock arbitor for TU104. TU104 is not a POR for Drive 6.0 so disabling it to easy migration of clk arb for GA100. As a first step all the NVRM Clock tests will be skipped by setting NVGPU_SUPPORT_CLOCK_CONTROLS to false for TU104. Then clk arbitor will be rewritten for GA100 and enabled back. This patch implements by adding a new flag NVGPU_CLK_ARB_ENABLED which holds the status of clk arbitor for each platform and disables them for TU104 Bug 200699763 Change-Id: I51cd5c7821bdc0b48080c17a70735925b278ddf5 Signed-off-by: absalam <absalam@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2515086 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2021-04-20 07:47:38 -07:00
Lili Sang	3f0ea98b73	gpu: nvgpu: Add get_gr_context support for Linux. Implement the feature of retrieving gr context contents for all chips. Two IOCTLs, NVGPU_DBG_GPU_IOCTL_GET_GR_CONTEXT_SIZE and _GET_GR_CONTEXT, are added. Bug 3102903 Change-Id: If11006f4e294f190785a2c3159ca491b9f3b5187 Signed-off-by: Lili Sang <lilis@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2449183 Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Chris Johnson <cwj@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:48 -06:00
Antony Clince Alex	c36752fe3d	gpu: nvgpu: sim: make ring buffer independent of PAGE_SIZE The simulator ring buffer DMA interface supports buffers of the following sizes: 4, 8, 12 and 16K. At present, it is configured to 4K and it happens to match with the kernel PAGE_SIZE, which is used to wrap back the GET/PUT pointers once 4K is reached. However, this is not always true; for instance, take 64K pages. Hence, replace PAGE_SIZE with SIM_BFR_SIZE. Introduce macro NVGPU_CPU_PAGE_SIZE which aliases to PAGE_SIZE and replace latter with former. Bug 200658101 Jira NVGPU-6018 Change-Id: I83cc62b87291734015c51f3e5a98173549e065de Signed-off-by: Antony Clince Alex <aalex@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2420728 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Prateek sethi	223baa5883	gpu: nvgpu: add support for ACB SLCG on gv11b Register list for ACB SLCG is auto generated with scripts. Add HAL operations to enable/disable ACB clock gating. Bug 200647909 Change-Id: I4be4c14cc072fcccd91031a5a40321f5ff11f549 Signed-off-by: Prateek sethi <prsethi@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2420355 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Lakshmanan M	2ecb5feaad	gpu: nvgpu: Skip graphics CB programming for MIG Added logic to skip the following graphics CB allocation, map and programming sequence when MIG is enabled. Global CB: 1) NVGPU_GR_GLOBAL_CTX_CIRCULAR 2) NVGPU_GR_GLOBAL_CTX_PAGEPOOL 3) NVGPU_GR_GLOBAL_CTX_ATTRIBUTE 4) NVGPU_GR_GLOBAL_CTX_CIRCULAR_VPR 5) NVGPU_GR_GLOBAL_CTX_PAGEPOOL_VPR 6) NVGPU_GR_GLOBAL_CTX_ATTRIBUTE_VPR 7) NVGPU_GR_GLOBAL_CTX_RTV_CIRCULAR_BUFFER CTX CB: 1) NVGPU_GR_CTX_CIRCULAR_VA 2) NVGPU_GR_CTX_PAGEPOOL_VA 3) NVGPU_GR_CTX_ATTRIBUTE_VA 4) NVGPU_GR_CTX_RTV_CIRCULAR_BUFFER_VA JIRA NVGPU-5650 Change-Id: I38c2859ce57ad76c58a772fdf9f589f2106149af Signed-off-by: Lakshmanan M <lm@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2423450 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Rajesh Devaraj <rdevaraj@nvidia.com> Reviewed-by: Dinesh T <dt@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2020-12-15 14:13:28 -06:00
Debarshi Dutta	38ce6fa717	gpu: nvgpu: change unnamed structs to named structs Following changes are made in this patch. 1) Change unnamed structs within gpu_ops to named structs with the prefix gops_. 2) Each named struct gops_ are moved into a separate gops specific file under include/nvgpu/gops/ 3) struct gpu_ops is moved into a separate file include/nvgpu/gpu_ops.h and all other dependent struct gops_ are included in this header. 4) Direct references to include/nvgpu/gops are removed from files as its enough to include gk20a.h. Change-Id: Ieb22cb853be567e3bef14f5f8a04674eebd902ea Signed-off-by: Debarshi Dutta <ddutta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2398776 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Rajesh Devaraj <rdevaraj@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2020-12-15 14:13:28 -06:00
Deepak Nibade	a2809088eb	gpu: nvgpu: remove unnecessary hal gops.gr.gr_enable_hw() gops.gr.gr_enable_hw() is a common function and not referred on vGPU. Remove HAL pointer and directly use nvgpu_gr_enable_hw() instead. Jira NVGPU-5648 Change-Id: Id031024ed01f9d890cffb5902cc433800810b219 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2403548 Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Lakshmanan M <lm@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Deepak Nibade	8cccb49bd2	gpu: nvgpu: collapse nvgpu_gr_prepare_sw into nvgpu_gr_alloc common.gr unit exports a separate API nvgpu_gr_prepare_sw to initialize some SW pieces required for nvgpu_gr_enable_hw(). A separate API is really unnecessary since same initialization can be performed in nvgpu_gr_alloc(). Remove nvgpu_gr_prepare_sw() and HAL gops.gr.gr_prepare_sw(). Initialize falcon and interrupt structures in loop from nvgpu_gr_alloc(). Move nvgpu_netlist_init_ctx_vars() from nvgpu_gr_prepare_sw() to common init path since netlist parsing need not be done from common.gr unit. It just needs to happen before nvgpu_gr_enable_hw(). Also, trigger nvgpu_gr_free() from gr_remove_support() instead of OS specific paths. Also remove nvgpu_gr_free() calls from probe error paths since nvgpu_gr_alloc is no longer called in probe path. Move interrupt and falcon data structure free calls to nvgpu_gr_free(). Also remove corresponding unit testing code that tests nvgpu_gr_prepare_sw() specifically. Update some unit tests to initialize ecc counters and netlist. Disable some unit tests that fail for reasons unknown. Jira NVGPU-5648 Change-Id: I82ec8160f76530bc40e0c11a9f26ba1c8f9cf643 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2400166 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Alex Waterman	fba96fdc09	gpu: nvgpu: Replace nvgpu_engine_info with nvgpu_device Delete the struct nvgpu_engine_info as it's essentially identical to struct nvgpu_device. Duplicating data structures is not ideal as it's terribly confusing what does what. Update all uses of nvgpu_engine_info to use struct nvgpu_device. This is often a fairly straight forward replacement. Couple of places though where things got interesting: - The enum_type that engine_info uses is defined in engines.h and has a bit of SW abstraction - in particular the GRCE type. The only place this seemed to be actually relevant (the IOCTL providing device info to userspace) the GRCE engines can be worked out by comparing runlist ID. - Addition of masks based on intr_id and reset_id; those can be computed easily enough using BIT32() but this is an area that could be improved on. This reaches into a lot of extraneous code that traverses the fifo active engines list and dramtically simplifies this. Now, instead of having to go through a table of engine IDs that point to the list of all host engines, the active engine list is just a list of pointers to valid engines. It's now trivial to do a for-all-active-engines type loop. This could even be turned into a generic macro or otherwise abstracted in the future. JIRA NVGPU-5421 Change-Id: I3a810deb55a7dd8c09836fd2dae85d3e28eb23cf Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2319895 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Lakshmanan M	48f1da4dde	gpu: nvgpu: Add bundle skip sequence in MIG mode In MIG mode, 2D, 3D, I2M and ZBC classes are not supported by GR engine. So skip those bundle programming sequence in MIG mode. JIRA NVGPU-5648 Change-Id: I7ac28a40367e19a3e31e63f3e25991c0ed4d2d8b Signed-off-by: Lakshmanan M <lm@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2397912 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2020-12-15 14:13:28 -06:00
Tejal Kudav	71b005c1ef	gpu: nvgpu: Enter Quiesce if GPU drops off the bus Currently, we reboot the entire system using kernel_restart() if the GPU registers become inaccessible due to GPU disappearing from the bus. GPU hitting high temperatures is one of the reasons we might end up in above scenario. Replace kernel_restart() with quiesce call as a more graceful way of notifying about GPU's unavailability. While entering quiesce state, make sure we do not trigger any register accesses which are bound to fail in this case. Bug 2919899 Change-Id: Ia9d413e04c7d205752414ff3e892f055c4363cce Signed-off-by: Tejal Kudav <tkudav@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2398801 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Vedashree Vidwans	ae25924393	gpu: nvgpu: print enabled_flags after poweron GPU enabled_flags indicate features supported by nvgpu. Add nvgpu_print_enabled() to print GPU enabled_flags. Print flag value after poweron complete to help during debug. Add verbose function to print flag name and status if gpu_dbg_info is set. JIRA NVGPU-5838 Change-Id: I3b0ddb8c6872f4f3b6101050da087ff553c16f84 Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2383531 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Deepak Nibade	010f818596	gpu: nvgpu: initialize gr struct in poweron path struct nvgpu_gr is right now initialized during probe and from OS specific code. To support multiple instances of graphics engine, nvgpu needs to initialize nvgpu_gr after number of engine instances have been enumerated in poweron path. Hence move nvgpu_gr_alloc() to poweron path and after gr manager has been initialized. Some of the members of nvgpu_gr are initialized in probe path and they too are in OS specific code. Move them to common code in nvgpu_gr_alloc() Add field fecs_feature_override_ecc_val to struct gk20a to store the override flag read from device tree. This flag is later copied to nvgpu_gr in poweron path. Update tpc_pg_mask_store() to check for g->gr being NULL before accessing golden image pointer. Update tpc_fs_mask_store() to return error if g->gr is not initialized. This path needs nvgpu_gr struct initialized. Also fix the incorrect NULL pointer check in tpc_fs_mask_store() which breaks the write path to this sysfs. Jira NVGPU-5648 Change-Id: Ifa2f66f3663dc2f7c8891cb03b25e997e148ab06 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2397259 Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Lakshmanan M <lm@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Lakshmanan M	2a6fcec078	gpu: nvgpu: add gr manager ops-2 and mig infra-2 This CL covers the code changes related to following support, - Enabled gr manager ops. - Added gr manager init/remove support. - Refactor in gpu instance config infra. - Refactor in gr syspipe gpcs config infra. JIRA NVGPU-5645 JIRA NVGPU-5646 Change-Id: Ib2fab2796d76fe105fc5a08f2c5f9bfa36317f7c Signed-off-by: Lakshmanan M <lm@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2393550 Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2020-12-15 14:13:28 -06:00
Deepak Nibade	08308bc936	gpu: nvgpu: rework pm resource reservation system Current PM resource reservation system is limited to HWPM resources only. And reservation tracking is done using boolean variables. New upcoming profiler support requires reservation for all the PM resources like SMPC and PMA stream. Using boolean variables is not scalable and confusing. Plus the variables have to be replicated on gpu server in case of virtualization. Remove flag tracking mechanism and use list based approach to track all PM reservations. Also, current HALs are defined on debugger object. Implement new HALs in new pm_reservation object since it is really an independent functionality. Add new source file common/profiler/pm_reservation.c which implements functions to reserve/release resources and to check if any resource is reserved or not. Add common/vgpu/pm_reservation_vgpu.c for vGPU which simply forwards the request to gpu server. Define new HAL object gops.pm_reservation and assign above functions to below respective HALs : g->ops.pm_reservation.acquire() g->ops.pm_reservation.release() g->ops.pm_reservation.release_all_per_vmid() Last HAL above is only used for gpu server cleanup of guest OS. Add below new common profiler functions that act as APIs to reserve/ release resources for rest of the units in nvgpu. nvgpu_profiler_pm_resource_reserve() nvgpu_profiler_pm_resource_release() Initialize the meta data required for reservtion system in nvgpu_pm_reservation_init() and call it during nvgpu_finalize_poweron. Clean up the meta data before releasing struct gk20a. Delete below HALs : g->ops.debugger.check_and_set_global_reservation() g->ops.debugger.check_and_set_context_reservation() g->ops.debugger.release_profiler_reservation() Bug 2510974 Jira NVGPU-5360 Change-Id: I4d9f89c58c791b3b2e63099a8a603462e5319222 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2367224 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
lm	83cb8be984	nvgpu: linux: uapi: Add MIG new caps 1) In MIG mode, 2D, 3D, I2M and ZBC classes are not supported by GR engine. NvGpu shall expose the HWCaps through "struct nvgpu_gpu_characteristics". 2) NvGpu shall expose the following MIG related new caps through "struct nvgpu_gpu_characteristics". * mig_enabled - Flag to indicate whether MIG is enabled/disabled. * gpu_instance_id - GPU instaces Id. * gr_instance_id - graphics execution unit id. * gr_sys_pipe_id - Sys pipe id of GR engine. 3) populate num_ppc_per_gpc - Pixel Processing cluster per GPC 4) populate max_veid_count_per_tsg - Maximum veid count per TSG 5) populate num_sub_partition_per_fbpa - Sub partition per FBPA. JIRA NVGPU-5762 Change-Id: I06b5bcd3f568eb0b9c78c8fc6ce155b39aaeaba5 Signed-off-by: lm <lm@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2352100 Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: Automatic_Commit_Validation_User Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Alex Waterman	8c5972ac7f	gpu: nvgpu: Move device de-init call Move the device de-init call to when the gk20a struct is being freed; the device list can live for as long as the gk20a struct does. This will be a problem later, since the current location causes the device structs to get freed and allocatoed over and over. That'll cause gross corruption in the FIFO code when the engine_info struct is replaced with pointers to the device structs. JIRA NVGPU-5421 Change-Id: If4e08ea88dbcae7acd599e3fad29f72ece63b8e0 Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2361269 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Alex Waterman	319520ff57	gpu: nvgpu: Add a new device manager unit This adds a new device management unit in the common code responsible for facilitating the parsing of the GPU top device list and providing that info to other units in nvgpu. The basic idea is to read this list once from HW and store it in a set of lists corresponding to each device type (graphics, LCE, etc). Many of the HALs in top can be deleted and instead implemented using common code parsing the SW representation. Every time the driver queries the device list it does so using a device type and instance ID. This is common code. The HAL is responsible for populating the device list in such a way that the driver can query it in a chip agnostic manner. Also delete some of the unit tests for functions that no longer exist. This code will require new unit tests in time; those should be quite simple to write once unit testing is needed. JIRA NVGPU-5421 Change-Id: Ie41cd255404b90ae0376098a2d6e9f9abdd3f5ea Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2319649 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Tejal Kudav	4dcfbc19de	gpu: nvgpu: Trigger quiesce on spurious FBPA intr In Bug 200588835, the spurious FBPA interrupts are seen on couple of boards. These interrupts were found to be EDC (Error detection and Correction) interrupts which are triggered due to ECC errors. The EDC registers are not exposed to the driver, so the interrupt status register cannot be cleared; resulting in interrupt storm. Also, it was concluded that only bad HW can cause this failure scenario. So, in the ISR for FBPA interrupts, get the GPU into quiesce state as we don't expect the GPU to be in usable state post such unrecoverable errors. Adapt the quiesce code for Linux build too. 1. On Linux, we cannot exit the nvgpu process after quiesce like we do on QNX. So, add nvgpu_disable_irqs() call to quiesce implementation which is done as part of process exit handler on QNX. Masking interrupts which is already done as part of quiesce would be sufficient in most cases, but to be fail-safe disable_irqs too. 3. Also, the IOCTL code looks at g->sw_ready, hence add nvgpu_start_gpu_idle() to set g->sw_ready to false along with setting NVGPU_DRIVER_IS_DYING = true. We expect the nvgpu_sw_quiesce() call to finish before quiesce thread wakes up from 50ms sleep. Hence, critical step like nvgpu_start_gpu_idle() is added to nvgpu_sw_quiesce(), whereas the somewhat redundant disable IRQs call is added to quiesce thread. nvgpu_fifo_quiesce() was called twice by mistake; remove one of the them. Bug 2919899 Bug 200588835 Change-Id: I9beec688c2e1c0d8dfc1327ddf122684576f8684 Signed-off-by: Tejal Kudav <tkudav@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2354537 Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: Automatic_Commit_Validation_User Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Vedashree Vidwans	fc5b45ea83	gpu: nvgpu: move init_ltc_support sequence Currently, ltc fs_state is initialized during ltc init support. However, ltc cbc_param and cbc_param2 registers do not seem to be providing correct data if ltc.init_fs_state is called before fb.init_fs_state. - Create fb.init_fb_support hal to initialize fb. - Trigger init_fb_support before init_ltc_support. Bug 2969956 Bug 2957808 JIRA NVGPU-4666 Change-Id: I54d697d27b9d9c6318c4ef459d215b6f82cd5571 Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2345673 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
tkudav	957b19092f	gpu: nvgpu: Enable Quiesce on all builds Make Recovery and quiesce co-exist to support quiesce state on unrecoverrable errors. Currently, the quiesce code is wrapped under ifndef CONFIG_NVGPU_RECOVERY. Isolate the quiesce code from recovery config, thereby enabling it on all builds. On Linux, the hung_task checker(check_hung_uninterruptible_tasks() in kernel/hung_task.c) complains that quiesce thread is stuck for more than 120 seconds. INFO: task sw-quiesce:1068 blocked for more than 120 seconds. The wait time of more than 120 seconds is expected as quiesce thread will wait until quiesce call is triggered on fatal unrecoverable errors. However, the INFO print upsets the kernel_warning_test(KWT) on Linux builds. To fix the failing KWT, change the quiesce task to interruptible instead of uninterruptible as checker only looks at uninterruptible tasks. Bug 2919899 JIRA NVGPU-5479 Change-Id: Ibd1023506859d8371998b785e881ace52cb5f030 Signed-off-by: tkudav <tkudav@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2342774 Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: Automatic_Commit_Validation_User Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Sami Kiminki	23cda4f4a9	gpu: nvgpu: add PDI for TU104 (Linux) Add reporting for the per-device identifier (PDI) in the Linux GPU characteristics. Implement PDI read for TU104. Bug 2957580 Signed-off-by: Sami Kiminki <skiminki@nvidia.com> Change-Id: I6ac0e4f74378564d82955b431d4c1fd6c0daeb13 Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2346933 Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: Lakshmanan M <lm@nvidia.com> Reviewed-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2020-12-15 14:13:28 -06:00
Konsta Hölttä	dd2fb50a1a	gpu: nvgpu: require deferred cleanup for aggressive sync destroy Aggressive sync destroy is used on some platforms where the amount of syncpoints is limited. It can cause sync objects to get allocated and freed in the submit path and when jobs are cleaned up, so require deferred cleanup. Allocations do not belong to job tracking in a deterministic submit path. Although this has been technically allowed before, deterministic channels have likely not been a priority on those old platforms with aggressive sync destroy set. Update virtualized gp10b platform data to match on a gp10b-vgpu compat string instead of gk20a-vgpu. gk20a (Tegra T124) hasn't been supported for a long time. Delete the aggressive sync destroy field from this platform. It's got enough syncpoints to not dynamically allocate them; having this property set for gp10b-vgpu has likely been a mistake. This is not a completely pure cherry-pick: also extend the gpu characteristics to not advertise full deterministic submit support when aggressive sync destroy is off. This platform flag cannot be adjusted by the user unlike many other flags. Jira NVGPU-4548 Change-Id: I283f546d48b79ac94b943d88e5dce55710858330 Signed-off-by: Konsta Hölttä <kholtta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2322042 (cherry picked from commit b1ba2b997b2174e365bcb0782ef3e67260ff9e57) Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2328411 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Konsta Hölttä	4f80c6b8a9	gpu: nvgpu: add channel_user_syncpt Refactor user managed syncpoints out of the channel sync infrastructure that deals with jobs submitted via the kernel api. The user syncpt only needs to expose the id and gpu address of the reserved syncpoint. None of the rest (fences, priv cmdbufs) is needed for that, so it hasn't been ideal to couple with the user-allocated syncpts. With user syncpts now provided by channel_user_syncpt, remove the user_managed flag from the kernel sync api. This allows moving all the kernel submit sync code to be conditionally compiled in only when needed, and separates the user sync functionality in a more clear way from the rest with a minimal API. [this is squashed with commit 5111caea601a (gpu: nvgpu: guard user syncpt with nvhost config) from https://git-master.nvidia.com/r/c/linux-nvgpu/+/2325009] Jira NVGPU-4548 Change-Id: I99259fc9cbd30bbd478ed86acffcce12768502d3 Signed-off-by: Konsta Hölttä <kholtta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2321768 (cherry picked from commit 1095ad353f5f1cf7ca180d0701bc02a607404f5e) Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2319629 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Konsta Hölttä	d0ffb335dc	gpu: nvgpu: move nvgpu_has_syncpoints nvgpu_has_syncpoints is more general than a channel synchronization related, so move it to nvhost.c from channel_sync.c. Move the declaration from gk20a.h to nvhost.h. As the debugfs knob is Linux related, move it from struct gk20a to struct nvgpu_os_linux. Jira NVGPU-4548 Change-Id: I4236086744993c3daac042f164de30939c01ee77 Signed-off-by: Konsta Hölttä <kholtta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2318814 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Philip Elcan	20a4080be0	gpu: nvgpu: quiesce: stop thread gracefully Previously, nvgpu_sw_quiesce_remove_support() stopped the quiesce thread abruptly with nvgpu_thread_stop(), which could mean the thread was killed while still waiting on the cond. Then when the cond was destroyed, there may be an error since the underlying implementation may think there is still a thread waiting (such as the Posix implementation). Change nvgpu_sw_quiesce_remove_support() to use nvgpu_thread_stop_graceful() and signal the cond in the callback after the thread is marked to be stopped. The quiesce thread will then wake up from the cond wait and see the thread should stop. JIRA NVGPU-4987 Change-Id: I29322d7867acc33a91092016c540e00bb1ae945a Signed-off-by: Philip Elcan <pelcan@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2306024 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Automatic_Commit_Validation_User Reviewed-by: Thomas Fleury <tfleury@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Vaibhav Kachore	bbb63c0a8c	gpu: nvgpu: remove "trace/events/gk20a.h" from QNX build - "include/trace/events/gk20a.h" file was having GPL2 license (which should not used for QNX code). This file was used for compiling linux userspace driver("libnvgpu-drv.so") and was used for unit testing on QNX. - This patch removes stubs in "include/trace/events/gk20a.h" file. (which were used for linux userspace driver.) - For QNX driver, "nvgpu_rmos/trace/events/gk20a.h" was used. This patch moves that file to "include/nvgpu/posix/trace_gk20a.h" and does relevant license change. This same file will be used for linux userspace driver. - This patch also creates a new file "include/nvgpu/trace.h" which selects proper trace file depending on the config. Bug 2802414 Change-Id: Icdfb251e5698073f986753a969e804161af3ecc5 Signed-off-by: Vaibhav Kachore <vkachore@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2286388 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Thomas Fleury	e257f96911	gpu: nvgpu: use cond signal for SW quiesce nvgpu_cond_broadcast error code is not checked in nvgpu_sw_quiesce, which causes a Coverity violation. Use nvgpu_cond_signal instead, since only one thread needs to be woken up. Jira NVGPU-4512 Change-Id: I4f6c3956f792487ba9c1eed09db09fd86ac56ffe Signed-off-by: Thomas Fleury <tfleury@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2286056 Reviewed-by: Automatic_Commit_Validation_User Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Philip Elcan <pelcan@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
ajesh	1041167668	gpu: nvgpu: remove usage of __must_check Remove the usage of __must_check compiler directive. Also rename __user as nvgpu_user and make the required changes for linux and posix builds. Jira NVGPU-4903 Change-Id: If4a18761cca84eb12e0babc0d528666673fca9e8 Signed-off-by: ajesh <akv@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2283404 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2020-12-15 14:10:29 -06:00
Thomas Fleury	d5833d1b8e	gpu: nvgpu: add BUG callbacks to SW quiesce After initializing support for SW quiesce, register callback to be invoked in case of BUG(). The callback will invoke nvgpu_sw_quiesce with "g" parameter. Jira NVGPU-4512 Change-Id: Id6bd73268d832e003cf66534bd0cbaa4b1f32a6c Signed-off-by: Thomas Fleury <tfleury@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2283011 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: Automatic_Commit_Validation_User Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:10:29 -06:00

1 2 3 4

167 Commits