linux-nvgpu

mirror of git://nv-tegra.nvidia.com/linux-nvgpu.git synced 2025-12-23 18:16:01 +03:00

Author	SHA1	Message	Date
Debarshi Dutta	60aab0a1da	gpu: nvgpu: add null check before calling function pointer nvgpu_gsp_isr_support is called from the common code and results in a null pointer exception when calling g->ops.gsp.enable_irq when its not defined for some chips. Fix that. Bug 200763510 Change-Id: Ifef0d31ac4a8d06120bcebc17daf4a5b6559e3c3 Signed-off-by: Debarshi Dutta <ddutta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2593355 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2021-09-16 21:45:49 -07:00
Debarshi Dutta	9328f057a7	gpu: nvgpu: fix use-after-free use case of CE APP. The following issue is reported when running sudo modprobe -r nvgpu [ 134.066392] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000058 [ 134.066428] Mem abort info: [ 134.066431] ESR = 0x96000004 [ 134.066434] EC = 0x25: DABT (current EL), IL = 32 bit [ 134.066450] [0000000000000058] pgd=0000000000000000, p4d=0000000000000000 [ 134.066459] Internal error: Oops: 96000004 [#1] PREEMPT_RT SMP [ 134.066639] pc : nvgpu_cic_rm_wait_for_stall_interrupts+0x78/0xd0 [nvgpu] [ 134.066847] lr : nvgpu_cic_rm_wait_for_stall_interrupts+0x74/0xd0 [nvgpu] [ 134.067043] sp : ffff80001971ba80 [ 134.067046] x29: ffff80001971ba80 x28: ffff000093b0da00 [ 134.067054] x27: 0000000000000000 x26: ffff80001c28b990 [ 134.067061] x25: ffff00008cd01000 x24: 0000000000000bb8 [ 134.067067] x23: 0000000000000000 x22: ffff0000915b0000 [ 134.067073] x21: ffff000093b0da00 x20: ffff0000915b0000 [ 134.067079] x19: ffff0000915b0000 x18: 0000000000000036 [ 134.067085] x17: 0000000000000000 x16: 0000000000000000 [ 134.067091] x15: ffff8000126b5fd8 x14: 7373616c633d4d45 [ 134.067097] x13: ffff8000098abef0 x12: 0000000000000000 [ 134.067102] x11: ffff8000098ab5a0 x10: ffff8000098abef8 [ 134.067108] x9 : ffff80001010e844 x8 : ffff80001971ba48 [ 134.067115] x7 : 2222222222222222 x6 : ffff000093b0da00 [ 134.067122] x5 : ffff8000098b1fd8 x4 : 0000000000000000 [ 134.067127] x3 : 0000000000000000 x2 : 0000000000000000 [ 134.067133] x1 : 0000000000000000 x0 : 0000000000000000 [ 134.067138] Call trace: [ 134.067140] nvgpu_cic_rm_wait_for_stall_interrupts+0x78/0xd0 [nvgpu] [ 134.067328] nvgpu_cic_rm_wait_for_deferred_interrupts+0x20/0xb0 [nvgpu] [ 134.067517] nvgpu_channel_deferred_reset_engines+0x29c/0x920 [nvgpu] [ 134.067714] nvgpu_channel_close+0x18/0x20 [nvgpu] [ 134.067904] nvgpu_init_pramin+0x2ac/0x350 [nvgpu] [ 134.068092] nvgpu_ce_app_destroy+0x94/0xe0 [nvgpu] [ 134.068279] nvgpu_put+0x90/0x120 [nvgpu] [ 134.068465] nvgpu_pci_shutdown+0x29c/0x18a0 [nvgpu] [ 134.068655] pci_device_remove+0x44/0xe0 [ 134.068665] device_release_driver_internal+0x114/0x1f0 [ 134.068701] driver_detach+0x54/0xe0 [ 134.068709] bus_remove_driver+0x70/0x120 [ 134.068733] driver_unregister+0x34/0x60 The above issue occurs due to freeing of CIC resources earlier than dependent users of interrupts e.g. CDE, CE etc. As a solution, move CIC deinit sequence to end of nvgpu_put. This handles deinit properly for VGPU/IGPU/DGPU. Bug 200763510 Change-Id: I696e31d5e03a9468cccfe710048000dbf7cf0269 Signed-off-by: Debarshi Dutta <ddutta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2592063 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-09-16 21:45:43 -07:00
Tejal Kudav	9b5274593c	gpu: nvgpu: Update common.ptimer documentation Enhance doxygen comments for below common.ptimer APIs: 1. nvgpu_scale_ptimer() 2. gops_ptimer.isr() Remove assert calls from nvgpu_scale_ptimer() as it now has a means to return error. Reorder the Ptimer ISR code for better logical flow. JIRA NVGPU-6989 Change-Id: I5adf4d665d3b90d3e9b11557a15fcb91e485f353 Signed-off-by: Tejal Kudav <tkudav@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2583667 (cherry picked from commit 502ab9ee2dc3f3b7b1da7ac59f13fddce4ead616) Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2592057 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-09-16 05:59:13 -07:00
Tejal Kudav	5a94007725	gpu: nvgpu: Remove redundant HAL from common.fbp common.fbp has two interfaces to initialize FBP: 1. Public API nvgpu_fbp_init_support 2. HAL fbp.fbp_init_support nvgpu_fbp_init_support() is only used to initialize HAL fbp.fbp_init_support. Remove the HAL and use the API directly. JIRA NVGPU-6644 Change-Id: I2c455e09dbcf5e4fb1dc370b284e4f0d5c678b40 Signed-off-by: Tejal Kudav <tkudav@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2592047 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-09-16 05:59:00 -07:00
Debarshi Dutta	791dc18666	gpu: nvgpu: bvec for struct nvgpu_tsg_sm_error_state fields Add Setter and Getter methods for accessing tsg->sm_error_states. Getter returns a constant pointer for struct nvgpu_tsg_sm_error_state. This renders it unnecessary to add BVEC for above fields for the struct in multiple locations. The current design ensures that only a constant pointer is obtained from the owner unit i.e. FIFO. The following new methods are added. Both unit tests and BVEC tests are added for them as well. nvgpu_tsg_store_sm_error_state nvgpu_tsg_get_sm_error_state Jira NVGPU-6947 Change-Id: I82c22a2774862c8579baa41b6fb8292fa164704a Signed-off-by: Debarshi Dutta <ddutta@nvidia.com> (cherry picked from commit 79574638671a0c6efe41cd3423668fcd1bd96826) Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2556938 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com> Reviewed-by: Shashank Singh <shashsingh@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2021-09-13 20:57:09 -07:00
ajeshkv	118f8c1280	gpu: nvgpu: add support for gsp stress test Add debugfs entries to support GSP stress test and other functionalities to enable the test. JIRA CORERM-3382 Change-Id: Iab20fcfe78807e76e91c64716502a2f036ed4d18 Signed-off-by: ajeshkv <akv@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2589390 Reviewed-by: Amit Pabalkar <apabalkar@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-09-10 16:02:43 -07:00
deepak goyal	cc7b048641	gpu: nvgpu: non-zero blob size for rail-gating. Ucode blob size 0 is passed currently for rail-gating. Ucode blob size 0 is not supported by ACR yet. ACR will copy UCODE blob again to SYSMEM for GPU Rail-gating cycles. Bug 3361416 Change-Id: I1fdb3993cda7e5d62507d83f9c0a8645dc5f7fc7 Signed-off-by: deepak goyal <dgoyal@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2588207 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com> Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2021-09-09 09:16:37 -07:00
Debarshi Dutta	a53ebf02d1	gpu: nvgpu: update error message to info. These errors are now actually expected from code that counts number of sys/gpc/fbp perfmons after first context creation. Nvgpu tries to count them by register offset lookup in context image and counts perfmons until invalid offset is found. nvgpu_gr_hwmp_map_find_priv_offset no longer prints an error message. The correct error condition is moved to gr_exec_reg_ops Bug 200755537 Change-Id: Ib5c6ccd39275b2b06e3f8bce4878a3234478a780 Signed-off-by: Debarshi Dutta <ddutta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2586228 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: svcacv <svcacv@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-09-09 09:13:03 -07:00
Sagar Kadamati	dd9b4364aa	gpu: nvgpu: add nvgpu-next infrastructure * As of now, working on multiple chip bringup in nvgpu-next repo has an issue because we end with losing control on source code (hard to find which part of the code belongs to which chip) and it's valuable history this affects chip migration on release. * To support multiple chip bringup simultaneously, we need new guidelines to avoid losing control on source code and make migration easier. This change adds links to nvgpu-next repo. * Updated return code to ENODEV for consistency * Updated ACR unittest to work with ENODEV return code NOTE: These are the initial set of infrastructure changes, guidelines will evolve, and source code will get updated accordingly. Based on future chip features, Which part of the source code falls under nvgpu-next repo is decided. JIRA NVGPU-6574 Change-Id: I81827e35d189c55554df00e255b527a4473e0338 Signed-off-by: Sagar Kadamati <skadamati@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2556793 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2021-09-08 06:50:38 -07:00
Konsta Hölttä	9ffcb0fade	gpu: nvgpu: log submit error reasons For each common error that may happen in the submit path, log the failure reason at info level if not already logged. Various mistakes may cause -EINVAL, and getting to know what is wrong is helpful when writing tests. Change-Id: I8ac2a40441e0bf3d8afdb40526b607537eb5105c Signed-off-by: Konsta Hölttä <kholtta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2587360 Reviewed-by: svcacv <svcacv@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Dinesh T <dt@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-09-07 16:00:50 -07:00
Divya Singhatwaria	b6ab227016	gpu: nvgpu: Enable pmu interrupt - For secure RISCV boot, enable pmu interrupt during pmu_rtos_init - As interrupts are enabled, PMU intr can be received before driver has changed the pmu firmware state. This can cause the RISCV boot to fail. - To resolve this, first change the pmu firmware state from off to PMU_FW_STATE_STARTING and then wait for pmu priv lockdown release. Change-Id: Ib2e8b033fec6320bf9ccff02696192a48172464b Signed-off-by: Divya Singhatwaria <dsinghatwari@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2586325 Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Mahantesh Kumbar <mkumbar@nvidia.com> Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com> Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2021-09-07 16:00:05 -07:00
dt	152d7c9edd	gpu: nvgpu: Fix for pes_tpc_mask programming After CONFIG_UBSAN kernel compilation flag to know any shifting cause overflow or not enablement ,this is identified. The register "gr_fe_tpc_fs_r(gpc_index)" is read only after Volta. The gops where we are computing the index is not needed. Bug 200727116 Change-Id: Ib2306103389ba9df77fd59d012ec70e775104989 Signed-off-by: dt <dt@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2573296 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-09-07 15:59:48 -07:00
dt	9355345610	gpu: nvgpu: Add IPA-PA cache to increase the performance When GPU need to programmed with PA(physical address), given IPA need to be converted to PA by querying Hypervisor. As this is an IPC between OSes, the call will reduce the performance badly. So this is adding a IPA-PA cache to improve the performance. This will be more helpful in passthr config. Bug 3277194 Change-Id: I6a3230d858977313a0ed0f33068055a3b516330a Signed-off-by: dt <dt@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2571814 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-09-07 10:28:58 -07:00
Ramesh Mylavarapu	ffd0d3962f	gpu: nvgpu: gsp: gsp isr and debug trace support - Created GSP NVRISCV interrupt handle and respective functions and register reads. - Created Debug trace support for GSP firmware. NVGPU-7084 Signed-off-by: Ramesh Mylavarapu <rmylavarapu@nvidia.com> Change-Id: I2728150c4db00403aa6e3c043bc19c51677dd9cf Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2589430 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-09-07 05:37:51 -07:00
Debarshi Dutta	33740b41b6	gpu: nvgpu: free memory during module removal Following pointers(allocated via Kmalloc/DMA) aren't freed during module removal. struct nvgpu_gr_config -> gpc_tpc_mask_physical struct nvgpu_netlist_vars -> ctxsw_regs.etpc.l struct mm_gk20a -> sysmem_flush struct nvgpu_pmu_pg -> pg_buf SGTable corresponding to VPR secure buffer. Added appropriate free calls. Bug 3364181 Change-Id: I2105c1f3256b1910f0f514d98f0ee3ae2e34aff7 Signed-off-by: Debarshi Dutta <ddutta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2586244 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com> Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2021-09-02 15:43:07 -07:00
Sagar Kamble	ed16377983	gpu: nvgpu: allocate comptags and store metadata in REGISTER_BUFFER ioctl To enable userspace query about comptags allocation status of a buffer, comptags are to be allocated only during buffer registration done by nvrm_gpu. Earlier, they were allocated during map. nvrm_gpu will be sending metadata blob to be associated with the buffer. This will have to be stored in the dmabuf privdata for all the buffers registered by nvrm_gpu. This patch moves the privdata allocation to buffer registration ioctl. Remove g->mm.priv_lock as it is not needed now. This lock was added to protect dmabuf private data setup. That private data is now handled through dmabuf->ops and setup of dmabuf->ops is done under dmabuf->lock. To support legacy userspace, this patch still allocates comptags on demand on map calls for unregistered buffers. Bug 200586313 Change-Id: I88b2ca04c733dd02a84bcbf05060bddc00147790 Signed-off-by: Sagar Kamble <skamble@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2480761 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-09-02 11:42:08 -07:00
Sagar Kamble	7410784b0b	gpu: nvgpu: fix clk_arb completion file private data access race clk_arb completion file descriptor can get closed immediately after poll finishes in the work item gp10b_clk_arb_run_arbiter_cb. In that case, the refcount for nvgpu_clk_dev can become zero in the work item and can lead to invalid access while removing nvgpu_clk_dev from the lists. Remove nvgpu_clk_dev from the list before dropping the reference to it. Also, delete the nvgpu_clk_dev in completion file release handler within the session and requests spinlocks to avoid race with gp10b_clk_arb_run_arbiter_cb using it. bug 200757277 Change-Id: I054eee547f2a6fa633d7ef55df216ec36647a826 Signed-off-by: Sagar Kamble <skamble@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2569522 (cherry picked from commit `ce8548ec05`) Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2587070 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-09-01 09:50:11 -07:00
deepak goyal	77d1e765f5	gpu: nvgpu: ga10b: Fix logic for BROM pass status Current code assumes riscv brom passed if it does not times out. This patch explicitly checks for brom pass/fail or timeout. Bug 3361416 Change-Id: I399a6cf9d32be92b24990532f81892642513ba54 Signed-off-by: deepak goyal <dgoyal@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2585786 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-08-31 08:54:35 -07:00
Deepak Nibade	3c97f3b932	gpu: nvgpu: disallow binding more channels than MAX channels supported per TSG There is HW specific limit on number of channel entries that can be added for each TSG entry in runlist. Right now there is no checking to enforce this from SW and hence if User binds more than supported channels to same TSG, invalid TSG formation error interrupts are generated. Fix this by adding appropriate checks in below steps : - Add new field ch_count to struct nvgpu_tsg to keep track of channels bound to TSG. - Define new hal gops.runlist.get_max_channels_per_tsg() to retrieve HW specific maximum channel count per TSG. - Implement the HAL for gk20a and gv11b chips, and assign new HALs for all chips appropriately. - Increment ch_count while binding the channel to TSG and decrement it while unbinding. - While binding channel to TSG, Check if current channel count is already equal to max channel count. If yes, print an error and bail out. Bug 200763991 Change-Id: Ic5f17a52e0fb171d1c020bf4f085f57cdb95f923 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2582095 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2021-08-25 09:47:47 -07:00
Debarshi Dutta	608decf1e6	gpu: nvgpu: add support for powering off gpu Add support for powering off IGPU for switching between legacy to SMC mode/vice-versa or changing SMC configuration. The power off can be issued as follows echo 0 > /dev/nvgpu/igpu0/power The following steps are done during a poweroff. 1) Deterministic channel idle 2) Acquire write_lock on l->busy semaphore. 3) Wait till power_usage decrements to indicate 0 active jobs. 4) Invoke pm_runtime_put_sync_suspend() 5) Invoke nvgpu_gr_remove_support() to clear existing GR memory. 6) Release write_lock on l->busy 7) Deterministic channel unidle. Part of the sequence matches that of the gk20a_do_idle code. The common parts are extracted into new functions gk20a_block_new_jobs_and_idle() and gk20a_unblock_jobs() For joint-rail case, the current implementation, does a railgate and then sets pm_runtime_set_autosuspend_delay(-1) to disable regular runtime resume/suspend. Remove clearing of NVGPU_SUPPORT_MIG status during state change ias it leads to inconsistencies. Jira NVGPU-6920 Change-Id: I0b3eb3278176122ac061c1e8a94ebfb3c17c3925 Signed-off-by: Debarshi Dutta <ddutta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2578501 Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com> Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: Antony Clince Alex <aalex@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2021-08-23 05:27:50 -07:00
Debarshi Dutta	2e3c3aada6	gpu: nvgpu: fix deinit of GR Existing implementation of GR de-init doesn't account for multiple instances of struct nvgpu_gr. As a fix, below changes are added. 1) nvgpu_gr_free is unified for VGPU as well as native. 2) All the GR instances are freed. 3) Appropriate NULL checks are added when freeing GR memories. 4) 2D, 3D, I2M and ZBC etc are explicitely disabled when MIG is set. 5) In ioctl_ctrl, checks are added to not return error when zbc is NULL for VGPU as requests are rerouted to RMserver. Jira NVGPU-6920 Change-Id: Icaa40f88f523c2cdbfe3a4fd6a55681ea7a83d12 Signed-off-by: Debarshi Dutta <ddutta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2578500 Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com> Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com> Reviewed-by: Dinesh T <dt@nvidia.com> Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: Antony Clince Alex <aalex@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2021-08-23 05:27:45 -07:00
Mahantesh Kumbar	b9696ee643	gpu: nvgpu: ga10b: update NVRISCV LSPMU - Set NVRISCV LSPMU app version to 0. - Setting app version to 0 helps to load and boot multiple LSPMU ucode's without modifying the NVGPU driver. - Add support for PMU NVRISCV prod and dbg bin's. - This is corresponding change to LSPMU MPSK CL https://git-master.nvidia.com/r/c/tegra/kernel-firmware-t18x/+/2576049 JIRA NVGPU-7061 Change-Id: I800953ca97af3badde1983aa99e09b4fe7453203 Signed-off-by: Mahantesh Kumbar <mkumbar@nvidia.com> Signed-off-by: mkumbar <mkumbar@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2575341 Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com> Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-08-22 11:05:03 -07:00
Richard Zhao	d8e847c90d	gpu: nvgpu: vgpu: fix force preemption from debugfs check whether there's any force_preemption_gfxp or force_preemption_cilp set in debugfs when alloc obj_ctx. Jira GVSCI-4658 Signed-off-by: Richard Zhao <rizhao@nvidia.com> Change-Id: I87fc7e195c9b0f7ed29ec6c37c8f46b456625fea Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2579218 Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com> Reviewed-by: Sagar Kamble <skamble@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-08-19 14:06:44 -07:00
Vedashree Vidwans	e13ab1f9ea	gpu: nvgpu: pmu: remove hw access from remove_pmu_support GPU HW registers are locked before remove_pmu_support. Remove functions accessing HW registers. Bug 3357477 Change-Id: I34a1923bfdb3afacd462f2646e2821569573a81a Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2577627 Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com> Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com> Reviewed-by: Seema Khowala <seemaj@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: Seshendra Gadagottu <sgadagottu@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-08-17 09:45:42 -07:00
Sagar Kamble	f4571194b0	gpu: nvgpu: stop ELPG init thread during unload ELPG initialization thread creation can fail when the process is killed. That leads to driver resume failure. That thread was stopped on suspend and re-created on resume. To avoid the issue above, don't stop the ELPG thread in suspend and let the first created thread handle the ELPG state transitions always. And stop the ELPG thread during unload. Also fix couple of instances of config flag as: s/CONFIG_PMU_POWER_PG/CONFIG_NVGPU_POWER_PG bug 3345977 Change-Id: I8952edf8d1664ed258f238e265002e716d1bf5c2 Signed-off-by: Sagar Kamble <skamble@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2573763 Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com> Reviewed-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: Ashish Mhetre <amhetre@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2021-08-11 01:55:46 -07:00
Sagar Kamble	40064ef1ec	gpu: nvgpu: fix ecc counter free ECC counter structures are freed without removing the node from the stats_list. This can lead to invalid access due to dangling pointers. Update the ecc counter free logic to set them to NULL upon free, to remove them from stats_list and free them by validation. Also updated some of the ecc init paths where error was not propa- gated to callers and full ecc counters deallocation was not done. Now, calling unit ecc_free from any context (with counters alloc- ated or not) is harmless as requisite checks are in place. bug 3326612 bug 3345977 Change-Id: I05eb6ed226cff9197ad37776912da9dcb7e0716d Signed-off-by: Sagar Kamble <skamble@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2565264 Tested-by: Ashish Mhetre <amhetre@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com> Reviewed-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2021-08-11 01:55:08 -07:00
mkumbar	de267c034c	gpu: nvgpu: ga10b: Enable PKC support -Enable PKC support in ACR and LS-PMU -Update the PMU f/w version. -Enable PMU support by default. Change-Id: I42bbe1b64ddc6ead9641c97d1ed27a9f4020510a Signed-off-by: mkumbar <mkumbar@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2568609 Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com> Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: Deepak Goyal <dgoyal@nvidia.com> Tested-by: Krishna Reddy <vdumpa@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-08-08 14:23:36 -07:00
Sagar Kamble	0f59efb2cd	gpu: nvgpu: return tpc exceptions error properly It is observed that recovery on receiving the ESR MMU NACK exception does not get triggered as the error returned by tpc level handler is masked. NACK is marked handled but recovery is not done and subsequent fb intr handler does not trigger recovery since NACK is handled. This leaves the HW engines in bad state. Fix the tpc error return logic to trigger recovery during ESR MMU NACK exception. Bug 3318939 Change-Id: I475826f734e4366e853607e1e0338290ee28249b Signed-off-by: Sagar Kamble <skamble@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2564764 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: svcacv <svcacv@nvidia.com> Reviewed-by: Debarshi Dutta <ddutta@nvidia.com> Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2021-07-26 05:13:41 -07:00
Tejal Kudav	b33079d47e	gpu: nvgpu: Move intr data members from MC to CIC Move interrupt specific data-members from common.mc to common.cic Some of these data members like sw_irq_stall_last_handled_cond need To be initialized much earlier during the OS specific init/probe stage. Also, some more members from struct nvgpu_interrupts(like stall_size, stall_lines[]), which will soon be moved to CIC will also need to be initialized early during the OS specific probe stage. However, the chip specific LUT can only be initialized after the hal_init stage where the HALs are all initialized. Split the CIC init to accommodate the above initialization requirements. JIRA NVGPU-6899 Change-Id: I9333db4cde59bb0aa8f6eb9f8472f00369817a5d Signed-off-by: Tejal Kudav <tkudav@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2552535 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-07-19 18:06:28 -07:00
Richard Zhao	a884bd3537	gpu: nvgpu: vgpu: add L2 sector promotion support - added new IVC command for setting L2 sector promotion policy. - init according HAL for ga10b VGPU. Jira GVSCI-10901 Signed-off-by: Richard Zhao <rizhao@nvidia.com> Change-Id: Ibd206d26cbe72dd37f541eb0a8fb177c195567ab Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2560575 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-07-19 16:13:34 -07:00
smadhavan	d9add2db52	gpu: nvgpu: pkc signature verification support This change adds lsf_ucode_desc_wrapper to hold the pkc signature header and corresponding lsf_lsb_header_v2. During blob preparation based on the flag is_sig_pkc, the new header defines will be packed into ls blob and passed to acr. The flag NVGPU_PKC_LS_SIG_ENABLED is also added, which will be set based on the acr core selection. JIRA NVGPU-6365 Change-Id: I74e25d7c0f69d4007893e46006f97f2a607fd11f Signed-off-by: smadhavan <smadhavan@nvidia.com> Signed-off-by: deepak goyal <dgoyal@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2506136 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-07-19 16:04:25 -07:00
Antony Clince Alex	f80dccb543	gpu: nvgpu: report gpc_tpc_mask in physical order At present, there is an inconsistency in the order in which gpc_tpc masks are reported to the userspace. Both gpc and tpc masks are reported using physical-ids. However, the gpc_tpc_masks array is ordered by logical gpc-ids and not physical-ids. This creates a mismatch between the gpc reported as enabled in the gpc_mask and its corresponding gpc_tpc_mask. Introduce field "gpc_tpc_mask_physical" which stores the gpc_tpc_masks in physical order and update NVGPU_GPU_IOCTL_GET_TPC_MASKS to return this field. Bug 200665942 Change-Id: I63aa83414a59676b7e7d36b6deb527e2f3c04cff Signed-off-by: Antony Clince Alex <aalex@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2531114 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-07-19 16:04:01 -07:00
Divya Singhatwaria	842bef7124	gpu: nvgpu: Support GPC and FBP Floorsweeping - Add gops_fbp_fs and gops_gpc_pg struct - Add HALs to write to NV_FUSE_CTRL_OPT_FBP and NV_FUSE_CTRL_OPT_GPC fuses needed for floorsweeping - Add set_fbp_mask and set_gpc_mask to probe FBP and GPC mask respectively during gpu probe - Add sysfs node: fbp_fs_mask and gpc_fs_mask to store FBP and GPC floorsweeping mask sent from userspace - Move the floorsweeping programming early in NVGPU’s GPU init function and then issue a PRI init. JIRA NVGPU-6433 Change-Id: I84764d625c69914c107e1e8c7f29c476c2f64f78 Signed-off-by: Divya Singhatwaria <dsinghatwari@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2499571 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com> Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-07-19 06:17:25 -07:00
Divya Singhatwaria	9f30609550	gpu: nvgpu: Rename TPC powergating mutex Rename tpc_pg_lock to static_pg_lock and have_tpc_pg_lock to have_static_pg_lock as it is used for tpc/gpc/fbp power gating. JIRA NVGPU-6433 Change-Id: I4c56b9710e303ad9e872bad4b5ed9a167acb9dd6 Signed-off-by: Divya Singhatwaria <dsinghatwari@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2537489 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-07-18 02:46:25 -07:00
mkumbar	860027dc8c	gpu: nvgpu: ga10b nvriscv pmu ucode update -PMU ucode from gpmu/ga10b branch. -Perfmon, PG and ACR features are enabled with bin. -update APP_VERSION_NVGPU_NEXT_CORE PMU app version to 30147895 -Add dummy bytes to pmu boot params. -P4 CL #30187066 JIRA NVGPU-6955 Change-Id: I17c51edaa2d4f8cd34e0e43044d62aae52b8ef2a Signed-off-by: mkumbar <mkumbar@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2559075 Reviewed-by: svcacv <svcacv@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com> Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-07-18 00:05:04 -07:00
mkumbar	87984ea344	gpu: nvgpu: support nvriscv debug feature Enable nvriscv debug buffer feature in NVGPU. Debug buffer is a feature to print the debug log from ucode onto console in real time. Debug buffer feature uses the DMEM, queue and SWGEN1 interrupt to share ucode debug data with NVGPU. Ucode writes debug message to DMEM and updates offset in queue to trigger interrupt to NVGPU. NVGPU copies the debug message from DMEM to local buffer to process and print onto console. Debug buffer feature is added under falcon unit and required engine can utilize the feature by providing required param through public functions. Currently GA10B NVRISCV NS/LS PMU ucode has support for this feature and enabled support on NVGPU side by adding required changes, with this feature enabled, it is now possible to see prints in real time. JIRA NVGPU-6959 Change-Id: I9d46020470285b490b6bc876204f62698055b1ec Signed-off-by: mkumbar <mkumbar@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2548951 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com> Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-07-17 12:45:00 -07:00
Richard Zhao	7ce01d3d1d	gpu: nvgpu: vgpu: add size and pgsz_idx when unmap buffer Since the server won't manage mapped_buffer anymore, the client needs to pass size and pgsz_idx to unmap buffers. Jira GVSCI-10901 Signed-off-by: Richard Zhao <rizhao@nvidia.com> Change-Id: Iff076e2cd86d0be71565b43d3993704e51978abe Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2557063 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2021-07-17 06:26:11 -07:00
mkumbar	fcf31d7063	gpu: nvgpu: ga10b: fix GSP/PMU priv error - Fix GSP/PMU registers priv errors which are seen as part of boot sequence. - Couple of GSP/PMU Falcon/NVRISCV registers are allowed to access upon NVRISCV bootrom completion but these registers were needed to configure on legacy chips to bootstrap/configure Falcon. - Add is_falcon2_enabled or NVGPU_PMU_NEXT_CORE_ENABLED check to skip these registers. JIRA NVGPU-7025 Change-Id: I087a477ade6736398dea113f89894a0ff73ae647 Signed-off-by: mkumbar <mkumbar@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2553127 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-07-16 16:44:08 -07:00
Sagar Kadamati	aabc161151	gpu: nvgpu: vgpu: added VAB support for HV Added below IVC commands to support VAB on HV. * TEGRA_VGPU_CMD_FB_VAB_RESERVE - Enable & Configure VAB tracking * TEGRA_VGPU_CMD_FB_VAB_FLUSH_STATE - Dump VAB to user buffer * TEGRA_VGPU_CMD_FB_VAB_RELEASE - Disable VAB tracking Also set HAL and enable VAB for ga10b vgpu. Jira GVSCI-4619 Change-Id: Id7564611c24740ab8613e4baa420ee58fb52759a Signed-off-by: Sagar Kadamati <skadamati@nvidia.com> Signed-off-by: Richard Zhao <rizhao@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2507268 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-07-16 16:40:47 -07:00
Ramesh Mylavarapu	d328bff79e	gpu: nvgpu: gsp NVRISCV load and bootstrap Changes: - This change will only init gsp software state, nvgpu_gsp_bootstrap need to be called. - CONFIG_NVGPU_GSP_SCHEDULER flag is created to compile out the gsp scheduler code when needed. - Created GSP engine reset which is needed when ACR completed execution and need to load gsp fw. NVGPU-6783 Signed-off-by: Ramesh Mylavarapu <rmylavarapu@nvidia.com> Change-Id: I2ce43e512b01df59443559eab621ed39868ad158 Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2554267 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-07-15 17:21:03 -07:00
Vedashree Vidwans	43980bfe06	gpu: nvgpu: remove nvgpu_is_bpmp_running usage BPMP driver doesn't support any API to check whether bpmp is running. Remove use of nvgpu_is_bpmp_running. Bug 200720732 Change-Id: Id266e65d4af598dd056cbdbaa219d0d53b7b3fb3 Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2556448 Reviewed-by: svcacv <svcacv@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com> Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-07-15 10:06:42 -07:00
Divya Singhatwaria	77e3a8c5e4	gpu: nvgpu: ga10b: Add request_idle ce ops Issue observed: - In GA10B, it was observed that after recovery happens ELPG does not engage. - It was because, after CE reset, when nvgpu_submit_twod test was run to engage ELPG, IDLE_FLIPPED_PWR_OFF signal was asserted. - This means that when ELPG was engaged (engine is in PWR_OFF), some idle signal flips (becomes non-idle) and this causes IDLE_SNAP. After IDLE_SNAP is hit, ELPG will not engage further. - After debugging from WAVES, it was observed that: LCE0/LCE1 are not done with the reset sequence. - The state of these LCE is RESET0. A pri request (pri read to NV_CE_PCE_MAP register in CE) is seen that kicks it out of RESET0. After this state, it goes through few states to update some internal states (states RESET1/RESET2/PCE_MAP etc) and then eventually settles down to IDLE state. Solution: - Read ce_pce_map_r register in recovery sequence (after ce reset). - It is observed that when this read is added recovery is complete and post that when nvgpu_submit_two test is executed, ELPG is engaging. - This means that a pri read is needed after CE reset so that it settles to idle state properly and post that ELPG can engage properly. Bug 200734258 Change-Id: I5bb84921ca62a740fde81ffe6c29ccde4ebb341b Signed-off-by: Divya Singhatwaria <dsinghatwari@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2554493 Reviewed-by: svcacv <svcacv@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com> Reviewed-by: Mahantesh Kumbar <mkumbar@nvidia.com> Reviewed-by: Seema Khowala <seemaj@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-07-15 10:05:02 -07:00
Deepak Nibade	2237221a57	gpu: nvgpu: fix CERT EXP34-C errors in common.gr nvgpu_gr_config_get_sm_info() returns NULL if invalid SM id is provided to the API. Since it is possible return NULL, a NULL check is required at all callers. Also, nvgpu_gr_config_get_sm_info() is always called in a loop from 0 to (sm_count - 1) and hence adding an nvgpu_assert() should be sufficient. Change-Id: I0fd92ac354447796c4c7d7237e7bd3b6e5c2682c Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2552409 (cherry picked from commit 4f3789d6563bbfe1be3e25c522ca1eac0d5d2d13) Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2558271 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com> Reviewed-by: V M S Seeta Rama Raju Mudundi <srajum@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-07-13 13:52:24 -07:00
Deepak Nibade	4edf952e3e	gpu: nvgpu: fix rule 5.1 misra violations in common.gr Fix rule 5.1 misra violations in common.gr by renaming below functions : nvgpu_gr_config_get_gpc_tpc_mask_base -> nvgpu_gr_config_get_base_mask_gpc_tpc nvgpu_gr_config_get_gpc_tpc_count_base -> nvgpu_gr_config_get_base_count_gpc_tpc gm20b_ctxsw_prog_set_priv_access_map_config_mode -> gm20b_ctxsw_prog_set_config_mode_priv_access_map gm20b_ctxsw_prog_set_priv_access_map_addr -> gm20b_ctxsw_prog_set_addr_priv_access_map gm20b_gr_falcon_read_fecs_ctxsw_mailbox -> gm20b_gr_falcon_read_mailbox_fecs_ctxsw gm20b_gr_falcon_read_fecs_ctxsw_status0 -> gm20b_gr_falcon_read_status0_fecs_ctxsw gm20b_gr_falcon_read_fecs_ctxsw_status1 -> gm20b_gr_falcon_read_status1_fecs_ctxsw gv11b_gr_intr_get_sm_hww_warp_esr_pc -> gv11b_gr_intr_get_warp_esr_pc_sm_hww gv11b_gr_intr_get_sm_hww_warp_esr -> gv11b_gr_intr_get_warp_esr_sm_hww Jira NVGPU-6779 Change-Id: Icbe23a7b022373785968fc417ee247e2d80cfcc6 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2554521 (cherry picked from commit 1432650774506f2a7e45f70b084f498736d0d0c5) Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2555330 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-07-13 09:20:41 -07:00
Debarshi Dutta	200777b854	gpu: nvgpu: bvec for channel and tsg Below changes are added. 1) Added checks in nvgpu_channel_from_id__func, nvgpu_tsg_check_and_get_from_id 2) Added BVEC tests for nvgpu_channel_open_new, nvgpu_channel_from_id, nvgpu_tsg_check_and_get_from_id, nvgpu_tsg_set_error_notifier 3) Added common function get_random_u32. Jira NVGPU-6905 Change-Id: I374d6f5503dc05e3224213d772a1752d82cbdc91 Signed-off-by: Debarshi Dutta <ddutta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2548304 (cherry picked from commit 39b2529b3e96cfd3cbd3bb020f32ee2cca0ea363) Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2554021 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com> Reviewed-by: Sachin Nikam <snikam@nvidia.com> Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2021-07-07 12:25:50 -07:00
Lakshmanan M	46457ea536	gpu: nvgpu: Fix priv error when MIG+Profiling is alive 1) Currently only one profiler object should be allowed. Enable/Disable/Reset CAU is using whole GR space for both MIG and legacy mode. Need to convert broadcast address to GR specific unicast programming when NvGpu supports more than one profiler object at a time. 2) Used nvgpu_gr_exec_with_err_for_instance() for update_smpc_global_mode(). JIRA NVGPU-5656 Change-Id: If9c2af1459458c031c7cc269e1a89f527b972d7c Signed-off-by: Lakshmanan M <lm@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2554590 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com> Reviewed-by: Dinesh T <dt@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: Sachin Nikam <snikam@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2021-07-07 08:47:08 -07:00
Antony Clince Alex	f51a43b579	gpu: nvgpu: ga10b: fix fetching of FBP_L2 FS mask On all chips except ga10b, the number of ROP, L2 units per FBP were in sync, hence, their FS masks could be represented by a single fuse register NV_FUSE_STATUS_OPT_ROP_L2_FBP. However, on ga10b, the ROP unit was moved out from FBP to GPC and it no longer matches the number of L2 units, so the previous fuse register was broken into two - NV_FUSE_CTRL_OPT_LTC_FBP, NV_FUSE_CTRL_OPT_ROP_GPC. At present, the driver reads the NV_FUSE_CTRL_OPT_ROP_GPC register and reports incorrect L2 mask. Introduce HAL function ga10b_fuse_status_opt_l2_fbp to fix this. In addition, rename fields and functions to exclusively fetch L2 masks, this should help accommadate ga10b and future chips in which L2 and ROP units are not in same. As part of this, the following functions and fields have been renamed. - nvgpu_fbp_get_rop_l2_en_mask => nvgpu_fbp_get_l2_en_mask - fuse.fuse_status_opt_rop_l2_fbp => fuse.fuse_status_opt_l2_fbp - nvgpu_fbp.fbp_rop_l2_en_mask => nvgpu_fbp.fbp_l2_en_mask The HAL ga10b_fuse_status_opt_rop_gpc is removed as rop mask is not used anywhere in the driver nor exposed to userspace. Bug 200737717 Bug 200747149 Change-Id: If40fe7ecd1f47c23f7683369a60d8dd686590ca4 Signed-off-by: Antony Clince Alex <aalex@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2551998 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-07-07 05:48:56 -07:00
Pekka Jylhä-Ollila	8a72068508	Revert "gpu: nvgpu: gsp NVRISCV load and bootstrap" This reverts commit `aef4b80acb`. Change-Id: I47e02bf97e6a3aaa9acdd7f5eec41518b31ee5dc Signed-off-by: Pekka Jylhä-Ollila <pjylhaollila@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2554105 Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>	2021-07-05 06:01:52 -07:00
Ramesh Mylavarapu	aef4b80acb	gpu: nvgpu: gsp NVRISCV load and bootstrap Changes: - This change will only init gsp software state, nvgpu_gsp_bootstrap need to be called. - CONFIG_NVGPU_GSP_SCHEDULER flag is created to compile out the gsp scheduler code when needed. - Created GSP engine reset which is needed when ACR completed execution and need to load gsp fw. NVGPU-6783 Signed-off-by: Ramesh Mylavarapu <rmylavarapu@nvidia.com> Change-Id: I26263ee5bae07de056f676ed0fddc1193b5af82d Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2530438 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com> Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-07-04 13:34:51 -07:00
scottl	cd3ad1ccc7	gpu: nvgpu: fix REMAP android build failure Rework nvgpu_vm_remap_os_buf structure initialization to avoid android/clang build issues with the use of a single pair of {} to initialize certain structures. The os-dependent nvgpu_vm_remap_os_buf_get() routine now does a memset of the structure prior to initializing its contents. Jira NVGPU-6804 Change-Id: I08682c6ab7b8324a605a56ed660dea5bea11d16b Signed-off-by: scottl <scottl@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2553193 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: svcacv <svcacv@nvidia.com> Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2021-07-03 02:05:25 -07:00

... 5 6 7 8 9 ...

3348 Commits