linux-nvgpu

mirror of git://nv-tegra.nvidia.com/linux-nvgpu.git synced 2025-12-23 01:50:07 +03:00

Author	SHA1	Message	Date
Alex Waterman	8c5972ac7f	gpu: nvgpu: Move device de-init call Move the device de-init call to when the gk20a struct is being freed; the device list can live for as long as the gk20a struct does. This will be a problem later, since the current location causes the device structs to get freed and allocatoed over and over. That'll cause gross corruption in the FIFO code when the engine_info struct is replaced with pointers to the device structs. JIRA NVGPU-5421 Change-Id: If4e08ea88dbcae7acd599e3fad29f72ece63b8e0 Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2361269 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Alex Waterman	319520ff57	gpu: nvgpu: Add a new device manager unit This adds a new device management unit in the common code responsible for facilitating the parsing of the GPU top device list and providing that info to other units in nvgpu. The basic idea is to read this list once from HW and store it in a set of lists corresponding to each device type (graphics, LCE, etc). Many of the HALs in top can be deleted and instead implemented using common code parsing the SW representation. Every time the driver queries the device list it does so using a device type and instance ID. This is common code. The HAL is responsible for populating the device list in such a way that the driver can query it in a chip agnostic manner. Also delete some of the unit tests for functions that no longer exist. This code will require new unit tests in time; those should be quite simple to write once unit testing is needed. JIRA NVGPU-5421 Change-Id: Ie41cd255404b90ae0376098a2d6e9f9abdd3f5ea Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2319649 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Tejal Kudav	4dcfbc19de	gpu: nvgpu: Trigger quiesce on spurious FBPA intr In Bug 200588835, the spurious FBPA interrupts are seen on couple of boards. These interrupts were found to be EDC (Error detection and Correction) interrupts which are triggered due to ECC errors. The EDC registers are not exposed to the driver, so the interrupt status register cannot be cleared; resulting in interrupt storm. Also, it was concluded that only bad HW can cause this failure scenario. So, in the ISR for FBPA interrupts, get the GPU into quiesce state as we don't expect the GPU to be in usable state post such unrecoverable errors. Adapt the quiesce code for Linux build too. 1. On Linux, we cannot exit the nvgpu process after quiesce like we do on QNX. So, add nvgpu_disable_irqs() call to quiesce implementation which is done as part of process exit handler on QNX. Masking interrupts which is already done as part of quiesce would be sufficient in most cases, but to be fail-safe disable_irqs too. 3. Also, the IOCTL code looks at g->sw_ready, hence add nvgpu_start_gpu_idle() to set g->sw_ready to false along with setting NVGPU_DRIVER_IS_DYING = true. We expect the nvgpu_sw_quiesce() call to finish before quiesce thread wakes up from 50ms sleep. Hence, critical step like nvgpu_start_gpu_idle() is added to nvgpu_sw_quiesce(), whereas the somewhat redundant disable IRQs call is added to quiesce thread. nvgpu_fifo_quiesce() was called twice by mistake; remove one of the them. Bug 2919899 Bug 200588835 Change-Id: I9beec688c2e1c0d8dfc1327ddf122684576f8684 Signed-off-by: Tejal Kudav <tkudav@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2354537 Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: Automatic_Commit_Validation_User Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Vedashree Vidwans	fc5b45ea83	gpu: nvgpu: move init_ltc_support sequence Currently, ltc fs_state is initialized during ltc init support. However, ltc cbc_param and cbc_param2 registers do not seem to be providing correct data if ltc.init_fs_state is called before fb.init_fs_state. - Create fb.init_fb_support hal to initialize fb. - Trigger init_fb_support before init_ltc_support. Bug 2969956 Bug 2957808 JIRA NVGPU-4666 Change-Id: I54d697d27b9d9c6318c4ef459d215b6f82cd5571 Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2345673 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
tkudav	957b19092f	gpu: nvgpu: Enable Quiesce on all builds Make Recovery and quiesce co-exist to support quiesce state on unrecoverrable errors. Currently, the quiesce code is wrapped under ifndef CONFIG_NVGPU_RECOVERY. Isolate the quiesce code from recovery config, thereby enabling it on all builds. On Linux, the hung_task checker(check_hung_uninterruptible_tasks() in kernel/hung_task.c) complains that quiesce thread is stuck for more than 120 seconds. INFO: task sw-quiesce:1068 blocked for more than 120 seconds. The wait time of more than 120 seconds is expected as quiesce thread will wait until quiesce call is triggered on fatal unrecoverable errors. However, the INFO print upsets the kernel_warning_test(KWT) on Linux builds. To fix the failing KWT, change the quiesce task to interruptible instead of uninterruptible as checker only looks at uninterruptible tasks. Bug 2919899 JIRA NVGPU-5479 Change-Id: Ibd1023506859d8371998b785e881ace52cb5f030 Signed-off-by: tkudav <tkudav@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2342774 Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: Automatic_Commit_Validation_User Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Sami Kiminki	23cda4f4a9	gpu: nvgpu: add PDI for TU104 (Linux) Add reporting for the per-device identifier (PDI) in the Linux GPU characteristics. Implement PDI read for TU104. Bug 2957580 Signed-off-by: Sami Kiminki <skiminki@nvidia.com> Change-Id: I6ac0e4f74378564d82955b431d4c1fd6c0daeb13 Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2346933 Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: Lakshmanan M <lm@nvidia.com> Reviewed-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2020-12-15 14:13:28 -06:00
Konsta Hölttä	dd2fb50a1a	gpu: nvgpu: require deferred cleanup for aggressive sync destroy Aggressive sync destroy is used on some platforms where the amount of syncpoints is limited. It can cause sync objects to get allocated and freed in the submit path and when jobs are cleaned up, so require deferred cleanup. Allocations do not belong to job tracking in a deterministic submit path. Although this has been technically allowed before, deterministic channels have likely not been a priority on those old platforms with aggressive sync destroy set. Update virtualized gp10b platform data to match on a gp10b-vgpu compat string instead of gk20a-vgpu. gk20a (Tegra T124) hasn't been supported for a long time. Delete the aggressive sync destroy field from this platform. It's got enough syncpoints to not dynamically allocate them; having this property set for gp10b-vgpu has likely been a mistake. This is not a completely pure cherry-pick: also extend the gpu characteristics to not advertise full deterministic submit support when aggressive sync destroy is off. This platform flag cannot be adjusted by the user unlike many other flags. Jira NVGPU-4548 Change-Id: I283f546d48b79ac94b943d88e5dce55710858330 Signed-off-by: Konsta Hölttä <kholtta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2322042 (cherry picked from commit b1ba2b997b2174e365bcb0782ef3e67260ff9e57) Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2328411 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Konsta Hölttä	4f80c6b8a9	gpu: nvgpu: add channel_user_syncpt Refactor user managed syncpoints out of the channel sync infrastructure that deals with jobs submitted via the kernel api. The user syncpt only needs to expose the id and gpu address of the reserved syncpoint. None of the rest (fences, priv cmdbufs) is needed for that, so it hasn't been ideal to couple with the user-allocated syncpts. With user syncpts now provided by channel_user_syncpt, remove the user_managed flag from the kernel sync api. This allows moving all the kernel submit sync code to be conditionally compiled in only when needed, and separates the user sync functionality in a more clear way from the rest with a minimal API. [this is squashed with commit 5111caea601a (gpu: nvgpu: guard user syncpt with nvhost config) from https://git-master.nvidia.com/r/c/linux-nvgpu/+/2325009] Jira NVGPU-4548 Change-Id: I99259fc9cbd30bbd478ed86acffcce12768502d3 Signed-off-by: Konsta Hölttä <kholtta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2321768 (cherry picked from commit 1095ad353f5f1cf7ca180d0701bc02a607404f5e) Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2319629 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Konsta Hölttä	d0ffb335dc	gpu: nvgpu: move nvgpu_has_syncpoints nvgpu_has_syncpoints is more general than a channel synchronization related, so move it to nvhost.c from channel_sync.c. Move the declaration from gk20a.h to nvhost.h. As the debugfs knob is Linux related, move it from struct gk20a to struct nvgpu_os_linux. Jira NVGPU-4548 Change-Id: I4236086744993c3daac042f164de30939c01ee77 Signed-off-by: Konsta Hölttä <kholtta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2318814 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Philip Elcan	20a4080be0	gpu: nvgpu: quiesce: stop thread gracefully Previously, nvgpu_sw_quiesce_remove_support() stopped the quiesce thread abruptly with nvgpu_thread_stop(), which could mean the thread was killed while still waiting on the cond. Then when the cond was destroyed, there may be an error since the underlying implementation may think there is still a thread waiting (such as the Posix implementation). Change nvgpu_sw_quiesce_remove_support() to use nvgpu_thread_stop_graceful() and signal the cond in the callback after the thread is marked to be stopped. The quiesce thread will then wake up from the cond wait and see the thread should stop. JIRA NVGPU-4987 Change-Id: I29322d7867acc33a91092016c540e00bb1ae945a Signed-off-by: Philip Elcan <pelcan@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2306024 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Automatic_Commit_Validation_User Reviewed-by: Thomas Fleury <tfleury@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Vaibhav Kachore	bbb63c0a8c	gpu: nvgpu: remove "trace/events/gk20a.h" from QNX build - "include/trace/events/gk20a.h" file was having GPL2 license (which should not used for QNX code). This file was used for compiling linux userspace driver("libnvgpu-drv.so") and was used for unit testing on QNX. - This patch removes stubs in "include/trace/events/gk20a.h" file. (which were used for linux userspace driver.) - For QNX driver, "nvgpu_rmos/trace/events/gk20a.h" was used. This patch moves that file to "include/nvgpu/posix/trace_gk20a.h" and does relevant license change. This same file will be used for linux userspace driver. - This patch also creates a new file "include/nvgpu/trace.h" which selects proper trace file depending on the config. Bug 2802414 Change-Id: Icdfb251e5698073f986753a969e804161af3ecc5 Signed-off-by: Vaibhav Kachore <vkachore@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2286388 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Thomas Fleury	e257f96911	gpu: nvgpu: use cond signal for SW quiesce nvgpu_cond_broadcast error code is not checked in nvgpu_sw_quiesce, which causes a Coverity violation. Use nvgpu_cond_signal instead, since only one thread needs to be woken up. Jira NVGPU-4512 Change-Id: I4f6c3956f792487ba9c1eed09db09fd86ac56ffe Signed-off-by: Thomas Fleury <tfleury@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2286056 Reviewed-by: Automatic_Commit_Validation_User Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Philip Elcan <pelcan@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
ajesh	1041167668	gpu: nvgpu: remove usage of __must_check Remove the usage of __must_check compiler directive. Also rename __user as nvgpu_user and make the required changes for linux and posix builds. Jira NVGPU-4903 Change-Id: If4a18761cca84eb12e0babc0d528666673fca9e8 Signed-off-by: ajesh <akv@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2283404 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2020-12-15 14:10:29 -06:00
Thomas Fleury	d5833d1b8e	gpu: nvgpu: add BUG callbacks to SW quiesce After initializing support for SW quiesce, register callback to be invoked in case of BUG(). The callback will invoke nvgpu_sw_quiesce with "g" parameter. Jira NVGPU-4512 Change-Id: Id6bd73268d832e003cf66534bd0cbaa4b1f32a6c Signed-off-by: Thomas Fleury <tfleury@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2283011 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: Automatic_Commit_Validation_User Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:10:29 -06:00
Thomas Fleury	e0a6000456	gpu: nvgpu: update SW quiesce Update SW quiesce as follows: - After waking up sw_quiesce_thread, nvgpu_sw_quiesce masks interrupts, then disables and preempts runlists without lock. There could be still a concurrent thread that would re-enable the runlist by accident. This is very unlikely and would mean we are not in mission mode anyway. - In sw_quiesce_thread, wait NVGPU_SW_QUIESCE_TIMEOUT_MS, to leave some time for interrupt handler to set error notifier (in case of HW error interrupt). Then disable and preempt runlists, and set error notifier for remaining channels before exiting the process. Also modified nvgpu_can_busy to return false in case SW quiesce is pending. This will make subsequent devctl to fail. Jira NVGPU-4512 Change-Id: I36dd554485f3b9b08f740f352f737ac4baa28746 Signed-off-by: Thomas Fleury <tfleury@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2266389 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Automatic_Commit_Validation_User Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:10:29 -06:00
Scott Long	5ee9a446b5	gpu: nvgpu: misra 12.1 fixes MISRA Advisory Rule states that the precedence of operators within expressions should be made explicit. This change removes the Advisory Rule 12.1 violations from various common units. Jira NVGPU-3178 Change-Id: I4b77238afdb929c81320efa93ac105f9e69af9cd Signed-off-by: Scott Long <scottl@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2277480 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:10:29 -06:00
Sagar Kamble	f3421645b2	gpu: nvgpu: compile out fb and ramin non-fusa code fbpa related functions are not supported on igpu safety. Don't compile them if CONFIG_NVGPU_DGPU is not set. Also compile out fb and ramin hals that are dgpu specific. Update the tests for the same. JIRA NVGPU-4529 Change-Id: I1cd976c3bd17707c0d174a62cf753590512c3a37 Signed-off-by: Sagar Kamble <skamble@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2265402 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:10:29 -06:00
Sagar Kamble	fba516ffae	gpu: nvgpu: enable PMU ECC interrupt early PMU IRQs were not enabled assuming entire functionality for LS PMU. Debugging early init issues of PMU falcon ECC errors triggered during nvgpu power-on will be cumbersome if interrupts are not enabled early. FMEA analysis of the nvgpu init path also requires this interrupt be enabled earlier. Hence, Enable the PMU ECC IRQ early during nvgpu_finalize_poweron. pmu_enable_irq is updated to enable interrupts differently for safety and non-safety. PMU interrupts disabling is moved out of nvgpu_pmu_destroy to nvgpu_prepare_poweroff. Prepared new wrapper API nvgpu_pmu_enable_irq. PMU ECC init and isr mutex init is moved to the beginning of nvgpu_pmu_early_init as for safety, ls pmu code path is disabled. Fixed the pmu_early_init dependent and mc interrupt related unit tests. Update the doxygen for changed functions. JIRA NVGPU-4439 Change-Id: I1a1e792d2ad2cc7a926c8c1456d4d0d6d1f14d1a Signed-off-by: Sagar Kamble <skamble@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2251732 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:10:29 -06:00
Sagar Kamble	13b02091bb	gpu: nvgpu: init fbpa ecc before initializing fbpa hw fbpa ecc counters need to be allocated before enabling the fbpa irqs. Bug 200572453 Change-Id: Ifdf31f342bf86cd905bf57dbee654ac5483ee777 Signed-off-by: Sagar Kamble <skamble@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2263979 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> GVS: Gerrit_Virtual_Submit Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:10:29 -06:00
Sagar Kadamati	42ccc21c62	gpu: nvgpu: fix static violations in common * Updated types and added error checks * Modified GR condition for ctxsw disable count CERT-C error check was added to detect error on integer overflow But below logic couldn't detect first overflow, so updated condition INT_MAX < gr->ctxsw_disable_count --> it became true after overflow So, we didn't detected in first overflow and lead to assert on enable JIRA NVGPU-3400 Change-Id: I6b0265a464f8f19efa7b0761612c6e9ffb3bd2bd Signed-off-by: Sagar Kadamati <skadamati@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2206282 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:10:29 -06:00
Lakshmanan M	a52ee77837	gpu: nvgpu: Add SM diversity gpu characteristic flag To achieve permanent fault coverage, the CTAs launched by each kernel in the mission and redundant contexts must execute on different hardware resources. This feature requires a change in software to make it possible to modify the virtual SM id to TPC mapping across mission and redundant contexts. This CL adds only SM diversity flags which are exposed to its clients through ioctl/devctl interfaces. Actual virtual SM id to TPC mapping implementation will be part of upcoming patch sets. Added NvGpu CFLAGS to identify the safety build "CONFIG_NVGPU_BUILD_CONFIGURATION_IS_SAFETY" JIRA NVGPU-4133 Change-Id: I5a18256780e6726e399e39c1c8d155d2ef07d7bd Signed-off-by: Lakshmanan M <lm@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2250461 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:10:29 -06:00
Sagar Kamble	a8866825d2	gpu: nvgpu: fix the doxygen comments due to ECC and MC refactoring changes nvgpu_mc_log_pending_intrs is debugging related function hence compile out that and related functionality under CONFIG_NVGPU_NON_FUSA. nvgpu_mc_intr_enable is applicable for older chips hence compile out under CONFIG_NVGPU_NON_FUSA and CONFIG_NVGPU_HAL_NON_FUSA. Update BUS, CE, ECC, FIFO, MC, PRIV_RING, GR, LTC, FB, PMU units' doxygen comments based on recent ECC and MC refactoring. JIRA NVGPU-4439 Change-Id: I337318683d6311b9c2b5748f2fb07dff29a6584f Signed-off-by: Sagar Kamble <skamble@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2252853 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:10:29 -06:00
Philip Elcan	b8c25a5a55	gpu: nvgpu: unit: init: add quiesce testing Add testing of quiesce functionality to init unit test. JIRA NVGPU-3981 Change-Id: Idc64179bc8d532bea385e705d96fb4b376d15cd9 Signed-off-by: Philip Elcan <pelcan@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2247154 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:10:29 -06:00
Divya Singhatwaria	84a24c9593	gpu: nvgpu: Remove TPC powergate from safety build - Remove non-safe TPC powergate feature from the safety build by introducing a new flag: CONFIG_NVGPU_TPC_POWERGATE - Move nvgpu_init_power_gate_gr() under same compile time flag. and move HAL function gr_gv11b_powergate_tpc() to tpc_gv11b.c - Also, remove the negative test scenario and usage of tpc_powergate from unit tests JIRA NVGPU-4149 Change-Id: If489482401e94de499e472b16b1bc091b00992e6 Signed-off-by: Divya Singhatwaria <dsinghatwari@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2242323 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:10:29 -06:00
Sagar Kamble	a8c9c800cd	gpu: nvgpu: reorganization of MC interrupts control Previously, unit interrupt enabling/disabling and corresponding MC level interrupt enabling/disabling was not done at the same time. With this change, stall and nonstall interrupt for units are programmed at MC level along with individual unit interrupts. Kept access to MC interrupt registers through mc.intr_lock spinlock. For doing this separated CE and GR interrupt mask functions. mc.intr_enable is only used when there is global interrupt control to be set. Removed mc_gp10b.c as mc_gp10b_intr_enable is now removed. Removed following functions - mc_gv100_intr_enable, mc_gv11b_intr_enable & intr_tu104_enable. Removed intr_pmu_unit_config as we can use the generic unit interrupt control function. JIRA NVGPU-4336 Change-Id: Ibd296d4a60fda6ba930f18f518ee56ab3f9dacad Signed-off-by: Sagar Kamble <skamble@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2196178 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:10:29 -06:00
Sagar Kamble	daf5475f50	gpu: nvgpu: split ecc support per GPU HW unit To enable ecc interrupts early during nvgpu_finalize_poweron, ecc support has to be enabled early. ecc support was being initialized together for GR, LTC, PMU, FB units late in the poweron sequence. Move the ecc init for each unit to respective unit's init functions. And separate out the hal ecc functions from GR ecc unit to respective hal units. JIRA NVGPU-4336 Change-Id: I2c42fb6ba3192dece00be61411c64a56ce16740a Signed-off-by: Sagar Kamble <skamble@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2239153 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:10:29 -06:00
Philip Elcan	8dd18e6f5e	gpu: nvgpu: init: reduce CCM for nvgpu_finalize_poweron Change how nvgpu_finalize_poweron() detects and reports units that do not need initializing. This reduces the Code Complexity of this function. This update reduces the TCC metric from 12 to 9. JIRA NVGPU-4327 Change-Id: I95a5d60bc90fe09358ed47a54eca700bd51d688f Signed-off-by: Philip Elcan <pelcan@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2235339 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> GVS: Gerrit_Virtual_Submit Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:10:29 -06:00
Philip Elcan	06a8fd2ecb	gpu: nvgpu: init: reduce CCM for nvgpu_prepare_poweroff Change how nvgpu_prepare_poweroff() handles multiple errors from the unit poweroff functions. Previously, the last error was returned. It doesn't really matter much which error is returned, just return an error. This update reduces the TCC metric from 11 to 8. JIRA NVGPU-4327 Change-Id: Ic84f7e25eef2657c3d11881f221c26c9b09bed27 Signed-off-by: Philip Elcan <pelcan@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2235338 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> GVS: Gerrit_Virtual_Submit Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:10:29 -06:00
Sagar Kamble	b26acdeb87	gpu: nvgpu: move mc_boot_0 function to hals and rename to get_chip_details This function gets the GPU chip architecture, implementation and revision information by reading the MC boot register, hence it is more suited to be located in HAL files. test_check_gpu_state is now being run after test_hal_init as the gops.mc needs to be initialized for test_check_gpu_state subtest. JIRA NVGPU-2524 Change-Id: I85355af11d3505a9eb4f10a3fe4e6d9b56285047 Signed-off-by: Sagar Kamble <skamble@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2226018 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:10:29 -06:00
Sagar Kamble	2edf3db10a	gpu: nvgpu: move mc gpu_ops out of gk20a.h and add doxygen comments for HALs gk20a.h will include gops_mc.h to contain the mc ops definitions. Add doxygen comments for the HAL functions that are called directly. Also move mc_gp10b_intr_pmu_unit_config to non-fusa HAL file. JIRA NVGPU-2524 Change-Id: I4f326332d7842211b004b372d79fac9fe6ed40e7 Signed-off-by: Sagar Kamble <skamble@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2226017 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:10:29 -06:00
Thomas Fleury	d22fc21a5c	gpu: nvgpu: sync with sw quiesce thread on init Sometimes nvgpu_sw_quiesce_thread does not get a chance to run before common.init unit tests complete. When it finally gets scheduled, related gk20a context is invalid and it crashes in the pthread wrapper. Make sure nvgpu_sw_quiesce_thread has started in nvgpu_sw_quiesce_init_support, to avoid such issues. Bug 2732985 Change-Id: I0a46f271a926b5f0c203b465f69556b2b46d96c5 Signed-off-by: Thomas Fleury <tfleury@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2222560 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:10:29 -06:00
Philip Elcan	c0a7556cb3	gpu: nvgpu: init: make quiesce clean function non-static Make nvgpu_sw_quiesce_remove_support() non-static so that it can be called from the init unit test and cleanup the running thread. Bug 2732985 Change-Id: I01afa9c21967b39f6f9f129590189882bbc963b4 Signed-off-by: Philip Elcan <pelcan@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2224306 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> GVS: Gerrit_Virtual_Submit Reviewed-by: Thomas Fleury <tfleury@nvidia.com> Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:10:29 -06:00
Thomas Fleury	76eb949edb	gpu: nvgpu: add flag to disable SW quiesce Add NVGPU_DISABLE_SW_QUIESCE flag which can be set in unit tests to avoid entering SW quiesce. Jira NVGPU-4089 Change-Id: Ie44b28bf69ff86908a66bb1be8aab3d365945178 Signed-off-by: Thomas Fleury <tfleury@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2217828 Reviewed-by: Automatic_Commit_Validation_User Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> GVS: Gerrit_Virtual_Submit Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:10:29 -06:00
Thomas Fleury	dabc5dd8b8	gpu: nvgpu: use init framework for sw quiesce Add NVGPU_INIT_TABLE_ENTRY for nvgpu_sw_quiesce_init_support. Add g->sw_quiesce_init_done to avoid multiple initializations, and check if deinit is needed in nvgpu_sw_quiesce_remove_support. This avoids issues in common.init unit tests. Jira NVGPU-4089 Change-Id: Ife3aa43d5f1f86899a895e4576e38ecc28a8e371 Signed-off-by: Thomas Fleury <tfleury@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2217779 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:10:29 -06:00
Thomas Fleury	5e688c35f8	gpu: nvgpu: set error notifier in SW quiesce For MMU and PBDMA faults, error notifier needs to be set before entering SW quiesce. Otherwise it ends up with default NVGPU_ERR_NOTIFIER_FIFO_ERROR_IDLE_TIMEOUT. Added nvgpu_rc_mmu_fault to: - call g->ops.fifo.recover when recovery is enabled - set MMU error when recovery is disabled Updated nvgpu_rc_pbdma_fault to set PBDMA error when recovery is disabled as well. Wait for deferred interrupts to complete before actually entering SW quiesce state, to make sure error notifier has been set. Jira NVGPU-4127 Change-Id: Ia84c723e021e397391c6c609d4bb96c06afdcc47 Signed-off-by: Thomas Fleury <tfleury@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2210909 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> GVS: Gerrit_Virtual_Submit Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:10:29 -06:00
Sagar Kamble	6c3c360462	gpu: nvgpu: protect nvgpu power state access using spinlock IRQs can get triggered during nvgpu power-on due to MMU fault, invalid PRIV ring or bus access etc. Handlers for those IRQs can't access the full state related to the IRQ unless nvgpu is fully powered on. In order to let the IRQ handlers know about the nvgpu power-on state gk20a.power_on_state variable has to be protected through spinlock to avoid the deadlock due to usage of earlier power_lock mutex. Further the IRQs need to be disabled on local CPU while updating the power state variable hence use spin_lock_irqsave and spin_unlock_- irqrestore APIs for protecting the access. JIRA NVGPU-1592 Change-Id: If5d1b5e2617ad90a68faa56ff47f62bb3f0b232b Signed-off-by: Sagar Kamble <skamble@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2203860 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:05:52 -06:00
Philip Elcan	d10447e717	gpu: nvgpu: init: make poweron table driven Change nvgpu_finalize_poweron() to call the subunit init functions by using a table. The table is local to nvgpu_finalize_poweron() since the function pointers are stored at runtime in the gk20a struct during HAL init. The table also includes a field for checking an enable flag before calling the init op, if required. The primary motivation for this change is to reduce CCM for nvgpu_finalize_poweron(). This change reduced the TCC metric from 40 to 10. The init unit test is disabled until it can be updated to match the new design. JIRA NVGPU-3980 Change-Id: Ic93d3fdd185cc7feb883c898284e15b251277a8b Signed-off-by: Philip Elcan <pelcan@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2202974 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> GVS: Gerrit_Virtual_Submit Reviewed-by: Vinod Gopalakrishnakurup <vinodg@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:05:52 -06:00
Philip Elcan	065f98f669	gpu: nvgpu: init: add return for all init APIs This adds return values for all init APIs. This make all the init APIs have the same signature. This is a prerequisite to making a table of init functions. JIRA NVGPU-3980 Change-Id: I5b71fd06ad248092af133ffe908e2930acb6d2b0 Signed-off-by: Philip Elcan <pelcan@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2202973 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:05:52 -06:00
Philip Elcan	b3e3509b4a	gpu: nvgpu: bios: update bios init API Remove the second parameter for the nvgpu_bios_sw_init() function so the function only requires the gk20a object. The g->bios was always passed for this parameter. And this makes the API signature match the other init functions in the driver. JIRA NVGPU-3980 Change-Id: Id70e2b6b3a9b2705815591d02730d2d2620771c0 Signed-off-by: Philip Elcan <pelcan@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2202972 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: Automatic_Commit_Validation_User GVS: Gerrit_Virtual_Submit Reviewed-by: Vinod Gopalakrishnakurup <vinodg@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:05:52 -06:00
Philip Elcan	67e1fbca1f	gpu: nvgpu: acr: update acr init APIs Remove the second parameter for the nvgpu_acr_init() and acr_construct_execute() functions so they only require the gk20a object. The g->acr was always passed for this parameter. And this makes the API signature match the other init functions in the driver. JIRA NVGPU-3980 Change-Id: I8c513b1dcb9c6083f0f3e2f7b6f31dc78c5c8200 Signed-off-by: Philip Elcan <pelcan@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2202971 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> GVS: Gerrit_Virtual_Submit Reviewed-by: Vinod Gopalakrishnakurup <vinodg@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:05:52 -06:00
Philip Elcan	a877192641	gpu: nvgpu: sec2: update sec2 API Remove the second parameter for the nvgpu_init_sec2_setup_sw() function so the function only requires the gk20a object. The g->sec2 was always passed for this parameter. And this makes the API signature match the other init functions in the driver. JIRA NVGPU-3980 Change-Id: I22f526d961da44da64d563f5f3136c62cf9f4adf Signed-off-by: Philip Elcan <pelcan@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2202970 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: Automatic_Commit_Validation_User GVS: Gerrit_Virtual_Submit Reviewed-by: Vinod Gopalakrishnakurup <vinodg@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:05:52 -06:00
Philip Elcan	19dd64930d	gpu: nvgpu: pmu: move rtos init to func ptr This moves the nvgpu_pmu_rtos_init() to a HAL function pointer which makes it consistent with the other init APIs. JIRA NVGPU-3980 Change-Id: I562e264deaec76f2a45026a07f24d35b291b1930 Signed-off-by: Philip Elcan <pelcan@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2202969 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: Automatic_Commit_Validation_User GVS: Gerrit_Virtual_Submit Reviewed-by: Vinod Gopalakrishnakurup <vinodg@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:05:52 -06:00
Philip Elcan	b53ec4731e	gpu: nvgpu: pmu: update init APIs Remove the second parameter for the pmu_early_init() and pmu_init() functions so they only require the gk20a object. The g->pmu was always passed for this parameter. And this makes the API signature match the other init functions in the driver. JIRA NVGPU-3980 Change-Id: Iae9361a5f14bc5c1d02f4ddb6583f30b71b22d59 Signed-off-by: Philip Elcan <pelcan@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2202968 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> GVS: Gerrit_Virtual_Submit Reviewed-by: Vinod Gopalakrishnakurup <vinodg@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:05:52 -06:00
Philip Elcan	78c1f328bb	gpu: nvgpu: init: consolidate falcon init/free calls Combine all of the falcon_sw_init() calls into nvgpu_falcons_sw_init() and combine all of the falcon_sw_free() calls into nvgpu_falcons_sw_free(). JIRA NVGPU-3980 Change-Id: I23008a19be95a8cf4f73e2a18c414bce8879e8a2 Signed-off-by: Philip Elcan <pelcan@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2202967 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> GVS: Gerrit_Virtual_Submit Reviewed-by: Vinod Gopalakrishnakurup <vinodg@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:05:52 -06:00
Philip Elcan	b21da03432	gpu: nvgpu: clk: remove unused HAL The clk HAL disable_slowboot() is not set for any platform and is thus unused, so remove it JIRA NVGPU-3980 Change-Id: Idb61ae35e85d35e852f18d22c076a1e16e723e88 Signed-off-by: Philip Elcan <pelcan@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2196421 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: Automatic_Commit_Validation_User GVS: Gerrit_Virtual_Submit Reviewed-by: Vinod Gopalakrishnakurup <vinodg@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:05:52 -06:00
Deepak Nibade	1d5698cf6a	gpu: nvgpu: set GR tick frequency to max GR tick frequency needs to be set to MAX value for profiler use cases for gp10b/gv11b/tu104 chips. Add new HAL g->ops.ptimer.config_gr_tick_freq() that configures GR tick frequency to MAX value and call this HAL in GPU poweron path. This support is not needed in safety build, so compile everything only if CONFIG_NVGPU_DEBUGGER is enabled Bug 200289214 Change-Id: Id8378540cc67ca0041b56990f8676e3a105403a5 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2195163 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:05:52 -06:00
Mahantesh Kumbar	5eeb751d58	gpu: nvgpu: Move PMU RTOS functions out from pmu.c Moved PMU RTOS functions to new file from pmu.c to make clear separation of PMU unit init & PMU RTOS init. JIRA NVGPU-2457 Change-Id: I694bf561517b4b55f9396be8e132dc0da5cb29e6 Signed-off-by: Mahantesh Kumbar <mkumbar@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2199543 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:05:52 -06:00
Thomas Fleury	62d7c5641f	gpu: nvgpu: rename recovery capability Rename "recovery" capability to more specific "fault recovery": - NVGPU_SUPPORT_FAULT_RECOVERY in UAPI - NVGPU_GPU_FLAGS_SUPPORT_FAULT_RECOVERY in enabled flags. Jira NVGPU-3896 Change-Id: I2a60601a7c73ce15e08b65f377e8a27a526d5eb2 Signed-off-by: Thomas Fleury <tfleury@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2197427 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> GVS: Gerrit_Virtual_Submit Reviewed-by: Sami Kiminki <skiminki@nvidia.com> Reviewed-by: Vinod Gopalakrishnakurup <vinodg@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:05:52 -06:00
Thomas Fleury	9f0dff4a03	gpu: nvgpu: add recovery capability Add NVGPU_SUPPORT_RECOVERY and NVGPU_FLAGS_GPU_SUPPORT_RECOVERY, to indicate if recovery is supported. When true, an engine reset is performed in order to recover from an uncorrectable error. When false, the driver enters SW quiesce state. Jira NVGPU-3896 Change-Id: Iea809c13a844641e31ce6306fbd1630ef622bfe9 Signed-off-by: Thomas Fleury <tfleury@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2175447 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: Philip Elcan <pelcan@nvidia.com> Reviewed-by: Automatic_Commit_Validation_User GVS: Gerrit_Virtual_Submit Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:01:38 -06:00
Thomas Fleury	95bb19827e	gpu: nvgpu: add sw quiesce For safety build, nvgpu driver should enter SW quiesce state in case an uncorrectable error has occurred. In this state, any activity on the GPU should be prevented, without powering off the GPU. Also, a minimal set of operations should be used to enter SW quiesce state. Entering SW quiesce state does the following: - set sw_quiesce_pending: when this flag is set, interrupt handlers exit after masking interrupts. This should help mitigate an interrupt storm. - wake up thread to complete quiescing. The thread performs the following: - set NVGPU_DRIVER_IS_DYING to prevent allocation of new resources - disable interrupts - disable fifo scheduling - preempt all runlists - set error notifier for all active channels Note: for channels with usermode submit enabled, userspace can still ring doorbell, but this will not trigger any work on engines since fifo scheduling is disabled. Jira NVGPU-3493 Change-Id: I639a32da754d8833f54dcec1fa23135721d8d89a Signed-off-by: Thomas Fleury <tfleury@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/2172391 Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2019-08-27 10:37:21 -07:00

1 2 3

131 Commits