linux-nvgpu

mirror of git://nv-tegra.nvidia.com/linux-nvgpu.git synced 2025-12-23 09:57:08 +03:00

Author	SHA1	Message	Date
Alex Waterman	77c0b9ffdc	gpu: nvgpu: Update runlist_update() to take runlist ptr Update the nvgpu_runlist_update_for_channel() function: - Rename it to nvgpu_runlist_update() - Have it take a pointer to the runlist to update instead of a runlist ID. For the most part this makes the code better but there's a few places where it's worse (for now). This starts the slow and painful process of moving away from the non-runlist code using runlist IDs in many places it should not. Most of this patch is just fixing compilation problems with the minor header updates. JIRA NVGPU-6425 Change-Id: Id9885fe655d1d750625a1c8aceda9e67a2cbdb7a Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2470304 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2021-01-29 09:51:44 -08:00
Alex Waterman	11d3785faf	gpu: nvgpu: Rename struct nvgpu_runlist_info, fields in fifo Rename struct nvgpu_runlist_info to struct nvgpu_runlist; the info is not necessary. struct nvgpu_runlist is soon to be a first class object among the nvgpu object model. Also rename the fields runlist_info and active_runlist_info to simply runlists and active_runlists respectively. Again the info text is just not necessary and somewhat misleading. These structs _are_ the runlist representations in SW; they are not merely informational. Also add an rl_dbg() macro to print debug info specific to runlist management and some debug prints specifying the runlist topology for the running chip. Change-Id: Id9fcbdd1a7227cb5f8c75cca4abbff94fe048e49 Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2470303 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-01-20 21:56:33 -08:00
Sagar Kamble	cf287a4ef5	gpu: nvgpu: retry tsg unbind if NEXT is set The NEXT bit can remain set for the channel if timeslice expires before scheduler clears it. Due to this nvgpu fails TSG unbind and in turn nvrm_gpu fails channel close. In this case, checking the channel hw state after some time can help see NEXT bit cleared by scheduler. Reenable the tsg and return -EAGAIN to nvrm_gpu for it to retry again. Bug 3144960 Change-Id: I35f417f02270e371a4e632986b73a00f8a4f921a Signed-off-by: Sagar Kamble <skamble@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2468391 Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2021-01-18 23:11:57 -08:00
Antony Clince Alex	c36af00e55	gpu: nvgpu: fix lookup of engine_id from mmu_fault_id The function "nvgpu_engine_mmu_fault_id_to_eng_id_and_veid" updates only the veid field and leaves the engine_id as invalid. This can cause the recovery to be skipped in certain instances of MMUFAULT; For example, the MMUFAULT when a unbind is done on a channel which is currently active on the engine. In this case, the ch_id associated with the fault is -1 and the function "gv11b_mm_mmu_fault_handle_non_replayable" will not set the rc_type correctly causing recovery to be skipped and leaving the engine in a bad state. Bug 3163660 Change-Id: Ic99c47771a4002c153ac77ab0473b11d01cfd54a Signed-off-by: Antony Clince Alex <aalex@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2457259 Reviewed-by: Rajesh Devaraj <rdevaraj@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2020-12-15 14:13:48 -06:00
Sagar Kamble	4d101a6303	gpu: nvgpu: do tsg unbind hw state check only for multi-channel TSG Host scheduler might be confused if more than one channels are present in TSG and one of the unbound channel has NEXT set. This is not so much of an issue if there is single channel in the TSG. So don't fail unbind in that case. ctx_reload and engine_faulted check can also be skipped for single channel TSG. Bug 3144960 Change-Id: I85eb9025ea53706ce8fda6d9b4bcf6a15a300d17 Signed-off-by: Sagar Kamble <skamble@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2442970 Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2020-12-15 14:13:48 -06:00
Lakshmanan M	883c12529a	gpu: nvgpu: Add multi GR reset support for MIG * Added multi GR reset/recovery support for MIG. * Added a api to get the gr engine id using gr instance id. JIRA NVGPU-5650 JIRA NVGPU-5653 Change-Id: I12ece75a4c33f0944f404121b54879e814dda6df Signed-off-by: Lakshmanan M <lm@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2443644 Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: Automatic_Commit_Validation_User Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Rajesh Devaraj <rdevaraj@nvidia.com> Reviewed-by: Dinesh T <dt@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2020-12-15 14:13:48 -06:00
Sagar Kamble	842dec2470	gpu: nvgpu: unrailgate gpu during tsg release There is race condition between nvgpu runtime suspend and l2_flush or tlb_invalidate that happens as part of gmmu_unmap done during nvgpu_gr_ctx_free. Since l2_flush and tlb_invalidate does not do pm_runtime_get_sync, the suspend in progress can lead to registers getting locked and then l2_flush or tlb_invalidate can access the registers when registers are locked (GPU is railgated). Bug 3132891 Signed-off-by: Sagar Kamble <skamble@nvidia.com> Change-Id: If1696a9e9d3d9bc5fd55dd754be90a81114a75cc Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2425680 Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2020-12-15 14:13:28 -06:00
Vedashree Vidwans	c0b9ae2f17	gpu: nvgpu: enable gr_reset in recovery on sim platform HALT_PIPELINE method is supported on nvgpu-next simulation platform. Send HALT_PIPELINE followed by gr reset during recovery for all types of platforms including simulation platform. Bug 3109773 Change-Id: Ib830075bb9414fa1765c762a652e63cddbe6a141 Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2406719 Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: Seshendra Gadagottu <sgadagottu@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Antony Clince Alex	c36752fe3d	gpu: nvgpu: sim: make ring buffer independent of PAGE_SIZE The simulator ring buffer DMA interface supports buffers of the following sizes: 4, 8, 12 and 16K. At present, it is configured to 4K and it happens to match with the kernel PAGE_SIZE, which is used to wrap back the GET/PUT pointers once 4K is reached. However, this is not always true; for instance, take 64K pages. Hence, replace PAGE_SIZE with SIM_BFR_SIZE. Introduce macro NVGPU_CPU_PAGE_SIZE which aliases to PAGE_SIZE and replace latter with former. Bug 200658101 Jira NVGPU-6018 Change-Id: I83cc62b87291734015c51f3e5a98173549e065de Signed-off-by: Antony Clince Alex <aalex@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2420728 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Lakshmanan M	c0e2dc5b74	gpu: nvgpu: Add subctx programming for MIG This CL covers the following code changes, 1) Added api to init inst_block for more than one subctxs. 2) Added logic to limit the subctx bind based on max. VEID count allocated to a gr instance. 3) Renamed nvgpu_grmgr_get_gr_runlist_id. JIRA NVGPU-5647 Change-Id: Ifec8164a9e5f46fbd0538c3dd50e19ee63667a54 Signed-off-by: Lakshmanan M <lm@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2418463 Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Rajesh Devaraj <rdevaraj@nvidia.com> Reviewed-by: Dinesh T <dt@nvidia.com> Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2020-12-15 14:13:28 -06:00
Peter Daifuku	a6e5c54882	gpu: nvgpu: fix resource leaks when cleaning up In channel_free(), destroy notifier_wq and semaphore_wq In nvgpu_vm_remove(), destroy the update_gmmu_lock mutex Bug 200647668 Change-Id: Icbb4e626c0fa9fa2dcf1430b3112b51829b00e4f Signed-off-by: Peter Daifuku <pdaifuku@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2414820 (cherry picked from commit `4f66942afa`) Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2416311 Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: Automatic_Commit_Validation_User Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Lakshmanan M	b49c892f81	gpu: nvgpu: Add multi GR reset support Added multi GR reset support for MIG. JIRA NVGPU-5653 Change-Id: I36c0473d4ba0e5bdd2dc07204b7c516ce9860b5e Signed-off-by: Lakshmanan M <lm@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2416069 Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Vedashree Vidwans	b2ff527d15	gpu: nvgpu: add channel.clear gops - Add channel.clear gops for nvgpu-next. - Do not return error if hw_state.next is set and channel.clear is not NULL. Bug 200650602 Bug 3109773 Change-Id: I4252691e4557351899e6fb9d85934e2d72517a36 Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2414211 Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Seema Khowala <seemaj@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Vedashree Vidwans	e0dd79cd43	gpu: nvgpu: rearch mc reset and enable hals Remove current mc hals - mc.reset() - mc.enable() - mc.disable() - mc.reset_mask() - mc.reset_engine() - mc.reset_engine_enable() Add new mc hals - mc.enable_units(g, units, enable) > enable/disable given unit(s) - mc.enable_dev(g, dev, enable) > enable/disable engine represented by given device pointer - mc.enable_devtype(g, devtype) > enable/disable all engines of given devtype Move common mc intr functions to common/mc/mc_intr.c. Add below common mc functions - nvgpu_mc_reset_units(g, units) > reset given logical OR of nvgpu unit bitmap - nvgpu_mc_reset_dev(g, dev) > reset given single engine via dev > if engine is graphics, reset gpcs for nvgpu_next - nvgpu_mc_reset_devtype(g, devtype) > reset all engines of given devtype > if devtype is graphics, reset gpcs for nvgpu_next Bug 200648985 Bug 3109773 Change-Id: Idc67a14a0a7cde83de44fbfbec13007fead3ed5c Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2408523 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Alex Waterman	2b48aa5b0c	gpu: nvgpu: Add device for_each macro Add a macro to iterate over a device list; it is just a wrapper to the nvgpu_list_for_each() macro. It lets code iterate over the list of detected devices without being aware of the underlying instance IDs. This also removes the need to do a separate nvgpu_device_get() and subsequent NULL checking. This will reduce overhead for unit testing! Change-Id: If41dbee30a743d29ab62ce930a819160265b9351 Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2404914 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Tejal Kudav	b269aae9f2	gpu: nvgpu: correct usage of pbdma_id The pbdma_id field stored in struct nvgpu_device is bitmask and not bit position as implied by the name. This field is incorrectly used as bit position in nvgpu_engine_disable_activity(), causing PRI timeout errors during iGPU and dGPU shutdown path. PRI timeout errors- nvgpu: 17000000.gv11b gk20a_ptimer_isr:54 [ERR] PRI timeout: ADR 0x0000308c READ DATA 0x00000000 Here the pbdma_id stored in struct nvgpu_device for runlist_0 on gv11b is 0x3(bitmask corresponding to PBDMA_0 and PBDMA_1). nvgpu_engine_disable_activity() interprets this as PBDMA_3 and adds incorrect offset to access PBDMA_STATUS register, causing PRI error. Modify nvgpu_engine_disable_activity() to treat pbdma_id as bitmask and loop through set bits. JIRA NVGPU-5991 Change-Id: Iaffb974cddaa375a329e70f3b5903b9ef2a222c4 Signed-off-by: Tejal Kudav <tkudav@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2397954 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Deepak Nibade	717921a274	gpu: nvgpu: return intr mask of all GR engine instances nvgpu_gr_engine_interrupt_mask() earlier returned mask of all GR engine instance interrupts. During device refactor series, this got changed to return interrupt of only first instance. Change this again to return interrupt mask of all the GR engine instances since common.mc unit does not yet support APIs to enable interrupt of individual GR instance. Update nvgpu_gr_get_syspipe_id() API to take gr_instance_id as parameter instead of struct nvgpu_gr pointer. Definition of struct nvgpu_gr is not available outside of common.gr unit. Jira NVGPU-5648 Change-Id: I5320d1515eea6054150dc14706a16475bd650da7 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2405409 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Lakshmanan M <lm@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2020-12-15 14:13:28 -06:00
Debarshi Dutta	38ce6fa717	gpu: nvgpu: change unnamed structs to named structs Following changes are made in this patch. 1) Change unnamed structs within gpu_ops to named structs with the prefix gops_. 2) Each named struct gops_ are moved into a separate gops specific file under include/nvgpu/gops/ 3) struct gpu_ops is moved into a separate file include/nvgpu/gpu_ops.h and all other dependent struct gops_ are included in this header. 4) Direct references to include/nvgpu/gops are removed from files as its enough to include gk20a.h. Change-Id: Ieb22cb853be567e3bef14f5f8a04674eebd902ea Signed-off-by: Debarshi Dutta <ddutta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2398776 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Rajesh Devaraj <rdevaraj@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2020-12-15 14:13:28 -06:00
Konsta Hölttä	611ad23bde	gpu: nvgpu: move channel worker and wdt Continue making the incoherent channel functionality more structured by moving the worker thread business to one file and the channel watchdog logic to another. This is channel-internal restructuring; the interface to other units does not change. The watchdog logic is called from the worker thread and as such these are rather tightly coupled but it's possible to have the thread and not the watchdog. Jira NVGPU-5582 Change-Id: I70f334dd15c9aca0eed75393b99e2f080d133015 Signed-off-by: Konsta Hölttä <kholtta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2398921 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Konsta Hölttä	b062081c52	gpu: nvgpu: add function for prealloc job release The last steps to finish job cleanup for both deterministic and nondeterministic submits are the same: put away preallocated job resources that the job had consumed. Avoid duplicated code by moving this code to a function that's shared with both paths. Jira NVGPU-5998 Change-Id: Ic278b0bc8f0f05895f5c24340a60c1ce3eade0b3 Signed-off-by: Konsta Hölttä <kholtta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2401468 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Konsta Hölttä	d4fb476e70	gpu: nvgpu: remove joblist cleanup lock The joblist cleanup lock exists to synchronize the submit job cleanup and the abort cleanup that may run in separate threads concurrently. This concurrency is no problem anymore, so delete the lock. The lock was added in commit `f1072a28be` ("gpu: nvgpu: add worker for watchdog and job cleanup") when the abort cleanup still went through each job in the pending list and released their semaphores; ordinary job cleanup from the worker thread also accesses the jobs. Commit `d20a501dcb` ("gpu: nvgpu: simplify job semaphore release in abort") deleted the entire loop because the semaphore, if any, is now reset in one go (via the "set_min_eq_max" ch sync op), but the lock stayed. With aggressive sync destroy enabled the sync object under the cleanup lock can still disappear if the job cleanup runs, but that's already guarded with the sync lock. Jira NVGPU-5998 Change-Id: I6554eb2065b003c6fdf83f66f97067b59aa272f5 Signed-off-by: Konsta Hölttä <kholtta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2401467 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Konsta Hölttä	e8201d6ce3	gpu: nvgpu: decouple channel watchdog dependencies The channel code needs the watchdog code and vice versa. Cut this circular dependency with a few simplifications so that the watchdog wouldn't depend on so much. When calling watchdog APIs that cause stores or comparisons of channel progress, provide a snapshot of the current progress instead of a whole channel pointer. struct nvgpu_channel_wdt_state is added as an interface for this to track gp_get and pb_get. When periodically checking the watchdog state, make the channel code ask whether a hang has been detected and abort the channel from within channel code instead of asking the watchdog to abort the channel. The debug dump verbosity flag is also moved back to the channel data. Move the functionality to restart all channels' watchdogs to channel code from watchdog code. Looping over active channels is not a good feature for the watchdog; it's better for the channel handling to just use the watchdog as a tracking tool. Move a few unserviceable checks up in the stack to the callers of the wdt code. They're a kludge but this will do for now and demonstrates what needs to be eventually fixed. This does not leave much code in the watchdog unit. Now the purpose of the watchdog is to only isolate the logic to couple a timer and progress snapshots with careful locking to start and stop the tracking. Jira NVGPU-5582 Change-Id: I7c728542ff30d88b1414500210be3fbaf61e6e8a Signed-off-by: Konsta Hölttä <kholtta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2369820 Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Alex Waterman	fba96fdc09	gpu: nvgpu: Replace nvgpu_engine_info with nvgpu_device Delete the struct nvgpu_engine_info as it's essentially identical to struct nvgpu_device. Duplicating data structures is not ideal as it's terribly confusing what does what. Update all uses of nvgpu_engine_info to use struct nvgpu_device. This is often a fairly straight forward replacement. Couple of places though where things got interesting: - The enum_type that engine_info uses is defined in engines.h and has a bit of SW abstraction - in particular the GRCE type. The only place this seemed to be actually relevant (the IOCTL providing device info to userspace) the GRCE engines can be worked out by comparing runlist ID. - Addition of masks based on intr_id and reset_id; those can be computed easily enough using BIT32() but this is an area that could be improved on. This reaches into a lot of extraneous code that traverses the fifo active engines list and dramtically simplifies this. Now, instead of having to go through a table of engine IDs that point to the list of all host engines, the active engine list is just a list of pointers to valid engines. It's now trivial to do a for-all-active-engines type loop. This could even be turned into a generic macro or otherwise abstracted in the future. JIRA NVGPU-5421 Change-Id: I3a810deb55a7dd8c09836fd2dae85d3e28eb23cf Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2319895 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Konsta Hölttä	a04525ece8	gpu: nvgpu: require deterministic for usermode Deterministic mode has always been a requirement for usermode submit; enforce it in the setup_bind path. Adjust tests to use the flag. QNX uses NVGPU_SETUP_BIND_FLAGS_SUPPORT_DETERMINISTIC only if CONFIG_NVGPU_IOCTL_NON_FUSA is set, so guard the check with that for now. Jira NVGPU-5582 Change-Id: Idedd01a3a24420b45195a472e8ca5c9f32f4ef46 Signed-off-by: Konsta Hölttä <kholtta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2369818 Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Konsta Hölttä	3245d48736	gpu: nvgpu: forbid watchdog on deterministic mode The channel watchdog feature has always been a blocker for deterministic submits. Instead of waiting for a submit call to happen just to reject it, nack already the setup_bind ioctl if deterministic is set and the watchdog has not been disabled before. This can avoid confusion with usermode submits where leaving the watchdog set would have worked but the watchdog would never see updates from userspace. Disallow also any other watchdog adjustments than disabling it when the channel has been set up for deterministic mode. Jira NVGPU-5582 Change-Id: I0ba4584bbc035197d952e5b562197c36aa483867 Signed-off-by: Konsta Hölttä <kholtta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2369819 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Konsta Hölttä	91515d1b47	gpu: nvgpu: unify joblist api names Add the nvgpu_ prefix to the peek, add and delete functions to make them consistent with the rest of the joblist functions. Rename the "prealloc resources" alloc and free functions to joblist init and deinit; there are many other resources that are also preallocated, and these handle just the job tracking list. NVGPU-5772 Change-Id: Ie5e6ba4f4b17465d626f36a0239bddb03a0a2fcb Signed-off-by: Konsta Hölttä <kholtta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2397395 Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: Automatic_Commit_Validation_User Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Konsta Hölttä	345eae584d	gpu: nvgpu: remove nvgpu_channel_joblist_is_empty channel_joblist_peek() returns NULL if the list is empty. nvgpu_channel_joblist_is_empty() has been used only together with that function; remove it and check against NULL to see whether there are jobs in flight. This removes some duplication, simplifies the call sites slightly, and gets rid of a Coverity nag about a possible NULL pointer from peek that really isn't (when the emptiness was already checked). Jira NVGPU-5772 Change-Id: I814e9c510d99b88e59539359992fb44d4e7ce2ea Signed-off-by: Konsta Hölttä <kholtta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2397394 Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Konsta Hölttä	27a64f2e23	gpu: nvgpu: enforce priv usage of fence Add a "priv" fence struct type and use that in the fence type to emphasize that the inner data is not meant to be seen. The fence unit needs to have an outside-visible fence type so that fences can be allocated directly as a struct field in job metadata for performance and simplicity, so hiding the type entirely wouldn't work. A couple of places need to touch the priv data directly in channel code. Those can be thought to be technically fence unit's code scattered outside the fence files, but they mean that the architecture is not perfect yet. Jira NVGPU-5773 Change-Id: Ifa3c95757ae31eef0e32f2605293e23e210b065f Signed-off-by: Konsta Hölttä <kholtta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2395071 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Konsta Hölttä	baaf25f8b0	gpu: nvgpu: decouple async and immediate cleanup Split up nvgpu_channel_clean_up_jobs() on the clean_all parameter so that there's one version for the asynchronous ("deferred") cleanup and another for the synchronous deterministic cleanup that occurs in the submit path. Forking another version like this adds some repetition, but this lets us look at both versions clearly in order to come up with a coherent plan. For example, it might be feasible to have the light cleanup of pooled items in also the nondeterministic path, and deferring heavy cleanup to another, entirely separated job queue. Jira NVGPU-5493 Change-Id: I5423fd474e5b8f7b273383f12302126f47076bd3 Signed-off-by: Konsta Hölttä <kholtta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2346065 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Konsta Hölttä	7aa852b31c	gpu: nvgpu: emphasize fence syncpt/sema interfaces Sometimes the syncpt-based fences are not used, and often the sema-based fences are not used. Move code around to new files to make it easier to see what happens and to allow leaving code out of the build easily. Start using nvgpu_fence_ops::free again and move the fence release there. The syncpt data is not refcounted, so it doesn't have this. Jira NVGPU-5773 Change-Id: I991f91886c59cf2c2fbfd2e75305ba512b5d7371 Signed-off-by: Konsta Hölttä <kholtta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2395069 Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Konsta Hölttä	e6c0d84683	gpu: nvgpu: allocate fences in job structs As the submit job metadata has been simplified, the fence pool for job tracking fences is now just complex code for very simple purposes, so delete it. It's enough to hold the fence memory in the job struct itself instead of having separately allocated objects with different lifetimes. Each channel is using preallocated job arrays based on the prespecified inflight job count. The fences are used for tracking job completion, and a new job cannot be submitted before a previous wait has completed. This means that even with a ringbuffer with space for only one job, the previous job memory cannot get reclaimed by a new submit because the submits are ordered. Jira NVGPU-5773 Change-Id: I0c777df700aa7cfda6f971efa47aa72c5462b53a Signed-off-by: Konsta Hölttä <kholtta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2392704 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Deepak Nibade	969b901999	gpu: nvgpu: create device/context profiler dev nodes Create new dev nodes for device and context profilers. Example of dev nodes on iGPU /dev/nvhost-prof-dev-gpu - device scope profiler /dev/nvhost-prof-ctx-gpu - context scope profiler Add below APIs to open/close above dev nodes : nvgpu_prof_dev_fops_open() nvgpu_prof_ctx_fops_open() nvgpu_prof_fops_release() Add common API nvgpu_prof_fops_ioctl() to handle IOCTL call on these dev nodes. Add IOCTL NVGPU_PROFILER_IOCTL_BIND_CONTEXT to bind the TSG to profiler objects. Add nvgpu_tsg_get_from_file() to retrieve TSG struct pointer from file descriptor. Also store profiler object pointer into TSG struct. Enable NVGPU_SUPPORT_PROFILER_V2_DEVICE capability on gv11b and tu104. Note that this is not yet enabled for vGPU. Keep NVGPU_SUPPORT_PROFILER_V2_CONTEXT capabiity disabled since this will take longer to support. Add new IOCTL NVGPU_PROFILER_IOCTL_UNBIND_CONTEXT so that userspace can explicitly unbind the context and release the resources before closing the profiler descriptor. Add context_init flag to profiler object for book keeping. Bug 2510974 Jira NVGPU-5360 Change-Id: Ie07e0cfd5a9da9d80008f79c955c7ef93b4bc60f Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2384354 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Alex Waterman	d0714b40c1	gpu: nvgpu: Add engine reset profiling This is a key part of the fifo recovery sequence. JIRA NVGPU-5606 Change-Id: I8807884394834b912f25d7c535ee22f547988b2d Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2382590 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: Tejal Kudav <tkudav@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2020-12-15 14:13:28 -06:00
Alex Waterman	1bcdc306a0	gpu: nvgpu: Add gv11b recovery profiling Add some basic profiling to the gv11b recovery sequence. This captures the high level events. Subsequent patches start to dig into the subsections in more detail. JIRA NVGPU-5606 Change-Id: I488a448ca1cbf961651588e24685e2a5b4420c44 Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2368302 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Tejal Kudav	ab2b0b5949	gpu: nvgpu: Set unserviceable flag early during RC During recovery, we set ch->unserviceable at the end after we preempt the TSG and reset the engines. It might be too late and user-space might submit more work to the broken channel which is not desirable. Move setting this unserviceable flag right at the start of recovery sequence. Another thread doing a submit can still read the unserviceable flag just before it is set here, leaving that submit stuck if recovery completes before the submit thread advances enough to set up a post fence visible for other threads. This could be fixed with a big lock or with a double check at the end of the submit code after the job data has been made visible. We still release the fences, semaphore and error notifier wait queues at the end; so user-space would not trigger channel unbind while channel is being recovered. Also, change the handle_mmu_fault APIs to return void as the debug_dump return value is not used in any of the caller APIs. JIRA NVGPU-5843 Change-Id: Ib42c2816dd1dca542e4f630805411cab75fad90e Signed-off-by: Tejal Kudav <tkudav@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2385256 Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Dinesh	d0087f3ad8	gpu: nvgpu: Support for runlist_max_supported nvgpu_next needs support for max_runlist_supported by litter value. So the function is changed to support. JIRA NVGPU-5534 Change-Id: I097f6343295049532c46904316314dc82092a46b Signed-off-by: Dinesh <dt@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2382882 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Tejal Kudav	881a6f35be	gpu: nvgpu: Trigger quiesce on PBDMA preempt fail During recovery, we preempt the faulty TSG from PBDMA and engines. If the TSG preempt on PBDMA times out(timeout = 100ms), the PBDMA might be hung state. We do not reset the HOST during recovery, so stuck PBDMAs are unrecoverable. Abort the recovery and trigger GPU to quiesce as there is no way back. Triggering Quiesce from recovery sequence should be fine as the only redundant operation will be write to FIFO_RUNLIST_PREEMPT register. The error notifiers will eventually be set by Quiesce thread. Bug 2768005 JIRA NVGPU-4631 Change-Id: I914b9379aa8e48014e6ddace9abe47180a072863 Signed-off-by: Tejal Kudav <tkudav@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2368187 Reviewed-by: Automatic_Commit_Validation_User Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Alex Waterman	359fc24aaf	gpu: nvgpu: Rework engine management to work with vGPU Currently the vGPU engine management rewrites a lot of the common device agnostic engine management code. With the new top HAL parsing one device at a time, it is now more easily possible to tie the vGPU into the new common device framework by implementing the top HAL but with the vGPU engine list backend. This lets the vGPU inherit all the common engine and device management code. By doing so the vGPU HAL need only implement a trivial and simple HAL. This also gets us a step closer to merging all of the CE init code: logically it just iterates through all CE engines whatever they may be. The only reason this differs between chips is because of the swap from CE0-2 to LCEs in the Pascal generation. This could be abstracted by the unit code easily enough. Also, the pbdma_id for each engine has to be added to the device struct. Eventually this was going to happen anyway, since the device struct will soon replace the nvgpu_engine_info struct. It's a little bit of an abuse but might be worth it long term. If not, it should not be difficult to replace uses of dev->pbdma_id with a proper lookup of PBDMA ID based on the device info. JIRA NVGPU-5421 Change-Id: Ie8dcd3b0150184d58ca0f78940c2e7ca72994e64 Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2351877 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Vedashree Vidwans	9fda0b2354	gpu: nvgpu: allow vpr channels when VPR supported Currently, if VPR support is requested with nvgpu_channel_setup_bind(), channel is marked as vpr independent of nvgpu VPR support. Modify nvgpu_channel_setup_bind() to mark channel as vpr only if nvgpu supports VPR, otherwise return error. Bug 2046782 JIRA NVGPU-5302 Change-Id: I5f1717651b7bcff0597a6f0d9c746d50af7af0bf Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2368411 Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: Automatic_Commit_Validation_User Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: Seema Khowala <seemaj@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: Seema Khowala <seemaj@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Konsta Hölttä	ff41d97ab5	gpu: nvgpu: always prealloc jobs and fences Unify the job metadata handling by deleting the parts that have handled dynamically allocated job structs and fences. Now a channel can be in one less mode than before which reduces branching in tricky places and makes the submit/cleanup sequence easier to understand. While preallocating all the resources upfront may increase average memory consumption by some kilobytes, users of channels have to supply the worst case numbers anyway and this preallocation has been already done on deterministic channels. Flip the channel_joblist_delete() call in nvgpu_channel_clean_up_jobs() to be done after nvgpu_channel_free_job(). Deleting from the list (which is a ringbuffer) makes it possible to reuse the job again, so the job must be freed before that. The comment about using post_fence is no longer valid; nvgpu_channel_abort() does not use fences. This inverse order has not posed problems before because it's been buggy only for deterministic channels, and such channels do not do the cleanup asynchronously so no races are possible. With preallocated job list for all channels, this would have become a problem. Jira NVGPU-5492 Change-Id: I085066b0c9c2475e38be885a275d7be629725d64 Signed-off-by: Konsta Hölttä <kholtta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2346064 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: Debarshi Dutta <ddutta@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2020-12-15 14:13:28 -06:00
Richard Zhao	7b8a08af7a	gpu: nvgpu: check ch->wdt on wdt restart all channels ch->wdt is not always initialized. For example it's not initialized on gpu server, since the channel wdt is managed on client side. Bug 2833924 Signed-off-by: Richard Zhao <rizhao@nvidia.com> Change-Id: Idb06f7de6a15e093bbb08be16454777b9d7582b9 Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2361978 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Peter Daifuku <pdaifuku@nvidia.com> Reviewed-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2020-12-15 14:13:28 -06:00
Richard Zhao	98264f7505	gpu: nvgpu: call gops.tsg.unbind_channel on fail path When current context is busy, nvgpu_tsg_unbind_channel_common may fail because of preemption failed. In such case, the .unbind_channel hal still need to be called to notify vserver that the channel will be removed from tsg in teardown path. Bug 2833924 Signed-off-by: Richard Zhao <rizhao@nvidia.com> Change-Id: I9996202485429b4d9cba0c2f985f8e55fcdd3f29 Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2361977 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2020-12-15 14:13:28 -06:00
Alex Waterman	fbb6a5bc1c	gpu: nvgpu: Remove fifo->pbdma_map The FIFO pbdma map is an array of bit maps that link PBDMAs to runlists. This array allows other software to query what PBDMA(s) serves a given runlist. The PBDMA map is read verbatim from an array of host registers. These registers are stored in a kmalloc()'ed array. This causes a problem for the device management code. The device management initialization executes well before the rest of the FIFO PBDMA initialization occurs. Thus, if the device management code queries the PBDMA mapping for a given device/runlist, the mapping has yet to be populated. In the next patches in this series the engine management code is subsumed into the device management code. In other words the device struct is reused by the engine management and all host SW does is pull pointers to the host managed devices from the device manager. This means that all engine initialization that used to be done on top of the device management needs to move to the device code. So, long story short, the PBDMA map needs to be read from the registers directly, instead of an array that gets allocated long after the device code has run. This patch removes the pbdma map array, deletes two HALs that managed that, and instead provides a new HAL to query this map directly from the registers so that the device code can use it. JIRA NVGPU-5421 Change-Id: I5966d440903faee640e3b41494d2caf4cd177b6d Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2361134 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2020-12-15 14:13:28 -06:00
Alex Waterman	194fac7f3c	gpu: nvgpu: Remove clutter in engine code Remove the get_mask_on_id() HAL and replace it's usage with the global nvgpu_engine_get_mask_on_id() function. There's no need to have this function as a HAL. JIRA NVGPU-5420 Change-Id: I4fc843beff8e65806da26a0addc83fa218d390ac Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2361315 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Konsta Hölttä	ca1f93bdd7	gpu: nvgpu: add user fence type Decouple the fence information needed for providing submit postfences to userspace by adding a separate type for that and using it to pass fence data to ioctls. The data in struct nvgpu_fence_type is used in various places: - job tracking needs to know when a post fence is expired - job submitters within the driver (vidmem clears) need to be able to wait for these fences - userspace needs the fence as an id, value pair or as a file descriptor created from an os fence To keep object lifetimes strict, start decoupling the os fence data out of struct nvgpu_fence_type: delete nvgpu_fence_install_fd() and add nvgpu_fence_extract_user() to return a struct nvgpu_user_fence that contains only the necessary information. Storing the os fence in job tracking metadata is legacy code and not useful. Passing the os fence from where it's created through the whole submit path inside this combined fence type has been convenient, though. The internally stored cde job fence in dmabuf compression metadata is still nvgpu_fence_type to keep this patch simple. Jira NVGPU-5248 Change-Id: I75b7da676fb6aa083828f888c55571bbf7645ef3 Signed-off-by: Konsta Hölttä <kholtta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2359064 Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2020-12-15 14:13:28 -06:00
Alex Waterman	160669a7bb	gpu: nvgpu: return device from nvgpu_device_get() Instead of copying the device contents into the passed pointer have nvgpu_device_get() return a device pointer. This will let the engines.c code move towards using the nvgpu_device type directly, instead of maintaining its own version of an essentially identical struct. JIRA NVGPU-5421 Change-Id: I6ed2ab75187a207c8962d4c0acd4003d1c20dea4 Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2319758 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Alex Waterman	70ce67df2d	gpu: nvgpu: Add a generic profiler Add a generic profiler based on the channel kickoff profiler. This aims to provide a mechanism to allow engineers to (more) easily profile arbitrary software paths within nvgpu. Usage of this profiler is still primarily through debugfs. Next up is a generic debugfs interface for this profiler in the Linux code. The end goal for this is to profile the recovery code and generate interesting statistics. JIRA NVGPU-5606 Signed-off-by: Alex Waterman <alexw@nvidia.com> Change-Id: I99783ec7e5143855845bde4e98760ff43350456d Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2355319 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Alex Waterman	319520ff57	gpu: nvgpu: Add a new device manager unit This adds a new device management unit in the common code responsible for facilitating the parsing of the GPU top device list and providing that info to other units in nvgpu. The basic idea is to read this list once from HW and store it in a set of lists corresponding to each device type (graphics, LCE, etc). Many of the HALs in top can be deleted and instead implemented using common code parsing the SW representation. Every time the driver queries the device list it does so using a device type and instance ID. This is common code. The HAL is responsible for populating the device list in such a way that the driver can query it in a chip agnostic manner. Also delete some of the unit tests for functions that no longer exist. This code will require new unit tests in time; those should be quite simple to write once unit testing is needed. JIRA NVGPU-5421 Change-Id: Ie41cd255404b90ae0376098a2d6e9f9abdd3f5ea Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2319649 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Konsta Hölttä	6cbc174fc2	gpu: nvgpu: avoid channel wdt ifdefs Implement empty stubs of the channel watchdog functions for when watchdog is disabled from build. Add some forward declarations that were missing. Now most call sites don't need #idefs for the build flag. Add error checks for the wdt alloc failure. Jira NVGPU-5494 Jira NVGPU-5493 Change-Id: I2d42e8ab4c5e045cd280b2e1f254396127bd154b Signed-off-by: Konsta Hölttä <kholtta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2352050 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Alex Waterman	2a3bb9107f	gpu: nvgpu: rename <nvgpu/top.h> to <nvgpu/device.h> top.h is a description of "devices" available on the GPU. As such rename this header to device.h. device.h will ultimately be a unit of actual C code that will rely on the top HAL to fill a device list. JIRA NVGPU-5421 Change-Id: If6e4a537d2209e429a678761a34713723da7a00a Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2319648 Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00

1 2 3 4 5 ...

450 Commits