linux-nvgpu

mirror of git://nv-tegra.nvidia.com/linux-nvgpu.git synced 2025-12-22 17:36:20 +03:00

Author	SHA1	Message	Date
Atul Anand	5db14f3bfb	nvgpu: Fix pm resource release sequence The memory leak issue was due to nvgpu_profiler_unbind_context() calling nvgpu_profiler_pm_resource_release() for all resources which clears the flag required by nvgpu_profiler_free_pma_stream() to release the memory for perf_buf instance block. Fixing this issue by splitting nvgpu_profiler_unbind_context() to release all the pm resources at a later time separately. Bug 3510455 Change-Id: Ibab8d071693e600c46f7e7f16575e36e6f62af3c Signed-off-by: atanand <atanand@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2825013 Reviewed-by: svcacv <svcacv@nvidia.com> Reviewed-by: Dinesh T <dt@nvidia.com> Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com> GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>	2022-12-15 15:13:29 -08:00
Sagar Kamble	96f675595c	gpu: nvgpu: implement get and revoke share token ioctls Add share token list to gk20a_ctrl_priv. Implement GET_SHARE_TOKEN and REVOKE_SHARE_TOKEN ioctls. Revoke tokens while closing the TSG for all active devices. Bug 3677982 JIRA NVGPU-8681 Change-Id: I74455c21d881d5a0d381729fd695239722599980 Signed-off-by: Sagar Kamble <skamble@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2792081 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: Scott Long <scottl@nvidia.com> Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com> GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>	2022-11-10 11:49:54 -08:00
Sagar Kamble	675edd5053	gpu: nvgpu: maintain authorized devices in TSG When the TSG is successfully created first time or is opened with share token, the device instance id associated with the CTRL fd will be added to the TSG private data structure as authorized device instance ids. This is used for a security check when creating a TSG share token with nvgpu_tsg_get_share_token. Bug 3677982 JIRA NVGPU-8681 Change-Id: I67bb0514e1272dab15023cd3828a6a51e9a4c928 Signed-off-by: Sagar Kamble <skamble@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2792080 Reviewed-by: svcacv <svcacv@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: Scott Long <scottl@nvidia.com> Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com> GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>	2022-11-10 11:49:44 -08:00
Sagar Kamble	d1b28712b6	gpu: nvgpu: implement VEID alloc/free Implement the ioctls NVGPU_TSG_IOCTL_CREATE_SUBCONTEXT and NVGPU_TSG_IOCTL_DELETE_SUBCONTEXT. These will allocate and free the VEID numbers. Address space association with the VEIDs is verified to ensure that channels association with VEIDs and address space remains consistent. Bug 3677982 JIRA NVGPU-8681 Change-Id: I2d913baf61a6bdeec412c58270c0024b80ca15c6 Signed-off-by: Sagar Kamble <skamble@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2766765 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com> GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>	2022-11-01 00:05:18 -07:00
Sagar Kamble	6d836becf5	gpu: nvgpu: retry unbind when force killing the channel If NEXT bit remains set for a channel being unbound, it can lead to MMU fault of type unbound inst block. When userspace is closing the channel and NEXT bit is set, userspace retries. When force killing the channel, nvgpu can retry few iterations to ensure the channel is truly idle and unbound. If the channel is really stuck then unbind will fail and TSG will be aborted. Bug 3800844 Change-Id: I8fb024630ff2dd272245ae27116f3db6d6e0f788 Signed-off-by: Sagar Kamble <skamble@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2787533 (cherry picked from commit 99e39f4b387743a93b05ba4b097c33b23fbbcf68) Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2786479 Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com> GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>	2022-10-10 08:17:12 -07:00
Sagar Kamble	ef99d9f010	gpu: nvgpu: implement scg, pbdma and cilp rules Only certain combination of channels of GFX/Compute object classes can be assigned to particular pbdma and/or VEID. CILP can be enabled only in certain configs. Implement checks for the configurations verified during alloc_obj_ctx and/or setting preemption mode. Bug 3677982 Change-Id: Ie7026cbb240819c1727b3736ed34044d7138d3cd Signed-off-by: Sagar Kamble <skamble@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2719995 Reviewed-by: Ankur Kishore <ankkishore@nvidia.com> GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>	2022-09-08 21:00:30 -07:00
Sagar Kamble	693305c0fd	gpu: nvgpu: subcontext add/remove support Subcontext PDBs and valid mask in the instance blocks of the channels in various subcontexts has to be updated when new subcontext is created or a subcontext is removed. Replayable fault state is cached in the channel structure. Replayable fault state for subcontext is set based on first channel's bind parameter. It was earlier programmed in function channel_setup_ramfc. init_inst_block_core is updated to setup TSG level pdb map and mask. Added new hal gv11b_channel_bind to enable the subcontext on channel bind. Bug 3677982 Change-Id: I58156c5b3ab6309b6a4b8e72b0e798d6a39c1bee Signed-off-by: Sagar Kamble <skamble@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2719994 Reviewed-by: Ankur Kishore <ankkishore@nvidia.com> GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>	2022-09-08 21:00:20 -07:00
Sagar Kamble	f55fd5dc8c	gpu: nvgpu: multiple address spaces support for subcontexts This patch introduces following relationships among various nvgpu objects to support multiple address spaces with subcontexts. IOCTLs setting the relationships are shown in the braces. nvgpu_tsg 1<---->n nvgpu_tsg_subctx (TSG_BIND_CHANNEL_EX) nvgpu_tsg 1<---->n nvgpu_gr_ctx_mappings (ALLOC_OBJ_CTX) nvgpu_tsg_subctx 1<---->1 nvgpu_gr_subctx (ALLOC_OBJ_CTX) nvgpu_tsg_subctx 1<---->n nvgpu_channel (TSG_BIND_CHANNEL_EX) nvgpu_gr_ctx_mappings 1<---->n nvgpu_gr_subctx (ALLOC_OBJ_CTX) nvgpu_gr_ctx_mappings 1<---->1 vm_gk20a (ALLOC_OBJ_CTX) On unbinding the channel, objects are deleted according to dependencies. Without subcontexts, gr_ctx buffers mappings are maintained in the struct nvgpu_gr_ctx. For subcontexts, they are maintained in the struct nvgpu_gr_subctx. Preemption buffer with index NVGPU_GR_CTX_PREEMPT_CTXSW and PM buffer with index NVGPU_GR_CTX_PM_CTX are to be mapped in all subcontexts when they are programmed from respective ioctls. Global GR context buffers are to be programmed only for VEID0. Based on the channel object class the state is patched in the patch buffer in every ALLOC_OBJ_CTX call unlike setting it for only first channel like before. PM and preemptions buffers programming is protected under TSG ctx_init_lock. tsg->vm is now removed. VM reference for gr_ctx buffers mappings is managed through gr_ctx or gr_subctx mappings object. For vGPU, gr_subctx and mappings objects are created to reference VMs for the gr_ctx lifetime. The functions nvgpu_tsg_subctx_alloc_gr_subctx and nvgpu_tsg_- subctx_setup_subctx_header sets up the subcontext struct header for native driver. The function nvgpu_tsg_subctx_alloc_gr_subctx is called from vgpu to manage the gr ctx mapping references. free_subctx is now done when unbinding channel considering references to the subcontext by other channels. It will unmap the buffers in native driver case. It will just release the VM reference in vgpu case. Note that TEGRA_VGPU_CMD_FREE_CTX_HEADER ioctl is not called by vgpu any longer as it would be taken care by native driver. Bug 3677982 Change-Id: Ia439b251ff452a49f8514498832e24d04db86d2f Signed-off-by: Sagar Kamble <skamble@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2718760 Reviewed-by: Scott Long <scottl@nvidia.com> Reviewed-by: Ankur Kishore <ankkishore@nvidia.com> GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>	2022-09-08 20:59:59 -07:00
Sagar Kamble	7085934303	gpu: nvgpu: update channel TSG unbind fail paths On tsg.unbind_channel hal failure, channel teardown was being done again that was done already as part of the function nvgpu_tsg_unbind_channel_common. Just abort the TSG and return err in that case. Also, decrement the TSG ch_count in the fail path. Bug 3677982 Change-Id: I466f2b2c693d43ed64dc531b08bf740bf00f28a6 Signed-off-by: Sagar Kamble <skamble@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2749970 Reviewed-by: svcacv <svcacv@nvidia.com> Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>	2022-08-10 15:42:34 -07:00
atanand	eae4593343	gpu: nvgpu: add ioctl to configure implicit ERRBAR Add ioctl support to configure implicit ERRBAR by setting/unsetting NV_PGRAPH_PRI_GPCS_TPCS_SM_SCH_MACRO_SCHED register. Add gpu characteritics flag: NVGPU_SCHED_EXIT_WAIT_FOR_ERRBAR_SUPPORTED to allow userspace driver to determine if implicit ERRBAR ioctl is supported. Bug: 200782861 Change-Id: I530a4cf73bc5c844e8d73094d3e23949568fe335 Signed-off-by: atanand <atanand@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2718672 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Sagar Kamble <skamble@nvidia.com> Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-08-05 23:10:18 -07:00
Sagar Kamble	3fb2a2e209	gpu: nvgpu: track gr_ctx init state On successful obj_ctx allocation, set ctx_initialized member in gr_ctx to true and when it is true then only invoke free_gr_ctx. With this we can get rid of tsg->vm check while calling free_gr_ctx. tsg->vm will go away with multiple address spaces support in TSG. Bug 3677982 Change-Id: I4a64842411ce4ab157010808e4e8e4d5cd254a7f Signed-off-by: Sagar Kamble <skamble@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2746803 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: Scott Long <scottl@nvidia.com> Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-07-19 10:32:35 -07:00
Debarshi Dutta	1bdca92c50	gpu: nvgpu: modify rl_domain member KMD needs to send the domain id and GPU_VA corresponding to the struct runlist_domains to GSP. In the current implementation, struct nvgpu_runlist_domain contains the domain name instead of domain id. This requires an additional search by name everytime an update is needed to be submitted to the GSP. Modify the struct nvgpu_runlist_domain to store domain id instead of domain name. This simplifies the flow and avoids unnecessary search. Removed the conditional check for existence of shadow domain as its a deadcode. Shadow Domain is not searchable in the list of domains inside the struct nvgpu_runlist. Jira NVGPU-8610 Signed-off-by: Debarshi Dutta <ddutta@nvidia.com> Change-Id: I0d67cfa93d89186240290e933aa750702b14f4f0 Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2744890 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2022-07-15 15:15:30 -07:00
Sagar Kamble	f95cb5f4f8	gpu: nvgpu: maintain ctx buffers mappings separately from ctx mems In order to maintain separate mappings of GR TSG and global context buffers for different subcontexts, we need to separate the memory struct and the mapping struct for the buffers. This patch moves the mappings of all GR ctx buffers to new structure nvgpu_gr_ctx_mappings. This will be instantiated per subcontext in the upcoming patches. Summary of changes: 1. Various context buffers were allocated and mapped separately. All TSG context buffers are now stored in gr_ctx->mem[] array since allocation and mapping is unified for them. 2. Mapping/unmapping and querying the GPU VA of the context buffers is now handled in ctx_mappings unit. Structure nvgpu_gr_ctx_mappings in nvgpu_gr_ctx holds the maps. On ALLOC_OBJ_CTX this struct is instantiated and deleted on free_gr_ctx. 3. Introduce mapping flags for TSG and global context buffers. This is to map different buffers with different caching attribute. Map all buffers as cacheable except PRIV_ACCESS_MAP, RTV_CIRCULAR_BUFFER, FECS_TRACE, GR CTX and PATCH ctx buffers. Map all buffers as privileged. 4. Wherever VM or GPU VA is passed in the obj_ctx allocation functions, they are now replaced by nvgpu_gr_ctx_mappings. 5. free_gr_ctx API need not accept the VM as mappings struct will hold the VM. mappings struct will be kept in gr_ctx. 6. Move preemption buffers allocation logic out of nvgpu_gr_obj_ctx_set_graphics_preemption_mode. 7. set_preemption_mode and gr_gk20a_update_hwpm_ctxsw_mode functions need update to ensure buffers are allocated and mapped. 8. Keep the unit tests and documentation updated. With these changes there is clear seggregation of allocation and mapping of GR context buffers. This will simplify further change to add multiple address spaces support. With multiple address spaces in a TSG, subcontexts created after first subcontext just need to map the buffers. Bug 3677982 Change-Id: I3cd5f1311dd85aad1cf547da8fa45293fb7a7cb3 Signed-off-by: Sagar Kamble <skamble@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2712222 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2022-07-15 07:10:11 -07:00
Sagar Kamble	80efe558b1	gpu: nvgpu: add BVEC test for nvgpu_rc_pbdma_fault Update nvgpu_rc_pbdma_fault with invalid checks and add BVEC test for it. Make ga10b_fifo_pbdma_isr static. NVGPU-6772 Change-Id: I5485760c53e1fff1278557a5b25659a1fc0e4eaf Signed-off-by: Sagar Kamble <skamble@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2551617 (cherry picked from commit e917042d395d07cb902580bad3d5a7d0096cc303) Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2623625 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2022-07-14 08:58:31 -07:00
Debarshi Dutta	76cc8870e1	nvgpu: gpu: update default nvs domain implementation In current form, the default domain acts like any schedulable domain. TSGs are bound to it and it can be enumerated via the public interfaces. The new expectation for the default domain is meant to change from the current form to a pseudo domain that cannot act like an ordinary domain in other ways, i.e. it must not be reachable by in particular the domain management API, it can't be removed, does not show up in lists, and TSGs cannot be explicitly bound to this domain. It won't participate in round-robin domain scheduling. It is not really a domain, and acts like one only when activated in the manual mode. Following changes are made overall to support the above change in definition. 1) Domain creation and attaching the domain to the scheduler are now split into two separate functions. The new default domain (having ID = UINT64_MAX) is created separately from a static function without linking it with other domains in the scheduler. 2) struct nvgpu_nvs_scheduler explicitely stores the default domain to support direct lookups. 3) TSGs are initially not bound to default domain/rl_domain. Jira NVGPU-8165 Signed-off-by: Debarshi Dutta <ddutta@nvidia.com> Change-Id: I916d11f4eea5124d8d64176dc77f3806c6139695 Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2697477 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2022-05-12 00:24:58 -07:00
Sagar Kamble	d82400d2b8	gpu: nvgpu: fix MISRA Rule 5.1 violation BVEC changes for nvgpu_rc_pbdma_fault and nvgpu_rc_mmu_fault started reporting below MISRA issue. kernel/nvgpu/drivers/gpu/nvgpu/common/fifo/tsg.c:321: 1. misra_c_2012_rule_5_1_violation: Declaration with identifier "nvgpu_tsg_unbind_channel_check_hw_state", which is ambiguous. kernel/nvgpu/drivers/gpu/nvgpu/common/fifo/tsg.c:349: 2. other_declaration: The first 31 characters of identifiers "nvgpu_tsg_unbind_channel_check_ctx_reload" and "nvgpu_tsg_unbind_channel_check_hw_state" are identical. Do below renames to fix the issue. Doing both for consistency. s/nvgpu_tsg_unbind_channel_check_hw_state/nvgpu_tsg_unbind_channel_hw_state_check s/nvgpu_tsg_unbind_channel_check_ctx_reload/nvgpu_tsg_unbind_channel_ctx_reload_check JIRA NVGPU-6772 Change-Id: Ib92cabe11c486621351bf15ddb86e20d16d514c4 Signed-off-by: Sagar Kamble <skamble@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2584152 (cherry picked from commit a619f259c6a4ffccb05550767212989af60c2a90) Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2706551 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-05-11 04:18:12 -07:00
srajum	8381647662	gpu: nvgpu: fixing MISRA violations - MISRA Directive 4.7 Calling function "nvgpu_tsg_unbind_channel(tsg, ch, true)" which returns error information without testing the error information. - MISRA Rule 10.3 Implicit conversion from essential type "unsigned 64-bit int" to different or narrower essential type "unsigned 32-bit int" - MISRA Rule 5.7 A tag name shall be a unique identifier JIRA NVGPU-5955 Change-Id: I109e0c01848c76a0947848e91cc6bb17d4cf7d24 Signed-off-by: srajum <srajum@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2572776 (cherry picked from commit 073daafe8a11e86806be966711271be51d99c18e) Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2678681 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2022-03-10 16:01:18 -08:00
srajum	069fe05dca	gpu: nvgpu: remove whitelisting for wrongly reported violations by tool - Earlier we whitelisted wrongly reported static analysis violations by tool, raised coverity tool bugs for these cases. - These bugs are fixed with new version of tool, so no need fo whitelisting. JIRA NVGPU-7119 Change-Id: Ib2341db0d46fa7fac4c0cc9a6c1bdc8704377ef1 Signed-off-by: srajum <srajum@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2604365 (cherry picked from commit dc2d8ddaa409aefe0e04e0bacb3a8a977f6dbd64) Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2677523 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Rajesh Devaraj <rdevaraj@nvidia.com> Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-03-10 16:01:06 -08:00
Konsta Hölttä	f10ee4ab0e	gpu: nvgpu: add domain name API Add nvgpu_nvs_domain_get_name() to minimize messing up with nvs internals and to help code organization when nvs is not built in yet. A stub to help compilation returns NULL because no domains can exist when the stub is built in, and thus it won't be used. Jira NVGPU-6788 Change-Id: If663f7c0e8434ef00dd3a3f40f6404a35b477f2b Signed-off-by: Konsta Hölttä <kholtta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2673120 Reviewed-by: svcacv <svcacv@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-03-01 00:09:01 -08:00
Konsta Hölttä	2a8914619d	gpu: nvgpu: bind sched domains as fds Replace id-based lookup with fd-based lookup when binding a TSG to a domain. The device node based domain interface naturally provides access control; this way userspace tools can limit which uid/gid can access each domain. Also, explicitly disallow binding channels to a TSG that has no runlist domain yet. Normally a TSG is in the default domain if nothing else has been specified, but the default domain can be deleted. Jira NVGPU-6788 Change-Id: I2af96dfc002367d894eaf0c175006332f790c27f Signed-off-by: Konsta Hölttä <kholtta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2651165 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-03-01 00:08:55 -08:00
Konsta Hölttä	359e83b45a	gpu: nvgpu: tsg: release default nvs domain ref A reference to the default scheduling domain is taken when a TSG is opened. Although the explicit bind is designed to support only one bind, the TSG is bound to the default one implicitly at that point. Release the reference to avoid leaking it. The domain might be null at that point if the default domain has been removed; in that case there's just no domain to put back. Change-Id: I7db43f7bbb2a8c86c391280eb7fa32431c8982da Signed-off-by: Konsta Hölttä <kholtta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2663420 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-02-06 10:09:34 -08:00
Sagar Kamble	29a0a146ac	gpu: nvgpu: fix coverity defects Fix following coverity defects: ioctl_prof.c resource leak ioctl_dbg.c logically dead code global_ctx.c identical code for branches therm_dev.c resource leak pmu_pstate.c unused value nvgpu_mem.c dead default in switch tsg.c Dereference before null check nvlink_gv100.c logically dead code nvlink.c Out-of-bounds write fifo_vgpu.c Dereference null return value pmu_pg.c Dereference before null check fw_ver_ops.c Identical code for different branches boardobjgrp.c Dereference after null check boardobjgrp.c Dereference before null check boardobjgrp.c Dereference after null check engines.c Dereference before null check nvgpu_init.c Unused value CID 10127875 CID 10127820 CID 10063535 CID 10059311 CID 10127863 CID 9875900 CID 9865875 CID 9858045 CID 9852644 CID 9852635 CID 9852232 CID 9847593 CID 9847051 CID 9846056 CID 9846055 CID 9846054 CID 9842821 Bug 3460991 Change-Id: I91c215a545d07eb0e5b236849d5a8440ed6fe18d Signed-off-by: Sagar Kamble <skamble@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2657444 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Sachin Nikam <snikam@nvidia.com> Reviewed-by: Mahantesh Kumbar <mkumbar@nvidia.com> GVS: Gerrit_Virtual_Submit	2022-01-28 04:50:12 -08:00
Richard Zhao	9ab1271269	gpu: nvgpu: common: fix compile error of new compile flags It's preparing to add bellow CFLAGS: -Werror -Wall -Wextra \ -Wmissing-braces -Wpointer-arith -Wundef \ -Wconversion -Wsign-conversion \ -Wformat-security \ -Wmissing-declarations -Wredundant-decls -Wimplicit-fallthrough Jira GVSCI-11640 Signed-off-by: Richard Zhao <rizhao@nvidia.com> Change-Id: Ia8f508c65071aa4775d71b8ee5dbf88a33b5cbd5 Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2555056 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2022-01-13 12:36:14 -08:00
Konsta Hölttä	55afe1ff4c	gpu: nvgpu: improve nvs uapi - Make the domain scheduler timeslice type nanoseconds to future proof the interface - Return -ENOSYS from ioctls if the nvs code is not initialized - Return the number of domains also when user supplied array is present - Use domain id instead of name for TSG binding - Improve documentation in the uapi headers - Verify that reserved fields are zeroed - Extend some internal logging - Release the sched mutex on alloc error - Add file mode checks in the nvs ioctls. The create and remove ioctls require writable file permissions, while the query does not; this allows filesystem based access control on domain management on the single dev node. Jira NVGPU-6788 Change-Id: I668eb5972a0ed1073e84a4ae30e3069bf0b59e16 Signed-off-by: Konsta Hölttä <kholtta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2639017 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2021-12-15 06:05:25 -08:00
Konsta Hölttä	632644b44a	gpu: nvgpu: couple runlist domains and nvs Now that the main nvsched code exists in the nvgpu build, make it control the runlist domains. As a new nvs domain is created, create the relevant runlist data too. To support the default domain, create a default nvs domain at boot. The scheduling domain code owns the responsibility of domain lifetime, and runlist domains exist to serve that logic although the RL domains are directly used by channel and TSG logic. Add refcounting to the scheduler uapi level to make sure that busy domains (that still have TSG participants) do not get removed too early. Adjust error injection sensitive unit tests to match the updated logic. Jira NVGPU-6425 Jira NVGPU-6427 Change-Id: I1beec97c54c60ad334165b1c0acb5e827c24f2ac Signed-off-by: Konsta Hölttä <kholtta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2632287 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-12-07 07:07:12 -08:00
Konsta Hölttä	1d14a4412f	gpu: nvgpu: scheduler management uapi Add ioctls for creating, removing and querying scheduling domains and interface with the "nvsched" entity that will be the core scheduler. Include the scheduler in the Linux build. The core scheduler code will ultimately hold data on and control what gets scheduled, but this intermediate layer in nvgpu-rm needs a bit of bookeeping to manage the userspace interface. To keep changes isolated, this does not touch the internal runlist domains yet. The core scheduler logic will eventually control the runlist domains. Jira NVGPU-6788 Change-Id: I7b4064edb6205acbac2d8c593dad019d517243ce Signed-off-by: Alex Waterman <alexw@nvidia.com> Signed-off-by: Konsta Hölttä <kholtta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2463625 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-12-07 07:07:01 -08:00
Konsta Hölttä	fe7ae02f5f	gpu: nvgpu: add sched domain bind ioctl Support binding TSGs to some other scheduling domain than the default one. Binding happens by name until a more robust interface appears in the future, as the name is a natural identifier for users. No other domains are actually created anywhere yet; that will happen in next patches. Jira NVGPU-6788 Change-Id: I5abcdea869b525b0a0e9937302f106f7eee94ec2 Signed-off-by: Konsta Hölttä <kholtta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2628047 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2021-11-24 04:47:30 -08:00
Konsta Hölttä	9be8fb80a2	gpu: nvgpu: make tsgs domain aware Start transitioning from an assumption of a single runlist buffer to the domain based approach where a TSG is a participant of a scheduling domain that then owns has a runlist buffer used for hardware scheduling. Concretely, move the concept of a runlist domain up to the users of the runlist code. Modifications to a runlist need to specify which domain is modified. There is still only the default domain that is created at boot. Jira NVGPU-6425 Change-Id: Id9a29cff35c94e0d7e195db382d643e16025282d Signed-off-by: Konsta Hölttä <kholtta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2621213 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2021-11-11 20:39:42 -08:00
Konsta Hölttä	c0473460ea	gpu: nvgpu: don't check ch activity on bind Delete an unnecessary check of the active_channels bitmap when attempting to bind a channel to a TSG. There is already a verification that the channel must not be a part of a TSG; if it's not, it cannot be set in the bitmap. All channels become active via a parent TSG, but the activity check predates this design. A channel is bound to a TSG early before setting up its gpfifo etc. and mandatory membership of a TSG is one of the setup_bind prechecks. Jira NVGPU-6425 Change-Id: Id34686f198db0a0265ffd6a49a0b2e47c37fd5f7 Signed-off-by: Konsta Hölttä <kholtta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2621211 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: svcacv <svcacv@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2021-11-04 12:47:54 -07:00
Konsta Hölttä	3cf796b787	gpu: nvgpu: move active bitmaps to domain Move the active_channels and active_tsgs bitmaps from struct nvgpu_runlist to struct nvgpu_runlist_domain. A TSG and its channels are currently active as part of a runlist; in the future, a runlist may be switched from multiple domains that each are a collection of TSGs. The changes are still internal to the runlist code. Users of runlists need no modifications. Jira NVGPU-6425 Change-Id: I2d0e98e97f04b9716bc3f4890cf881735d0ab664 Signed-off-by: Konsta Hölttä <kholtta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2618387 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-11-03 20:55:08 -07:00
Debarshi Dutta	791dc18666	gpu: nvgpu: bvec for struct nvgpu_tsg_sm_error_state fields Add Setter and Getter methods for accessing tsg->sm_error_states. Getter returns a constant pointer for struct nvgpu_tsg_sm_error_state. This renders it unnecessary to add BVEC for above fields for the struct in multiple locations. The current design ensures that only a constant pointer is obtained from the owner unit i.e. FIFO. The following new methods are added. Both unit tests and BVEC tests are added for them as well. nvgpu_tsg_store_sm_error_state nvgpu_tsg_get_sm_error_state Jira NVGPU-6947 Change-Id: I82c22a2774862c8579baa41b6fb8292fa164704a Signed-off-by: Debarshi Dutta <ddutta@nvidia.com> (cherry picked from commit 79574638671a0c6efe41cd3423668fcd1bd96826) Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2556938 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com> Reviewed-by: Shashank Singh <shashsingh@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2021-09-13 20:57:09 -07:00
Deepak Nibade	3c97f3b932	gpu: nvgpu: disallow binding more channels than MAX channels supported per TSG There is HW specific limit on number of channel entries that can be added for each TSG entry in runlist. Right now there is no checking to enforce this from SW and hence if User binds more than supported channels to same TSG, invalid TSG formation error interrupts are generated. Fix this by adding appropriate checks in below steps : - Add new field ch_count to struct nvgpu_tsg to keep track of channels bound to TSG. - Define new hal gops.runlist.get_max_channels_per_tsg() to retrieve HW specific maximum channel count per TSG. - Implement the HAL for gk20a and gv11b chips, and assign new HALs for all chips appropriately. - Increment ch_count while binding the channel to TSG and decrement it while unbinding. - While binding channel to TSG, Check if current channel count is already equal to max channel count. If yes, print an error and bail out. Bug 200763991 Change-Id: Ic5f17a52e0fb171d1c020bf4f085f57cdb95f923 Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2582095 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2021-08-25 09:47:47 -07:00
Debarshi Dutta	200777b854	gpu: nvgpu: bvec for channel and tsg Below changes are added. 1) Added checks in nvgpu_channel_from_id__func, nvgpu_tsg_check_and_get_from_id 2) Added BVEC tests for nvgpu_channel_open_new, nvgpu_channel_from_id, nvgpu_tsg_check_and_get_from_id, nvgpu_tsg_set_error_notifier 3) Added common function get_random_u32. Jira NVGPU-6905 Change-Id: I374d6f5503dc05e3224213d772a1752d82cbdc91 Signed-off-by: Debarshi Dutta <ddutta@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2548304 (cherry picked from commit 39b2529b3e96cfd3cbd3bb020f32ee2cca0ea363) Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2554021 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com> Reviewed-by: Sachin Nikam <snikam@nvidia.com> Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2021-07-07 12:25:50 -07:00
Richard Zhao	4e08649b7f	gpu: nvgpu: move mem checking of gr_ctx to .alloc_obj_ctx Preparing for adding vgpu cmd .add_obj_ctx and memory will be allocated on server side. Outside of implementation of .alloc_obj_ctx, code should not check whether gr_ctx is valid by check gr_ctx mem. Jirs GVSCI-10977 Change-Id: I6b3d826e930fdfaaae517d204186642e49f5c2d7 Signed-off-by: Richard Zhao <rizhao@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2546190 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-06-28 18:10:01 -07:00
Deepak Nibade	419a65965b	gpu: nvgpu: add mutex for gr_ctx initialization If user calls IOCTL to allocate object context for two channels in same TSG in parallel, nvgpu_gr_setup_alloc_obj_ctx() could end up racing and trying to allocate object context for both channels at the same time. This could result in corrupting object context. Fix this by introducing per-TSG mutex ctx_init_lock to serialize context initialization for all channels within TSG. In ideal scenario nvrm_gpu is the only caller of all the IOCTLs, and nvrm_gpu makes sure to initialize object context for each channel in serial order. Because of this new lock does not cause any contention. Jira NVGPU-6431 Change-Id: Ibb1cbb4878748929bb7f23e8666c283c39ecbf5a Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2538333 (cherry picked from commit 8be447838dc1ecbd5637eb6bd13b8f338eaf33cd) Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2538773 Reviewed-by: svcacv <svcacv@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com> Reviewed-by: Shashank Singh <shashsingh@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2021-06-03 15:59:43 -07:00
Sagar Kamble	89ec2afbd4	gpu: nvgpu: fix tsg unbind failure paths nvgpu_tsg_unbind_channel_common failure handling missed channel.clear & nvgpu_tsg_set_mmu_debug_mode calls. Bug 200711183 Change-Id: I19fd53be55db9df725b7cf467b2673e4cd29deb5 Signed-off-by: Sagar Kamble <skamble@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2521972 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Antony Clince Alex <aalex@nvidia.com> Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2021-05-03 03:25:07 -07:00
Sagar Kamble	ff706e5456	gpu: nvgpu: handle ctx_reload when force unbinding the channel When force closing the channel, NEXT and CTX_RELOAD bits might be set. Currently CTX_RELOAD bit is ignored. However, due to this, the channel created after the erroneous unbind encounters FECS fault. If the channel is unbound while it is running, fifo unbind error happens and can lead to unspecified behavior. By moving CTX_RELOAD to other channel in the TSG, the channel can be unbound safely. In other cases, if the channel is truly running something when it is being unbound it should either get preempted or be handled through engine reset. Bug 200701444 Change-Id: Iba956544dcaa1144c6064247257c64cbe9a29ae6 Signed-off-by: Sagar Kamble <skamble@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2515083 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-04-15 16:21:44 -07:00
Mayur Poojary	6277d57936	gpu: nvgpu: Add new api for setting longer timeslice on dbg node Add new ioctl api for setting longer timeslice and get timeslice inside 'dbg' dev node. Update ioctl gpu_get_characteristic to pass the max timeslice value Add debugfs to access and change the max timeslice value Bug 1842244 Change-Id: I7e80f59162cf5d90496f9752fc128f5fa8dcc7d2 Signed-off-by: Mayur Poojary <mpoojary@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2471569 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-04-06 04:37:38 -07:00
Alex Waterman	5bf229dcd5	gpu: nvgpu: Rename runlist_id to id Rename the runlist_id field in struct nvgpu_runlist to just id. The runlist part is redundant given that this id is already in 'struct nvgpu_runlist'. Change-Id: Ie2ea98f65d75e5e46430734bd7a7f6d6267c7577 Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2470306 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Tejal Kudav <tkudav@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2021-02-19 15:16:46 -08:00
Alex Waterman	bd1b395b5c	gpu: nvgpu: Update runlist_id in TSG Update the runlist_id field in struct tsg to now be a pointer to the relevant runlist. This further cleans up the rampant use of runlist_ids throughout the driver. Change-Id: I3dce990f198d534a80caa9ca95982255dcf104ad Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2470305 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Tejal Kudav <tkudav@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2021-02-19 15:16:41 -08:00
Alex Waterman	77c0b9ffdc	gpu: nvgpu: Update runlist_update() to take runlist ptr Update the nvgpu_runlist_update_for_channel() function: - Rename it to nvgpu_runlist_update() - Have it take a pointer to the runlist to update instead of a runlist ID. For the most part this makes the code better but there's a few places where it's worse (for now). This starts the slow and painful process of moving away from the non-runlist code using runlist IDs in many places it should not. Most of this patch is just fixing compilation problems with the minor header updates. JIRA NVGPU-6425 Change-Id: Id9885fe655d1d750625a1c8aceda9e67a2cbdb7a Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2470304 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2021-01-29 09:51:44 -08:00
Alex Waterman	11d3785faf	gpu: nvgpu: Rename struct nvgpu_runlist_info, fields in fifo Rename struct nvgpu_runlist_info to struct nvgpu_runlist; the info is not necessary. struct nvgpu_runlist is soon to be a first class object among the nvgpu object model. Also rename the fields runlist_info and active_runlist_info to simply runlists and active_runlists respectively. Again the info text is just not necessary and somewhat misleading. These structs _are_ the runlist representations in SW; they are not merely informational. Also add an rl_dbg() macro to print debug info specific to runlist management and some debug prints specifying the runlist topology for the running chip. Change-Id: Id9fcbdd1a7227cb5f8c75cca4abbff94fe048e49 Signed-off-by: Alex Waterman <alexw@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2470303 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2021-01-20 21:56:33 -08:00
Sagar Kamble	cf287a4ef5	gpu: nvgpu: retry tsg unbind if NEXT is set The NEXT bit can remain set for the channel if timeslice expires before scheduler clears it. Due to this nvgpu fails TSG unbind and in turn nvrm_gpu fails channel close. In this case, checking the channel hw state after some time can help see NEXT bit cleared by scheduler. Reenable the tsg and return -EAGAIN to nvrm_gpu for it to retry again. Bug 3144960 Change-Id: I35f417f02270e371a4e632986b73a00f8a4f921a Signed-off-by: Sagar Kamble <skamble@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2468391 Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2021-01-18 23:11:57 -08:00
Sagar Kamble	4d101a6303	gpu: nvgpu: do tsg unbind hw state check only for multi-channel TSG Host scheduler might be confused if more than one channels are present in TSG and one of the unbound channel has NEXT set. This is not so much of an issue if there is single channel in the TSG. So don't fail unbind in that case. ctx_reload and engine_faulted check can also be skipped for single channel TSG. Bug 3144960 Change-Id: I85eb9025ea53706ce8fda6d9b4bcf6a15a300d17 Signed-off-by: Sagar Kamble <skamble@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2442970 Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2020-12-15 14:13:48 -06:00
Sagar Kamble	842dec2470	gpu: nvgpu: unrailgate gpu during tsg release There is race condition between nvgpu runtime suspend and l2_flush or tlb_invalidate that happens as part of gmmu_unmap done during nvgpu_gr_ctx_free. Since l2_flush and tlb_invalidate does not do pm_runtime_get_sync, the suspend in progress can lead to registers getting locked and then l2_flush or tlb_invalidate can access the registers when registers are locked (GPU is railgated). Bug 3132891 Signed-off-by: Sagar Kamble <skamble@nvidia.com> Change-Id: If1696a9e9d3d9bc5fd55dd754be90a81114a75cc Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2425680 Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2020-12-15 14:13:28 -06:00
Vedashree Vidwans	b2ff527d15	gpu: nvgpu: add channel.clear gops - Add channel.clear gops for nvgpu-next. - Do not return error if hw_state.next is set and channel.clear is not NULL. Bug 200650602 Bug 3109773 Change-Id: I4252691e4557351899e6fb9d85934e2d72517a36 Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2414211 Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Seema Khowala <seemaj@nvidia.com> Reviewed-by: Alex Waterman <alexw@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Deepak Nibade	969b901999	gpu: nvgpu: create device/context profiler dev nodes Create new dev nodes for device and context profilers. Example of dev nodes on iGPU /dev/nvhost-prof-dev-gpu - device scope profiler /dev/nvhost-prof-ctx-gpu - context scope profiler Add below APIs to open/close above dev nodes : nvgpu_prof_dev_fops_open() nvgpu_prof_ctx_fops_open() nvgpu_prof_fops_release() Add common API nvgpu_prof_fops_ioctl() to handle IOCTL call on these dev nodes. Add IOCTL NVGPU_PROFILER_IOCTL_BIND_CONTEXT to bind the TSG to profiler objects. Add nvgpu_tsg_get_from_file() to retrieve TSG struct pointer from file descriptor. Also store profiler object pointer into TSG struct. Enable NVGPU_SUPPORT_PROFILER_V2_DEVICE capability on gv11b and tu104. Note that this is not yet enabled for vGPU. Keep NVGPU_SUPPORT_PROFILER_V2_CONTEXT capabiity disabled since this will take longer to support. Add new IOCTL NVGPU_PROFILER_IOCTL_UNBIND_CONTEXT so that userspace can explicitly unbind the context and release the resources before closing the profiler descriptor. Add context_init flag to profiler object for book keeping. Bug 2510974 Jira NVGPU-5360 Change-Id: Ie07e0cfd5a9da9d80008f79c955c7ef93b4bc60f Signed-off-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2384354 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Tejal Kudav	ab2b0b5949	gpu: nvgpu: Set unserviceable flag early during RC During recovery, we set ch->unserviceable at the end after we preempt the TSG and reset the engines. It might be too late and user-space might submit more work to the broken channel which is not desirable. Move setting this unserviceable flag right at the start of recovery sequence. Another thread doing a submit can still read the unserviceable flag just before it is set here, leaving that submit stuck if recovery completes before the submit thread advances enough to set up a post fence visible for other threads. This could be fixed with a big lock or with a double check at the end of the submit code after the job data has been made visible. We still release the fences, semaphore and error notifier wait queues at the end; so user-space would not trigger channel unbind while channel is being recovered. Also, change the handle_mmu_fault APIs to return void as the debug_dump return value is not used in any of the caller APIs. JIRA NVGPU-5843 Change-Id: Ib42c2816dd1dca542e4f630805411cab75fad90e Signed-off-by: Tejal Kudav <tkudav@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2385256 Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:13:28 -06:00
Richard Zhao	98264f7505	gpu: nvgpu: call gops.tsg.unbind_channel on fail path When current context is busy, nvgpu_tsg_unbind_channel_common may fail because of preemption failed. In such case, the .unbind_channel hal still need to be called to notify vserver that the channel will be removed from tsg in teardown path. Bug 2833924 Signed-off-by: Richard Zhao <rizhao@nvidia.com> Change-Id: I9996202485429b4d9cba0c2f985f8e55fcdd3f29 Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2361977 Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com> Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com> Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com> Reviewed-by: Konsta Holtta <kholtta@nvidia.com> Reviewed-by: automaticguardword <automaticguardword@nvidia.com> Reviewed-by: Deepak Nibade <dnibade@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com> Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> GVS: Gerrit_Virtual_Submit	2020-12-15 14:13:28 -06:00
Vedashree Vidwans	2fb56f2cea	gpu: nvgpu: add bvec check for common.fifo input This patch adds boundary value check for common.fifo parameters as listed below. 1. nvgpu_channel_setup_bind() includes a condition to check that value of num_gpfifo_entries does not exceed 2^31. Otherwise prints message and returns error. 2. nvgpu_tsg_bind_channel() includes a condition to check if channel subctx had ASYNC id. If true, runqueue selector is set to 1 and 0 otherwise. This check is to be moved from devctl to common.fifo. Jira NVGPU-4817 Change-Id: Id1c9253945859c245e584b5c42b3285a6b620055 Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com> Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2278613 Tested-by: mobile promotions <svcmobile_promotions@nvidia.com> Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>	2020-12-15 14:10:29 -06:00

1 2 3 4

155 Commits