Refactor user managed syncpoints out of the channel sync infrastructure
that deals with jobs submitted via the kernel api. The user syncpt only
needs to expose the id and gpu address of the reserved syncpoint. None
of the rest (fences, priv cmdbufs) is needed for that, so it hasn't been
ideal to couple with the user-allocated syncpts.
With user syncpts now provided by channel_user_syncpt, remove the
user_managed flag from the kernel sync api.
This allows moving all the kernel submit sync code to be conditionally
compiled in only when needed, and separates the user sync functionality
in a more clear way from the rest with a minimal API.
[this is squashed with commit 5111caea601a (gpu: nvgpu: guard user
syncpt with nvhost config) from
https://git-master.nvidia.com/r/c/linux-nvgpu/+/2325009]
Jira NVGPU-4548
Change-Id: I99259fc9cbd30bbd478ed86acffcce12768502d3
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2321768
(cherry picked from commit 1095ad353f5f1cf7ca180d0701bc02a607404f5e)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2319629
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Each submitted job has held a reference to the channel where the job
runs. This is not necessary: all that the refs do is prevent the channel
from getting freed before the jobs are done in case the channel file is
closed early. However, that is already taken care of, so remove the
per-job get/put pair.
The channel closure path needs to unbind the channel from its tsg if
that hasn't done by the channel's user. Unbind gets the channel off the
runlist and forces all fences to expire, then enqueues the channel for
final job cleanup. No jobs can outlive this.
Delete also the extra get/put pair in job cleanup. The caller (either
the channel worker thread or the submit path in case of deterministic
channels) will always hold a reference.
Jira NVGPU-4548
Change-Id: I3a01759e1b2caf66c46cff19f6557645489ca8f4
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2322541
(cherry picked from commit 8af6260b8fcfd7bf393f50addb681b5353cbae38)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2324255
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Channel debug_dump hal function does not involve
any register related code.
Move gv11b_channel_debug_dump hal function to
common code nvgpu_channel_info_debug_dump function.
Check gpu hw version to limit instance variables
dump that differs between socs.
Add new hal pointer syncpt_debug_dump for pbdma.
Jira NVGPU-5109
Signed-off-by: Vinod G <vinodg@nvidia.com>
Change-Id: Icfca837ce8e4117387cffa6fadf6c094c7da5946
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2321016
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Enable build flags for dGPU in safety, when
NVGPU_FORCE_DGPU_SAFETY_PROFILE is set.
Use libnvgpu-dgpu_safe.exports for dGPU safety build.
Add build flags for tu104 HAL initialization (to solve
undefined symbols in safety build).
Temporarily add non-fusa files needed to build dGPU in safety.
related functions will have to move to fusa files.
Jira NVGPU-4611
Change-Id: I41db0c039c7f15d9191cdb811b4906e779d5cc88
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2310276
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
- "include/trace/events/gk20a.h" file was having GPL2 license
(which should not used for QNX code). This file was used for
compiling linux userspace driver("libnvgpu-drv.so") and was used for
unit testing on QNX.
- This patch removes stubs in "include/trace/events/gk20a.h" file.
(which were used for linux userspace driver.)
- For QNX driver, "nvgpu_rmos/trace/events/gk20a.h" was used.
This patch moves that file to "include/nvgpu/posix/trace_gk20a.h" and
does relevant license change. This same file will be used for linux
userspace driver.
- This patch also creates a new file "include/nvgpu/trace.h" which
selects proper trace file depending on the config.
Bug 2802414
Change-Id: Icdfb251e5698073f986753a969e804161af3ecc5
Signed-off-by: Vaibhav Kachore <vkachore@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2286388
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
This patch adds boundary value check for common.fifo parameters as
listed below.
1. nvgpu_channel_setup_bind() includes a condition to check that value
of num_gpfifo_entries does not exceed 2^31. Otherwise prints message and
returns error.
2. nvgpu_tsg_bind_channel() includes a condition to check if channel
subctx had ASYNC id. If true, runqueue selector is set to 1 and 0
otherwise. This check is to be moved from devctl to common.fifo.
Jira NVGPU-4817
Change-Id: Id1c9253945859c245e584b5c42b3285a6b620055
Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2278613
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
IRQs can get triggered during nvgpu power-on due to MMU fault, invalid
PRIV ring or bus access etc. Handlers for those IRQs can't access the
full state related to the IRQ unless nvgpu is fully powered on.
In order to let the IRQ handlers know about the nvgpu power-on state
gk20a.power_on_state variable has to be protected through spinlock
to avoid the deadlock due to usage of earlier power_lock mutex.
Further the IRQs need to be disabled on local CPU while updating the
power state variable hence use spin_lock_irqsave and spin_unlock_-
irqrestore APIs for protecting the access.
JIRA NVGPU-1592
Change-Id: If5d1b5e2617ad90a68faa56ff47f62bb3f0b232b
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2203860
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
This change eliminates MISRA Advisory Rule 18.4
violations in the following by accessing g->fifo.channel
with array indexing:
* nvgpu_channel_init_support()
* nvgpu_channel_semaphore_wakeup()
Advisory Rule 18.4 states that the +, -, +=, and -=
operators should not be applied to an expression
of pointer type.
JIRA NVGPU-3798
Change-Id: I6b1bf360db6ec25894cc0ea430c33067e0cddf64
Signed-off-by: Scott Long <scottl@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2207550
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Add macros for whitelisting coverity violations. These macros use pragma
directives. The pragma directives and whitelisting macros are only
enabled when a coverity scan is being run.
The whitelisting macros have been added to a new header called
static_analysis.h. The contents of safe_ops.h (CERT C safe ops) have
been moved into static_analysis.h because this will be the new header
for static analysis related macros/defines/etc.
JIRA NVGPU-3820
Change-Id: I9c63f20f670880b420415535738034619314b7c3
Signed-off-by: Adeel Raza <araza@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2180600
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
The following misra violations are fixed in the current patch.
1) misra_c_2012_directive_4_7_violation: Calling function
"nvgpu_report_host_err" which returns error information without testing
the error information.
2) misra_c_2012_directive_4_7_violation: The variable "intr_0_en_mask"
which contains error information hasn't been tested.
3) misra_c_2012_directive_4_7_violation: Calling function
"gv11b_fifo_intr_0_error_mask(g)" which returns error information
without testing the error information.
4) misra_c_2012_rule_8_6_violation: "gk20a_fifo_bar1_snooping_disable"
is declared but never defined.
5) misra_c_2012_rule_8_6_violation: "gm20b_fuse_check_priv_security" is
declared but never defined.
6) misra_c_2012_rule_8_6_violation: "gm20b_fuse_status_opt_gpc" is
declared but never defined.
Jira NVGPU-3881
Change-Id: I731cd1d99649e07cb39aa75c4715e17eedd4d927
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2188161
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
For safety build, nvgpu driver should enter SW quiesce state
in case an uncorrectable error has occurred. In this state, any
activity on the GPU should be prevented, without powering off the GPU.
Also, a minimal set of operations should be used to enter SW quiesce
state.
Entering SW quiesce state does the following:
- set sw_quiesce_pending: when this flag is set, interrupt
handlers exit after masking interrupts. This should help mitigate
an interrupt storm.
- wake up thread to complete quiescing.
The thread performs the following:
- set NVGPU_DRIVER_IS_DYING to prevent allocation of new resources
- disable interrupts
- disable fifo scheduling
- preempt all runlists
- set error notifier for all active channels
Note: for channels with usermode submit enabled, userspace can
still ring doorbell, but this will not trigger any work on
engines since fifo scheduling is disabled.
Jira NVGPU-3493
Change-Id: I639a32da754d8833f54dcec1fa23135721d8d89a
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2172391
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
The following violations are fixed in this patch
a) misra_c_2012_rule_2_1_violation: This code cannot be reached: "return
err;".
b) misra_c_2012_directive_4_7_violation: Calling function
"nvgpu_preempt_channel(g, ch)" which returns error information without
testing the error information.
c) misra_c_2012_rule_8_6_violation: "" is declared but never defined for
following functions
1) gm20b_dump_engine_status
2) gp10b_ramfc_setup
3) gp10b_ramfc_get_syncpt
4) gp10b_ramfc_set_syncpt
5) gk20a_fifo_intr_0_enable
6) gk20a_fifo_intr_0_isr
7) gk20a_fifo_handle_sched_error
8) gk20a_fifo_is_mmu_fault_pending
9) gk20a_fifo_intr_set_recover_mask
10) gk20a_fifo_intr_unset_recover_mask
11) gk20a_init_fifo_reset_enable_hw
12) gk20a_init_fifo_setup_hw
13) nvgpu_tsg_set_runlist_interleave
14) gm20b_dump_engine_status
15) gp10b_pbdma_channel_fatal_0_intr_descs
16) gp10b_pbdma_allowed_syncpoints_0_index_f
17) gp10b_pbdma_allowed_syncpoints_0_valid_f
18) gp10b_pbdma_allowed_syncpoints_0_index_v
19) gk20a_runlist_reschedule
The above functions declarations are now embedded within
CONFIG_NVGPU_HAL_NON_FUSA
d) The function nvgpu_channel_abort_clean_up has a UMD version and hence
its taken out of CONFIG_NVGPU_KERNEL_MODE_SUBMIT to avoid errors of
type c above.
Jira NVGPU-3881
Change-Id: I5f85c7070e1d2f0b18d14db07ce22a01c29f0e40
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2181032
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
CTS test dEQP-VK.api.object_management.max_concurrent.device_group
crashes with invalid userspace memory access.
Currently, nvgpu_submit_prepare_syncs() races with
nvgpu_channel_clean_up_jobs() and this race condition is exposed when
aggressive_sync_destroy_thresh is set to non-zero value.
nvgpu_submit_prepare_syncs() gets ref for c->sync to submit job and
releases channel sync_lock. Meanwhile, nvgpu_worker_poll_work()
triggers nvgpu_channel_clean_up_jobs(), which destroys ref'd c->sync
pointer.
This patch protects channel's sync pointer by holding channel sync_lock
during complete execution of nvgpu_submit_prepare_syncs().
Bug 2613870
Change-Id: I6f3d48aff361d1cb38c30d2ce5de276d0c55fb6f
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2176929
Reviewed-by: Vedashree Vidwans <vvidwans@nvidia.com>
Reviewed-by: Debarshi Dutta <ddutta@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: Vedashree Vidwans <vvidwans@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Advisory Rule 2.7 states that there should be no unused
parameters in functions.
This patch removes unused function parameters from the following:
* nvgpu_channel_ctxsw_timeout_debug_dump_state()
* nvgpu_channel_destroy()
* nvgpu_tsg_destroy()
* nvgpu_rc_pdbma_fault()
Jira NVGPU-3178
Change-Id: I12ad0d287fd7980533663a9776428ef5d4fd1fb9
Signed-off-by: Scott Long <scottl@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2176066
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Rule INT31-C states the following
Casting "" from "" to "" without checking its value may result in lost
or misinterpreted data.
This is fixed by changing the BIT64 to BIT32 for constructing the
bitmask for ch->runlist_id variable as ch->runlist_id already belongs
to a u32 type.
Jira NVGPU-3881
Change-Id: Ie6c36ef9db995b76f6a8866783f7736e2e97721a
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2169133
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Add documentation for channel functions used in safety build.
Made nvgpu_channel_commit_va return void, as it can never fail.
Removed nvgpu_channel_recover prototype (function moved to rc unit).
Compile out refcount tracking definitions when
GK20A_CHANNEL_REFCOUNT_TRACKING is disabled.
Removed unused nvgpu_channel_set_ctx_mmu_error function.
Jira NVGPU-3588
Change-Id: Ia65a6c60ff30837230d81ca0e5f6dadafcc3af4e
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2159674
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
This change switches nvgpu_timeout_peek_expired() to return a bool
instead of an int to remove advisory rule MISRA 10.5 violations.
MISRA 10.5 states that the value of an expression should not be
cast to an inappropriate essential type.
JIRA NVGPU-3798
Change-Id: I5cf9badaf07493e11a639e47ae4cf221700134ff
Signed-off-by: Scott Long <scottl@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2155617
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
There was a name clash between the nvgpu_set_error_notifier*() APIs and
the SET_ERROR_NOTIFIER IOCTL. Therefore, the APIs were renamed from
nvgpu_set_error_notifier*() to nvgpu_set_err_notifier*(). This rename
was done to fix MISRA 5.x errors.
JIRA NVGPU-1633
Change-Id: I06af551a664b0706f106e853f1ea8733894f11bd
Signed-off-by: Adeel Raza <araza@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2159813
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
The following functions belong to the path of kernel_mode submit and
the flag CONFIG_NVGPU_KERNEL_MODE_SUBMIT is used to compile these out
of safety builds.
channel_gk20a_alloc_priv_cmdbuf
channel_gk20a_free_prealloc_resources
channel_gk20a_joblist_add
channel_gk20a_joblist_delete
channel_gk20a_joblist_peek
channel_gk20a_prealloc_resources
nvgpu_channel
nvgpu_channel_add_job
nvgpu_channel_alloc_job
nvgpu_channel_alloc_priv_cmdbuf
nvgpu_channel_clean_up_jobs
nvgpu_channel_free_job
nvgpu_channel_free_priv_cmd_entry
nvgpu_channel_free_priv_cmd_q
nvgpu_channel_from_worker_item
nvgpu_channel_get_gpfifo_free_count
nvgpu_channel_is_prealloc_enabled
nvgpu_channel_joblist_is_empty
nvgpu_channel_joblist_lock
nvgpu_channel_joblist_unlock
nvgpu_channel_kernelmode_deinit
nvgpu_channel_poll_wdt
nvgpu_channel_set_syncpt
nvgpu_channel_setup_kernelmode
nvgpu_channel_sync_get_ref
nvgpu_channel_sync_incr
nvgpu_channel_sync_incr_user
nvgpu_channel_sync_put_ref_and_check
nvgpu_channel_sync_wait_fence_fd
nvgpu_channel_update
nvgpu_channel_update_gpfifo_get_and_get_free_count
nvgpu_channel_update_priv_cmd_q_and_free_entry
nvgpu_channel_wdt_continue
nvgpu_channel_wdt_handler
nvgpu_channel_wdt_init
nvgpu_channel_wdt_restart_all_channels
nvgpu_channel_wdt_restart_all_channels
nvgpu_channel_wdt_rewind
nvgpu_channel_wdt_start
nvgpu_channel_wdt_stop
nvgpu_channel_worker_deinit
nvgpu_channel_worker_from_worker
nvgpu_channel_worker_init
nvgpu_channel_worker_poll_init
nvgpu_channel_worker_poll_wakeup_post_process_item
nvgpu_channel_worker_poll_wakeup_process_item
nvgpu_submit_channel_gpfifo_kernel
nvgpu_submit_channel_gpfifo_user
gk20a_userd_gp_get
gk20a_userd_pb_get
gk20a_userd_gp_put
nvgpu_fence_alloc
The following members of struct nvgpu_channel are compiled out of
safety build.
struct gpfifo_desc gpfifo;
struct priv_cmd_queue priv_cmd_q;
struct nvgpu_channel_sync *sync;
struct nvgpu_list_node worker_item;
struct nvgpu_channel_wdt wdt;
The following files are compiled out of safety build.
common/fifo/submit.c
common/sync/channe1_sync_semaphore.c
hal/fifo/userd_gv11b.c
Jira NVGPU-3479
Change-Id: If46c936477c6698f4bec3cab93906aaacb0ceabf
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2127212
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Safety build does not support vidmem. This patch compiles out vidmem
related changes - vidmem, dma alloc, cbc/acr/pmu alloc based on
vidmem and corresponding tests like pramin, page allocator &
gmmu_map_unmap_vidmem..
As vidmem is applicable only in case of DGPUs the code is compiled
out using CONFIG_NVGPU_DGPU.
JIRA NVGPU-3524
Change-Id: Ic623801112484ffc071195e828ab9f290f945d4d
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2132773
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Remove below hals since the corresponding functions are same on all
platforms and they are h/w independent
g->ops.gr.enable_ctxsw()
g->ops.gr.disable_ctxsw()
g->ops.gr.reset()
Call the functions directly at all places
Remove CONFIG_NVGPU_DEBUGGER from places where these functions are
called since they are not debugger dependent
This also helps to disable CONFIG_NVGPU_DEBUGGER and to keep recovery
sequence intact
Jira NVGPU-3506
Change-Id: Id2b208ca23dc4667e78edcd8ad242a8558e0ff64
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2137255
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vinod Gopalakrishnakurup <vinodg@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Add branch coverage for:
- nvgpu_channel_kill
- nvgpu_channel_close
Also, modified gk20a_free_channel as follows:
- use nvgpu_assert to check ch->g (so that it can be tested)
- make sure g is non-NULL before calling nvgpu_get_poll_timeout
Jira NVGPU-3480
Change-Id: Ie1fa231b022da47b9ef9022ae67a6b3d73c28a8b
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2129724
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>