Add new HAL gops.mc.isr_nonstall() to handle nonstall interrupts
We already handle nonstall interrupts in nvgpu_intr_nonstall()
But this API is completely in linux specific code
Separate out os-independent code to handle nonstall interrupts in new API
mc_gk20a_isr_nonstall() and set it to HAL gops.mc.isr_nonstall() for all
existing chips
Call this HAL from nvgpu_intr_nonstall()
Jira NVGPUT-8
Change-Id: Iec6a56db03158a72a256f7eee8989a0a8a42ae2f
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1706589
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Implement a worker thread to replace the workqueue based
handling for clk arbiter update callbacks.
The work items scheduled with the thread are of two types,
update_vf_table and arb_update. Previously, there were two
workqueues for handling these two work struct's separately.
Now the single worker thread would process these two events.
If a work item of a particular type is scheduled to be run
on the worker, another instance of same type won't be
scheduled, which mirrors the linux workqueue behavior.
This also removes dependency on linux workqueues/work struct
and makes code portable to be used by QNX also.
Jira VQRM-3741
Change-Id: Ic27ce718c62c7d7c3f8820fbd1db386a159e28f2
Signed-off-by: Sourab Gupta <sourabg@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1706032
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
RCU's are available only in (linux) kernel. Though they are
able to achieve lockless access in some specific scenarios,
they are heavily dependent on the kernel for their functionality.
E.g. synchronize_rcu(), which depends on the kernel in order to
delimit read side critical sections.
As such it is very difficult to implement constructs analogous
to RCUs in userspace code. As a result the code which depends on
RCU's for synchronization is not portable between OS'es,
especially if one of them is in userspace, viz. QNX.
Also, if the code is not in performance critical path, we can do
with non-RCU constructs.
For clk arbiter code here, RCU's are replaced by the traditional
spinlocks, so that the code could be used by QNX down the line.
Jira VQRM-3741
Change-Id: I178e5958788c8fd998303a6a94d8f2f328201508
Signed-off-by: Sourab Gupta <sourabg@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1705535
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
In order to enable the movement of clk arbitrator to common
code, we need to remove the NVGPU_GPU_EVENT_* defines (which
are present in uapi) and instead use the common code defines.
Add a conversion function for the same.
With this the uapi header is no longer required to be included
inside clk_arb.c
Jira VQRM-3741
Change-Id: If01614b01733876046f98b97e70285c52bc33e45
Signed-off-by: Sourab Gupta <sourabg@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1699241
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
- total DMA memory allocation is currently tracked by adding page aligned
size of nvgpu_mem
- The sequence is roughly as follows:
- total dma memory used += mem->aligned_size
- mem->aligned_size = PAGE_ALIGN(size)
- In above sequence, nvgpu_mem structure is initially zero when it is added
to total dma memory used after which it is assigned page aligned value
- This patch fixes total dma memory usage tracking.
Change-Id: Ibb879c8d38ae9077c3d198d9bb008a72e9208b4d
Signed-off-by: Deepak Bhosale <dbhosale@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1685312
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Clk arbiter code contains two significant portions -
the one which interacts with userspace and is OS specific,
and the other which does the heavylifting work which can
be moved to the common OS agnostic code.
Split the code into two files in prep towards refactoring
the clk arbiter.
Jira VQRM-3741
Change-Id: I47e2c5b18d86949d02d6963c69c2e2ad161626f7
Signed-off-by: Sourab Gupta <sourabg@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1699240
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Cache the rate used in clk_set_rate().
Return that cached rate on clk_get_rate(), don't read from hardware.
This cached rate is used to avoid duplicate requests to clk_set_rate().
Motivation is to support multiple governors for gpu clk.
Reading clock from hardware is unreliable in multi-governor situation.
Relying on hardware clock value could mislead the kernel gpu governor
in its scaling calculations.
Bug 2051688
Change-Id: I43fc056eea6f69fe0889c45640fcb892b658071c
Signed-off-by: Arun Kannan <akannan@nvidia.com>
(cherry picked from commit 7f819a9ba7)
Reviewed-on: https://git-master.nvidia.com/r/1662759
Reviewed-on: https://git-master.nvidia.com/r/1668919
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
This patch deals with cleanups meant to make things simpler for the
upcoming os abstraction patches for the sync framework. This patch
causes some substantial changes which are listed out as follows.
1) sync_timeline is moved out of gk20a_fence into struct
nvgpu_channel_linux. New function pointers are created to facilitate os
independent methods for enabling/disabling timeline and are now named
as os_fence_framework. These function pointers are located in the struct
os_channel under struct gk20a.
2) construction of the channel_sync require nvgpu_finalize_poweron_linux()
to be invoked before invocations to nvgpu_init_mm_ce_context(). Hence,
these methods are now moved away from gk20a_finalize_poweron() and
invoked after nvgpu_finalize_poweron_linux().
3) sync_fence creation is now delinked from fence construction and move
to the channel_sync_gk20a's channel_incr methods. These sync_fences are
mainly associated with post_fences.
4) In case userspace requires the sync_fences to be constructed, we
try to obtain an fd before the gk20a_channel_submit_gpfifo() instead of
trying to do that later. This is used to avoid potential after effects
of duplicate work submission due to failure to obtain an unused fd.
JIRA NVGPU-66
Change-Id: I42a3e4e2e692a113b1b36d2b48ab107ae4444dfa
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1678400
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
As part of debug session unification following changes are
required.
-Including bug.h header file to fix the compilation issue
on QNX
- The mechanism of posting debug events is OS specific. In Linux
this works through poll fd, wherein we can make use of nvgpu_cond
variables to poll and trigger the corresponding wait_queue
via nvgpu_cond_broadcast_interruptible() call.
The post event functionality on QNX doesn't work on poll though.
It uses iofunc_notify_trigger to post the debug events to calling
process. As such QNX can't work with nvgpu_cond's.
To overcome this issue, it is proposed to create a OS specific
interface for posting debugger events. Linux can call
nvgpu_cond_broadcast_interruptible() in its implementation, which
makes sense since these are already initialized and poll'ed in the
Linux specific code only.
QNX can implement this interface to call iofunc_notify_* functions,
as per its need
Jira VQRM-2363
Change-Id: I0abdc0787f771040b8aff5384290d7e6549f81fb
Signed-off-by: Sourab Gupta <sourabg@nvidia.com>
Signed-off-by: Prateek Sethi <prsethi@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1696368
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-by: Richard Zhao <rizhao@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Currently, hyp_read_ipa_pa_info() only translates IPA for RAM
mappings. It fails for MMIO mappings. In particular, it will
fail when attempting to translate addresses in the syncpoint
shim aperture. As a workaround, assume 1:1 IPA to PA mapping
when hyp_read_ipa_pa_info fails, and address is in syncpt
shim aperture.
Bug 2096877
Change-Id: I5267f0a8febf065157910ad3408374cacd398731
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1687796
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Richard Zhao <rizhao@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
linux driver runs in user's process but qnx driver has dedicate driver
process, so they have different way to get user pid. nvgpu common code
expect calls from os specific code pass pid/tid.
ce/cde open channel for internal use, we use driver pid.
Jira VQRM-3534
Change-Id: I892372ac5f1dc4d25f9928d16992bcc659d12a56
Signed-off-by: Richard Zhao <rizhao@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1694145
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
The submit gpfifo flags are splattered everywhere inside the nvgpu
code. Though the usage is inside nvgpu Linux code only, still it
needs to be gotten rid of and replaced with the defines
present in common code.
VQRM-3465
Change-Id: I901b33565b01fa3e1f9ba6698a323c16547a8d3e
Signed-off-by: Sourab Gupta <sourabg@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1691979
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Remove the usage of nvgpu_gpfifo splattered across nvgpu,
and replace with a struct defined in common code.
The usage is still inside Linux, but this helps the
subsequent unification efforts, e.g. to unify the submit
path.
VQRM-3465
Change-Id: I9e5ac697a0c7f85239ddba319085c09481d20d6b
Signed-off-by: Sourab Gupta <sourabg@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1691978
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Remove the usage of nvgpu_fence splattered across nvgpu,
and replace with a struct defined in common code.
The usage is still inside Linux, but this helps the
subsequent unification efforts, e.g. to unify the submit
path.
VQRM-3465
Change-Id: Ic3737450123dfc5e1c40ca5b6b8d8f6b3070aa0d
Signed-off-by: Sourab Gupta <sourabg@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1691977
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
- Enabled internal temperature sensor read for gv100
dgpu.
- Added check to temperature read support before
proceeding to read temperature from H/W
- Assigned GP106 temperature HAL's for GV100 as no changes
between GP106 & GV100 H/W registers.
Bug 200352328
Change-Id: I86b5a1859b87ace49a07d0ff3749bb5b085bba91
Signed-off-by: Mahantesh Kumbar <mkumbar@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1673347
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
The pmgr code is in theory common code. However there were uses
of Linux stuff within this code.
This patch cleans that up by deleting the unnecessary os_linux.h
includes, usage of kfree() and adds several platform fields to
the gk20a struct. The platform data is copied to the gk20a struct
in the platform initialization code so that this common code can
access said data without requiring any knowledge of the OS platform
data.
JIRA NVGPU-525
Change-Id: Ic4bb6021f60b0a0778779ab5f3e15b7e5ca98306
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1673825
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
The patch defines 'struct nvgpu_gpfifo_args' to be filled
by alloc_gpfifo(_ex) ioctls and passed to the
gk20a_channel_alloc_gpfifo function. This is required as a
prep towards having the usermode submission support in the
core channel core.
Change-Id: I72acc00cc5558dd3623604da7d716bf849f0152c
Signed-off-by: Sourab Gupta <sourabg@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1683391
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
BIT() is defined as returning a 64-bit value. We use it to create the
log mask values, but the functions that accept log mask take only
u32 as parameter.
Use u64 as log mask parameter for the logging functions to match the
sizes.
Change-Id: I6f0803a7d04ee6a2ee725b5defc4cc14b5b7acf5
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1683818
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
When runtime pm is disabled, then gpu rail will be on as soon as
nvgpu module is loaded. If pm suspend/resume called before gpu
hw initialization(g->poweron = false) then pm suspend is skipping
gpu railgate, which is causing issues with SC7 entry/exit.
To fix this issue:
1. During pm suspend, if g->poweron is false, check for runtime pm
disable to railgate gpu rail.
2. On pm resume, check for runtime pm disable to enable gpu rail,
though gpu driver not initialized.
Bug 2073029
Change-Id: I7631109d79cda5882d2864557f1b7b3d2d89c9f6
Signed-off-by: seshendra Gadagottu <sgadagottu@nvidia.com>
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1679010
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Accept submits on deterministic channels even when the prefence is a
syncfd, but only if it has just one fence inside.
Because NVGPU_SUBMIT_GPFIFO_FLAGS_SYNC_FENCE is shared between pre- and
postfences, a postfence (SUBMIT_GPFIFO_FLAGS_FENCE_GET) is not allowed
at the same time though.
The sync framework is problematic for deterministic channels due to
certain allocations that are not controlled by nvgpu. However, that only
applies for postfences, yet we've disallowed FLAGS_SYNC_FENCE for
deterministic channels even when a postfence is not needed.
Bug 200390539
Change-Id: I099bbadc11cc2f093fb2c585f3bd909143238d57
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1680271
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
MAX/threshold value of user managed syncpoint is not tracked by nvgpu
So if channel is reset by nvgpu there could be waiters still waiting on some
user syncpoint fence
Fix this by setting a large safe value to user managed syncpoint when aborting
the channel and when closing the channel
We right now increment the current value by 0x10000 which should be sufficient
to release any pending waiter
Bug 200326065
Jira NVGPU-179
Change-Id: Ie6432369bb4c21bd922c14b8d5a74c1477116f0b
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1678768
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Delete the proxy waiter for non-semaphore-backed syncfds in sema wait
path to simplify code, to remove dependencies to the sync framework (and
thus Linux) and to support upcoming refactorings. This feature has never
been used for actually foreign fences.
Jira NVGPU-43
Jira NVGPU-66
Change-Id: I2b539aefd2d096a7bf5f40e61d48de7a9b3dccae
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1665119
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
get_cycles is a linux specific API used in common code. This API
is being used, it seems, as a method to generate time stamps. So
add an API to generate 'high resolution' time stamps. This API
returns an opaque time stamp: that is not something one may use
directly as a time since in the Linux implementation we just use
this cycle counter.
Other implementations will, of course, be free to implement as a
real time stamp.
JIRA NVGPU-525
Change-Id: I237aac9bd6c795d000459025bdb4fce92e8aaa3d
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1673811
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>