When an engine faults due to unbound instance block, all
active TSGs are currently aborted. This includes the TSG
used by vidmem-clear task to clear vidmem buffers. From
this point nvgpu_vidmem_clear cannot submit jobs anymore.
Define TSG in MM CE context as non-abortable, and skip it
when aborting active TSGs.
Bug 2486146
Change-Id: I221259aec468e8ee3a24e80fab8d8fb7ee8607b0
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2008954
(cherry picked from commit 6f2444dc5e128aa2b870796bd1e9dee7853f90af)
Reviewed-on: https://git-master.nvidia.com/r/2008942
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
gk20a_fifo_update_runlist_locked() and the vgpu counterpart do three
things:
1. find out whether there's a channel to add or remove (if not, the
whole runlist is just reconstructed or cleared),
2. reconstruct the runlist format for hardware (or the vgpu message),
3. actually update the runlist to hw and maybe wait for finish.
Split out the two first operations to separate functions to make the
code easier to understand. Now it's also clearer that the "add"
parameter behaves completely differently depending on whether the
channel pointer is NULL or not.
Also ignore (with a warning) channels not bound to a tsg. We shouldn't
get runlist updates on such channels. This simplifies the control flow a
bit.
Jira NVGPU-1309
Jira NVGPU-1922
Change-Id: I478c33eb2bd154c05a6c8c4148e4fea528a39a3e
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2007473
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Add error prints for DMA alloc failures. This way when there is a
DMA alloc failure the failure is clear. Without this it's hard to
know what exactly is causing any given -ENOMEM issue or what the
specifics of said ENOMEM case are.
Change-Id: Ia535895ae07bc1704edaed564edbb6f6dfbf6518
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1976441
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vinod Gopalakrishnakurup <vinodg@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Instead of using kmalloc() to get a buffer for storing the
computed flags string during DMA debugging use a stack
buffer. This removes the need for a kmalloc() call. The
problem with kmalloc() is that if a dma_alloc() fails due
to being out of memory the kmalloc may likely fail, too!
Also simplify the logic now that there's no need to do
any error checking for a kmalloc() call.
Change-Id: I45c1fd16658212188a1206a2edf17b28f3c06c9e
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1976440
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vinod Gopalakrishnakurup <vinodg@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
- Remove handling for channels that are no more bound to tsg
as channel could be referenceable but no more part of a tsg
- Use tsg_gk20a_from_ch to get pointer to tsg for a given channel
- Clear unhandled gr interrupts
Bug 2429295
JIRA NVGPU-1580
Change-Id: I9da43a2bc9a0282c793b9f301eaf8e8604f91d70
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1972492
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Move code involved in nvlink interrupt and error handling and
initialization into a separate unit under subelement 'nvlink'.
Add g->ops.nvlink.intr_err ops to allow other units to access
the APIs exposed by this unit.
JIRA NVGPU-1813
Change-Id: I2d90cf1394faa0692630514b6a3cea15f5e105ae
Signed-off-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1997732
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
-Earlier, with DMEM queue, if command needs in/out payload
then space needs to be allocated in DMEM/FB-surface &
copy payload in allocated space before sending command
by providing payload info in sending command .
-With FBQ, command in/out payload is also part of FB command
queue element & not required to allocate separate space in
DMEM/FB-surface, so added changes to handle FBQ payload request
while sending command & also in response handler to extract
data from out payload.
JIRA NVGPU-1579
Bug 2487534
Change-Id: If3ca724c2776dc57bf6d394edf9bc10eaacd94f9
Signed-off-by: Mahantesh Kumbar <mkumbar@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2004021
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
-Added NVGPU_SUPPORT_PMU_RTOS_FBQ feature to enable
FBQ support.
-Add support to read PMU RTOS init message from
FBQ message queue to process init message &
construct FBQ for further communication
with PMU RTOS ucode.
-Added functions to init FB command/message queues
as per init message inputs from PMU RTOS ucode.
JIRA NVGPU-1578
Bug 2487534
Change-Id: Ib6c2b26af3927339bd4fb25396350b3f4d222737
Signed-off-by: Mahantesh Kumbar <mkumbar@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2004020
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
-FBQ(command/message queue) will be part of super surface
which will reside in FB.
-FBQ access should happen using falcon generic queue functions
so added FBQ related functions under flacon queue as needed
to support FBQ.
-Additional FBQ related public functions exposed as command buffer is
constructed in sysmem buffer which will be copied to actual FBQ element
buffer & also, payload is part of queue element which needs some queue
parameters access from client
JIRA NVGPU-1577
Bug 2487534
Change-Id: I1b9d9d999e73af3dabe54240c16676ff8d0c21fa
Signed-off-by: Mahantesh Kumbar <mkumbar@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2004019
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-misra-checker <svc-misra-checker@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
gv11b_mm_l2_flush was not checking error codes from the various
functions it was calling. MISRA Rule-17.7 requires the return value
of all functions to be used. This patch now checks return values and
propagates the error upstream.
JIRA NVGPU-677
Change-Id: I9005c6d3a406f9665d318014d21a1da34f87ca30
Signed-off-by: Nicolas Benech <nbenech@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1998809
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Make a physical nvgpu_mem implementation in the common code. This
implementation assumes a single, contiguous, physical range. GMMU
mappability is provided by building a one entry SGT.
Since this is now "common" code the original Linux code has been
moved to commom/mm/nvgpu_mem.c.
Also fix the '__' prefix in the nvgpu_mem function. This is not
necessary as this function, although somewhat tricky, is expected
to be used by arbitrary users within the nvgpu driver.
JIRA NVGPU-1029
Bug 2441531
Change-Id: I42313e5c664df3cd94933cc63ff0528326628683
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1995866
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
In the below scenario
1. nvrm app requests & gets all VF points from nvgpu.
2. nvrm stores all the VF points and starts setting each point.
3. During step 2, VF gets updated in nvgpu due to some events.
4. There is a mismatch b/w points in nvrm and VF table in nvgpu.
5. If nvrm freq is less than nvgpu freq , PMU cant program.
Makesure highest between nvrm and VF table goes to PMU
Bug 200454682
Change-Id: I9c58f129ff1c0dfb3f4759242469b3622fe11bb2
Signed-off-by: Abdul Salam <absalam@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2000238
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Add new hal to log the mme exception register information. Support
added for Turing only. On mme exception interrupt, read the
mme_hww_esr register and log the error based on esr register bits.
JIRA NVGPU-1241
Change-Id: Ied3db0cc8fe6e2a82ecafc9964875e2686ca0d72
Signed-off-by: Vinod G <vinodg@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2005807
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Move the code involved in nvlink register initialization into a
separate unit called "nvlink_device_reginit".
Nvlink device_reginit will be an unit under component nvlink_init.
TLC buffer credit initialization is done by this unit.
JIRA NVGPU-1784
Change-Id: I9dd4238d0288b33867eb8a8993e56287a67a907f
Signed-off-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1994665
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
1. The nvlink code in common/ is clean from any external API usage.
There should not be any compilation issues with POSIX build if we
include nvlink.c to it.
2. Rename the nvlink file in POSIX build to avoid the tmake
duplicate filename issue.
3. Set CONFIG_TEGRA_NVLINK for POSIX to enable reporting of MISRA
violations in nvlink code by the MISRA scanner.
4. To fix the build issues:
a. Add stubs in POSIX
b. Return the 'err' variable set during dev_shutdown() as 'err'
is set but not used.
JIRA NVGPU-1921
JIRA NVGPU-1319
Change-Id: Ifdd6574d772167856782bafa74994507b3cedf4c
Signed-off-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2005622
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
MISRA Rule-17.7 requires the return value of all functions to be used.
Fix is either to use the return value or change the function to return
void. This patch contains fix for all 17.7 violations in gm20b files.
JIRA NVGPU-677
Change-Id: I63182d52213494f871c187b5efc1637bc36bdf3d
Signed-off-by: Nicolas Benech <nbenech@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2003230
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
With PREEMPT_RT kernel, regular spinlocks are mapped onto sleeping
spinlocks (rt_mutex locks), and raw spinlocks retain their behaviour.
Schedule while atomic can occur in gk20a_channel_timeout_start,
as it acquires ch->timeout.lock raw spinlock, and then calls
functions that acquire ch->ch_timedout_lock regular spinlock.
Bug 200484795
Change-Id: Iacc63195d8ee6a2d571c998da1b4b5d396f49439
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2004100
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
The commit 3fdd8e38b2 ("gpu: nvgpu: Use our own vmap() for
coherent DMA buffers") added an NVGPU_DMA_NO_KERNEL_MAPPING
flag for coherent chips to work around a memory mapping bug
suspiciously from DMA API.
However, this requires dma-mapping code of ARM64 to support
a legacy DMA_NO_KERNEL_MAPPING attribute for DMA allocation,
which will not likely get upstreamed -- it is not long-term
sustainable. So the plan is to remove this flag from ARM64
part.
The results of 3D benchmarks and GVS sanity tests show that
the system has no regressions in stability, and no mapping
issue being observed after removing this WAR. In case that
GPU code encounters mapping issue in the future, we should
fix from the general DMA API side instead.
Bug 2424160
Change-Id: Ice91f2b2c924beb2f83762cb02efbd53fe7df1c0
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2001294
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Move the code involved in nvlink probe sequence into a separate
unit called "nvlink_probe"
nvlink probe code is spread over both the common and OS specific
nvlink files.
Nvlink Probe unit would encompass code needed to initialize the
nvlink software state. Nvlink software initialization involves:
1. Allocate memory for nvlink_device and nvlink_link structs
2. Read the device tree pci node to know about nvlink topology
3. Initialize nvlink function pointers needed by Tegra nvlink
core-driver
4. Register nvlink_device and nvlink_link with the core-driver.
nvlink probe returns -ENODEV when nvlink is not supported.
Nvlink is not supported in two cases:
1. There is no nvlink IP on the Tegra SoC which is denoted by
CONFIG_TEGRA_NVLINK or
2. The pci device tree node does not have "nvidia,nvlink" child
node needed to describe nvlink topology.
Any negative return value other than -ENODEV denotes failure in
execution of nvlink probe.
JIRA NVGPU-1783
Change-Id: I50011b25d88d8cc01569caac7895abe32ee38215
Signed-off-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1994619
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
A naked channel ID does not carry good information about the channel
validity and is a very low level construct for an API of this level.
Refactor the runlist updating fifo APIs to take a channel pointer.
While at it, delete the channel and wait_for_finish parameters from
gk20a_fifo_update_runlist_ids() - the only caller is suspend and resume
and the parameters were always null for channel and true for wait.
Jira NVGPU-1309
Jira NVGPU-1737
Change-Id: Ied350bc8e482d8e311cc708ab0c7afdf315c61cc
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1997744
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
sema cmdbuf specific functions are only for the sync functionality
of nvgpu and donot belong to fifo.
construct files sema_cmdbuf_gv11b.h and sema_cmdbuf_gv11b.c
under common/sync to contain the syncpt specific cmdbuf functions
for arch gv11b.
Jira NVGPU-1308
Change-Id: I440847e8b996e0956d81fe6cdde331937deda40e
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1975923
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
sema cmdbuf specific functions are only for the sync functionality
of nvgpu and do not belong to fifo.
construct files sema_cmdbuf_gk20a.h and sema_cmdbuf_gk20a.c
under common/sync to contain the syncpt specific cmdbuf functions
for arch gk20a.
Jira NVGPU-1308
Change-Id: Iebeebe7a3de627f2de08d4ced74bb1aabf1eb53c
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1975922
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>