On nvlink 2.2, we poll for sublink substate to be stable before checking
sublink primary state. Currently, we read both TX and RX sublink state
during set_sublink_mode() irrespective of which sublink mode is changed.
This is not correct when we are polling on substate value while getting
sublink state.
JIRA NVLINK-164
Change-Id: I474705f059dbf41e5fb7e45bef455c33ee21aa95
Signed-off-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1734539
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
bus_gk20a.c had some debug dump references to MC and GR registers.
The dumps have not been very useful, so instead of refactoring the
code just remove the dumps.
JIRA NVGPU-588
Change-Id: Id974731716d058ef4a3fe77240c11b1c53db169c
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1730891
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vinod Gopalakrishnakurup <vinodg@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Sysfs nodes for GR stats are created on GR init. If nvgpu
module is removed without any ops, then it tries to remove
sysfs nodes which do not exist resulting in kernel panic.
Fix this issue by removing sysfs nodes only if ecc counters
are initialized.
Bug 1987855
Change-Id: I3f967ee92ec02ad19ffbd9bfa8bace5bfd229dd2
Signed-off-by: Nitin Kumbhar <nkumbhar@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1730536
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
The vidmem shall be destroyed only if it has been
initialized. If not skipped, it accesses mutexes
which are in invalid state. This results in BUG like:
BUG: spinlock bad magic on CPU#0, rmmod/1560
Also, destroy vidmem bootstrap allocator which is
set up in nvgpu_vidmem_init().
Bug 1987855
Change-Id: I68e91422a54b40feeb9071158b797828e2391303
Signed-off-by: Nitin Kumbhar <nkumbhar@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1730535
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Earlier implementation of railgate disable config is disabling
runtime pm during pm_init. This is causing multiple issues:
1. gpu rail will be on as soon as nvgpu driver probe is called.
Actual gpu hw init may happen at much later point of time.
2. This is breaking railgate_enable sysfs node functionality.
railgate_enable is not working if runtime pm is disabled.
To avoid all these issues for railgate disable, enable runtime pm
during pm_init and set auto-suspend delay to negative (-1), which
will disable runtime pm suspend calls.
Also fixed following issues along with this:
1. Updated railgate_enable debugfs implementation to use auto-suspend delay.
To disable railgating:
Set auto-suspend delay with negative value(-1) which will disable runtime
pm suspend.
To enable railgating:
Set auto-suspend delay with railgate_delay value.
Also removed redundant user_railgate_disabled gk20a device data and
replaced with can_railgate, where ever it is applicable.
2. Initialized default railgate_delay to 500msec to avoid railgate
on/off transitions with railigate enable from disabled state.
3. Created railgate_residency debug fs node irrespective of can_railgate
initial state. This is helping with the case, where initial state of
railgate state off and then railgate enable is done through sysfs node.
Bug 2073029
Change-Id: I531da6d93ba8907e806f65a1de2a447c1ec2665c
Signed-off-by: seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1694944
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Each GPU platform has different DMA limitations. For older
chips the maximum size of a DMA buffer was more limited than
newer SoCs (read: Xavier) and discrete GPUs.
This patch adds support to set the DMA mask for a GPU on a
per platform basis by adding a platform field that is populated
with the maximum allowed DMA mask. That mask is programmed by
the driver common code. If no mask is specified then the
default mask size is 16GB (34 bits).
Bug 2043276
Change-Id: I9c3c76c86bac6c485eb1197326e662516fbcaa41
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1700980
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Before nvlink 2.2, driver was responsible for setting the NVLink clocks
during NVLink initialization. For the purpose of security, NVLink PLL
handling is moved to Minion in nvlink 2.2 and driver should stop writing
to these registers.
JIRA NVLINK-167
Change-Id: I18392a29c322da55053037bfde62c8f74ee75288
Signed-off-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1730597
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Bharat Nihalani <bnihalani@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
RXDET is supported only on nvlink 2.2 devices and forward.
Add HAL to run RXDET selectively based on chip. RXDET needs to be
done after the links are out of reset but before any other link
level initialization.
minion_send_cmd is also made non-static to support RXDET
functionality.
JIRA NVLINK-160
Change-Id: Ic65b8dbc7281743f62072089ff3c805521ac9b38
Signed-off-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1729525
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
We receive bundle with address and 64 bit values from ucode on some platforms
This patch adds the support to handle 64 bit values
Add struct av64_gk20a to store an address and corresponding 64 bit value
Add struct av64_list_gk20a to store count and list of av64_gk20a
Add API alloc_av64_list_gk20a() to allocate the list that supports 64bit
values
In gr_gk20a_init_ctx_vars_fw(), if we see NETLIST_REGIONID_SW_BUNDLE64_INIT,
load the bundle64 state into above local structures
Add new HAL gops.gr.init_sw_bundle64() and call it from gk20a_init_sw_bundle()
if defined
Also load the bundle for simulation cases in gr_gk20a_init_ctx_vars_sim()
Jira NVGPUT-96
Change-Id: I1ab7fb37ff91c5fbd968c93d714725b01fd4f59b
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1736450
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
In gk20a_gr_isr(), we handle various errors including GPC/TPC errors.
And then if BPT errors are pending we call gk20a_gr_post_bpt_events() at the
end and pass channel pointer to it
gk20a_gr_post_bpt_events() extracts TSG pointer based on ch->tsgid
But in some race conditions it is possible that we clear the error and trigger
recovery and as a result channel is unbounded from TSG and closed by user space
before calling gk20a_gr_post_bpt_events()
And in that case the code above results in getting incorrect TSG pointer and
hence crashes as below
Unable to handle kernel paging request at virtual address ffffff8012000c08
...
[<ffffff8008081f84>] el1_da+0x24/0xb4
[<ffffff80086e72e0>] gk20a_tsg_get_event_data_from_id+0x30/0xb0
[<ffffff80086e7560>] gk20a_tsg_event_id_post_event+0x50/0xc8
[<ffffff800872922c>] gk20a_gr_isr+0x27c/0x12e0
To fix this extract the TSG pointer before handling all the errors and pass
this pointer to gk20a_gr_post_bpt_events() will post the events if they are
enabled and if TSG is still open
Bug 200404720
Change-Id: I4861c72e338a2cec96f31cb9488af665c5f2be39
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1735415
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vinod Gopalakrishnakurup <vinodg@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Add g->fifo_eng_timeout_us to define engine timeout in microseconds.
It is initialized with GRFIFO_TIMEOUT_CHECK_PERIOD_US. In RM server
case, it can be overriden with value defined in device tree.
Jira EVLR-2674
Change-Id: I69ac2ce779fe575566c8ba48e8cd2d0e6b2d93cf
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1728391
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
On GV100, we could not enable reflck repeater at source of PLL
which is shared by link 0/1. So we do not allow link 0 and 1 to
be used on GV100. This refclk repeater is present only on GV100.
Remove the check as we currently use link3 on GV100 and do not
plan to use any other link.
JIRA NVLINK-162
Change-Id: I9ffcc0b20d084a208271d2c594ec64b5bafaabfb
Signed-off-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1734538
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
CAU (Counter Aggregation Unit) registers might be split out from SMPC registers
and moved into their own list on some platforms
In gr_gk20a_init_ctx_vars_fw() add support to check if pm_cau list is available
If list is available, count will be set to non-zero here
In add_ctxsw_buffer_map_entries_gpcs(), parse the pm_cau list if count is
non-zero
Bug 2139870
Change-Id: Ia630e7d03481a6f927c6739d28ebfe49f221326f
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1733208
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Matthew Braun (SW-GPU) <matthewb@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Add gk20a_fifo_profile_snapshot() to store the submit time in a
profiling entry that was acquired from gk20a_fifo_profile_acquire().
Also get rid of ifdef CONFIG_DEBUG_FS by stubbing the acquire and free
functions when debugfs is not enabled. This reduces some cyclomatic
complexity in the submit path.
Jira NVGPU-708
Change-Id: I39829a6475cfe3aa582620219e420bde62228e52
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1729545
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
The headers use u32 and bool but do not include <nvgpu/types.h>.
Moving around header includes exposed this issue in the cascade
builds.
This patch fixes the problem in all clock gating headers to
avoid this being a concern in the future.
Change-Id: Id56074df393d95bf65baf4062ac811d80d87e96b
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1729748
Reviewed-by: Bo Yan <byan@nvidia.com>
The QNX compiler seems to automatically link against this library
and as such the extra -lpthread is not necessary. Instead it
causes a link failure since the pthread library is not present.
JIRA NVGPU-525
Change-Id: Id5a6fcdffb067ed961665a3ee44a9d44301b725b
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1722157
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Integrity already typedefs these and complains if you override them
even with the same underlying type.
Since we only use these in the regops_gk20a.h header file (outside of
the Linux specific code, that is) this patch just changes the __uXX to
uXX. With that we can delete the now unnecessary __uXX defs.
JIRA NVGPU-525
Change-Id: I01dd2723b68db2170449342f73c711ee5a589adb
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1721186
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Instead of referencing the header from $NVGPU/drivers/gpu/nvgpu/common
reference it from $NVGPU/drivers/gpu/nvgpu. This makes the POSIX
compilation happy since we don't do a -Idrivers/gpu/nvgpu/common.
Not sure exactly why the regular kernbel build does this but it
probably should not.
Change-Id: I00aee373b651e3b7710669fa04c5b75fc1c814d9
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1727426
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vinod Gopalakrishnakurup <vinodg@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>