- Patch updates the ZBC table values as per the POR values for safety
build.
- Fix the color table default values initialization for standard build
which was being done in floating point format for CROP while it should
be in FB format. As per the documentation "CROP ZBC table should be
programmed exactly the way the L2 table is programmed".
Bug 3585766
Change-Id: I47d11b6a230189ee0c818f850d36b93c0aea0e54
Signed-off-by: prsethi <prsethi@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2724935
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Shashank Singh <shashsingh@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
GVS: Gerrit_Virtual_Submit
This patch defines the IOCTL NVGPU_TSG_IOCTL_READ_ALL_SM_ERROR_STATES
to read the error states for all the SMs.
The corresponding input parameter is num_sm (number of SM error states to be read) and output is a list of error states for all the SMs.
Bug 200468220
Signed-off-by: Jinesh Parakh <jparakh@nvidia.com>
Change-Id: Iaf926b72d900a6c8f978fa034c20d76e482eb13f
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2717313
Reviewed-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-by: Sami Kiminki <skiminki@nvidia.com>
Reviewed-by: Sandarbh Jain <sanjain@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
GVS: Gerrit_Virtual_Submit
There are some configs which are set for the stable kernel and
it is identified from the NV_BUILD_KERNEL_OPTIONS.
The stable kernel build nvgpu as out-of-tree module and
pass the environment config CONFIG_TEGRA_OOT_MODULE during
build.
Hence, it is not required to use the NV_BUILD_KERNEL_OPTIONS to
identify the kstable build. It uses CONFIG_TEGRA_OOT_MODULE for
setting the configs for build as module.
Bug 3652905
Change-Id: I6570760e91ca98a4c83d7691fad517b2c772e629
Signed-off-by: Laxman Dewangan <ldewangan@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2720729
Reviewed-by: Jonathan Hunter <jonathanh@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
GVS: Gerrit_Virtual_Submit
vidmem buffers are using fd as buffer handles and we need to allocate more
than 1024 fds. tegra_alloc_fd was exported by TEGRA_MC driver that allowed
allocating more than 1024 fds, however that function is to be removed from
that driver.
Hence use now kernel exported function __alloc_fd directly from nvgpu.
This is currently to be used only for dgpu on downstream kernel 5.10.
Bug 3535321
Change-Id: I10cfc41a6439f07309cda9eb2f22746f3fbac996
Signed-off-by: Dinesh T <dt@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2702794
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-by: Sachin Nikam <snikam@nvidia.com>
Tested-by: Sagar Kamble <skamble@nvidia.com>
GVS: Gerrit_Virtual_Submit
This patch creates a sysfs node which is easier to parse than
the existing mig_mode_config_list node. This new node outputs
in the following format:
active: -1
num_configs: 2
num_instances: 2
id:000000000001 gr:000000000000 gpc:0003
id:000000000002 gr:000000000004 gpc:0003
num_instances: 3
id:000000000001 gr:000000000000 gpc:0003
id:000000000005 gr:000000000004 gpc:0002
id:000000000013 gr:000000000006 gpc:0001
Bug 200740852
Change-Id: I8a3d4425ccb88dd4e58bbe1908e0f7cc577ff191
Signed-off-by: Martin Radev <mradev@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2704349
Reviewed-by: Alex Waterman <alexw@nvidia.com>
GVS: Gerrit_Virtual_Submit
To be able to access the full physical memory range, gpu's dma_mask
needs to be set to the max value of H/W compatible range.
For example. In order to support from 2GB to 66 GB, GV11B's dma_mask
needs to be atleast 37 bits. Set GV11B's dma_mask to 38 bit
and T23X's dma_mask to 39 bit. These values are supported by H/W
Bug 3656729
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Change-Id: Icfff3c36a8c9cf074a254fa773c42e18020ae5de
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2723640
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Brad Griffis <bgriffis@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: Brad Griffis <bgriffis@nvidia.com>
When MMU fault happens, if the id_type = 1, that means
fault happened in TSG. So in that path we set the error
notifier and let userspace know about faulty channel.
During this, we check if debugger is attached or not by
reading gr_gpc0_tpc0_sm0_dbgr_control0_r() register.
During this time ELPG is enabled and this read causes
IDLE SNAP error for ELPG.
To resolve this, move CG/PG disable function call
early in fifo recover code path. This ensures that
ELPG is disabled early before any read happens for any
GR register.
Bug 3660592
Change-Id: Ie5d01b7ccf00167b58f260e9142aa5deb2a08be4
Signed-off-by: Divya <dsinghatwari@nvidia.com>
(cherry picked from commit f09e429f2d142c20529bedc05acf193805e1bb25)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2720655
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Mahantesh Kumbar <mkumbar@nvidia.com>
GVS: Gerrit_Virtual_Submit
To add GL/VK support for shader debugging via the SM trap handler
functionality, a write operation to the following PRI registers need to
be allowed in all chips (ga10b, gv11b, gm20b, gp10b):
- NV_PGRAPH_PRI_GPCS_MMU_DEBUG_CTRL
- NV_PGRAPH_PRI_GPCS_TPCS_SM_SCH_MACRO_SCHED
- NV_PGRAPH_PRI_GPCS_TPCS_SMS_DBGR_CONTROL0
- NV_PGRAPH_PRI_GPCS_TPCS_SMS_HWW_WARP_ESR_REPORT_MASK
- NV_PGRAPH_PRI_GPCS_TPCS_SMS_HWW_GLOBAL_ESR_REPORT_MASK
In this patch, we are adding the above registers into allowlist, if they
were absent. Note that these registers included only in non-safety using
CONFIG_NVGPU_SET_FALCON_ACCESS_MAP flag.
Bug 3642131
Change-Id: I5f62731944b6b3e059afa80a491c3cf5c3656f60
Signed-off-by: Rajesh Devaraj <rdevaraj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2715799
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: Christopher Lentini <clentini@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: Christopher Lentini <clentini@nvidia.com>
Patch defines a ZBC static table and configure it at sw layer. Later
existing API read this sw configuration and program it to hw.
This is applicable only for ga10b safety build and for other chips/
configuration it will be supported in the legacy way.
Bug 3585766
Change-Id: I00d79162c0b096616e3f555da965e82e47c014d1
Signed-off-by: prsethi <prsethi@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2713821
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Update HAL function runlist.write_state to skip in-active
runlists in fifo.runlists. It is possible for one or more
engines to be floorswept in which case their associated
runlist will be in-active, example, if host supports
3 runlists(0, 1, 2) each serving 3 engines(0, 1, 2), and
engine-1 is floorswept, then runlist-1 becomes in-active and
the entry fifo->runlists[1] will be set to NULL.
Bug 3650588
Change-Id: Iaf9d75e310903c47b842e84dcfa2209d9fe7da96
Signed-off-by: Antony Clince Alex <aalex@nvidia.com>
(cherry picked from commit e29a2019cf8f4796737c670f98164f7783448d49)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2717075
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
GVS: Gerrit_Virtual_Submit
Background: In case of a deferred suspend implemented by gk20a_idle,
the device waits for a delay before suspending and invoking
power gating callbacks. This helps minimize resume latency for any
resume calls(gk20a_busy) that occur before the delay.
Now, some APIs spread across the driver requires that if the device
is powered on, then they can proceed with register writes, but if its
powered off, then it must return. Examples of such APIs include
l2_flush, fb_flush and even nvs_thread. We have relied on
some hacks to ensure the device is kept powered on to prevent any such
delayed suspension to proceed. However, this still raced for some calls
like ioctl l2_flush, so gk20a_busy() was added (Refer to commit Id
dd341e7ecbaf65843cb8059f9d57a8be58952f63)
Upstream linux kernel has introduced the API pm_runtime_get_if_active
specifically to handle the corner case for locking the state during the
event of a deferred suspend.
According to the Linux kernel docs, invoking the API with
ign_usage_count parameter set to true, prevents an incoming suspend
if it has not already suspended.
With this, there is no longer a need to check whether
nvgpu_is_powered_off(). Changed the behavior of gk20a_busy_noresume()
to return bool. It returns true, iff it managed to prevent
an imminent suspend, else returns false. For cases where
PM runtime is disabled, the code follows the existing implementation.
Added missing gk20a_busy_noresume() calls to tlb_invalidate.
Also, moved gk20a_pm_deinit to after nvgpu_quiesce() in
the module removal path. This is done to prevent regs access
after registers are locked out at the end of nvgpu_quiesce. This
can happen as some free function calls post quiesce might still
have l2_flush, fb_flush deep inside their stack, hence invoke
gk20a_pm_deinit to disable pm_runtime immediately after quiesce.
Kept the legacy implementation same for VGPU and
older kernels
Jira NVGPU-8487
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Change-Id: I972f9afe577b670c44fc09e3177a5ce8a44ca338
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2715654
Reviewed-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
GVS: Gerrit_Virtual_Submit
nvgpu_pmu_rpc_execute takes pmu rpc header address and dereferences
it at address past header based on rpc struct that the header is
part of.
This usage of pointer is not right and confuses CERT checker.
Instead, pass the rpc struct address as char pointer and use
as header or rpc struct as per need.
CID 17141
CID 154223
CID 17557
CID 154226
CID 153904
CID 153926
CID 153929
CID 153925
CID 153925
CID 225346
CID 225355
CID 225356
CID 225360
CID 225361
CID 225365
CID 225367
CID 296735
CID 330244
CID 17557
Bug 3512546
Change-Id: I93b154d4321e75c0d2b41f43d7c2b701682962a3
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2710224
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Mahantesh Kumbar <mkumbar@nvidia.com>
Reviewed-by: Sachin Nikam <snikam@nvidia.com>
GVS: Gerrit_Virtual_Submit
Below print is misleading and seems like an error.
[INFO] Missing support-gpu-tools property, ret =-22
'support-gpu-tools' property was added to allow disabling debugger
features on prod boards. The debugger/profiler support will be
enabled by default, even if the property is missing.
Make the INFO print conditional, more informational and less
dramatic.
Bug 3539518
Change-Id: I5fc50df30be23e1fd1ecc06282a0d50f3ca7ac64
Signed-off-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2668464
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
This is setting evict_max_ways for L2 cache to the maximum
supported value for safety.
In normal build L2 cache MAX_EVICT_LAST is configure via
KMD and RegOps. RegOps is enabled only on standard build
with CONFIG_DEBUGGER flag. This method we cant use it for
safety build. Safety we can make use of the patch buffer
to patch the register while creating the context.
JIRA NVGPU-8227
Change-Id: Iec5d73197239b9cad31c6b593ca2b87c224aad5e
Signed-off-by: Dinesh T <dt@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2708702
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Usage of nvgpu_safe_add_u32 to increment nvgpu maintained corrected
ecc error counters can lead to BUG due to overflow as corrected ecc
errors can keep coming in and system will continue to operate.
In some configurations, uncorrected error counters can also
overflow and lead to BUG.
Increment these counters and their delta calculations to use
nvgpu_wrapping_add_u32.
JIRA NVGPU-7054
Change-Id: I85ddddfa46062744cccbe0756ad942787e72f01b
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2601152
(cherry picked from commit f016e59189d2bd66e23f17ccb638f6d384b82fbd)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2623638
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
In current form, the default domain acts like any schedulable
domain. TSGs are bound to it and it can be enumerated via the
public interfaces.
The new expectation for the default domain is meant to change
from the current form to a pseudo domain that cannot act like
an ordinary domain in other ways, i.e. it must not be reachable
by in particular the domain management API, it can't be removed,
does not show up in lists, and TSGs cannot be explicitly bound to
this domain. It won't participate in round-robin domain scheduling.
It is not really a domain, and acts like one only when activated in
the manual mode.
Following changes are made overall to support the above change in
definition.
1) Domain creation and attaching the domain to the scheduler are now
split into two separate functions. The new default domain (having ID
= UINT64_MAX) is created separately from a static function without
linking it with other domains in the scheduler.
2) struct nvgpu_nvs_scheduler explicitely stores the default domain
to support direct lookups.
3) TSGs are initially not bound to default domain/rl_domain.
Jira NVGPU-8165
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Change-Id: I916d11f4eea5124d8d64176dc77f3806c6139695
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2697477
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
In order to support the concept of the default domain, a new
rl domain is created that shadows all the other domains i.e.
all channels of all TSGs are replicated here. This is scheduled
by default during GPU boot.
1) The shadow rl_domain is constructed during poweron sequence via
nvgpu_runlist_alloc_shadow_rl_domain(). struct nvgpu_runlist
is appended to store this separately as 'shadow_rl_domain'.
This is scheduled in background as long as no other user created
rl domains exist.
2) 'shadow_rl_domain' is scheduled out once user created rl domain
exist. At this point, any updates in the user created rl domains
are synchronized with the 'shadow_rl_domain'. i.e. 'shadow_rl_domain'
is also reconstructed to contain active channels and tsgs from the rl
domain.
3) 'shadow_rl_domain' is scheduled back in when the last user created
rl domain is removed.
4) In future for manual mode, driver shall support explicitely
switching to 'shadow_rl_domain'. Also, we will move to an
implementation where 'shadow_rl_domain' is switched out only when
other domains are actively scheduled. These changes will be
implemented later.
Jira NVGPU-8165
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Change-Id: Ia6a07d6bfe90e7f6c9e04a867f58c01b9243c3b0
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2704702
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
GVS: Gerrit_Virtual_Submit
BVEC changes for nvgpu_rc_pbdma_fault and nvgpu_rc_mmu_fault
started reporting below MISRA issue.
kernel/nvgpu/drivers/gpu/nvgpu/common/fifo/tsg.c:321:
1. misra_c_2012_rule_5_1_violation: Declaration with identifier
"nvgpu_tsg_unbind_channel_check_hw_state", which is ambiguous.
kernel/nvgpu/drivers/gpu/nvgpu/common/fifo/tsg.c:349:
2. other_declaration: The first 31 characters of identifiers
"nvgpu_tsg_unbind_channel_check_ctx_reload" and
"nvgpu_tsg_unbind_channel_check_hw_state" are identical.
Do below renames to fix the issue. Doing both for consistency.
s/nvgpu_tsg_unbind_channel_check_hw_state/nvgpu_tsg_unbind_channel_hw_state_check
s/nvgpu_tsg_unbind_channel_check_ctx_reload/nvgpu_tsg_unbind_channel_ctx_reload_check
JIRA NVGPU-6772
Change-Id: Ib92cabe11c486621351bf15ddb86e20d16d514c4
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2584152
(cherry picked from commit a619f259c6a4ffccb05550767212989af60c2a90)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2706551
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
GVS: Gerrit_Virtual_Submit
nvgpu_device_get can return NULL if supplied invalid ID or instance
ID. We expect GR device struct to be non-NULL there hence just
assert that it is indeed non-NULL in gr_reset_engine and
ga10b_grmgr_init_gr_manager.
CID 224133
CID 250232
Bug 3512546
Change-Id: Id09a1c436a8e49b921111b940d3d013bd66bff7a
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2707018
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>