1. Registers NV_PLTCG_LTC0_LTS0_DSTG_ECC_REPORT and
NV_PLTCG_LTC0_LTS0_DSTG_ECC_ADDRESS are
deprecated. Remove them.
2. Define NV_PLTCG_LTC0_LTS0_INTR3 for ga100.
3. Add fields and constants for the register
NV_PLTCG_LTC0_LTS0_L2_CACHE_ECC_ADDRESS.
4. Add new fields for the register
NV_PLTCG_LTC0_LTS0_L2_CACHE_ECC_CONTROL.
Bug 3446731
Change-Id: I3e41198b7b2e75ff69b5c6193e6fd54efae15752
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2633958
Reviewed-by: Antony Clince Alex <aalex@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
- Make the domain scheduler timeslice type nanoseconds to future proof
the interface
- Return -ENOSYS from ioctls if the nvs code is not initialized
- Return the number of domains also when user supplied array is present
- Use domain id instead of name for TSG binding
- Improve documentation in the uapi headers
- Verify that reserved fields are zeroed
- Extend some internal logging
- Release the sched mutex on alloc error
- Add file mode checks in the nvs ioctls. The create and remove ioctls
require writable file permissions, while the query does not; this
allows filesystem based access control on domain management on the
single dev node.
Jira NVGPU-6788
Change-Id: I668eb5972a0ed1073e84a4ae30e3069bf0b59e16
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2639017
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Move away from the prototype call in channel wdt worker and create a
separate worker thread for the domain scheduler. The details of runlist
domains are still encapsulated in the runlist code; the domain scheduler
controls when to switch domains. Switching happens based on domain
timeslices or when the current domain is deleted.
The worker thread is paused on railgate and spun back on poweron. The
scheduler data was also left dangling, so fix that by deinitializing all
nvs-related when gk20a_remove_support() is called. The runlist domains
already get freed as part of fifo removal.
Jira NVGPU-6427
Change-Id: I64f42498f8789448d9becdd209b7878ef0fdb124
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2632579
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
The flag pmu->pg->golden_image_initialized is set to
true during initial GPU context creation and is not
cleared while the GPU goes into pm_suspend (during railgate).
Hence, when the GPU resumes after un-railgate it retains
the previous value which can cause ELPG to kick in immediately.
Due to this, when ELPG and Railgating are enabled, IDLE_SNAP
is seen for read access of gr_gpc0_tpc0_sm_arch_r reg.
To resolve this, if golden image is ready set the
pmu->pg->golden_image_initialized to suspend state during railgate,
to delay the early enable of ELPG. Add a new
pmu_init_golden_img_state hal in the NVGPU_INIT_TABLE_ENTRY.
This will be called after all the GR access is done and GPU resumes
completely after un-railgate. This hal will then check if
golden_image_initialized flag is in suspend state, it will set it
to ready state and then re-enable ELPG.
Bug 3431798
Change-Id: I1fee83e66e09b6b78d385bbe60529d0724f79e79
Signed-off-by: Divya <dsinghatwari@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2639188
Reviewed-by: Mahantesh Kumbar <mkumbar@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
WARN() and WARN_ON() are most useful when the log explains where they
happened. The posix implementation of these prints neither that nor the
warning message (if any). Extend the macros to include function name and
line number, and print those plus the format string.
Actually formatting the format string is problematic wrt. MISRA rules,
so the arguments are not formatted.
The implementation of BUG() already prints the function name and line
number.
Change-Id: Ie246a915f5e8420e1c606bb1555a7f9b498725fd
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2634105
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
With the existing implemenatation ELPG was disabled and
enabled once for handling stall isr and then again ELPG is
disabled and aenabled for writing to gr retrigger register.
This increased number of ELPG cycles and degraded perf of various
graphics test with ELPG enabled.
This change now disables ELPG, then handles stall_isr and
write to gr retrigger register and then enables ELPG.
Thus, number of ELPG cycles are reduced.
Bug 3451615
Change-Id: Iadac0c7b01eb711878280cd1503ba0f26000937c
Signed-off-by: Divya <dsinghatwari@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2638175
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
The GSP stress test debug node only checks for gpu poweron and continues
to trigger the stress test if power is on. This results in a small
window where the GPU might get railgated, so call gk20a_busy() to hold a
power ref while loading or starting the stress test (which touches HW).
Change-Id: I2ced45472cd0602da36f1801e56b486097ece83d
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2635604
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: Dinesh T <dt@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Now that the main nvsched code exists in the nvgpu build, make it
control the runlist domains. As a new nvs domain is created, create the
relevant runlist data too. To support the default domain, create a
default nvs domain at boot.
The scheduling domain code owns the responsibility of domain lifetime,
and runlist domains exist to serve that logic although the RL domains
are directly used by channel and TSG logic. Add refcounting to the
scheduler uapi level to make sure that busy domains (that still have TSG
participants) do not get removed too early.
Adjust error injection sensitive unit tests to match the updated logic.
Jira NVGPU-6425
Jira NVGPU-6427
Change-Id: I1beec97c54c60ad334165b1c0acb5e827c24f2ac
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2632287
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Add ioctls for creating, removing and querying scheduling domains and
interface with the "nvsched" entity that will be the core scheduler.
Include the scheduler in the Linux build.
The core scheduler code will ultimately hold data on and control what
gets scheduled, but this intermediate layer in nvgpu-rm needs a bit of
bookeeping to manage the userspace interface.
To keep changes isolated, this does not touch the internal runlist
domains yet. The core scheduler logic will eventually control the
runlist domains.
Jira NVGPU-6788
Change-Id: I7b4064edb6205acbac2d8c593dad019d517243ce
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2463625
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Add software scheduler and domain data structures for a general
scheduler and bookkeeping code to create, remove and list them and to
keep logs for debugging and development; as a first step, this will only
serve as the heart of nvgpu-internal domain scheduler.
Long term, this code is intended to work in arbitrary configurations -
running on GSP, CPU, or elsewhere, and controlling the GPU or something
else.
Jira NVGPU-6427
Change-Id: I2a1577fa9120bb8ae6b9552dd2a187cb2cdfe504
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2632286
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
As the docs of nvgpu_thread say, each thread (which the worker loop is)
should wake up and check also nvgpu_thread_should_stop() to manage
graceful and quick exit as requested. The loop does have that check
already, but the workqueue condition does not, so the cond wait might
end up waiting until its timeout hits.
It's not robust to trust the worker users to have a swift timeout for
exiting the thread, so read the should-stop flag in the wakeup condition
too.
Simplify the clk arb worker ops now that calling
nvgpu_worker_should_stop from there is no longer necessary. (Other
worker users did not have those, so they were technically buggy.)
Change-Id: I5409b8037564d4b6445a15cdbd4f1f3d616c4083
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2635808
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
The list of valid class numbers is inconsistent between SWUD and code.
Remove the non-Volta class numbers from the valid list to ensure
consistency.
Also, there are some common.class files in safety build which are not
needed. Remove files hal/class/class_gm20b.h and
hal/class/class_gp10b.h from safety build.
Enhance common.class SWUD to include hyperlinks and input parameter
validation information.
JIRA NVGPU-6645
JIRA NVGPU-6991
Change-Id: I91f76dfed17bad1ed946070d0c96732047fd4548
Signed-off-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2603768
(cherry picked from commit af36b956ef12d1ef09aed6757d68d5866303b931)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2634753
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
- Separate out local golden context memory allocation from
nvgpu_gr_global_ctx_init_local_golden_image() into a new function
nvgpu_gr_global_ctx_alloc_local_golden_image().
- Add a new member local_golden_image_copy to struct
nvgpu_gr_obj_ctx_golden_image to store copy used for context
verification.
- Allocate local golden context memory from nvgpu_gr_obj_ctx_init()
which is called during poweron path.
- Remove memory allocation from nvgpu_gr_obj_ctx_save_golden_ctx().
- Disable test test_gr_obj_ctx_error_injection since it needs rework
to accomodate the new changes.
- Fix below tests to allocate local golden context memory :
test_gr_global_ctx_local_ctx_error_injection
test_gr_setup_alloc_obj_ctx
Bug 3307637
Change-Id: I2f760d524881fd328346838ea9ce0234358f8e51
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2633713
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
tpc_pg_mask_read: function returns the value stored in the
tpc_pg_mask node. Since, it does not have the terminator it
returns garbage value after the mask value and gets troublesome
to parse in the userspace. Fix is to add '\n' at the end of the
mask value to be consistent with the earlier releases.
tpc_pg_mask_store: In this function there is a call to check if
gpu is powered on or not. In case when gpu is powered on, the return
value is just assigned to the 'err' variable which is never
returned to notify the application to follow the next steps. Thus,
need to return this value and inform the same to the userspace.
Bug 3445617
Change-Id: Ibb6fed88c83c751a5fb73181089274aa27c3f18b
Signed-off-by: Ninad Malwade <nmalwade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2630925
Reviewed-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
The definition of srctree.nvgpu for external modules is specified as a
function of $(MAKEFILE_LIST). This variable gets extended also when
files are included, so any include directives in the makefile affect the
path because srctree.nvgpu is assigned with "?=" which is lazy like "=".
To avoid problems with the lazy init, use ":=" surrounded with an
explicit check to initialize srctree.nvgpu only when undefined.
Jira NVGPU-6427
Change-Id: Ic9847528d0da73f06bc7e421130c9c54247caadb
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2632925
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
- When DISALLOW cmd is sent from driver to PMU the actual
completion of the disallow will be acknowledged by PMU
via a new RPC: ASYNC_CMD_RESP.
- Disallow needs a delayed ACK from PMU in order to disable
the ELPG.
- If ELPG is already engaged, the DISALLOW cmd will trigger
ELPG exit and then transition to PMU_PG_STATE_DISALLOW.
- After this whole process is completed, PMU will send
DISALLOW_ACK through ASYNC_CMD_RESP RPC.
- After disallow command is sent from the driver, NvGPU driver
waits/polls for disallow command ack. This is sent immediately
by RPC framework of PMU.
- Then, the driver will poll/wait for ASYNC_CMD_RESP event which
is the delayed DISALLOW ACK.
- The driver captures the ASYNC_CMD_RESP RPC sent from PMU.
- set disallow_state to ELPG_OFF.
- If the driver does not wait/poll for this delayed disallow
ack from PMU, it can result in pmu halt issues as PMU is still
processing DISALLOW cmd but the driver progressed further which
can result in errors.
Bug 3430273
Bug 3439350
Change-Id: If2acf8391d18cd3c6b8b07e3bf6577667ec99eea
Signed-off-by: Divya <dsinghatwari@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2631214
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
User fence syncpt_id in the buffer compbits state was set to 0
on allocation through PREPARE_COMPRESSIBLE_READ ioctl or
MARK_COMPRESSIBLE_WRITE ioctl.
In case NVGPU_GPU_COMPBITS_GPU is requested through the ioctl
PREPARE_COMPRESSIBLE_READ, CDE conversion command is not
submitted and the output fence is cloned from the initial
state fence (with syncpt_id=0).
NvRmSyncWait on this fence from userspace lead to below error:
13e40000.host1x: nvhost_syncpt_wait_timeout: invalid syncpoint id 0
Initialize the buffer compbits state user fence syncpt_id to
NVGPU_INVALID_SYNCPT_ID with nvgpu_user_sync_init() so that
the userspace skips the NvRmSyncWait on that fence.
While at it, initialize other uninitialized members, valid_compbits
and zbc_color.
Bug 3360675
Change-Id: Ie2a584546e1ed37841cbdc3472b598794f911e6f
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2631235
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Most of the Orin chip specific code is compiled out of safety build
with CONFIG_NVGPU_NON_FUSA and CONFIG_NVGPU_HAL_NON_FUSA. Remove the
config protection from Orin/GA10B specific code. Currently all code
is enabled. Code not required in safety will be compiled out later
in separate activity.
Other noteworthy changes in this patch related to safety build:
- In ga10b_ce_request_idle(), add a log print to dump num_pce so that
compiler does not complain about unused variable num_pce.
- In ga10b_fifo_ctxsw_timeout_isr(), protect variables active_eng_id and
recover under CONFIG_NVGPU_KERNEL_MODE_SUBMIT to fix compilation
errors of unused variables.
- Compile out HAL gops.pbdma.force_ce_split() from safety since this HAL
is GA100 specific and not required for GA10B.
- Compile out gr_ga100_process_context_buffer_priv_segment() with
CONFIG_NVGPU_DEBUGGER.
- Compile out VAB support with CONFIG_NVGPU_HAL_NON_FUSA.
- In ga10b_gr_intr_handle_sw_method(), protect left_shift_by_2 variable
with appropriate configs to fix unused variable compilation error.
- In ga10b_intr_isr_stall_host2soc_3(), compile ELPG function calls
with CONFIG_NVGPU_POWER_PG.
- In ga10b_pmu_handle_swgen1_irq(), move whole function body under
CONFIG_NVGPU_FALCON_DEBUG to fix unused variable compilation errors.
- Add below TU104 specific files in safety build since some of the code
in those files is required for GA10B. Unnecessary code will be
compiled out later on.
hal/gr/init/gr_init_tu104.c
hal/class/class_tu104.c
hal/mc/mc_tu104.c
hal/fifo/usermode_tu104.c
hal/gr/falcon/gr_falcon_tu104.c
- Compile out GA10B specific debugger/profiler related files from
safety build.
- Disable CONFIG_NVGPU_FALCON_DEBUG from safety debug build temporarily
to work around compilation errors seen with keeping this config
enabled. Config will be re-enabled in safety debug build later.
Jira NVGPU-7276
Change-Id: I35f2489830ac083d52504ca411c3f1d96e72fc48
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2627048
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Add support for adding and deleting domains for all runlists together.
The core scheduler logic will control runlist domains.
Initially, however, it may be necessary to only actually schedule only
the GR runlist, but keeping the runlist code agnostic of such scheduling
logic helps isolate the control complexity.
NVGPU-6425
Change-Id: Id6039bd37a293a2cf3eaee5ed84d35459e8b89e7
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2628049
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
When there are multiple scheduling domains, each runlist pointer has to
be switched according to the active scheduling policy. For now implement
a trivial round robin policy to loop the domains over, just sufficient
for testing. In the future the switching will be owned by the scheduler
code, but this helps prepare the design for that.
The switching will not do anything if there is only one domain, so
current functionality is not affected.
For simplicity, all runlists are switched at the same time. In the
future, it may be desirable to swap e.g. only the GR runlist and keep
others running free, outside scheduler control.
Jira NVGPU-6427
Jira NVGPU-6425
Change-Id: Ic68c13e97761bbdc210c74794de8ccb8dbd45587
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2628048
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit