- Accessing any PGRAPH registers in GR intr retrigger
ISR routine when ELPG is engaged causes idle snap.
- This idle snap is caught when nvgpu_submit_illegal_class
test is run.
- To avoid access to PGRAPH registers when ELPG is engaged
add elpg protected call for GR intr retrigger and CE ISR
and retrigger HALs
Bug 200777033
Change-Id: Ieef4a423faf79f09476d696c3078b113750548bb
Signed-off-by: Divya Singhatwaria <dsinghatwari@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2586449
Reviewed-by: Mahantesh Kumbar <mkumbar@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
- On pre-silicon platform, static pg will be
done by nvgpu driver. For this, retain structs
and HALs of static pg.
- Add the static pg support under pre-silicon code.
- On silicon, the static pg will be done by BPMP.
- Rename variables used in static pg for better
readability and consistency
Bug 200768322
JIRA NVGPU-6433
Change-Id: Ib31c0f83b751c2b1563a36bd51af78a0bd12a117
Signed-off-by: Divya <dsinghatwari@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2594801
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
common.fbp has two interfaces to initialize FBP:
1. Public API nvgpu_fbp_init_support
2. HAL fbp.fbp_init_support
nvgpu_fbp_init_support() is only used to initialize HAL
fbp.fbp_init_support. Remove the HAL and use the API directly.
JIRA NVGPU-6644
Change-Id: I2c455e09dbcf5e4fb1dc370b284e4f0d5c678b40
Signed-off-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2592047
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Add Setter and Getter methods for accessing tsg->sm_error_states.
Getter returns a constant pointer for struct nvgpu_tsg_sm_error_state.
This renders it unnecessary to add BVEC for above fields for the struct
in multiple locations. The current design ensures that only a constant
pointer is obtained from the owner unit i.e. FIFO.
The following new methods are added. Both unit tests and BVEC tests
are added for them as well.
nvgpu_tsg_store_sm_error_state
nvgpu_tsg_get_sm_error_state
Jira NVGPU-6947
Change-Id: I82c22a2774862c8579baa41b6fb8292fa164704a
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
(cherry picked from commit 79574638671a0c6efe41cd3423668fcd1bd96826)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2556938
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Shashank Singh <shashsingh@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Update documentation for priv_ring unit based on updated swud
guidelines. This patch is contains a combination of two commits.
Documentation is added for the HAL methods
enable_priv_ring and isr
decode_error_code
enum_ltc
get_fbp_count
get_gpc_count
set_ppriv_timeout_settings
Jira NVGPU-6986
Change-Id: Ifa401dab0f29330ab7db2dcc888edf46a402cc83
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2587227
GVS: Gerrit_Virtual_Submit
(cherry picked from commit 0bdcf425ca58e6d04dceaedbb48f3adef43a870a)
(cherry picked from commit ca44c09df60791db2ea6a6a80bc807f6c7eba494)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2590992
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
This patch fixes a bug in the cbc initialization code for ga10b,
where it was erroneously assumed that a fixed ltc count of only one
should be used for historical reasons. For volta and later, the full
ltc count should be used in cbc-related computation.
Ensure
- CBC base address is 64K aligned
- CBC start address lies within CBC allocated memory
Check CBC is marked safe only for silicon platform.
Bug 3353418
Change-Id: I5edee2a78dc9e8c149e111a9f088a57e0154f5c2
Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2585778
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
These errors are now actually expected from code that counts number of
sys/gpc/fbp perfmons after first context creation. Nvgpu tries to count
them by register offset lookup in context image and counts perfmons until
invalid offset is found.
nvgpu_gr_hwmp_map_find_priv_offset no longer prints an error message.
The correct error condition is moved to gr_exec_reg_ops
Bug 200755537
Change-Id: Ib5c6ccd39275b2b06e3f8bce4878a3234478a780
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2586228
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
* As of now, working on multiple chip bringup in nvgpu-next repo has
an issue because we end with losing control on source code (hard to
find which part of the code belongs to which chip) and it's valuable
history this affects chip migration on release.
* To support multiple chip bringup simultaneously, we need new
guidelines to avoid losing control on source code and make migration
easier. This change adds links to nvgpu-next repo.
* Updated return code to ENODEV for consistency
* Updated ACR unittest to work with ENODEV return code
NOTE:
These are the initial set of infrastructure changes, guidelines
will evolve, and source code will get updated accordingly.
Based on future chip features, Which part of the source code falls
under nvgpu-next repo is decided.
JIRA NVGPU-6574
Change-Id: I81827e35d189c55554df00e255b527a4473e0338
Signed-off-by: Sagar Kadamati <skadamati@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2556793
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
After CONFIG_UBSAN kernel compilation flag to know any shifting
cause overflow or not enablement ,this is identified.
The register "gr_fe_tpc_fs_r(gpc_index)" is read only after
Volta. The gops where we are computing the index is not needed.
Bug 200727116
Change-Id: Ib2306103389ba9df77fd59d012ec70e775104989
Signed-off-by: dt <dt@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2573296
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Currently reading temperature value depeads on therm pstate
board objects. In absence of pstate reading temperature
from therm get status will be failed which will cause GVS
failure in NvRmGpuTest_Device_GetTemperature test.
This change will add support to read temperature from
therm sensor_00 register but this will have following
limitation:
- NV_THERM_I2CS_SENSOR_00 doesn't support fractional
precision.
- It doesn't support negative temperatures.
BUG-200736830
Signed-off-by: Ramesh Mylavarapu <rmylavarapu@nvidia.com>
Change-Id: I25e577dac9029fcd787a6f71957dbeefd6fe43dd
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2584269
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
There is HW specific limit on number of channel entries that can be
added for each TSG entry in runlist. Right now there is no checking
to enforce this from SW and hence if User binds more than supported
channels to same TSG, invalid TSG formation error interrupts are
generated.
Fix this by adding appropriate checks in below steps :
- Add new field ch_count to struct nvgpu_tsg to keep track of
channels bound to TSG.
- Define new hal gops.runlist.get_max_channels_per_tsg() to retrieve
HW specific maximum channel count per TSG.
- Implement the HAL for gk20a and gv11b chips, and assign new HALs for
all chips appropriately.
- Increment ch_count while binding the channel to TSG and decrement it
while unbinding.
- While binding channel to TSG, Check if current channel count is
already equal to max channel count. If yes, print an error and bail
out.
Bug 200763991
Change-Id: Ic5f17a52e0fb171d1c020bf4f085f57cdb95f923
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2582095
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Add support for powering off IGPU for switching between
legacy to SMC mode/vice-versa or changing SMC configuration.
The power off can be issued as follows
echo 0 > /dev/nvgpu/igpu0/power
The following steps are done during a poweroff.
1) Deterministic channel idle
2) Acquire write_lock on l->busy semaphore.
3) Wait till power_usage decrements to indicate 0 active jobs.
4) Invoke pm_runtime_put_sync_suspend()
5) Invoke nvgpu_gr_remove_support() to clear existing GR memory.
6) Release write_lock on l->busy
7) Deterministic channel unidle.
Part of the sequence matches that of the gk20a_do_idle code.
The common parts are extracted into new functions
gk20a_block_new_jobs_and_idle() and gk20a_unblock_jobs()
For joint-rail case, the current implementation, does a railgate
and then sets pm_runtime_set_autosuspend_delay(-1) to disable
regular runtime resume/suspend.
Remove clearing of NVGPU_SUPPORT_MIG status during state change
ias it leads to inconsistencies.
Jira NVGPU-6920
Change-Id: I0b3eb3278176122ac061c1e8a94ebfb3c17c3925
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2578501
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: Antony Clince Alex <aalex@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
ECC counter structures are freed without removing the node from the
stats_list. This can lead to invalid access due to dangling pointers.
Update the ecc counter free logic to set them to NULL upon free, to
remove them from stats_list and free them by validation.
Also updated some of the ecc init paths where error was not propa-
gated to callers and full ecc counters deallocation was not done.
Now, calling unit ecc_free from any context (with counters alloc-
ated or not) is harmless as requisite checks are in place.
bug 3326612
bug 3345977
Change-Id: I05eb6ed226cff9197ad37776912da9dcb7e0716d
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2565264
Tested-by: Ashish Mhetre <amhetre@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Currently, the ECC errors go unnoticed. ECC errors in GR are handled
by incrementing the ecc counters and clearing the intr registers. No
action is taken bassed on the ECC counters. The ECC intr prints are
also not printed by default.
Ideally, the ECC errors should trigger recovery if they cross the
error threshold. While we wait for the ideal ECC handling, convert
the conditional ECC prints to error messages.
JIRA NVGPU-7058
Change-Id: I2e54aeb5aef1d76dd6d4162eedd21cd529089b54
Signed-off-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2573029
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: V M S Seeta Rama Raju Mudundi <srajum@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Currently, cbc init and compression tests are failing because MMU marks
cbc to be not safe.
- Modify cbc.get_base_divisor hal to use ltc_count = 1 for Tegra devices
- Update fb.cbc_configure to write compbit_backing_size value to
fb_mmu_cbc_top register.
- After config confirm that CBC is marked safe.
Bug 3353418
Change-Id: I1e9c27f47f7bfcf476f2499231951382e2e8653d
Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2570550
Reviewed-by: Sami Kiminki <skiminki@nvidia.com>
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: V M S Seeta Rama Raju Mudundi <srajum@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
- Add gops_fbp_fs and gops_gpc_pg struct
- Add HALs to write to NV_FUSE_CTRL_OPT_FBP and
NV_FUSE_CTRL_OPT_GPC fuses needed for floorsweeping
- Add set_fbp_mask and set_gpc_mask to probe FBP and GPC mask
respectively during gpu probe
- Add sysfs node: fbp_fs_mask and gpc_fs_mask to store
FBP and GPC floorsweeping mask sent from userspace
- Move the floorsweeping programming early in NVGPU’s GPU init
function and then issue a PRI init.
JIRA NVGPU-6433
Change-Id: I84764d625c69914c107e1e8c7f29c476c2f64f78
Signed-off-by: Divya Singhatwaria <dsinghatwari@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2499571
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Enable nvriscv debug buffer feature in NVGPU.
Debug buffer is a feature to print the debug log from ucode onto console
in real time.
Debug buffer feature uses the DMEM, queue and SWGEN1 interrupt to share
ucode debug data with NVGPU.
Ucode writes debug message to DMEM and updates offset in queue to trigger
interrupt to NVGPU.
NVGPU copies the debug message from DMEM to local buffer to process and
print onto console.
Debug buffer feature is added under falcon unit and required engine
can utilize the feature by providing required param through public
functions.
Currently GA10B NVRISCV NS/LS PMU ucode has support for this feature
and enabled support on NVGPU side by adding required changes, with this
feature enabled, it is now possible to see prints in real time.
JIRA NVGPU-6959
Change-Id: I9d46020470285b490b6bc876204f62698055b1ec
Signed-off-by: mkumbar <mkumbar@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2548951
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
- Fix GSP/PMU registers priv errors which are seen as part of boot sequence.
- Couple of GSP/PMU Falcon/NVRISCV registers are allowed to access
upon NVRISCV bootrom completion but these registers were needed
to configure on legacy chips to bootstrap/configure Falcon.
- Add is_falcon2_enabled or NVGPU_PMU_NEXT_CORE_ENABLED check
to skip these registers.
JIRA NVGPU-7025
Change-Id: I087a477ade6736398dea113f89894a0ff73ae647
Signed-off-by: mkumbar <mkumbar@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2553127
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Added below IVC commands to support VAB on HV.
* TEGRA_VGPU_CMD_FB_VAB_RESERVE - Enable & Configure VAB tracking
* TEGRA_VGPU_CMD_FB_VAB_FLUSH_STATE - Dump VAB to user buffer
* TEGRA_VGPU_CMD_FB_VAB_RELEASE - Disable VAB tracking
Also set HAL and enable VAB for ga10b vgpu.
Jira GVSCI-4619
Change-Id: Id7564611c24740ab8613e4baa420ee58fb52759a
Signed-off-by: Sagar Kadamati <skadamati@nvidia.com>
Signed-off-by: Richard Zhao <rizhao@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2507268
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Changes:
- This change will only init gsp software
state, nvgpu_gsp_bootstrap need to be called.
- CONFIG_NVGPU_GSP_SCHEDULER flag is created to
compile out the gsp scheduler code when needed.
- Created GSP engine reset which is needed when
ACR completed execution and need to load gsp fw.
NVGPU-6783
Signed-off-by: Ramesh Mylavarapu <rmylavarapu@nvidia.com>
Change-Id: I2ce43e512b01df59443559eab621ed39868ad158
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2554267
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Issue observed:
- In GA10B, it was observed that after recovery happens
ELPG does not engage.
- It was because, after CE reset, when nvgpu_submit_twod test
was run to engage ELPG, IDLE_FLIPPED_PWR_OFF signal was asserted.
- This means that when ELPG was engaged (engine is in PWR_OFF),
some idle signal flips (becomes non-idle) and this causes
IDLE_SNAP. After IDLE_SNAP is hit, ELPG will not engage further.
- After debugging from WAVES, it was observed that:
LCE0/LCE1 are not done with the reset sequence.
- The state of these LCE is RESET0. A pri request (pri read
to NV_CE_PCE_MAP register in CE) is seen that kicks it out of
RESET0. After this state, it goes through few states to update
some internal states (states RESET1/RESET2/PCE_MAP etc) and then
eventually settles down to IDLE state.
Solution:
- Read ce_pce_map_r register in recovery sequence (after ce reset).
- It is observed that when this read is added recovery is complete
and post that when nvgpu_submit_two test is executed, ELPG is engaging.
- This means that a pri read is needed after CE reset so that it settles
to idle state properly and post that ELPG can engage properly.
Bug 200734258
Change-Id: I5bb84921ca62a740fde81ffe6c29ccde4ebb341b
Signed-off-by: Divya Singhatwaria <dsinghatwari@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2554493
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Mahantesh Kumbar <mkumbar@nvidia.com>
Reviewed-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>