CIC (Central Interrupt controller) will be responsible for the
interrupt handling. common.cic unit is the placeholder for all
interrupt related code. Move interrupt related defines and
Public APIs present in common.mc to common.cic.
Note: The common.mc interrupts related struct definitions are
not moved as part of this patch.
Adapt the code to use interrupt handling related defines and public
APIs migrated from common.mc to common.cic
JIRA NVGPU-6899
Change-Id: I747e2b556c0dd66d58d74ee5bb36768b9370d276
Signed-off-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2535618
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GFxP preemption for graphics contexts is not supported in safety.
But the support was enabled along with CONFIG_NVGPU_GRAPHICS since GFxP
preemption was protected under same config.
Add a separate config CONFIG_NVGPU_GFXP to protect all GFxP specific
code, enum values, and HALs.
Disable the config in safety profile.
Jira NVGPU-6893
Change-Id: Iebb5f754a1025dfa6e05a94704bdb8a7123b599a
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2534986
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Add a new Central Interrupt Controller(CIC) unit in common code.
The interrupt handling is done in a distributed manner currently.
The error handling policy for different errors resides in each unit's
ISR code. The goal is to converge this data under one central place -
the CIC unit.
This patch creates framework for CIC unit and moves the gv11b QNX
safety LUT to CIC unit. All the error reporting APIs from different
units are also moved to CIC.
New APIs are exposed by CIC unit to access its internal data like:
1. Struct err_desc - the static err handling /injection data per
error id
2. Num_hw_modules - the number of error reporting HW units
supported by CIC
Init and deinit of CIC unit:
1. CIC unit should be initialized earlyon during boot so that it
is available for any interrupt handling.
2. Initialize CIC just before the interrupts are enabled during
boot.
3. Similarly, CIC is disabled late during deinit cycle; right
after the interrupts are masked.
LUT:
1. LUT is currently used only for reporting error to safety
services in gv11b QNX safety build.
2. This error handling policy LUT currently has only two levels
of handing - correctable and quiecse.
3. Once, the error handling policy decision is moved from leaf
unit nodes to CIC, LUT will be updated to have additional levels
like fast recovery and full recovery.
4. Also, then a separate LUT will be added for each platform/build.
5. In current framework, the LUT is set to NULL for all
configurations except gv11b.
report_err() ops is added to report error to safety services.
This ops is only effective for gv11b qnx build; and set to NULL for
other configurations.
NVGPU-6521
NVGPU-6523
NVGPU-6750
NVGPU-6758
NVGPU-6760
NVGPU-6754
Change-Id: I24be7836a96d787741e37b732e19863ed8014635
Signed-off-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2518683
Reviewed-by: Ajesh K V <akv@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
The function gk20a_mm_l2_flush incorrectly returns an error value
when it skips l2_flush when hardware is powered off.
This causes the following prints to occur even when the behavior is expected.
gv11b_mm_l2_flush:43 [ERR] gk20a_mm_l2_flush failed
nvgpu_gmmu_unmap_locked:1043 [ERR] gk20a_mm_l2_flush[1] failed
The above errors occur from the following paths
1) gk20a_remove -> gk20a_free_cb -> gk20a_remove_support ->
nvgpu_pmu_remove_support -> nvgpu_pmu_pg_deinit ->
nvgpu_dma_unmap_free
2) gk20a_remove -> gk20a_free_cb -> gk20a_remove_support ->
nvgpu_remove_mm_support -> gv11b_mm_mmu_fault_info_mem_destroy ->
nvgpu_dma_unmap_free
Since, these do not belong in the Poweron/Poweroff path, its okay to
skip flushing them when the hardware has powered off.
Fixed the userspace tests by allocating g->mm.bar1.vm to prevent NULL access
in gv11b_mm_l2_flush->tlb_invalidate.
Jira LS-77
Change-Id: I3ca71f5118daf4b2eeacfe5bf83d94317f29d446
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2523751
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Some of the RTV circular buffer programming is under GRAPHICS config and
some is under DGPU config. For nvgpu-next, RTV circular buffer is
required even for iGPU so keeping the code under DGPU config does not
make sense.
Move all the code from DGPU config to GRAPHICS config.
Bug 3159973
Change-Id: I8438cc0e25354d27701df2fe44762306a731d8cd
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2524897
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
1) nvgpu poweron sequence split into two stages:
- nvgpu_early_init() - Initializes the sub units
which are required to be initialized before the grgmr init.
For creating dev node, grmgr init and its dependency unit
needs to move to early stage of GPU power on.
After successful nvgpu_early_init() sequence,
NvGpu can indetify the number of MIG instance required
for each physical GPU.
- nvgpu_finalize_poweron() - Initializes the sub units which
can be initialized at the later stage of GPU power on sequence.
- grmgr init depends on the following HAL sub units,
* device - To get the device caps.
* priv_ring - To get the gpc count and other
MIG config programming.
* fb - MIG config programming.
* ltc - MIG config programming.
* bios, bus, ecc and clk - dependent module of
priv_ring/fb/ltc.
2) g->ops.xve.reset_gpu() should be called before GPU sub unit
initialization. Hence, added g->ops.xve.reset_gpu() HAL in the
early stage of dGPU power on sequence.
3) Increased xve_reset timeout from 100ms to 200ms.
4) Added nvgpu_assert() for gpc_count, gpc_mask and
max_veid_count_per_tsg for identify the GPU boot
device probe failure during nvgpu_init_gr_manager().
JIRA NVGPU-6633
Change-Id: I5d43bf711198e6b3f8eebcec3027ba17c15fc692
Signed-off-by: Lakshmanan M <lm@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2521894
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Currently, there are few chip specific erratas present in nvgpu code.
For better traceability of the erratas and corresponding fixes,
introduce flags to indicate existing erratas on a chip. These flags
decide if a corresponding solution is applied to the chip(s).
This patch introduces below functions to handle errata flags:
- nvgpu_init_errata_flags
- nvgpu_set_errata
- nvgpu_is_errata_present
- nvgpu_print_errata_flags
- nvgpu_free_errata_flags
nvgpu_print_errata_flags: print below details of erratas present in chip
1. errata flag name
2. chip where the errata was first discovered
3. short description of the errata
Flags corresponding to erratas present in a chip are set during chip hal
init sequence.
JIRA NVGPU-6510
Change-Id: Id5a8fb627222ac0a585aba071af052950f4de965
Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2498095
Reviewed-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
All graphics code is under CONFIG_NVGPU_GRAPHICS and all the HALs are
in non-fusa files. In order to support graphics in safety,
CONFIG_NVGPU_GRAPHICS needs to be enabled. But since most of the HALs
are in non-fusa files, this causes huge compilation problem.
Fix this by moving all graphics specific HALs used on gv11b to fusa
files. Graphics specific HALs not used on gv11b remain in non-fusa files
and need not be protected with GRAPHICS config.
Protect call to nvgpu_pmu_save_zbc() also with config
CONFIG_NVGPU_POWER_PG, since it is implemented under that config.
Delete hal/ltc/ltc_gv11b.c since sole function in this file is moved to
fusa file.
Enable nvgpu_writel_loop() in safety build since it is needed for now.
This will be revisited later once requirements are clearer.
Move below CTXSW methods under CONFIG_NVGPU_NON_FUSA for now. Safety
CTXSW ucode does not support these methods. These too will be revisited
later once requirements are clearer.
NVGPU_GR_FALCON_METHOD_PREEMPT_IMAGE_SIZE
NVGPU_GR_FALCON_METHOD_CTXSW_DISCOVER_ZCULL_IMAGE_SIZE
Jira NVGPU-6460
Change-Id: Ia095a04a9ba67126068aa7193f491ea27477f882
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2513675
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Separated set_atomic_mode functionality from
init_fs_state/enable_nvlink and created new
fb gops for set_atomic_mode.
In gpu init sequence, set_atomic_mode is
called after acr_construct_execute to take care
of design changes required for nvgpu-next
architectures.
Updated fb_gv11b_init_test to use set_atomic_mode
gops along with init_fs_state.
Bug 3268664
Change-Id: I1ab9eb21cc4cce77f3325c4e8821a75b6e85fba2
Signed-off-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2508095
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
This patch is to disable the clock arbitor for TU104.
TU104 is not a POR for Drive 6.0 so disabling it to easy migration
of clk arb for GA100.
As a first step all the NVRM Clock tests will be skipped by setting
NVGPU_SUPPORT_CLOCK_CONTROLS to false for TU104.
Then clk arbitor will be rewritten for GA100 and enabled back.
This patch implements by adding a new flag NVGPU_CLK_ARB_ENABLED which
holds the status of clk arbitor for each platform and disables them for
TU104
Bug 200699763
Change-Id: I51cd5c7821bdc0b48080c17a70735925b278ddf5
Signed-off-by: absalam <absalam@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2515086
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Disable access to ssync unit when MIG is enabled as ssync is
part of GR and not Compute. A runtime check is now added for the below
function.
gv11b_gr_intr_enable_hww_exceptions
The following priv errors are seen.
SYS write error: ADR 0x00405a14 WRDAT 0xc0000000 master 0x00000000
[ERR] INFO 0x19400200: (subid 0x00000019 priv_level 0 local_ordering 1)
[ERR] CODE 0xbadf1100
Jira NVGPU-6699
Change-Id: I9a08f1b6ab58affdcaa18e8ca314a4a00478a3e5
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2514761
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Lakshmanan M <lm@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Introduce new HAL gops_ltc.set_l2_sector_promotion to configure L2
sector promotion policy. The follow three promotion settings are support:
- NVGPU_GPU_IOCTL_TSG_L2_SECTOR_PROMOTE_FLAG_NONE
- NVGPU_GPU_IOCTL_TSG_L2_SECTOR_PROMOTE_FLAG_64B
- NVGPU_GPU_IOCTL_TSG_L2_SECTOR_PROMOTE_FLAG_128B
Add ioctl "NVGPU_TSG_IOCTL_SET_L2_SECTOR_PROMOTION" to the gpu tsg node
to support l2 sector promotion. On chips which do not support sector
promotion, the ioctl returns 0.
Bug 200656177
Change-Id: Iad835a5c954d3b10da436cfafb388aaaa04f44c7
Signed-off-by: Antony Clince Alex <aalex@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2460553
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Add ioctl support to configure and read the max number of lines/ways
in a L2 cache set that can be marked as EVICT_LAST. This is accomplished
through two new ltc hals: set_l2_max_ways_evict_last,
get_l2_max_ways_evict_last. These hals will only be set for nvgpu-next
chips. Incase of legacy chips, the IOCTLs will return error -ENOSYS.
Generate following litter constants to get the number of sets in a l2
slice and the number of ways in each set:
- GPU_LIT_NUM_LTC_LTS_SETS
- GPU_LIT_NUM_LTC_LTS_WAYS
Add gpu characteritics flag: NVGPU_L2_MAX_WAYS_EVICT_LAST_ENABLED to
allow userspace driver to determine if L2_MAX_WAYS_EVICT_LAST ioctl is
supported.
Bug 200605474
Change-Id: Id3180f891399f5e128500f3835d762aee59953e0
Signed-off-by: Antony Clince Alex <aalex@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2445884
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
gv11b_gr_wait_for_sm_lock_down() uses nvgpu_get_poll_timeout() to
get timeout value for polling of SM lock down status.
nvgpu_get_poll_timeout() returns -1 if timeouts are disabled by
debugger, and if SM lock down fails, nvgpu lands in an infinite loop.
Use g->poll_timeout_default instead of nvgpu_get_poll_timeout() so
that explicit timeout value is always used. This also means that
timeout value of ULONG_MAX will still be used on simulation platforms.
Bug 200676073
Change-Id: I5777e98efcd63f24ade244384cf7b157dcea991d
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2478255
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Mikhail Filimonov <mfilimonov@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Skip setting error notifier on MMUFAULT when a debugger is connected and
debugging is enabled(mmu_debug_mode=on). At present, error notifier causes
the application to teardown the channels and debugger will not be able to
collect any data.
Bug 200632771
Change-Id: Idc4141990d4e2fb4714de9bfd31cfea5e7dcd52a
Signed-off-by: Antony Clince Alex <aalex@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2477253
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Update the nvgpu_runlist_update_for_channel() function:
- Rename it to nvgpu_runlist_update()
- Have it take a pointer to the runlist to update instead
of a runlist ID. For the most part this makes the code
better but there's a few places where it's worse (for
now).
This starts the slow and painful process of moving away from
the non-runlist code using runlist IDs in many places it should
not.
Most of this patch is just fixing compilation problems with
the minor header updates.
JIRA NVGPU-6425
Change-Id: Id9885fe655d1d750625a1c8aceda9e67a2cbdb7a
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2470304
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Rename struct nvgpu_runlist_info to struct nvgpu_runlist; the
info is not necessary. struct nvgpu_runlist is soon to be a
first class object among the nvgpu object model.
Also rename the fields runlist_info and active_runlist_info to
simply runlists and active_runlists respectively. Again the info
text is just not necessary and somewhat misleading. These structs
_are_ the runlist representations in SW; they are not merely
informational.
Also add an rl_dbg() macro to print debug info specific to
runlist management and some debug prints specifying the runlist
topology for the running chip.
Change-Id: Id9fcbdd1a7227cb5f8c75cca4abbff94fe048e49
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2470303
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
The NEXT bit can remain set for the channel if timeslice expires before
scheduler clears it. Due to this nvgpu fails TSG unbind and in turn
nvrm_gpu fails channel close. In this case, checking the channel hw
state after some time can help see NEXT bit cleared by scheduler.
Reenable the tsg and return -EAGAIN to nvrm_gpu for it to retry again.
Bug 3144960
Change-Id: I35f417f02270e371a4e632986b73a00f8a4f921a
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2468391
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Add CAU initialization data in const array hwpm_cau_init_data[].
Add HAL API gops.gr.get_hwpm_cau_init_data() to retrieve this data
and implement it for TU104.
Add new HAL API gops.gr.init_cau() that uses above data and
initializes all cau units. Implement this HAL only for TU104.
Invoke above sequence from nvgpu_profiler_bind_hwpm() in case of
global HWPM mode.
Jira NVGPU-5360
Change-Id: I1c7a380e9d04d6cd45fb7f746c0a79fc56675244
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2463854
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Add ptimer register offsets to regops allowlist for testing. New
allowlist restricts regops only to reserved resources, this makes it
difficult to test the interface since only HWPM registers can be
accessed and that could have side effects on system.
Having ptimer registers as test offsets has advantage that the offsets
do not change across chips, registers are read-only, and values are
always incrementing so a test can verify read regops and test various
flags of interface.
Add gops.ptimer.get_timer_reg_offsets() HAL to return timer offsets.
Add static function add_test_range_to_map() that adds timer offsets to
allowlist always.
In nvgpu_profiler_validate_regops_allowlist() return success if timer
offsets are hit in range search.
Bug 2510974
Jira NVGPU-5360
Change-Id: I8b51bb92e43e8b1bbe903c874a429341659ef603
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2460002
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Antony Clince Alex <aalex@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit