The timestamp control register in the SMCARB should be configured to have
the NV_PSMCARB_TIMESTAMP_CTRL_DISABLE_TICK field cleared, otherwise the PTIMER
ticks will not be sent to GR engine. Hence, remove the pre-processor checks
around grmgr.load_timestamp_prod call.
Bug 3510460
Bug 3500065
Change-Id: I223cea1aca28a9215287f540eb961a16e3fe6626
Signed-off-by: Antony Clince Alex <aalex@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2671021
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
The ctxsw ucode saves all the ctxsw'ed TPC priv registers in the TPC
priv segment of the ctxsw image. In ga10b, these registers can be stored
in either of the two arrangements:
- INTERLEAVED: means the format is sorted by address first, then by TPC number
- MIGRATION: exact opposite of interleaved.
Update HAL functions gr_ga10b_process_context_buffer_priv_segment,
gr_ga10b_find_priv_offset_in_buffer to detect the register layout and
calculate the register offset accordingly.
Bug 200737000
Bug 3532165
Change-Id: I305509cf89498cb0c2c5bfa1d867272bdf5f42b3
Signed-off-by: Antony Clince Alex <aalex@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2665491
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
This patch does the following changes:
- Compiles-out unused error reporting APIs and the related
data structures from safety build. For this purpose, it
introduces the new flag: CONFIG_NVGPU_INTR_DEBUG
- Updates nvgpu_report_err_to_sdl() API with one more argument,
hw_unit_id. This aids in finding whether an error to be reported
is corrected or uncorrected from LUT.
- Triggers SW quiesce, if an uncorrected error is reported to
Safety_Services, in safety build.
- Renames files in cic folder by replacing gv11b with ga10b,
since error reporting for gv11b is not supported in dev-main.
JIRA NVGPU-8002
Change-Id: Ic01e73b0208252abba1f615a2c98d770cdf41ca4
Signed-off-by: Rajesh Devaraj <rdevaraj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2668466
Reviewed-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
GVS: Gerrit_Virtual_Submit
For SMC mode, userspace is expected to use local indexing
for accessing GPC/FBP specific perf registers where local indexing
refers to indexes localized to a given SMC instance. H/W however expects
logical id based indexing for these registers. Currently, nvgpu driver maintains
a mapping between local <-> logical/physical ids of the GPCs for SMC specific
configurations/instances.
These register accesses are performed by the Debugger/Profiler interfaces and uses regops
for read/writes. In their current state, regops simply validates register addresses and performs
the required operation on them. These registers are currently indexed using local ids
and there is a need to convert them to use logical ids for supporting SMC modes. For non-SMC case
local ids are equivalent to logical ids and hence the conversion would have no effect on them.
Following changes are added to facilitate the above conversion from
local ids to logical ids in the regops path.
1) nvgpu_profiler_allowlist_range_search is modified to update
a nvgpu_pm_resource_register_range_map entry instead of just the
type.
2) added two APIs, one meant for profiler V2 based interfaces
and the other for legacy profiler interface. The logic for
legacy profiler interface extends into the more generic profiler
V2 logic to help retain future compatibility. These APIs are added
just after the validation stage for nvgpu_exec_regops.
3) The above APIs return an error if the local ids exceed the number
of GPCs/FBPs for a particular instance.
Bug 200712091
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Change-Id: I060c2408a798f2f4e058aba266fa1ea9cebc2682
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2644956
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Antony Clince Alex <aalex@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
GVS: Gerrit_Virtual_Submit
Add following HALs for Ga100 and Ga10b. These will
be used for calculating chiplet offsets corresponding
to GPC/FBP perf register.
get_pmmgpcrouter_per_chiplet_offset
get_pmmfbprouter_per_chiplet_offset
get_hwpm_fbp_perfmon_regs_base
get_hwpm_gpc_perfmon_regs_base
get_hwpm_fbprouter_perfmon_regs_base
get_hwpm_gpcrouter_perfmon_regs_base
Bug 200712091
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Change-Id: Iec1a16ef4a3c26dca054c30d95bef991983dc2b7
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2648832
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Following changes are added
1) nvgpu_gr_config->gpc_tpc_mask_physical is now indexed by physical
gpc id instead of logical id.
2) Removed the conversion of logical fbp ids and replace them with
physical ids.
3) nvgpu_gpu_instance->fbp_en_mask now contains the mask of physical fbp ids.
4) gk20a_ctrl_ioctl_gpu_characteristics returns gpu.gpc_mask returns mask
of physical ids.
Bug 200712091
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Change-Id: I0e066df76e07203ff4a5be5bfff2cef8566b425d
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2648831
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: Antony Clince Alex <aalex@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
GVS: Gerrit_Virtual_Submit
This patch updates the interaction between the VAB
packet polling code and the VAB_ERROR MMU fault handling
code. A shared atomic flag is used to determine if a
VAB_ERROR MMU fault has happened while polling, which will
result in polling be terminated immediately instead of
waiting on a timeout to happen. This allows testing VAB_ERROR
MMU fault handling in environments where a timeout may never
happen or happen very slowly.
The sequence for this to work is the following:
1) before requesting a VAB dump, which may trigger a fault,
the atomic flag is atomically reset to 0.
2) polling eventually starts which atomically checks the flag
in the loop. If flag is set, polling exits because the VAB
result will never be available.
3) If a VAB_ERROR MMU fault is raised, this sets the flag to 1
atomically.
Note that while there could be a race in this sequence if the
VAB_ERROR MMU fault handling is somehow delayed, the chance is
extremely slim because:
1) the race could only happen if the VAB dump code is re-entered
before the earlier VAB_ERROR MMU fault is still pending.
2) the polling code has a large timeout
3) re-entering means a new ioctl/devctl
Bug 3425981
Change-Id: I422b15b581b0c3417abd4c66fbcdde9a0ff8cd9b
Signed-off-by: Martin Radev <mradev@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2664103
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: Vedashree Vidwans <vvidwans@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
GVS: Gerrit_Virtual_Submit
In DRIVE 6.0, NvGPU is allowed to report only 32-bit metadata to
Safety_Services. So, there is no need to have distinct APIs for
reporting errors from units like GR, MM, FIFO to SDL unit. All
these error reporting APIs will be replaced with a single API. To
meet this objective, this patch does the following changes:
- Replaces nvgpu_report_*_err with nvgpu_report_err_to_sdl.
- Removes the reporting of error messages.
- Replaces nvgpu_log() with nvgpu_err(), for error reporting.
- Removes error reporting to Safety_Services from nvgpu_report_*_err.
However, nvgpu_report_*_err APIs and their related files are not
removed. During the creation of nvgpu-mon, they will be moved under
nvgpu-rm, in debug builds.
Note:
- There will be a follow-up patch to fix error IDs.
- As discussed in https://nvbugs/3491596 (comment #12), the high
level expectation is to report only errors.
JIRA NVGPU-7450
Change-Id: I428f2a9043086462754ac36a15edf6094985316f
Signed-off-by: Rajesh Devaraj <rdevaraj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2662590
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
- implemented device info cmd to send device info to the gsp for
runlist submission. Currently GSP scheduler support only GR
engine '0' instance.
- implemented runlist submit cmd. GSP firmware will submit the
corresponding runlist by writing into submit registers. This
command is direct replacement of hw_submit ga10b hal for GR engine.
NVGPU-6790
Signed-off-by: Ramesh Mylavarapu <rmylavarapu@nvidia.com>
Change-Id: I5dc573a6ad698fe20b49a3466a8e50b94cae74df
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2608923
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
- NVGPU need to check for priv lockdown release before configuring
any priv registers. In current GSP bootstrap sequence has irq
configuration after GSP engine reset which is causing priv errors.
So irq configuration should be done after GSP firmware releases
priv lockdown.
- Removed clearing irq mask and dest registers before configuring
them as GSP firmware would have done partial irq configuration
before releasing the priv.
NVGPU-7342
Signed-off-by: Ramesh Mylavarapu <rmylavarapu@nvidia.com>
Change-Id: I4b6e83452c051654253e02bfb72330b3d6aec3fd
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2649826
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Separated gsp unit into three unit:
- GSP unit which holds the core functionality of GSP RISCV core,
bootstrap, interrupt, etc.
- GSP Scheduler to hold the cmd/msg management, IPC, etc.
- GSP Test to hold stress test ucode specific support.
NVGPU-7492
Signed-off-by: Ramesh Mylavarapu <rmylavarapu@nvidia.com>
Change-Id: I12340dc776d610502f28c8574843afc7481c0871
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2660619
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Initially, REMAP only worked with big pages but in some cases
only small pages are supported where REMAP functionality is
also needed.
This cleans up some page size assumptions. In particular, on a
remap request, the nvgpu_vm_area is found from the passed in VA,
but can only be done from virt_offset_in_pages if we're also
told the page size.
This now occurs from _PAGESIZE_ flags which are required by
both map and unmap operations.
Jira NVGPU-6804
Change-Id: I311980a1b5e0e5e1840bdc1123479350a5c9d469
Signed-off-by: Chris Johnson <cwj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2566087
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
The nvgpu timeout API has an internal override for presilicon mode by
default: in presi simulation environments the timeouts never trigger.
This behaviour is intended in the original usecase of the timer unit
with hardware polling loops. In pure software logic though, the timer
must trigger after the specified timeout even in presi mode so add a new
init function to produce a timer for software logic. Use this new kind
of timer in channel and scheduling worker threads.
The channel worker currently times out for just the purpose of the
channel watchdog timer which has its own internal timer. Although that's
just software, the general expectation is that the watchdog does not
trigger in presilicon tests that run slower than usual. The internal
watchdog timer thus keeps the non-sw mode.
Bug 3521828
Change-Id: I48ae8522c7ce2346a930e766528d8b64195f81d8
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2662541
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
GVS: Gerrit_Virtual_Submit
Currently, VAB implementation is using fixed number of access bits. This
value can be computed using fb_mmu_vidmem_access_bit_size_f() value.
- Modify VAB implementation to compute number of access bits.
- Modify nvgpu_vab structure to hold VAB entry size corresponding to
number of access bits.
- Information given by nvgpu_vab structure is more related to the GPU
than nvgpu_mm structure. Move nvgpu_vab struct element to gk20a struct.
- Add fb.set_vab_buffer_address to update vab buffer address in hw
registers.
- Rename gr.vab_init HAL to gr.vab_reserve to avoid any confusion about
when this HAL should be used.
-Replace gr.vab_release and gr.vab_recover with gr.vab_configure HAL.
Bug 3465734
Change-Id: I1b67bfa9be7728be5bda978c6bb87b196d55ab65
Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2659467
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Martin Radev <mradev@nvidia.com>
Reviewed-by: Seema Khowala <seemaj@nvidia.com>
GVS: Gerrit_Virtual_Submit
CBC contig allocation requires mempool node in DT and the
node can be used for contig allocations. The code duplication
can be avoided by unifying the code from vgpu.
Change-Id: I6eaa1d0c9db47b158602bf0ba68ce4e09cf487a7
Signed-off-by: Dinesh T <dt@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2650459
Reviewed-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-by: Richard Zhao <rizhao@nvidia.com>
Reviewed-by: Ankur Kishore <ankkishore@nvidia.com>
GVS: Gerrit_Virtual_Submit
At present, for each resume cycle the driver sends the
"nvgpu_cbc_op_clear" command to L2 cache controller, this causes the
contents of the compression bit backing store to be cleared, and results
in corrupting the metadata for all the compressible surfaces already allocated.
Fix this by updating cbc.init function to be aware of resume state and
not clear the compression bit backing store, instead issue
"nvgpu_cbc_op_invalide" command, this should leave the backing store in a
consistent state across suspend/resume cycles.
The updated cbc.init HAL for gv11b is reusable acrosss multiple chips, hence
remove unnecessary chip specific cbc.init HALs.
Bug 3483688
Change-Id: I2de848a083436bc085ee98e438874214cb61261f
Signed-off-by: Antony Clince Alex <aalex@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2660075
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Separate nvgpu_cg_blcg/slcg_fb_ltc_load_enable function
into nvgpu_cg_blcg/slcg_fb_load_enable and
nvgpu_cg_blcg/slcg_ltc_load_enable.
Program fb slcg/blcg prod values during fb init and
program ltc slcg/blcg prod values after acr boot to
have correct privilege for ltc cg programming.
Update unit tests to have sperate blcg/slcg hal for
fb and ltc programming.
Bug 3423549
Change-Id: Icdb45528abe1a3ab68a47f689310dee9a4fe9366
Signed-off-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2646039
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Allocator (bitmap, buddy, page) debugfs files are not cleaned up when
the allocators are destroyed. This leads to warning logs from nvgpu
like below:
[21073.493000] debugfs: File 'gk20a_as_17' in directory 'allocators' already present!
[21073.493026] debugfs: File 'gk20a_as_17-sys' in directory 'allocators' already present!
Remove the per-allocator debugfs node when destroying an allocator in
runtime.
While at this, add missing nvgpu_allocator locking to the function
nvgpu_bitmap_alloc_destroy. And create nop functions for the
functions nvgpu_init_alloc_debug and nvgpu_fini_alloc_debug
when CONFIG_DEBUG_FS is not defined to avoid adding the
CONFIG checks at multiple places.
Move gk20a_debug_deinit to the end of gk20a_free_cb called in nvgpu_put
as that tears down all debugfs entries. Allocator destroy happens as
part of nvgpu_put call and it can lead to invalid debugfs dentry
access if gk20a_debug_deinit is called before it.
Bug 3481097
Change-Id: I8a66bcf6ade7e5707f9207c78a54d12d7bd94c02
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2648012
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
VPR functionality is split up as static VPR and VPR resize. Static VPR
is supported on all kernels. VPR resize is enabled only on 4.9 kernel.
Enable CONFIG_NVGPU_VPR unconditionally in Linux Makefile. Compile
VPR resize related functionality in nvgpu under the check for
Linux kernel version using new define NVGPU_VPR_RESIZE_SUPPORTED.
JIRA LS-458
Bug 200754700
Change-Id: Ib92f7f1b95afc6c69fbdf33354459c147337350c
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2647619
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
nvs worker thread is created on each resume and deinitialized on every
suspend. nvgpu can be resumed when process is getting killed. Thread
creation can fail when the process is getting killed. That will lead
to driver resume failure.
To avoid the issue above, don't stop the nvs worker thread in suspend
and let the first created thread handle the nvs work always.
Deinitialize the nvs worker thread during nvgpu unload.
Also, log the error returned by nvgpu_thread_create in the function
nvgpu_worker_start.
bug 3480192
Change-Id: I8d5d9e7716a950b162cc3c2d9fcfde07c4edfcf6
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2646218
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
This patch performs the following improvements for VAB:
1) It avoids an infinite loop when collecting VAB information.
Previously, nvgpu incorrectly assumed that the valid bit would
be eventually set for the checker when polling. It may not be set
if a VAB-related fault has occurred.
2) It handles the VAB_ERROR mmu fault which may be caused for various
reasons: invalid vab buffer address, tracking in protected mode,
etc. The recovery sequence is to set the vab buffer size to 0 and
then to the original size. This clears the VAB_ERROR bit. After
reseting, the old register values are again set in the recovery
code sequence.
3) Use correct number of VAB buffers. There's only one VAB buffer on
ga10b, not two.
4) Simplify logic.
Bug 3374805
Bug 3465734
Bug 3473147
Change-Id: I716f460ef37cb848ddc56a64c6f83024c4bb9811
Signed-off-by: Martin Radev <mradev@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2621290
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
1. Registers NV_PLTCG_LTC0_LTS0_DSTG_ECC_REPORT and
NV_PLTCG_LTC0_LTS0_DSTG_ECC_ADDRESS are
deprecated. Remove them.
2. Define NV_PLTCG_LTC0_LTS0_INTR3 for ga100.
3. Add fields and constants for the register
NV_PLTCG_LTC0_LTS0_L2_CACHE_ECC_ADDRESS.
4. Add new fields for the register
NV_PLTCG_LTC0_LTS0_L2_CACHE_ECC_CONTROL.
Bug 3446731
Change-Id: I3e41198b7b2e75ff69b5c6193e6fd54efae15752
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2633958
Reviewed-by: Antony Clince Alex <aalex@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
- Make the domain scheduler timeslice type nanoseconds to future proof
the interface
- Return -ENOSYS from ioctls if the nvs code is not initialized
- Return the number of domains also when user supplied array is present
- Use domain id instead of name for TSG binding
- Improve documentation in the uapi headers
- Verify that reserved fields are zeroed
- Extend some internal logging
- Release the sched mutex on alloc error
- Add file mode checks in the nvs ioctls. The create and remove ioctls
require writable file permissions, while the query does not; this
allows filesystem based access control on domain management on the
single dev node.
Jira NVGPU-6788
Change-Id: I668eb5972a0ed1073e84a4ae30e3069bf0b59e16
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2639017
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Move away from the prototype call in channel wdt worker and create a
separate worker thread for the domain scheduler. The details of runlist
domains are still encapsulated in the runlist code; the domain scheduler
controls when to switch domains. Switching happens based on domain
timeslices or when the current domain is deleted.
The worker thread is paused on railgate and spun back on poweron. The
scheduler data was also left dangling, so fix that by deinitializing all
nvs-related when gk20a_remove_support() is called. The runlist domains
already get freed as part of fifo removal.
Jira NVGPU-6427
Change-Id: I64f42498f8789448d9becdd209b7878ef0fdb124
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2632579
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
The flag pmu->pg->golden_image_initialized is set to
true during initial GPU context creation and is not
cleared while the GPU goes into pm_suspend (during railgate).
Hence, when the GPU resumes after un-railgate it retains
the previous value which can cause ELPG to kick in immediately.
Due to this, when ELPG and Railgating are enabled, IDLE_SNAP
is seen for read access of gr_gpc0_tpc0_sm_arch_r reg.
To resolve this, if golden image is ready set the
pmu->pg->golden_image_initialized to suspend state during railgate,
to delay the early enable of ELPG. Add a new
pmu_init_golden_img_state hal in the NVGPU_INIT_TABLE_ENTRY.
This will be called after all the GR access is done and GPU resumes
completely after un-railgate. This hal will then check if
golden_image_initialized flag is in suspend state, it will set it
to ready state and then re-enable ELPG.
Bug 3431798
Change-Id: I1fee83e66e09b6b78d385bbe60529d0724f79e79
Signed-off-by: Divya <dsinghatwari@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2639188
Reviewed-by: Mahantesh Kumbar <mkumbar@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
WARN() and WARN_ON() are most useful when the log explains where they
happened. The posix implementation of these prints neither that nor the
warning message (if any). Extend the macros to include function name and
line number, and print those plus the format string.
Actually formatting the format string is problematic wrt. MISRA rules,
so the arguments are not formatted.
The implementation of BUG() already prints the function name and line
number.
Change-Id: Ie246a915f5e8420e1c606bb1555a7f9b498725fd
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2634105
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Now that the main nvsched code exists in the nvgpu build, make it
control the runlist domains. As a new nvs domain is created, create the
relevant runlist data too. To support the default domain, create a
default nvs domain at boot.
The scheduling domain code owns the responsibility of domain lifetime,
and runlist domains exist to serve that logic although the RL domains
are directly used by channel and TSG logic. Add refcounting to the
scheduler uapi level to make sure that busy domains (that still have TSG
participants) do not get removed too early.
Adjust error injection sensitive unit tests to match the updated logic.
Jira NVGPU-6425
Jira NVGPU-6427
Change-Id: I1beec97c54c60ad334165b1c0acb5e827c24f2ac
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2632287
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>