Commit Graph

464 Commits

Author SHA1 Message Date
Sagar Kamble
9d6269ce7f gpu: nvgpu: assert gr dev is non-NULL
nvgpu_device_get can return NULL if supplied invalid ID or instance
ID. We expect GR device struct to be non-NULL there hence just
assert that it is indeed non-NULL in gr_reset_engine and
ga10b_grmgr_init_gr_manager.

CID 224133
CID 250232
Bug 3512546

Change-Id: Id09a1c436a8e49b921111b940d3d013bd66bff7a
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2707018
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-05-07 23:24:39 -07:00
Richard Zhao
1ce899ce46 gpu: nvgpu: fix compile error of new compile flags
Preparing to push hvrtos gpu server changes which requires bellow CFLAGS:
        -Werror -Wall -Wextra \
        -Wmissing-braces -Wpointer-arith -Wundef \
        -Wconversion -Wsign-conversion \
        -Wformat-security \
        -Wmissing-declarations -Wredundant-decls -Wimplicit-fallthrough

Jira GVSCI-11640

Signed-off-by: Richard Zhao <rizhao@nvidia.com>
Change-Id: I25167f17f231ed741f19af87ca0aa72991563a0f
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2653746
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-05-07 15:11:49 -07:00
Richard Zhao
c30afdce02 gpu: nvgpu: add periodic timer API
move fecs_trace polling from kthread to timer API.

Jira GVSCI-10883

Signed-off-by: Richard Zhao <rizhao@nvidia.com>
Change-Id: I224754b7205f1d0eefdc19a73a98f42e4d3e9d0e
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2700601
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Shashank Singh <shashsingh@nvidia.com>
Reviewed-by: Aparna Das <aparnad@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-05-02 23:16:44 -07:00
Sagar Kamble
e1cdfaa208 gpu: nvgpu: fix CERT EXP34-C issue
Fix CERT issue in nvgpu_gr_falcon_bind_fecs_elpg where nvgpu_pmu_pg_buf
could return NULL. nvgpu_pmu_pg_buf is called from context where PG
will be enabled hence remove the NULL return logic as it is dead
code.

Replace nvgpu_pmu_pg_buf and nvgpu_pmu_pg_buf_get_cpu_va functions by
new function nvgpu_pmu_pg_buf_alloc.

CID 17860
Bug 3512546

Change-Id: I09820a966dadeb258167ce1433ca256f94845896
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2692466
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-04-14 17:02:34 -07:00
Tejal Kudav
dae284c74b gpu: nvgpu: Disable GR functional intrs on safety
Disable below interrupts on safety as they do not report any error
condition and are not used by CUDA and Graphics(VKSC) on safety
build.
Signoff from CUDA and VKSC is on Bug https://nvbugs/3588603

1. NV_PGRAPH_INTR_NOTIFY: This intr is set when the Notification
     style is WRITE_THEN_AWAKEN.
2. NV_PGRAPH_INTR_SEMAPHORE: This is set when a 3d class sempahore is
     released as the result ofa SetSemaphoreD method, when the
     AwakenEnable field is TRUE.
3. NV_PGRAPH_INTR_BUFFER_NOTIFY: This bit is set when a Mem2mem DMA
     completes and the LaunchDma method specifies the interrupt type
     as INTERRUPT
4. NV_PGRAPH_INTR_DEBUG_METHODS: This is debug feature and not used
     on QNX safety

Bug 3588603
JIRA NVGPU-8166

Change-Id: I6d07dfd2857ac047fac4599421600d364251df76
Signed-off-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2694363
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-04-13 02:35:35 -07:00
Antony Clince Alex
62d6f753d2 gpu: nvgpu: add support for PES, ROP floorsweeping
Volta+ chips supports PES floorsweeping and Ampere+(iGPU) chips supports
ROP floorsweeping. At present, the driver isn't aware of PES, ROP
floorsweeping, make the driver PES, ROP floorsweeping aware by introducing the
following fields in nvgpu_gr_config:
- gpc_(rop/pes)_mask: Contains the bit mask of non FSed ROP/PES units per GPC.
- gpc_(rop/pes)_logical_id_map: Translates per GPC ROP/PES physical id to
  logical id.

Introduce the following HAL functions to read PES/ROP FS data:
- gops_fuse.fuse_status_opt_(pes/rop)_gpc: This fuction gets the FS
  config from the fuse.
- gops_top.get_max_(pes/rop)_per_gpc: Gets the maximum number of PES/ROP
  units that can be present in a GPC.

In addition, introduce the enabled flag NVGPU_SUPPORT_PES_FS to identify chips
which support PES floorsweeping, piggyback on NVGPU_SUPPORT_ROP_IN_GPC
enabled flag to identify ROP floorsweeping.

Bug 3524791

Change-Id: I065bab6c02618fe38892c8c890b069c340b85301
Signed-off-by: Antony Clince Alex <aalex@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2679570
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-04-13 02:32:14 -07:00
Antony Clince Alex
9e0fd1a093 gpu: nvgpu: gr: update gr suspend
Update GR suspend routine to clear GR falcon "coldboot_bootstrap_done"
flag, this is needed because GPU power rails are turned off during
suspend cycle due to which GR falcons need to be bootstrapped again
during resume.

Function "nvgpu_gr_falcon_suspend" is added to clear the above mentioned
flag.

Bug 3497398
Bug 3514055

Change-Id: If852a2c09f05c096f287b845c56d8b4f335ec8e7
Signed-off-by: Antony Clince Alex <aalex@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2670554
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-03-28 23:47:06 -07:00
Rajesh Devaraj
c5822b0d98 gpu: nvgpu: add error prints for errors reported to sdl
In Drive 6.0, only error IDs are reported to Safety_Services. The
additional debug/error information is printed using nvgpu_err().

JIRA NVGPU-8094
Bug 3491596

Change-Id: Ie90f3e1453e6a796d5c76373c11f8a5a188ac590
Signed-off-by: Rajesh Devaraj <rdevaraj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2684289
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Dinesh T <dt@nvidia.com>
Reviewed-by: Ankur Kishore <ankkishore@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-03-22 17:55:10 -07:00
Deepak Nibade
a1ef716f9d gpu: nvgpu: set graphics specific PRI values for graphics contexts
Add new HAL gops.gr.init.set_default_gfx_regs() to set graphics specific
PRI values for graphics contexts in function nvgpu_gr_obj_ctx_alloc().

Add new HAL gops.gr.init.capture_gfx_regs() to capture and save init
values for the PRIs. Add new struct nvgpu_gr_obj_ctx_gfx_regs to hold the
PRI init values.

Define HAL functions gv11b_gr_init_set_default_gfx_regs() and
gv11b_gr_init_capture_gfx_regs(). Set the HAL functions for
gv11b and ga10b.

Register accessors required to set PRIs are auto-generated.

Bug 3506078

Change-Id: I4c2843a274f3c924e402541e600e104ed0c9ed1c
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2671598
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Shashank Singh <shashsingh@nvidia.com>
Reviewed-by: Jonathan Mccaffrey <jmccaffrey@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-03-14 13:17:05 -07:00
Rajesh Devaraj
329807b8f9 gpu: nvgpu: update error ids for pgraph
This patch updates PGRAPH related error IDs for ga10b.
Since sub error type is not supported in Safety_Services 6.0, dedicated
error IDs have been allocated for all sub-errors in PGRAPH.

JIRA NVGPU-8094

Change-Id: Ic8de5815c5ea63e290d11ffca598e58812573603
Signed-off-by: Rajesh Devaraj <rdevaraj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2678289
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-03-09 04:42:36 -08:00
Shashank Singh
5ec241a1d8 gpu: nvgpu: remove non stall intr from top handler for safety
On safety nonstall interrupt is not used and should be compiled out to
rule out any chance of interference with safety code. Remove top handler
support of nonstall interrupt for safety which is currently not
applicable to linux.

Jira NVGPU-7066
Jira NVGPU-4078

Change-Id: I278efc8da6ddd0f22c128af6630cfd1b20ba4784
Signed-off-by: Shashank Singh <shashsingh@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2589006
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2671586
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-02-21 02:31:38 -08:00
Rajesh Devaraj
0699220b85 gpu: nvgpu: compile-out unused apis from safety build
This patch does the following changes:
- Compiles-out unused error reporting APIs and the related
  data structures from safety build. For this purpose, it
  introduces the new flag: CONFIG_NVGPU_INTR_DEBUG
- Updates nvgpu_report_err_to_sdl() API with one more argument,
  hw_unit_id. This aids in finding whether an error to be reported
  is corrected or uncorrected from LUT.
- Triggers SW quiesce, if an uncorrected error is reported to
  Safety_Services, in safety build.
- Renames files in cic folder by replacing gv11b with ga10b,
  since error reporting for gv11b is not supported in dev-main.

JIRA NVGPU-8002

Change-Id: Ic01e73b0208252abba1f615a2c98d770cdf41ca4
Signed-off-by: Rajesh Devaraj <rdevaraj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2668466
Reviewed-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-02-14 22:00:33 -08:00
Debarshi Dutta
3d01b89e68 gpu: nvgpu: expose physical masks for GPCS/FBPs for MIG
Following changes are added
1) nvgpu_gr_config->gpc_tpc_mask_physical is now indexed by physical
gpc id instead of logical id.
2) Removed the conversion of logical fbp ids and replace them with
physical ids.
3) nvgpu_gpu_instance->fbp_en_mask now contains the mask of physical fbp ids.
4) gk20a_ctrl_ioctl_gpu_characteristics returns gpu.gpc_mask returns mask
of physical ids.

Bug 200712091

Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Change-Id: I0e066df76e07203ff4a5be5bfff2cef8566b425d
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2648831
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: Antony Clince Alex <aalex@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-02-11 13:28:50 -08:00
Deepak Nibade
2373a87048 gpu: nvgpu: set compute regs only for compute class
In safety build, gops.gr.init.set_default_compute_regs() is invoked in
nvgpu_gr_obj_ctx_alloc() for all classes. Before enabling graphics
classes in safety this was executed only for compute class. But since
graphics classes are supported in safety now this call should be made
only for compute classes.

Add gops.gpu_class.is_valid_compute() check before calling this
function.

Bug 3482988

Change-Id: If3722be36e779195122f54925ad122871cf13317
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2667324
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: Sagar Kamble <skamble@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-02-10 20:36:06 -08:00
Rajesh Devaraj
7dc013d242 gpu: nvgpu: merge error reporting apis
In DRIVE 6.0, NvGPU is allowed to report only 32-bit metadata to
Safety_Services. So, there is no need to have distinct APIs for
reporting errors from units like GR, MM, FIFO to SDL unit. All
these error reporting APIs will be replaced with a single API. To
meet this objective, this patch does the following changes:
- Replaces nvgpu_report_*_err with nvgpu_report_err_to_sdl.
- Removes the reporting of error messages.
- Replaces nvgpu_log() with nvgpu_err(), for error reporting.
- Removes error reporting to Safety_Services from nvgpu_report_*_err.

However, nvgpu_report_*_err APIs and their related files are not
removed. During the creation of nvgpu-mon, they will be moved under
nvgpu-rm, in debug builds.

Note:
- There will be a follow-up patch to fix error IDs.
- As discussed in https://nvbugs/3491596 (comment #12), the high
level expectation is to report only errors.

JIRA NVGPU-7450

Change-Id: I428f2a9043086462754ac36a15edf6094985316f
Signed-off-by: Rajesh Devaraj <rdevaraj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2662590
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-02-09 00:41:18 -08:00
Sagar Kamble
29a0a146ac gpu: nvgpu: fix coverity defects
Fix following coverity defects:
  ioctl_prof.c resource leak
  ioctl_dbg.c logically dead code
  global_ctx.c identical code for branches
  therm_dev.c resource leak
  pmu_pstate.c unused value
  nvgpu_mem.c dead default in switch
  tsg.c Dereference before null check
  nvlink_gv100.c logically dead code
  nvlink.c Out-of-bounds write
  fifo_vgpu.c Dereference null return value
  pmu_pg.c Dereference before null check
  fw_ver_ops.c Identical code for different branches
  boardobjgrp.c Dereference after null check
  boardobjgrp.c Dereference before null check
  boardobjgrp.c Dereference after null check
  engines.c Dereference before null check
  nvgpu_init.c Unused value

CID 10127875
CID 10127820
CID 10063535
CID 10059311
CID 10127863
CID 9875900
CID 9865875
CID 9858045
CID 9852644
CID 9852635
CID 9852232
CID 9847593
CID 9847051
CID 9846056
CID 9846055
CID 9846054
CID 9842821

Bug 3460991

Change-Id: I91c215a545d07eb0e5b236849d5a8440ed6fe18d
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2657444
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Sachin Nikam <snikam@nvidia.com>
Reviewed-by: Mahantesh Kumbar <mkumbar@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-01-28 04:50:12 -08:00
Richard Zhao
9ab1271269 gpu: nvgpu: common: fix compile error of new compile flags
It's preparing to add bellow CFLAGS:
    -Werror -Wall -Wextra \
    -Wmissing-braces -Wpointer-arith -Wundef \
    -Wconversion -Wsign-conversion \
    -Wformat-security \
    -Wmissing-declarations -Wredundant-decls -Wimplicit-fallthrough

Jira GVSCI-11640

Signed-off-by: Richard Zhao <rizhao@nvidia.com>
Change-Id: Ia8f508c65071aa4775d71b8ee5dbf88a33b5cbd5
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2555056
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-01-13 12:36:14 -08:00
Seshendra Gadagottu
03b1a81ab1 gpu: nvgpu: gr: ignore second zcull request to ctx
All channels in TSG will share same zcull context. Any
attempt to add a second zcull buffer will be ignored.

Bug 3364302

Change-Id: I04e18dfe8e5fac4ca131c3b625755aa90a23180d
Signed-off-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2616677
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2022-01-13 10:30:23 -08:00
Shashank Singh
a372ec9a38 gpu: nvgpu: disable golden context image verification
- Disable golden context image verification until ctxsw fw for orin
safety is ready for this feature.
- Make NULL check for hal set_default_compute_regs else it causes crash
for orin safety.

Bug 3456240

Change-Id: I1f6ca9d78f22cc6776bb0b3a9e05f22171095c7f
Signed-off-by: Shashank Singh <shashsingh@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2645666
(cherry picked from commit 3907d1b315e1247243632fefdcbce69d58090681)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2644533
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-01-06 11:40:46 -08:00
Divya
9446cfa320 gpu: nvgpu: update golden image flag for RG seq
The flag pmu->pg->golden_image_initialized is set to
true during initial GPU context creation and is not
cleared while the GPU goes into pm_suspend (during railgate).
Hence, when the GPU resumes after un-railgate it retains
the previous value which can cause ELPG to kick in immediately.
Due to this, when ELPG and Railgating are enabled, IDLE_SNAP
is seen for read access of gr_gpc0_tpc0_sm_arch_r reg.

To resolve this, if golden image is ready set the
pmu->pg->golden_image_initialized to suspend state during railgate,
to delay the early enable of ELPG. Add a new
pmu_init_golden_img_state hal in the NVGPU_INIT_TABLE_ENTRY.
This will be called after all the GR access is done and GPU resumes
completely after un-railgate. This hal will then check if
golden_image_initialized flag is in suspend state, it will set it
to ready state and then re-enable ELPG.

Bug 3431798

Change-Id: I1fee83e66e09b6b78d385bbe60529d0724f79e79
Signed-off-by: Divya <dsinghatwari@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2639188
Reviewed-by: Mahantesh Kumbar <mkumbar@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-12-11 14:06:49 -08:00
Dinesh T
ad09e3e3cc gpu: nvgpu: Enable sm_l1tag_surface_cut_collector
This is enabling sm_l1tag_surface_cut_collector at gpu boot.
This is done with adding new hal "set_sm_l1tag_surface_collector"
that sets l1tag_surface_cut_collector in the sm_l1tag_ctrl
register.

Bug 2557724

Change-Id: I869e3bfa563db204259e7a464657229632f182d9
Signed-off-by: Dinesh T <dt@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2634878
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-12-06 04:36:56 -08:00
Deepak Nibade
9f55801a15 gpu: nvgpu: move local golden context memory allocation to poweorn
- Separate out local golden context memory allocation from
  nvgpu_gr_global_ctx_init_local_golden_image() into a new function
  nvgpu_gr_global_ctx_alloc_local_golden_image().
- Add a new member local_golden_image_copy to struct
  nvgpu_gr_obj_ctx_golden_image to store copy used for context
  verification.
- Allocate local golden context memory from nvgpu_gr_obj_ctx_init()
  which is called during poweron path.
- Remove memory allocation from nvgpu_gr_obj_ctx_save_golden_ctx().
- Disable test test_gr_obj_ctx_error_injection since it needs rework
  to accomodate the new changes.
- Fix below tests to allocate local golden context memory :
  test_gr_global_ctx_local_ctx_error_injection
  test_gr_setup_alloc_obj_ctx

Bug 3307637

Change-Id: I2f760d524881fd328346838ea9ce0234358f8e51
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2633713
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-12-01 08:44:30 -08:00
dt
e1d6b8af8d gpu: nvgpu: ga10x: compute gnic_stride
GNIC register stride calculation is fixed by adding new hal to compute
the stride by getting the difference of gpc1 and gpc0 xbar_gnic strides
for ga10x GPUs.

Bug 200782045

Change-Id: Iaa84109bd9f1a974ef1af6fee136ca1fcc89bbb1
Signed-off-by: Dinesh T <dt@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2624848
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-12-01 08:40:36 -08:00
Deepak Nibade
3d9c67a0e7 gpu: nvgpu: enable Orin support in safety build
Most of the Orin chip specific code is compiled out of safety build
with CONFIG_NVGPU_NON_FUSA and CONFIG_NVGPU_HAL_NON_FUSA. Remove the
config protection from Orin/GA10B specific code. Currently all code
is enabled. Code not required in safety will be compiled out later
in separate activity.

Other noteworthy changes in this patch related to safety build:

- In ga10b_ce_request_idle(), add a log print to dump num_pce so that
  compiler does not complain about unused variable num_pce.
- In ga10b_fifo_ctxsw_timeout_isr(), protect variables active_eng_id and
  recover under CONFIG_NVGPU_KERNEL_MODE_SUBMIT to fix compilation
  errors of unused variables.
- Compile out HAL gops.pbdma.force_ce_split() from safety since this HAL
  is GA100 specific and not required for GA10B.
- Compile out gr_ga100_process_context_buffer_priv_segment() with
  CONFIG_NVGPU_DEBUGGER.
- Compile out VAB support with CONFIG_NVGPU_HAL_NON_FUSA.
- In ga10b_gr_intr_handle_sw_method(), protect left_shift_by_2 variable
  with appropriate configs to fix unused variable compilation error.
- In ga10b_intr_isr_stall_host2soc_3(), compile ELPG function calls
  with CONFIG_NVGPU_POWER_PG.
- In ga10b_pmu_handle_swgen1_irq(), move whole function body under
  CONFIG_NVGPU_FALCON_DEBUG to fix unused variable compilation errors.
- Add below TU104 specific files in safety build since some of the code
  in those files is required for GA10B. Unnecessary code will be
  compiled out later on.
	hal/gr/init/gr_init_tu104.c
	hal/class/class_tu104.c
	hal/mc/mc_tu104.c
	hal/fifo/usermode_tu104.c
	hal/gr/falcon/gr_falcon_tu104.c
- Compile out GA10B specific debugger/profiler related files from
  safety build.
- Disable CONFIG_NVGPU_FALCON_DEBUG from safety debug build temporarily
  to work around compilation errors seen with keeping this config
  enabled. Config will be re-enabled in safety debug build later.

Jira NVGPU-7276

Change-Id: I35f2489830ac083d52504ca411c3f1d96e72fc48
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2627048
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-11-26 08:46:47 -08:00
Divya
d538737ba1 gpu: nvgpu: Add ELPG_MS protected call for L2 flush
- if L2 flush is done when ELPG_MS feature is engaged
  then it can cause some of the signals to go non-idle.
  This can cause idle snap in ELPG_MS.
- To avoid the idle snap, add elpg_ms protected call before
  L2 flush operation

Bug 200763448

Change-Id: I651875bc051c3b7d26d2bb0b593083512a5765b2
Signed-off-by: Divya <dsinghatwari@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2599459
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Mahantesh Kumbar <mkumbar@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-10-22 06:20:13 -07:00
Vedashree Vidwans
b24f577a5c gpu: nvgpu: reduce traffic on dbg_fn or dbg_info
Reduce debug logs printed when gpu_dbg_info or gpu_dbg_fn is set.
- Add gpu_dbg_verbose flag for more verbose debug prints. Update prints
in to ga10b_gr_init_wait_idle(), gm20b_gr_init_wait_fe_idle(),
gv11b_gr_init_write_bundle_veid_state() and
gv11b_gr_init_load_sw_veid_bundle().
- Add gpu_dbg_hwpm flag for hwpm specific debug prints. Update print in
nvgpu_gr_hwpm_map_create().
- Add gpu_dbg_mm for MM specific debug prints. Update prints in
gm20b_fb_tlb_invalidate(), gk20a_mm_fb_flush(),
gk20a_mm_l2_invalidate_locked(), gk20a_mm_l2_flush() and
gv11b_mm_l2_flush().
- Remove gpu_dbg_fn mask print in gr_ga10b_create_priv_addr_table(),
gr_gk20a_get_pm_ctx_buffer_offsets(), gr_gv11b_decode_priv_addr() and
gr_gv11b_create_priv_addr_table().

Jira NVGPU-7183

Change-Id: I9842d567047cb95a42e23b5907ae324214eed606
Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2602797
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-10-09 15:05:21 -07:00
Seshendra Gadagottu
4333bc7faf gpu: nvgpu: ga10b: patch ctx with rops_crop_debug1_crd_cond_read_disable
For ga10b emulate_mode, patch context with rops_crop_debug1_crd_cond_read_disable
for required perf setting.

Bug 200768322
JIRA NVGPU-6433

Change-Id: Ib1f977ed28e3b18184bce7ac695a0b6a2bae979d
Signed-off-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2602268
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-10-06 18:15:40 -07:00
Deepak Nibade
d1f3f81553 gpu: nvgpu: remove SW methods from safety build
Improved SDL heartbeat mechanism detects the interrupts triggered by
SW method and treats them as errors. Hence remove the SW method support
completely from safety build. Registers set by SW methods are now set
by default for all the contexts.

Implement new HAL gops.gr.init.set_default_compute_regs() to set the
registers in patch context. Call this HAL while creating each context.

Update gv11b_gr_intr_handle_sw_method() to treat all compute SW methods
as invalid.

Update unit test test_gr_intr_sw_exceptions() so that it now expects
failure for any method/data.

Bug 200748548

Change-Id: I614f6411bbe7000c22f1891bbaf06982e8bd7f0b
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2527249
(cherry picked from commit bb6e0f9aa1404f79bcfbdd308b8c174a4fc83250)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2602638
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: Rajesh Devaraj <rdevaraj@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-10-04 18:03:55 -07:00
Konsta Hölttä
1b1d183b9c gpu: nvgpu: simplify gmmu map calls
Introduce nvgpu_gmmu_map_partial() to map a specific size of a buffer
represented by nvgpu_mem, or what nvgpu_gmmu_map() used to do. Delete
the size parameter from nvgpu_gmmu_map() such that it now maps the
entire buffer. The separate size parameter is a historical artifact from
when nvgpu_mem did not exist yet; the typical use is to map the entire
buffer.

Mapping at a certain address with nvgpu_gmmu_map_fixed() still takes the
size parameter.

The returned address still has to be stored somewhere, typically to
mem.gpu_va by the caller so that the matching unmap variant finds the
right address.

Change-Id: I7d67a0b15d741c6bcee1aecff1678e3216cc28d2
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2601788
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-10-01 21:38:43 -07:00
Konsta Hölttä
44422db851 gpu: nvgpu: simplify gmmu unmap calls
Introduce nvgpu_gmmu_unmap_addr() to unmap a nvgpu_mem that was mapped
at some other address than mem.gpu_va, which can be the case for buffers
that are shared across different address spaces. Delete the address
parameter from nvgpu_gmmu_unmap(), as the common case is to store the
address to mem.gpu_va when mapping the buffer.

Modify some instances of consecutive unmap + free calls to call just
nvgpu_dma_unmap_free().

Change-Id: Iecd7c9aa41d04e9f48e055f6bc0c9227cd759c69
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2601787
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-09-30 16:29:41 -07:00
Deepak Nibade
af989f6212 gpu: nvgpu: fix misra rule 13.2 violations in common.gr unit
Fix MISRA rule 13.2 violations of below type from common.gr unit:

nvgpu/drivers/gpu/nvgpu/common/gr/gr_intr.c:108
  Type: MISRA C-2012 Side Effects (MISRA C-2012 Rule 13.2, Required)

nvgpu/drivers/gpu/nvgpu/common/gr/gr_intr.c:108:
  1. misra_c_2012_rule_13_2_violation:
  In "nvgpu_safe_add_u32(nvgpu_gr_gpc_offset(g, gpc), nvgpu_gr_tpc_offset(g, tpc))",
  there are 2 function calls in the arguments for which the order of
  evaluation is undefined.

Jira NVGPU-7127

Change-Id: Ie867fb62098eed3a45ec01b941eda93b94220b4b
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2598696
(cherry picked from commit 15483df6ca1017e5b9d6f2dff35f7e57094a2b4d)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2601976
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: V M S Seeta Rama Raju Mudundi <srajum@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-09-29 15:14:34 -07:00
Sagar Kamble
72c3bce602 gpu: nvgpu: compile out non-safe ctxsw_prog hals
Following two hals are non-safe. Compile them under
CONFIG_NVGPU_HAL_NON_FUSA:
1. init_ctxsw_hdr_data
2. disable_verif_features

JIRA NVGPU-5358

Change-Id: I751c4655dc628f7ab66ed3a779268a6a88f9a1e3
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2581361
(cherry picked from commit abf16c6a01109d174879609c10354f06739fb6dc)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2581842
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-09-21 03:17:12 -07:00
Sagar Kamble
62b04331de gpu: nvgpu: compile out priv_access_map config/addr hals
These hals are non-safe. Compile them out with
CONFIG_NVGPU_SET_FALCON_ACCESS_MAP.

JIRA NVGPU-5358

Change-Id: I75b46e201fa132e09fee15679a402d24bbf9b2ab
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2581360
(cherry picked from commit d048333ef391019b2618abf7d09c8fe2042f8ee0)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2581841
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-09-21 03:17:00 -07:00
Debarshi Dutta
a53ebf02d1 gpu: nvgpu: update error message to info.
These errors are now actually expected from code that counts number of
sys/gpc/fbp perfmons after first context creation. Nvgpu tries to count
them by register offset lookup in context image and counts perfmons until
invalid offset is found.

nvgpu_gr_hwmp_map_find_priv_offset no longer prints an error message.
The correct error condition is moved to gr_exec_reg_ops

Bug 200755537

Change-Id: Ib5c6ccd39275b2b06e3f8bce4878a3234478a780
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2586228
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-09-09 09:13:03 -07:00
dt
152d7c9edd gpu: nvgpu: Fix for pes_tpc_mask programming
After CONFIG_UBSAN kernel compilation flag to know any shifting
cause overflow or not enablement ,this is identified.
The register "gr_fe_tpc_fs_r(gpc_index)" is read only after
Volta. The gops where we are computing the index is not needed.

Bug 200727116

Change-Id: Ib2306103389ba9df77fd59d012ec70e775104989
Signed-off-by: dt <dt@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2573296
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-09-07 15:59:48 -07:00
Debarshi Dutta
33740b41b6 gpu: nvgpu: free memory during module removal
Following pointers(allocated via Kmalloc/DMA) aren't freed during
module removal.

struct nvgpu_gr_config -> gpc_tpc_mask_physical
struct nvgpu_netlist_vars -> ctxsw_regs.etpc.l
struct mm_gk20a -> sysmem_flush
struct nvgpu_pmu_pg -> pg_buf
SGTable corresponding to VPR secure buffer.

Added appropriate free calls.

Bug 3364181

Change-Id: I2105c1f3256b1910f0f514d98f0ee3ae2e34aff7
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2586244
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-09-02 15:43:07 -07:00
Debarshi Dutta
2e3c3aada6 gpu: nvgpu: fix deinit of GR
Existing implementation of GR de-init doesn't account for multiple
instances of struct nvgpu_gr. As a fix, below changes are added.

1) nvgpu_gr_free is unified for VGPU as well as native.
2) All the GR instances are freed.
3) Appropriate NULL checks are added when freeing GR memories.
4) 2D, 3D, I2M and ZBC etc are explicitely disabled when MIG is set.
5) In ioctl_ctrl, checks are added to not return error when zbc is NULL
   for VGPU as requests are rerouted to RMserver.

Jira NVGPU-6920

Change-Id: Icaa40f88f523c2cdbfe3a4fd6a55681ea7a83d12
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2578500
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: Dinesh T <dt@nvidia.com>
Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: Antony Clince Alex <aalex@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-08-23 05:27:45 -07:00
Sagar Kamble
40064ef1ec gpu: nvgpu: fix ecc counter free
ECC counter structures are freed without removing the node from the
stats_list. This can lead to invalid access due to dangling pointers.

Update the ecc counter free logic to set them to NULL upon free, to
remove them from stats_list and free them by validation.

Also updated some of the ecc init paths where error was not propa-
gated to callers and full ecc counters deallocation was not done.

Now, calling unit ecc_free from any context (with counters alloc-
ated or not) is harmless as requisite checks are in place.

bug 3326612
bug 3345977

Change-Id: I05eb6ed226cff9197ad37776912da9dcb7e0716d
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2565264
Tested-by: Ashish Mhetre <amhetre@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-08-11 01:55:08 -07:00
Sagar Kamble
0f59efb2cd gpu: nvgpu: return tpc exceptions error properly
It is observed that recovery on receiving the ESR MMU NACK exception
does not get triggered as the error returned by tpc level handler
is masked.

NACK is marked handled but recovery is not done and subsequent fb
intr handler does not trigger recovery since NACK is handled.

This leaves the HW engines in bad state.

Fix the tpc error return logic to trigger recovery during ESR MMU
NACK exception.

Bug 3318939

Change-Id: I475826f734e4366e853607e1e0338290ee28249b
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2564764
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-07-26 05:13:41 -07:00
Antony Clince Alex
f80dccb543 gpu: nvgpu: report gpc_tpc_mask in physical order
At present, there is an inconsistency in the order in which
gpc_tpc masks are reported to the userspace. Both gpc and
tpc masks are reported using physical-ids. However, the
gpc_tpc_masks array is ordered by logical gpc-ids and
not physical-ids. This creates a mismatch between the gpc
reported as enabled in the gpc_mask and its corresponding
gpc_tpc_mask.

Introduce field "gpc_tpc_mask_physical" which stores the
gpc_tpc_masks in physical order and update
NVGPU_GPU_IOCTL_GET_TPC_MASKS to return this field.

Bug 200665942

Change-Id: I63aa83414a59676b7e7d36b6deb527e2f3c04cff
Signed-off-by: Antony Clince Alex <aalex@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2531114
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-07-19 16:04:01 -07:00
Deepak Nibade
2237221a57 gpu: nvgpu: fix CERT EXP34-C errors in common.gr
nvgpu_gr_config_get_sm_info() returns NULL if invalid SM id is provided
to the API. Since it is possible return NULL, a NULL check is required
at all callers.

Also, nvgpu_gr_config_get_sm_info() is always called in a loop from 0
to (sm_count - 1) and hence adding an nvgpu_assert() should be
sufficient.

Change-Id: I0fd92ac354447796c4c7d7237e7bd3b6e5c2682c
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2552409
(cherry picked from commit 4f3789d6563bbfe1be3e25c522ca1eac0d5d2d13)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2558271
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: V M S Seeta Rama Raju Mudundi <srajum@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-07-13 13:52:24 -07:00
Deepak Nibade
4edf952e3e gpu: nvgpu: fix rule 5.1 misra violations in common.gr
Fix rule 5.1 misra violations in common.gr by renaming below functions :

nvgpu_gr_config_get_gpc_tpc_mask_base ->
  nvgpu_gr_config_get_base_mask_gpc_tpc

nvgpu_gr_config_get_gpc_tpc_count_base ->
  nvgpu_gr_config_get_base_count_gpc_tpc

gm20b_ctxsw_prog_set_priv_access_map_config_mode ->
  gm20b_ctxsw_prog_set_config_mode_priv_access_map

gm20b_ctxsw_prog_set_priv_access_map_addr ->
  gm20b_ctxsw_prog_set_addr_priv_access_map

gm20b_gr_falcon_read_fecs_ctxsw_mailbox ->
  gm20b_gr_falcon_read_mailbox_fecs_ctxsw

gm20b_gr_falcon_read_fecs_ctxsw_status0 ->
  gm20b_gr_falcon_read_status0_fecs_ctxsw

gm20b_gr_falcon_read_fecs_ctxsw_status1 ->
  gm20b_gr_falcon_read_status1_fecs_ctxsw

gv11b_gr_intr_get_sm_hww_warp_esr_pc ->
  gv11b_gr_intr_get_warp_esr_pc_sm_hww

gv11b_gr_intr_get_sm_hww_warp_esr ->
  gv11b_gr_intr_get_warp_esr_sm_hww

Jira NVGPU-6779

Change-Id: Icbe23a7b022373785968fc417ee247e2d80cfcc6
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2554521
(cherry picked from commit 1432650774506f2a7e45f70b084f498736d0d0c5)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2555330
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-07-13 09:20:41 -07:00
Lakshmanan M
e9872a0d91 gpu: nvgpu: Skip graphics unit access when MIG is enabled
This CL covers the following modifications,
1) Added logic to skip the graphics unit specific sw context load
   register write during context creation when MIG is enabled.
2) Added logic to skip the graphics unit specific sw method
   register write when MIG is enabled.
3) Added logic to skip the graphics unit specific slcg and blcg gr
   register write when MIG is enabled.
4) Fixed some priv errors observed during MIG boot.
5) Added MIG Physical support for GPU count < 1.
6) Host clk register access is not allowed for GA100.
   So skipped to access host clk register.
7) Added utiliy api - nvgpu_gr_exec_with_ret_for_all_instances()
8) Added gr_pri_mme_shadow_ram_index_nvclass_v() reg field
   to identify the sw method class number.

Bug 200649233

Change-Id: Ie434226f007ee5df75a506fedeeb10c3d6e227a3
Signed-off-by: Lakshmanan M <lm@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2549811
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-07-02 16:41:51 -07:00
tkudav
0526e7eaa9 gpu: nvgpu: Create CIC-mon and CIC-rm subunits
common.cic unit is divided into common.cic.mon and common.cic.rm
based on rm and mon process split.

CIC-mon subunit includes the code which is utilized in critical
interrupt handling path like initialization, error detection and
error reporting path. CIC-rm subunit includes the code corresponding
to rest of interrupt handling(like collecting error debug data from
registers) and ISR status management (status of deferred interrupts).

Split the CIC APIs and data-members into above two subunits.

JIRA NVGPU-6899

Change-Id: I151b59105ff570607c4a62e974785e9c1323ef69
Signed-off-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2551897
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-07-02 09:57:56 -07:00
Deepak Nibade
8ccf9820ba gpu: nvgpu: check for valid sm_id in nvgpu_gr_config_get_sm_info
Check if requested sm_id is valid in nvgpu_gr_config_get_sm_info()
function. Also update doxygen documentation for same.

Also, ensure SM count is set using nvgpu_gr_config_set_sm_info() before
usig nvgpu_gr_config_get_sm_info() to retrieve it.

Update unit test test_gr_config_set_get to set valid SM count instead of
random number. With random number it is possible that SM count is set
higher than size of SM info struct. This could result into test process
crash.

Change-Id: I4292977b7e880752c65001cbd594e0617fe135f5
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2549882
(cherry picked from commit ee9767cac1a27ffbc99f707c1aa158b8216d757f)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2551983
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-07-01 06:51:05 -07:00
Richard Zhao
77f0ab6583 gpu: nvgpu: remove gpu_va update_hwpm_ctxsw_mode
Since gpu server can noew allocate va itself, update_hwpm_ctxsw_mode
does not need to fixed map pm ctx anymore.

Jira GVSCI-10977

Change-Id: If592c8a2eb6dbfd7d922c79c87871162e9d8d8a4
Signed-off-by: Richard Zhao <rizhao@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2546192
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-06-28 18:10:18 -07:00
Richard Zhao
4e08649b7f gpu: nvgpu: move mem checking of gr_ctx to .alloc_obj_ctx
Preparing for adding vgpu cmd .add_obj_ctx and memory will be allocated
on server side. Outside of implementation of .alloc_obj_ctx, code should
not check whether gr_ctx is valid by check gr_ctx mem.

Jirs GVSCI-10977

Change-Id: I6b3d826e930fdfaaae517d204186642e49f5c2d7
Signed-off-by: Richard Zhao <rizhao@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2546190
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-06-28 18:10:01 -07:00
Antony Clince Alex
d2919409e9 gpu: nvgpu: rename/collpase nvgpu_next functions and structs
Replace all nvgpu_next functions/structs either by 1) collapsing them
into nvgpu legacy functions/structs 2) renaming them as follows:
- nvgpu_next_*() => nvgpu_(ga10b/ga100)_*()
- nvgpu_next_*() => (ga10b/ga100)_*()
- nvgpu_next_*() => nvgpu_*() [only if this doesn't cause collision]
- nvgpu_next_*() = > nvgpu_*_extra()

Create hal.sim unit and move Ampere+ SIM code into it.

Jira NVGPU-4771

Change-Id: I215594a0d0df4bd663bd875a0d0db47bcb9ff6a2
Signed-off-by: Antony Clince Alex <aalex@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2548056
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Mahantesh Kumbar <mkumbar@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-06-27 05:02:58 -07:00
Antony Clince Alex
f9cac0c64d gpu: nvgpu: remove nvgpu_next files
Remove all nvgpu_next files and move the code into corresponding
nvgpu files.

Merge nvgpu-next-*.yaml into nvgpu-.yaml files.

Jira NVGPU-4771

Change-Id: I595311be3c7bbb4f6314811e68712ff01763801e
Signed-off-by: Antony Clince Alex <aalex@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2547557
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Mahantesh Kumbar <mkumbar@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-06-27 05:02:53 -07:00
Antony Clince Alex
c7d43f5292 gpu: nvgpu: remove usage of CONFIG_NVGPU_NEXT
The CONFIG_NVGPU_NEXT config is no longer required now that ga10b and
ga100 sources have been collapsed. However, the ga100, ga10b sources
are not safety certified, so mark them as NON_FUSA by replacing
CONFIG_NVGPU_NEXT with CONFIG_NVGPU_NON_FUSA.

Move CONFIG_NVGPU_MIG to Makefile.linux.config and enable MIG support
by default on standard build.

Jira NVGPU-4771

Change-Id: Idc5861fe71d9d510766cf242c6858e2faf97d7d0
Signed-off-by: Antony Clince Alex <aalex@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2547092
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-06-27 05:02:47 -07:00