Commit Graph

20 Commits

Author SHA1 Message Date
Sagar Kamble
40064ef1ec gpu: nvgpu: fix ecc counter free
ECC counter structures are freed without removing the node from the
stats_list. This can lead to invalid access due to dangling pointers.

Update the ecc counter free logic to set them to NULL upon free, to
remove them from stats_list and free them by validation.

Also updated some of the ecc init paths where error was not propa-
gated to callers and full ecc counters deallocation was not done.

Now, calling unit ecc_free from any context (with counters alloc-
ated or not) is harmless as requisite checks are in place.

bug 3326612
bug 3345977

Change-Id: I05eb6ed226cff9197ad37776912da9dcb7e0716d
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2565264
Tested-by: Ashish Mhetre <amhetre@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-08-11 01:55:08 -07:00
Sagar Kadamati
3e43f92f21 gpu: nvgpu: add ga10b & ga100 sources
Mass copy ga10b & ga100 sources from nvgpu-next repo.
TOP COMMIT-ID: 98f530e6924c844a1bf46816933a7fe015f3cce1

Jira NVGPU-4771

Change-Id: Ibf7102e9208133f8ef3bd3a98381138d5396d831
Signed-off-by: Sagar Kadamati <skadamati@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2524817
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-06-17 12:56:16 -07:00
Deepak Nibade
7a937a6190 gpu: nvgpu: add debug logs for common.gr debugging
Add separate flag gpu_dbg_gr to enable common.gr specific debugging.
Add this flag to all the existing debug logs that use gpu_dbg_fn or
gpu_dbg_info for debugging. Also add many other debugging logs that
might be helpful in debugging.

Removing debug log in gv11b_gr_init_get_nonpes_aware_tpc() as it dumps
too much data that does not seem useful.

Batch all interrupt enable functions in gr_init_setup_hw() together for
readability.

Jira NVGPU-5648

Change-Id: I0b857650122cdb1f974b452d28c26e7f142baf61
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2411740
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Lakshmanan M <lm@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Sagar Kamble
a73ca0b70e gpu: nvgpu: split GR ECC initialization
Split GR ECC initialization into GPC/TPC and FECS ECC init as FECS ECC
errors during acr_construct_execute need to be reported and handled
hence FECS ECC counters are required to be initialized before
acr_construct_execute.

GPC/TPC ECC counters are dependent on the GR config that will be
initialized only after acr_construct_execute.

nvgpu_gr_intr_init_support is moved to nvgpu_gr_prepare_sw.

FECS ECC interrupt is enabled by default hence interrupt is not
enabled through gr_fecs_host_int_enable_r in nvgpu_gr_prepare_sw.

JIRA NVGPU-4439

Change-Id: Ifc9912f0578015a6ba1e9d38765c42633632b15f
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2261987
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: Vinod Gopalakrishnakurup <vinodg@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:10:29 -06:00
Sagar Kamble
daf5475f50 gpu: nvgpu: split ecc support per GPU HW unit
To enable ecc interrupts early during nvgpu_finalize_poweron, ecc
support has to be enabled early. ecc support was being initialized
together for GR, LTC, PMU, FB units late in the poweron sequence.

Move the ecc init for each unit to respective unit's init functions.
And separate out the hal ecc functions from GR ecc unit to
respective hal units.

JIRA NVGPU-4336

Change-Id: I2c42fb6ba3192dece00be61411c64a56ce16740a
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2239153
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:10:29 -06:00
vinodg
f73da1dfe5 gpu: nvgpu: code correction in gr ecc unit
FECS_FEATURE_OVERRIDE_ECC bits for SM_L0_CACHE and SM_L1_CHACHE
need to be checked against NV_PGRAPH_PRI_FECS_FEATURE_OVERRIDE_ECC_1
register.
Correct the error of checking those bits against
NV_PGRAPH_PRI_FECS_FEATURE_OVERRIDE_ECC register.

Jira NVGPU-4095

Change-Id: I09737b83496f9e728e0b022bd6a4e75741bd0c49
Signed-off-by: vinodg <vinodg@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2210429
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Alex Waterman <alexw@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:05:52 -06:00
Vinod G
c7a2633e82 gpu: nvgpu: Correct misra error in gr unit
Correct 5.3 misra error in gr ecc unit
misra_c_2012_rule_5_3_violation: Declaration with identifier
"err" hides another declaration.
Change the variable name inside the function.

Jira NVGPU-4085

Change-Id: Ib8cd62ce98da1cf1bf327eb8dd92e51c1b77d6f4
Signed-off-by: Vinod G <vinodg@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2209867
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:05:52 -06:00
Vinod G
2608c4d80b gpu: nvgpu: ccm fix for gr unit
Fix CCM issue in gv11b_ecc_init_tpc function
Reduced the complexity below 10 by adding sub functions
in ecc_init for corrected error count and
uncorrected error count.

Jira NVGPU-4084

Change-Id: I27593a68ee80790e9c66168cc1225b3e3a0c27cc
Signed-off-by: Vinod G <vinodg@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2203958
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:05:52 -06:00
Sagar Kamble
ec293030c1 gpu: nvgpu: move non-safe functions from fusa hal to non-fusa hal
Multiple non-safe functions under NVGPU_DEBUGGER, NVGPU_CILP and other
config flags were moved to fusa files. Although they are guarded by
the C flags, it makes sense to keep those functions in non-fusa
files. Make this change for all hals.

JIRA NVGPU-3853

Change-Id: I8151b55a60cb50c5058af48bab9e8068f929ac3b
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2204352
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:05:52 -06:00
Rajesh Devaraj
935c5f6578 gpu: nvgpu: fix misra violations in SDL
This patch addresses misra violations due to SDL error reporting
callbacks. In particular, it addresses the following misra violation:

- misra_c_2012_directive_4_7_violation: Calling function
  "nvgpu_report_*_err()" which returns error information without testing
  the error information.

JIRA NVGPU-4025

Change-Id: Ia10b6b3fd9c127a8c5189c3b6ba316f243cedf04
Signed-off-by: Rajesh Devaraj <rdevaraj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2196895
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:05:52 -06:00
Vinod G
0deeb6b2f8 gpu: nvgpu: Fix misra 4.7 errors in gr ecc unit
Fix misra 4.7 violations in gr ecc unit
misra_c_2012_directive_4_7_violation: return error information hasn't been tested.

jira NVGPU-4054

Change-Id: I6e10a637f45886667de733827444526216061cc7
Signed-off-by: Vinod G <vinodg@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2197398
Reviewed-by: Prateek Sethi <prsethi@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:05:52 -06:00
Vinod G
a20739c1f6 gpu: nvgpu: misra error in gr unit
Fix MISRA violation for rule 8.6 in ecc and ctx gr units.
misra_c_2012_rule_8_6_violation:function is declared but never defined

Jira NVGPU-3854

Change-Id: Ia3e3f6ab6d2c33d31a3518fe3fbd033d403cbb7e
Signed-off-by: Vinod G <vinodg@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2168765
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Philip Elcan <pelcan@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-08-07 10:47:58 -07:00
Rajesh Devaraj
fa6ada7619 gpu: nvgpu: disable hw error injection support in safety-release
This patch disables HW based fake error injection support in safety-release
build. For this purpose, it makes use of the following flag:
CONFIG_NVGPU_INJECT_HWERR.

JIRA NVGPU-3861

Change-Id: I1fa8544e67adbc53a1f3b98b340d76cf4f5bf524
Signed-off-by: Rajesh Devaraj <rdevaraj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2163289
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-07-30 04:09:31 -07:00
Vinod G
fe3be6431e gpu: nvgpu: reduce code complexity in gr.ecc unit
Reduce code complexity of following functions in gr.ecc unit
gp10b_ecc_init(complexity : 17)
gp10b_ecc_detect_enabled_units(complexity : 15)
gv11b_ecc_init(complexity : 23)
gv11b_ecc_detect_enabled_units(complexity : 30)

Create sub functions by moving the control statement codes from the
function which has high complexity above 10.

Create four sub functions from gp10b_ecc_init function for
sub units init.
gp10b_ecc_init_lts(complexity : 2)
gp10b_ecc_init_tpc(complexity : 2)
gp10b_ecc_init_tpc_tex(complexity : 8)
gp10b_ecc_init_tpc_sm(complexity : 5)
and reduce gp10b_ecc_init complexity to 3

Create four sub functions from gp10b_ecc_detect_enabled_units function
gp10b_ecc_enable_ltc(with complexity : 4)
gp10b_ecc_enable_tex(with complexity : 4)
gp10b_ecc_enable_smshm(with complexity : 4)
gp10b_ecc_enable_smlrf(with complexity : 4)
and reduce gp10b_ecc_detect_enabled_units complexity to 3

Create four sub functions from gv11b_ecc_init function for
sub units init.
gv11b_ecc_init_tpc(complexity : 10)
gv11b_ecc_init_gpc(complexity : 6)
gv11b_ecc_init_fb(complexity : 6)
gv11b_ecc_init_other_units(complexity : 6)
and reduce gv11b_ecc_init complexity to 5

Create six sub functions from gv11b_ecc_detect_enabled_units function
gv11b_ecc_enable_smlrf(with complexity : 4)
gv11b_ecc_enable_sml1data(with complexity : 4)
gv11b_ecc_enable_sml1tag(with complexity : 4)
gv11b_ecc_enable_smicache(with complexity : 6)
gv11b_ecc_enable_ltc(with complexity : 4)
gv11b_ecc_enable_smcbu(with complexity : 4)
and reduce gv11b_ecc_detect_enabled_units complexity to 3

Jira NVGPU-3662

Change-Id: Id10be4f9a500c300f66756ebae41bfff3b734aea
Signed-off-by: Vinod G <vinodg@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2159050
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-07-23 12:15:49 -07:00
Sagar Kamble
249ffa0fb0 gpu: nvgpu: split ecc_gv11b fusa/non-fusa hal
functions in ecc_gv11b.c are needed in ecc_gv11b_fusa.c, hence moved
them there. Updated the arch yaml to reflect the fusa and non-fusa
units for ecc.

JIRA NVGPU-3690

Change-Id: Id7b65901840a1f9494215f722cdcb943e243aaa4
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2156876
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-07-19 18:19:35 -07:00
Sagar Kamble
9bb347edec gpu: nvgpu: fix the hw header accessors
Various gv11b register accessors are passed as function pointer to
NVGPU_ECC_ERR. pmu logic needs access to head, tail, mutex registers
as function pointers. fix the same.

JIRA NVGPU-3733

Change-Id: I5668fedaac187fab052ee5d68a10f7e2d6d35413
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2150880
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-07-12 06:20:44 -07:00
Debarshi Dutta
69ef86e627 gpu: nvgpu: move safe code HAL files to fusa
This patch moves all the safe static and non-static functions as well
as its dependencies such as static declared structs into files with
_fusa.c extension. If the original file is left with no functions
remaining then the file is deleted.

Added changes in Makefile, Makefile.sources, nvgpu-hal-new.yaml for
compilation.

Jira NVGPU-3690

Change-Id: I81af67c308705faf8a681df63a6778e7de2076cf
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2146761
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: Sagar Kamble <skamble@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-07-03 02:46:15 -07:00
Vinod G
c3c541b1af gpu: nvgpu: Fix CERT INT30-C errors in hal.gr.ecc unit
Fix CERT INT30-C erros in hal.gr.ecc units.
Unsigned integer operation may wrap. Use safe_ops macro to fix
the wrap errors.

Jira NVGPU-3585

Change-Id: I3811bfe0c542e7960ab8dbc2877465f7a72d1761
Signed-off-by: Vinod G <vinodg@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2133803
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: Nitin Kumbhar <nkumbhar@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-06-11 05:05:57 -07:00
Rajesh Devaraj
05ed37ae3a gpu: nvgpu: remove usage of hw headers from SDL
This patch does the following:
(1) Removes the usage of hw headers in SDL unit. For this purpose, it moves
    the initialization required for errors that can be injected using hw
    support, error injection function. Further, it passes the required
    information to SDL via hal layers.
(2) Renames (i) PWR as PMU, (ii) nvgpu_report_ecc_parity_err to
    nvgpu_report_ecc_err.

Jira NVGPU-3235

Change-Id: I69290af78c09fbb5b792058e7bc6cc8b6ba340c9
Signed-off-by: Rajesh Devaraj <rdevaraj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2112837
Reviewed-by: Raghuram Kothakota <rkothakota@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-05-31 04:06:51 -07:00
Alex Waterman
28c98f0f7d gpu: nvgpu: Move ECC code into ECC hal directory
Refactor the ECC code present under gp10b/, gv11b/, and tu104/. To
do this a few things were required:

Firstly, move the ecc related C and H files under the aforementioned
directories over to their new home under hal/gr/ecc/. Also copy over
the ECC HAL code present in gp10b/gp10b.c - not sure why this was
there but it, too, needed to be refactored. Also handle all the
updated header paths and Makefiles to take these movements into
account.

Secondly add a new HAL in GR (gr.ecc) to handle ECC HAL ops. There's
only two: detect and init. init() was copied over from gr and detect
was added so that the common ECC code can call it in a chip agnostic
way. This is required so that the ECC detect can be moved out of the
gp10b init characteristics function.

Lastly update the ECC init flow in the common ecc module to call
detect() right before init().

JIRA NVGPU-3074

Change-Id: Iba1464a904f9a91a45f30902beadf6f912607e40
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2090977
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-04-12 15:32:50 -07:00