Add separate new unit gr/ctxsw_prog that provides interface to access
h/w header files hw_ctxsw_prog_*.h
Add below chip specific files that access above h/w unit and provide
interface through g->ops.gr.ctxsw_prog.*() HAL for rest of the units
common/gr/ctxsw_prog/ctxsw_prog_gm20b.c
common/gr/ctxsw_prog/ctxsw_prog_gp10b.c
common/gr/ctxsw_prog/ctxsw_prog_gv11b.c
Remove all the h/w header includes from rest of the units and code.
Remove direct calls to h/w headers ctxsw_prog_*() and use HALs
g->ops.gr.ctxsw_prog.*() instead
In gr_gk20a_find_priv_offset_in_ext_buffer(), h/w header
ctxsw_prog_extended_num_smpc_quadrants_v() is only defined on gk20a
And since we don't support gk20a remove corresponding code
Add missing h/w header ctxsw_prog_main_image_pm_mode_ctxsw_f() for
some chips
Add new h/w header ctxsw_prog_gpccs_header_stride_v()
Jira NVGPU-1526
Change-Id: I170f5c0da26ada833f94f5479ff299c0db56a732
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1966111
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
All the netlist parsing code is currently under GR unit, but netlist
ucode parsing does not really have any logical dependency to GR
Hence separate out a new unit common/netlist/ that parses the netlist
image and stores/exposes its content through netlist_vars structure
Structure nvgpu_netlist_vars is added to structure gk20a
Move netlist parsing code to common/netlist/netlist.c and chip
specific files to common/netlist/netlist_<chip>.c
Move simulation netlist parsing to common/netlist/netlist_sim.c
Rename g.ops.gr_ctx HAL to g.ops.netlist
Rename all the exported structures to be in the form of nvgpu_*
Rename all exported functions to be in the form of nvgpu_netlist_*()
Add netlist initialization to GPU boot path, and add deinitialization
to GPU remove path
Jira NVGPU-1317
Change-Id: I9af86e3b3230a89db5260cc8ed96ff5f72938c9a
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1936454
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
- This patch corrects parameters in set_pmm_registers
- As FBP 6 and 7 are floorswept for GV100, GPU_LIT_NUM_FBPS
should not be used
- halify get_num_hwpm_perfmon and set_pmm_register
Bug 2106999
Change-Id: Ib285b25d0c836c93b529dfe4e26c078159a3e6dd
Signed-off-by: Vaibhav Kachore <vkachore@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1785620
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
MISRA rule 21.2 doesn't allow the use of macro names which start with
an underscore. These leading underscores are to be removed from the
macro names. This patch will fix such violations in gv100 by renaming
them to follow the convention, 'NVGPU_PARENT-DIR_HEADER-NAME' when
there is no keyword repetition between file name and directory or
'NVGPU_HEADER-NAME' when there is repetition.
JIRA NVGPU-1028
Change-Id: I0e4fabc8b31d0bfd66ab541435ac8813e5cc4c1d
Signed-off-by: smadhavan <smadhavan@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1815554
Reviewed-by: svc-misra-checker <svc-misra-checker@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-by: Adeel Raza <araza@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
- Write new pm mode to context buffer header. Ucode use
this mode to enable mode-e context switch. This is Mode-B
context switch of PMs with Mode-E streamout on one context.
If this mode is set, Ucode makes sure that Mode-E pipe
(perfmons, routers, pma) is idle before it context switches PMs.
- This allows us to collect counters in a secure way
(i.e. on context basis) with stream out.
Bug 2106999
Change-Id: I5a7435f09d1bf053ca428e538b0a57f3a175ac37
Signed-off-by: Vaibhav Kachore <vkachore@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1760366
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Separate HAL added in gv11b and gv100 for
init_gpc_mmu function.
In gv11b HAL, RMW is enabled for gpu atomics
as default.
In gv100 HAL, GPC atomic capability mode will
get set based on the FB MMU capability.
If GPU is connected through NVLINK then mmu
will be set to RMW mode, else it will be in
L2 mode.
Bug 200390336
Change-Id: I224934f83d1762ec864ef8da7265dd01d86893a0
Signed-off-by: Ashish Srivastava <assrivastava@nvidia.com>
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1735137
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
In gr_gv11b/gk20a_create_priv_addr_table() we do not consider floorswept FBPAs
and just calculate the unicast list assuming all FBPAs are present
This generates incorrect list of unicast addresses
Fix this introducing new HAL ops.gr.split_fbpa_broadcast_addr
Set gr_gv100_get_active_fpba_mask() for GV100
Set gr_gk20a_split_fbpa_broadcast_addr() for rest of the chips
gr_gv100_get_active_fpba_mask() will first get active FPBA mask and generate
unicast list only for active FBPAs
Bug 200398811
Jira NVGPU-556
Change-Id: Idd11d6e7ad7b6836525fe41509aeccf52038321f
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1694444
GVS: Gerrit_Virtual_Submit
Reviewed-by: Bharat Nihalani <bnihalani@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GV100 ucode is changed so that it expects LIST_nv_perf_pma_ctx_reg list in
ctxsw buffer to be 256 byte aligned but same change is not applied to other
chip ucodes
ADD new HAL (*add_ctxsw_reg_perf_pma) to configure PMA register list and
define a common HAL gr_gk20a_add_ctxsw_reg_perf_pma() for all other
chips except GV100
Define a separate HAL for GV100 gr_gv100_add_ctxsw_reg_perf_pma() and fix
the required alignment in this function
Bug 1998067
Change-Id: Ie172fe90e2cdbac2509f2ece953cd8552e66fc56
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1676655
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
For LIST_nv_pm_fbpa_ctx_regs, we right now call
add_ctxsw_buffer_map_entries_subunits() to add registers corresponding
to all the FBPAs
But while configuring total number of registers, we do not consider
floorswept FBPAs and that causes misalignment in subsequent lists for GV100
Fix this by reading disabled/floorswept FBPAs from fuse and consider only those
FBPAs which are active for GV100
Add new HAL (*add_ctxsw_reg_pm_fbpa) to support this setting and define a
common HAL gr_gk20a_add_ctxsw_reg_pm_fbpa() for all chips except GV100
Define GV100 specific gr_gv100_add_ctxsw_reg_pm_fbpa() with above mentioned
implementation to consider floorsweeping
Bug 1998067
Change-Id: Id560551bb0b8142791c117b6d27864566c90b489
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1676654
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
On gv11b we can have multiple SMs per TPC. Add sm_per_tpc in
vgpu constants to properly dimension the virtual SM to TPC/GPC
mapping in virtualization case.
Use TEGRA_VGPU_CMD_GET_SMS_MAPPING to query current mapping.
Bug 2039676
Change-Id: I817be18f9a28cfb9bd8af207d7d6341a2ec3994b
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1631203
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
SMID tables were generated according with the local tpc and the pagepool and cb
buffers from a different chip and did not take performance in consideration,
which made compute kernels hang with CTAs on the fly.
This change ensures we are using the right sizes and adds proper enumeration
of smids.
JIRA: NVGPUGV100-36
bug 2004378
Change-Id: Ic8f50c325d6d6720cca41d9740ae4f5f51e1100a
Signed-off-by: David Nieto <dmartineznie@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1581664
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>