In MIG mode, each of the dev nodes should be enumerated for each fGPU.
And for physical instance only the "ctrl" node should be enumerated.
Support this with below set of changes :
- Add struct nvgpu_mig_static_info that describes static GPU instance
configuration. GPCs are enumerated only during poweron and grmgr unit
will populate instance information based on number of GPCs.
For linux, GPU poweron happens only with first gk20a_busy() call and
instance information is not available during probe() time. Hence this
static table is a temporary solution until proper solution is
identified.
- Add nvgpu_default_mig_static_info for iGPU and
nvgpu_default_pci_mig_static_info for dGPU that describes GPU instance
partition.
- Add new function nvgpu_prepare_mig_dev_node_class_list() that parses
the static table and creates one class per instance in MIG mode.
Non-MIG mode classes are now enumerated in
nvgpu_prepare_default_dev_node_class_list().
- Add new structure nvgpu_cdev_class_priv_data to store private data for
each cdev. This will hold instance specific information and pointer to
private data will be maintained in struct class and also passed as
private data while creating device node with device_create()
- Add nvgpu_mig_phys_devnode() to set dev node path/names for fGPUs and
add nvgpu_mig_fgpu_devnode() to set dev node path/names for physical
instance in MIG mode.
- Add new field mig_physical_node to struct nvgpu_dev_node. This field
is set if corresponding dev node should be created for physical
instance in MIG mode. For now set it only for "ctrl" node.
Jira NVGPU-5648
Change-Id: Ic97874eece1fbe0083b3ac4c48e36e06004f1bc2
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2434586
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: Lakshmanan M <lm@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Remove devnode_class pointer from struct nvgpu_os_linux and replace it
by a list head.
Add new structure nvgpu_class to store class related meta-data and
create it dynamically in nvgpu_create_class().
Add new function nvgpu_prepare_dev_node_class_list() to prepare list of
all classes that are required for each GPU.
For now there is only one class per GPU, but in MIG mode multiple
classes will be created with one class per instance.
Update gk20a_user_init() to loop through list of classes and create
dev nodes for each class.
gk20a_user_deinit() frees up the linked list.
Jira NVGPU-5648
Change-Id: I891a55c0ce1c2ff9db094564529b3f569df9735c
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2428501
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Remove static dev node meta data from struct nvgpu_os_linux and replace
it by a dynamic list. Struct nvgpu_os_linux will only keep track of list
head and number of entries.
Add new structure nvgpu_cdev to store meta data of each dev node and
create/setup it dynamically in gk20a_user_init(). Once done, add the new
node under list head maintained in nvgpu_os_linux.
Add a static list dev_node_list[] that contains list of dev node names
and file operations. This static list is used to create nvgpu_cdev data
structures and to register new device nodes.
Update all dev node open file operations (e.g. gk20a_as_dev_open()) to
extract struct gk20a pointer from device pointer of dev node.
gk20a device is the parent of dev node device.
Jira NVGPU-5648
Change-Id: If070c3428afd6215e45b4919335d9f43e04c36f9
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2428500
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Remove static class definition and registration for iGPU and dGPU.
Create the class dynamically in gk20a_user_init() and setup the callback
function to create devnode name based on GPU type.
For now add nvgpu_pci_devnode() callback for dGPU that sets correct
dev node path for dGPUs. For iGPU, Android apparently does not honor dev
node path set in callback and hence override the device name for iGPU
with function nvgpu_devnode().
Destroy the class in gk20a_user_deinit().
This will overall be helpful in adding multiple classes and dev nodes
for each GPU instance in MIG mode.
Set GPU device pointer as the parent of new devices created with
device_create(). This is helpful in getting GPU device name in
callback function nvgpu_pci_devnode().
Update functions to not pass class structure and interface names :
nvgpu_probe()
gk20a_user_init()
gk20a_user_deinit()
nvgpu_remove()
Remove static interface name format like INTERFACE_NAME since it is no
longer needed.
Update GK20A_NUM_CDEVS to 10 since there are 10 dev nodes per GPU right
now.
Jira NVGPU-5648
Change-Id: I5d41db5a0f87fa4a558297fb4135a9fbfcd51080
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2423492
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Use fixed address mapping for pma byte buffer so that the address of
this buffer always fits in 32 bits.
This also requires to move unmap sequence to OS specific function since
different unmap API is now needed for linux and QNX.
Also call nvgpu_prof_free_pma_stream_priv_data() before
nvgpu_profiler_free_pma_stream() since former uses mm->perfbuf which
is released in later.
Bug 2510974
Jira NVGPU-5360
Change-Id: I398b0ca4f96527d6e09c9aacacb4b43c90f5bfc9
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2424691
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Background: There is a race that occurs when l2_fb_ops ioctl is
invoked. The race occurs as part of the flush() call while a
gk20_idle() is in progress.
This patch handles the race by making changes in the l2_fb_ops
ioctl itself. For cases where pm_runtime is disabled or railgate is
disabled, we allow this ioctl call to always go ahead as power is
assumed to be always on.
For the other case, we first check the status of g->power_on. In the
driver, g->power_on is set to true, once unrailgate is completed and is
set to false just before calling railgate.
For linux, the driver invokes gk20a_idle() but there is a delay after
which the call to the rpm_suspend()'s callback gets triggered. This
leads to a scenario where we cannot efficiently rely on the
runtime_pm's APIs to allow us to block an imminent suspend or exit if
the suspend is currently in progress. Previous attempts at solving this
has lead to ineffective solutions and make it much complicated to
maintain the code.
With regards to the above, this patch attempts to simplify the way this
can be solved. The patch calls gk20a_busy() when g->power_on = true.
This prevents the race with gk20a_idle(). Based on the rpm_resume and
rpm_suspend's upstream code, resume is prioritized over a suspend
unless a suspend is already in progress i.e. the delay period has been
served and the suspend invokes the callback. There is a very small
window for this to happen and the ioctl can then power_up the device as
evident from the gk20a_busy's calls.
nvgpu power state is queried using nvgpu_is_powered_off to determine
whether to skip the resume. power state is protected under spinlock.
Bug 200507468
Change-Id: I5c02dfa8ea855732e59b759d167152cf45a1131f
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2299545
(cherry picked from commit 06942bd268)
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2425682
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
In linux kernel v4.14 and below gpu sysfs node is created under
/sys/devices. In linux kernel v5.x it is created under
/sys/devices/platform.
Create symbolic link gpu.0 under /sys/devices/ as various tests
and scripts expect it to be there.
Bug 200665782
Change-Id: I807ce72fad94438f927df25e829082e771b72543
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2426544
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Fuse registers should be queried with physical gpc-id and not the
logical ones. For tu104 and before chips physical gpc-ids are same as
logical for non-floorswept config but for newer chips it may differ.
Also, logical to physical mapping is not present for a floorswept gpc so
query gpc_tpc mask only upto actual gpcs that are present.
Jira NVGPU-6080
Change-Id: I84c4a3c1f256fdd1927f4365af26e9892fe91beb
Signed-off-by: shashank singh <shashsingh@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2417721
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
The simulator ring buffer DMA interface supports buffers of the following sizes:
4, 8, 12 and 16K. At present, it is configured to 4K and it happens to match
with the kernel PAGE_SIZE, which is used to wrap back the GET/PUT pointers once
4K is reached. However, this is not always true; for instance, take 64K pages.
Hence, replace PAGE_SIZE with SIM_BFR_SIZE.
Introduce macro NVGPU_CPU_PAGE_SIZE which aliases to PAGE_SIZE and replace
latter with former.
Bug 200658101
Jira NVGPU-6018
Change-Id: I83cc62b87291734015c51f3e5a98173549e065de
Signed-off-by: Antony Clince Alex <aalex@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2420728
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Mapping of large buffers to GMMU end up needing many
pages for the PTE tables. Allocating these one by one
can end up being a performance bottleneck, particularly
in the virtualized case.
This is adding the following changes:
- As the TLB invalidation doesn't have access to mem_off,
allow top-level allocation by alloc_cache_direct().
- Define NVGPU_PD_CACHE_SIZE, the allocation size for a new slab
for the PD cache, effectively set to 64K bytes
- Use the PD cache for any allocation < NVGPU_PD_CACHE_SIZE
When freeing up cached entries, avoid prefetch errors by
invalidating the entry (memset to 0).
- Try to fall back to direct allocation of smaller chunk for
contiguous allocation failures.
- Unit test changes.
Bug 200649243
Change-Id: I0a667af0ba01d9147c703e64fc970880e52a8fbc
Signed-off-by: dt <dt@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2404371
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Update nvgpu_gr_zbc as:
struct nvgpu_gr_zbc {
struct nvgpu_mutex zbc_lock; /* Lock to access zbc table */
struct zbc_color_table *zbc_col_tbl; /* SW zbc color table pointer */
struct zbc_depth_table *zbc_dep_tbl; /* SW zbc depth table pointer */
struct zbc_stencil_table *zbc_s_tbl; /* SW zbc stencil table pointer */
u32 min_color_index; /* Minimum valid color table index */
u32 min_depth_index; /* Minimum valid depth table index */
u32 min_stencil_index; /* Minimum valid stencil table index */
u32 max_color_index; /* Maximum valid color table index */
u32 max_depth_index; /* Maximum valid depth table index */
u32 max_stencil_index; /* Maximum valid stencil table index */
u32 max_used_color_index; /* Max used color table index */
u32 max_used_depth_index; /* Max used depth table index */
u32 max_used_stencil_index; /* Max used stencil table index */
};
Add global struct nvgpu_gr_zbc_table_indices
struct nvgpu_gr_zbc_table_indices {
u32 min_color_index;
u32 min_depth_index;
u32 min_stencil_index;
u32 max_color_index;
u32 max_depth_index;
u32 max_stencil_index;
};
Currently, hw zbc table registers are written during both
gr_init_setup_sw() and gr_init_setup_hw().
- Modify nvgpu_gr_zbc_load_default_table() to
nvgpu_gr_zbc_load_default_sw_table() to only update sw copy of zbc table
during gr_init_setup_sw().
- Modify nvgpu_gr_zbc_load_table() to write zbc values stored in sw zbc
table to hw registers.
Re-structure zbc function as per zbc type i.e. color, depth and stencil.
Add gr.zbc.init_table_indices() hal to initialize zbc indices. Valid ZBC
table indices start from 1. HW indices start from 0 for color, depth and
stencil tables. Note that the corresponding format registers follow ZBC
index range starting at 1.
- void (*init_table_indices)(struct gk20a *g,
struct nvgpu_gr_zbc_table_indices *zbc_indices);
- Add corresponding functions for legacy chips
- Add zbc color, depth and stencil table size hw defines
- Remove ltc.zbc_table_size() hal
- Update ltc.set_zbc_s_entry(), ltc.set_zbc_color_entry and
ltc.set_zbc_depth_entry() accordingly.
Bug 3122410
Bug 3122649
Change-Id: Ib799991ad35c6613534c0a6eb07f3bf24e600dc5
Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2417620
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Right now PMA byte buffer address is allocated in the range of
0x1ffc010000. The register that stores this address is only 32-bit and
there is no corresponding _hi() register, so the address must fit in
32 bits.
Update nvgpu_vm_init() parameters in nvgpu_perfbuf_init_vm() so that a
low_hole of only 4K is used. This allows the address to be allocated
in the range of 0x4000000.
Also map byte buffer before PMA stream buffer so that byte buffer always
gets lower address.
There is only one PMA stream buffer allowed to be mapped right now so
this works for now. But in future multiple buffers can be mapped and this
solution needs to be reworked.
Bug 2510974
Jira NVGPU-5360
Change-Id: Ief1a9ee54d554e3bc13c7a9567934dcbeaefbcc6
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2418520
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
APIs to set preemption modes right now have config based code to set
default preemption modes or to check if given preemption mode is valid
or not. This makes code unreadable and complex.
Rework nvgpu_gr_obj_ctx_init_ctxsw_preemption_mode() so that it checks
for initial preemption modes in the beginning. If no preemption mode is
passed while allocating context, get default preemption modes with
gops.gr.init.get_default_preemption_modes() and use them.
Rework nvgpu_gr_ctx_check_valid_preemption_mode() so that it is more
readable. Use gops.gr.init.get_supported_preemption_modes() to validate
incoming preemption modes against supported preemption modes.
Log preemption modes getting set in
nvgpu_gr_obj_ctx_set_ctxsw_preemption_mode().
Disable failing unit test. It will need rework according to new code.
Jira NVGPU-5648
Change-Id: Ie1a3e1aeae7826a123e104d9d016f181bea3b271
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2419034
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Lakshmanan M <lm@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
common.gr defined a temporary macro NVGPU_GR_NUM_INSTANCES to enable or
disable multiple GR instances from common.gr unit.
Multiple GR instance boot is now verified, so we can remove this
temporary solution.
Note that nvgpu_grmgr_get_num_gr_instances() will return more than 1
instance only if NVGPU_SUPPORT_MIG is enabled.
Update unit tests to set number of syspipes to 1 to allow enumeration
of GR instance by grmgr.
Jira NVGPU-5648
Change-Id: I795901ae516843ae7b6c1794dae0f023a213ab1d
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2418377
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Lakshmanan M <lm@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Below HALs are implemented in common.gr unit, but they really belong
to common.perf unit since they access registers from perf unit.
gops.gr.init_hwpm_pmm_register()
gops.gr.get_num_hwpm_perfmon()
gops.gr.set_pmm_register()
gops.gr.reset_hwpm_pmm_registers()
Move them to common.perf unit, and update all the code accordingly
gops.perf.init_hwpm_pmm_register()
gops.perf.get_num_hwpm_perfmon()
gops.perf.set_pmm_register()
gops.perf.reset_hwpm_pmm_registers()
Add new HAL gops.gr.get_pm_ctx_buffer_offsets() and set it to
gr_gk20a_get_pm_ctx_buffer_offsets() for all chips.
Bug 2510974
Jira NVGPU-5360
Change-Id: Ib5e84ed5c8b6e72cc6923161e55fc2c3a6a4070e
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2418306
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Add new HAL g->ops.gr.reset_hwpm_pmm_registers() to reset all HWPM regs
while binding HWPM in global mode in nvgpu_profiler_bind_hwpm()
Add below new HALs to get sys/gpc/fbp register list and count
g->ops.perf.get_hwpm_sys_perfmon_regs()
g->ops.perf.get_hwpm_gpc_perfmon_regs()
g->ops.perf.get_hwpm_fbp_perfmon_regs()
Auto generate all the HWPM regs in below arrays for gv11b/tu104
static const u32 hwpm_sys_perfmon_regs[]
static const u32 hwpm_gpc_perfmon_regs[]
static const u32 hwpm_fbp_perfmon_regs[]
Bug 2510974
Jira NVGPU-5360
Change-Id: I2ca5c04ed75c7b30ae942807bf018a24551d7ba0
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2414934
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: Antony Clince Alex <aalex@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>