Commit Graph

197 Commits

Author SHA1 Message Date
srajum
8381647662 gpu: nvgpu: fixing MISRA violations
- MISRA Directive 4.7
  Calling function "nvgpu_tsg_unbind_channel(tsg, ch, true)" which returns
  error information without testing the error information.

- MISRA Rule 10.3
  Implicit conversion from essential type "unsigned 64-bit int" to different
  or narrower essential type "unsigned 32-bit int"

- MISRA Rule 5.7
  A tag name shall be a unique identifier

JIRA NVGPU-5955

Change-Id: I109e0c01848c76a0947848e91cc6bb17d4cf7d24
Signed-off-by: srajum <srajum@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2572776
(cherry picked from commit 073daafe8a11e86806be966711271be51d99c18e)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2678681
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-03-10 16:01:18 -08:00
Shashank Singh
19a3b86f06 gpu: nvgpu: remove unused code from common.nvgpu on safety build
- remove unused code from common.nvgpu unit on safety build. Also,
remove the code which uses them in other places.
- document use of compiler intrinsics as mandated in code inspection
  checklist.

Jira NVGPU-6876

Change-Id: Ifd16dd197d297f56a517ca155da4ed145015204c
Signed-off-by: Shashank Singh <shashsingh@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2561584
(cherry picked from commit 900391071e9a7d0448cbc1bb6ed57677459712a4)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2561583
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2022-02-17 04:58:32 -08:00
Konsta Hölttä
632644b44a gpu: nvgpu: couple runlist domains and nvs
Now that the main nvsched code exists in the nvgpu build, make it
control the runlist domains. As a new nvs domain is created, create the
relevant runlist data too. To support the default domain, create a
default nvs domain at boot.

The scheduling domain code owns the responsibility of domain lifetime,
and runlist domains exist to serve that logic although the RL domains
are directly used by channel and TSG logic. Add refcounting to the
scheduler uapi level to make sure that busy domains (that still have TSG
participants) do not get removed too early.

Adjust error injection sensitive unit tests to match the updated logic.

Jira NVGPU-6425
Jira NVGPU-6427

Change-Id: I1beec97c54c60ad334165b1c0acb5e827c24f2ac
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2632287
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-12-07 07:07:12 -08:00
Sagar Kamble
48d17e9c53 gpu: nvgpu: fix the unit test traceability
gk20a_tsg_unbind_channel_check_hw_next was not added to Targets
in unit test specification. Add it.  __attribute__ in debug.h
is captured by Doxygen as function with no tests. However it
is not really a function and applies to non-fusa function so
skip it in Doxygen.

JIRA NVGPU-7211

Change-Id: I2adadaebbf4e43768eb408dd10aaa20b1e13eccc
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2615256
(cherry picked from commit e829afb55a17dc0dacf17c71633f5689324171d7)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2623629
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-11-14 04:24:17 -08:00
Sagar Kamble
f64b5e20b0 gpu: nvgpu: add unit test for gk20a_tsg_unbind_channel_check_hw_next
false branch when NEXT bit is not set is not covered. Add unit test
for same.

JIRA NVGPU-7211

Change-Id: I57725e35971605bf8144e7eaac618f44a38e5b31
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2614209
(cherry picked from commit 2064209f92700dc859d7398e061b3d7dc2725521)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2623628
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-11-14 04:24:11 -08:00
Konsta Hölttä
6cff904dc3 gpu: nvgpu: use runlist obj for wait_pending
Change the gops_runlist::wait_pending API to take a runlist pointer
instead of a runlist ID to better match with the rest of that interface.

Jira NVGPU-6425

Change-Id: I96c4f49df8e2613498e0a09cc75a950824828bed
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2621214
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-11-11 20:39:47 -08:00
Konsta Hölttä
3cf796b787 gpu: nvgpu: move active bitmaps to domain
Move the active_channels and active_tsgs bitmaps from struct
nvgpu_runlist to struct nvgpu_runlist_domain. A TSG and its channels are
currently active as part of a runlist; in the future, a runlist may be
switched from multiple domains that each are a collection of TSGs.

The changes are still internal to the runlist code. Users of runlists
need no modifications.

Jira NVGPU-6425

Change-Id: I2d0e98e97f04b9716bc3f4890cf881735d0ab664
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2618387
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-11-03 20:55:08 -07:00
Konsta Hölttä
1d23b8f13a gpu: nvgpu: introduce internal runlist domain
The current runlist code assumes a single runlist buffer to hold all TSG
and channel entries. Create separate RL domain and domain memory types
to hold data that is related to only a scheduling domain and not
directly to the runlist hardware; in the future, more than one domains
may exist and one of them is enabled at a time.

The domain is used only internally by the runlist code at this point and
is functionally equivalent to the current runlist memory that houses the
round robin entries.

The double buffering is still kept, although more domains might benefit
from some cleverness. Although any number of created domains may be
edited in runtime, nly one runlist memory is accessed by the hardware at
a time. To spare some contiguous memory, this should be considered an
opportunity for optimization in the future.

Jira NVGPU-6425

Change-Id: Id99c55f058ad56daa48b732240f05b3195debfb1
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2618386
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-11-03 20:54:48 -07:00
Konsta Hölttä
9b3f3ea4be gpu: nvgpu: remove timeout fault injection tests
The timeout init API is changing to return void in most cases. Adapt the
unit tests to the reduced branching.

Change-Id: I4d05484529fe4ef46b518f41d10b71a4a9f9c6fb
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2614286
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-10-26 13:47:20 -07:00
Debarshi Dutta
791dc18666 gpu: nvgpu: bvec for struct nvgpu_tsg_sm_error_state fields
Add Setter and Getter methods for accessing tsg->sm_error_states.
Getter returns a constant pointer for struct nvgpu_tsg_sm_error_state.
This renders it unnecessary to add BVEC for above fields for the struct
in multiple locations. The current design ensures that only a constant
pointer is obtained from the owner unit i.e. FIFO.

The following new methods are added. Both unit tests and BVEC tests
are added for them as well.

nvgpu_tsg_store_sm_error_state
nvgpu_tsg_get_sm_error_state

Jira NVGPU-6947

Change-Id: I82c22a2774862c8579baa41b6fb8292fa164704a
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
(cherry picked from commit 79574638671a0c6efe41cd3423668fcd1bd96826)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2556938
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Shashank Singh <shashsingh@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-09-13 20:57:09 -07:00
Tejal Kudav
b33079d47e gpu: nvgpu: Move intr data members from MC to CIC
Move interrupt specific data-members from common.mc to common.cic
Some of these data members like sw_irq_stall_last_handled_cond need
To be initialized much earlier during the OS specific init/probe stage.
Also, some more members from struct nvgpu_interrupts(like stall_size,
stall_lines[]), which will soon be moved to CIC will also need to be
initialized early during the OS specific probe stage.
However, the chip specific LUT can only be initialized after the
hal_init stage where the HALs are all initialized.
Split the CIC init to accommodate the above initialization requirements.

JIRA NVGPU-6899

Change-Id: I9333db4cde59bb0aa8f6eb9f8472f00369817a5d
Signed-off-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2552535
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-07-19 18:06:28 -07:00
Debarshi Dutta
6d917822c8 gpu: nvgpu: bvec for ramin unit.
1) added a BVEC test for g->ops.ramin.set_big_page_size
2) Currently, runlist unit tests are not enabled in Dev-Main. Left
it as it is.

Jira NVGPU-6905

Change-Id: I7aefce472743653624cc5a22d978632f77b5f404
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2548305
(cherry picked from commit c84b6c890a1711cd7c15ec974ea59041a0ace6d5)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2554022
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-07-07 12:25:57 -07:00
Debarshi Dutta
200777b854 gpu: nvgpu: bvec for channel and tsg
Below changes are added.

1) Added checks in
    nvgpu_channel_from_id__func, nvgpu_tsg_check_and_get_from_id
2) Added BVEC tests for
    nvgpu_channel_open_new, nvgpu_channel_from_id,
    nvgpu_tsg_check_and_get_from_id, nvgpu_tsg_set_error_notifier
3) Added common function get_random_u32.

Jira NVGPU-6905

Change-Id: I374d6f5503dc05e3224213d772a1752d82cbdc91
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2548304
(cherry picked from commit 39b2529b3e96cfd3cbd3bb020f32ee2cca0ea363)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2554021
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Sachin Nikam <snikam@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-07-07 12:25:50 -07:00
tkudav
0526e7eaa9 gpu: nvgpu: Create CIC-mon and CIC-rm subunits
common.cic unit is divided into common.cic.mon and common.cic.rm
based on rm and mon process split.

CIC-mon subunit includes the code which is utilized in critical
interrupt handling path like initialization, error detection and
error reporting path. CIC-rm subunit includes the code corresponding
to rest of interrupt handling(like collecting error debug data from
registers) and ISR status management (status of deferred interrupts).

Split the CIC APIs and data-members into above two subunits.

JIRA NVGPU-6899

Change-Id: I151b59105ff570607c4a62e974785e9c1323ef69
Signed-off-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2551897
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-07-02 09:57:56 -07:00
Tejal Kudav
e0a1fcf5f5 gpu: nvgpu: Add Central Intr Controller unit
Add a new Central Interrupt Controller(CIC) unit in common code.
The interrupt handling is done in a distributed manner currently.
The error handling policy for different errors resides in each unit's
ISR code. The goal is to converge this data under one central place -
the CIC unit.

This patch creates framework for CIC unit and moves the gv11b QNX
safety LUT to CIC unit. All the error reporting APIs from different
units are also moved to CIC.

New APIs are exposed by CIC unit to access its internal data like:
  1. Struct err_desc - the static err handling /injection data per
                       error id
  2. Num_hw_modules  - the number of error reporting HW units
                       supported by CIC

Init and deinit of CIC unit:
  1. CIC unit should be initialized earlyon during boot so that it
     is available for any interrupt handling.
  2. Initialize CIC just before the interrupts are enabled during
     boot.
  3. Similarly, CIC is disabled late during deinit cycle; right
     after the interrupts are masked.

LUT:
  1. LUT is currently used only for reporting error to safety
     services in gv11b QNX safety build.
  2. This error handling policy LUT currently has only two levels
     of handing - correctable and quiecse.
  3. Once, the error handling policy decision is moved from leaf
     unit nodes to CIC, LUT will be updated to have additional levels
     like fast recovery and full recovery.
  4. Also, then a separate LUT will be added for each platform/build.
  5. In current framework, the LUT is set to NULL for all
     configurations except gv11b.

report_err() ops is added to report error to safety services.
This ops is only effective for gv11b qnx build; and set to NULL for
other configurations.

NVGPU-6521
NVGPU-6523
NVGPU-6750
NVGPU-6758
NVGPU-6760
NVGPU-6754

Change-Id: I24be7836a96d787741e37b732e19863ed8014635
Signed-off-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2518683
Reviewed-by: Ajesh K V <akv@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-05-25 14:28:04 -07:00
Sagar Kamble
07d8a39647 gpu: nvgpu: wait for stalling interrupts to complete during TSG unbind preempt
Some of the engine stalling interrupts can block the context save off
the engine if not handled during fifo.preempt_tsg. They need to be
handled while polling for engine ctxsw status.

Bug 200711183

Change-Id: I7418a9e0354013b81fbefd8c0cab5068404fc44e
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2521971
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-05-03 20:40:05 -07:00
Vedashree Vidwans
625d942c52 gpu: nvgpu: update gops_fifo.intr_1_isr logic
Update gops_fifo.intr_1_isr to clear interrupt and return
NVGPU_NONSTALL_OPS_WAKEUP_SEMAPHORE only if channel interrupt is pending

Jira NVGPU-6222

Change-Id: I976f8bcf53c7735b154f40bb70b5f401020c8dd4
Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2479250
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-03-05 01:27:39 -08:00
Alex Waterman
06f554318a userspace: Update unit tests to use new .id and .runlist fields
Update the unit test code to reflect the following changes:

  https://git-master.nvidia.com/r/c/linux-nvgpu/+/2470305
  https://git-master.nvidia.com/r/c/linux-nvgpu/+/2470306

These adjust field names/types in struct nvgpu_runlist and
struct nvgpu_tsg; no functional changes are made but these do
affect the unit tests that rely on setting these private fields.

JIRA NVGPU-6425

Change-Id: I1fc9d56e363d326f1e6901aaec710903894ac61d
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2477585
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-02-19 15:16:51 -08:00
Alex Waterman
d925e33e8b userspace: Prune unit tests for new runlist code
Remove and prune the now broken tests related to the runlist updates.

JIRA NVGPU-6425

Change-Id: I76e03c943ceae261e35958aa64717b5590a19c0e
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2474334
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Vedashree Vidwans <vvidwans@nvidia.com>
Reviewed-by: Shashank Singh <shashsingh@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-01-29 09:51:50 -08:00
Alex Waterman
11d3785faf gpu: nvgpu: Rename struct nvgpu_runlist_info, fields in fifo
Rename struct nvgpu_runlist_info to struct nvgpu_runlist; the
info is not necessary. struct nvgpu_runlist is soon to be a
first class object among the nvgpu object model.

Also rename the fields runlist_info and active_runlist_info to
simply runlists and active_runlists respectively. Again the info
text is just not necessary and somewhat misleading. These structs
_are_ the runlist representations in SW; they are not merely
informational.

Also add an rl_dbg() macro to print debug info specific to
runlist management and some debug prints specifying the runlist
topology for the running chip.

Change-Id: Id9fcbdd1a7227cb5f8c75cca4abbff94fe048e49
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2470303
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2021-01-20 21:56:33 -08:00
Sagar Kamble
cf287a4ef5 gpu: nvgpu: retry tsg unbind if NEXT is set
The NEXT bit can remain set for the channel if timeslice expires before
scheduler clears it. Due to this nvgpu fails TSG unbind and in turn
nvrm_gpu fails channel close. In this case, checking the channel hw
state after some time can help see NEXT bit cleared by scheduler.

Reenable the tsg and return -EAGAIN to nvrm_gpu for it to retry again.

Bug 3144960

Change-Id: I35f417f02270e371a4e632986b73a00f8a4f921a
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2468391
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2021-01-18 23:11:57 -08:00
Vedashree Vidwans
2386ddd038 gpu: nvgpu: modify pbdma.get_fc_target
Modify pbdma.get_fc_target() to accept nvgpu_device pointer. This is
required for nvgpu-next.

JIRA NVGPU-6135

Change-Id: I8baa58c704ee32ee68e87915029ac2be2132d4a4
Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2440180
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:48 -06:00
tkudav
2ca4f145e4 gpu: nvgpu: Fix HAL checker pointed mismatches
Add new HALs for register field definition/value changes in
GV11B as compared to Pascal. Update the HALs for recent
chips too if applicable.

Bug 200604892

Change-Id: I14ee9440859007e86a1ffa937df399a31e2628bd
Signed-off-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2437564
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Vedashree Vidwans
78fb67bb0b gpu: nvgpu: move fuse definitions to fuse.h
Move common fuse definition macros to fuse.h. This will allow all
chip specific fuse files to use the common macros.

Jira NVGPU-6081

Change-Id: I85b5250809eef26a40f5b4b9bf6908dfa0d2be1f
Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2422892
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Shashank Singh
3aec79d242 gpu: nvgpu: add check for valid engine id
-Check validity of engine-id when iterating through all engines and
passing the engine-id as an argument to other function(s).
-Skip test test_gv100_dump_engine_status which fails due to this change.

Bug 200660469

Change-Id: I64ebb1a0297f605dd3cba7ef73954ff5594828bc
Signed-off-by: Shashank Singh <shashsingh@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2424655
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Antony Clince Alex
09857ecd91 userspace: units: replace PAGE_SIZE with NVGPU_CPU_PAGE_SIZE
Replace PAGE_SIZE with NVGPU_CPU_PAGE_SIZE, which is a nvgpu defined wrapper
over OS native page size.

Bug 200658101
Jira NVGPU-6018

Change-Id: If35e23d5df38a6b52b586911d1055e0b00b12ebe
Signed-off-by: Antony Clince Alex <aalex@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2424792
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Vedashree Vidwans <vvidwans@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Peter Daifuku
a331fd4b3a gpu: nvgpu: pd_cache enablement for >4k allocations in qnx
Mapping of large buffers to GMMU end up needing many
pages for the PTE tables. Allocating these one by one
can end up being a performance bottleneck, particularly
in the virtualized case.

This is adding the following changes:

 - As the TLB invalidation doesn't have access to mem_off,
   allow top-level allocation by alloc_cache_direct().
 - Define NVGPU_PD_CACHE_SIZE, the allocation size for a new slab
   for the PD cache, effectively set to 64K bytes
 - Use the PD cache for any allocation < NVGPU_PD_CACHE_SIZE
   When freeing up cached entries, avoid prefetch errors by
   invalidating the entry (memset to 0).
 - Try to fall back to direct allocation of smaller chunk for
   contiguous allocation failures.
 - Unit test changes.

Bug 200649243

Change-Id: I0a667af0ba01d9147c703e64fc970880e52a8fbc
Signed-off-by: dt <dt@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2404371
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Lakshmanan M
c0e2dc5b74 gpu: nvgpu: Add subctx programming for MIG
This CL covers the following code changes,
1) Added api to init inst_block for more than one subctxs.
2) Added logic to limit the subctx bind based on
   max. VEID count allocated to a gr instance.
3) Renamed nvgpu_grmgr_get_gr_runlist_id.

JIRA NVGPU-5647

Change-Id: Ifec8164a9e5f46fbd0538c3dd50e19ee63667a54
Signed-off-by: Lakshmanan M <lm@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2418463
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Rajesh Devaraj <rdevaraj@nvidia.com>
Reviewed-by: Dinesh T <dt@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2020-12-15 14:13:28 -06:00
srajum
6aec282dc1 Revert "gpu: nvgpu: Fix for unit test failures"
This reverts commit 0e353e0022da6064a2c0f71ed43a2a76ceec1a97.

- created unit test change to exercise change with "23293fef"
  but there was issues with that change and now made correponding 
  driver change and no longer this unit test change required.

JIRA NVGPU-6051

Change-Id: Id8131ad027069062435947d79d627b23470a7199
Signed-off-by: srajum <srajum@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2415023
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2020-12-15 14:13:28 -06:00
Seeta Rama Raju
1bd0261cbe gpu: nvgpu: Fix for unit test failures
JIRA NVGPU-6051

Change-Id: Ic061594096ef49f7984cde4405f4934ded220e91
Signed-off-by: Seeta Rama Raju <srajum@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2411562
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Konsta Hölttä
dfd9feace6 gpu: nvgpu: recover pbdma errors before ack
When a pbdma fault needs a channel teardown, do the recovery/teardown
process before acking the pbdma interrupt status back. Acking it causes
the hardware to proceed which could release fences too early before the
involved channel(s) have been found to be broken.

With these host copyengine interrupts, the teardown sequence is light
and proceeds even with the pbdma intr flag still set; there are no
engines to reset when these pbdma launch check interrupts happen. The
bad tsg is just disabled and the channels in it aborted.

A few unit tests are so heavily affected by this refactor that they
would need to be rewritten. They're not strictly needed at the moment,
so do only half of the rewrite: just delete them.

Bug 200611198

Change-Id: Id126fb158b6d05e46ba124cd426389046eedc053
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2392669
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Alex Waterman
0f501d806f gpu: nvgpu: Unit test fixes and staging for device rework
Fix up what unit tests can be easily fixed up. Stage everything else.

In short the unit test code is _incredibly_ fragile since it's designed
to hit every branch, positive and negative, in the code. However, the
result of that is unit tests that are painful to modify.

A lot of unit tests are also extremely opaque and rely on internal
nvgpu behavior. This patch will be updated with fixes as I make them.

Or, alternatively, it may be worth just temporarily disabling unit
tests on dev-main. We'll have a _lot_ of work for Orin that will
essentially gut the gr, host, and interrupt code. If we retain the
unit test code for this, it may end up being backgreaking.

JIRA NVGPU-5421

Change-Id: I8055fc72521f6a3a8a0d8f07fbe50c649a675016
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2347274
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Konsta Hölttä
a04525ece8 gpu: nvgpu: require deterministic for usermode
Deterministic mode has always been a requirement for usermode submit;
enforce it in the setup_bind path. Adjust tests to use the flag.

QNX uses NVGPU_SETUP_BIND_FLAGS_SUPPORT_DETERMINISTIC only if
CONFIG_NVGPU_IOCTL_NON_FUSA is set, so guard the check with that for
now.

Jira NVGPU-5582

Change-Id: Idedd01a3a24420b45195a472e8ca5c9f32f4ef46
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2369818
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Dinesh
d0087f3ad8 gpu: nvgpu: Support for runlist_max_supported
nvgpu_next needs support for max_runlist_supported by litter
value. So the function is changed to support.

JIRA NVGPU-5534

Change-Id: I097f6343295049532c46904316314dc82092a46b
Signed-off-by: Dinesh <dt@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2382882
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Tejal Kudav
881a6f35be gpu: nvgpu: Trigger quiesce on PBDMA preempt fail
During recovery, we preempt the faulty TSG from PBDMA and engines.
If the TSG preempt on PBDMA times out(timeout = 100ms), the PBDMA
might be hung state. We do not reset the HOST during recovery, so
stuck PBDMAs are unrecoverable.
Abort the recovery and trigger GPU to quiesce as there is no way
back.

Triggering Quiesce from recovery sequence should be fine as the only
redundant operation will be write to FIFO_RUNLIST_PREEMPT register.
The error notifiers will eventually be set by Quiesce thread.

Bug 2768005
JIRA NVGPU-4631

Change-Id: I914b9379aa8e48014e6ddace9abe47180a072863
Signed-off-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2368187
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Alex Waterman
359fc24aaf gpu: nvgpu: Rework engine management to work with vGPU
Currently the vGPU engine management rewrites a lot of the common
device agnostic engine management code.

With the new top HAL parsing one device at a time, it is now more
easily possible to tie the vGPU into the new common device framework
by implementing the top HAL but with the vGPU engine list backend.

This lets the vGPU inherit all the common engine and device
management code. By doing so the vGPU HAL need only implement a
trivial and simple HAL.

This also gets us a step closer to merging all of the CE init
code: logically it just iterates through all CE engines whatever
they may be. The only reason this differs between chips is because
of the swap from CE0-2 to LCEs in the Pascal generation. This could
be abstracted by the unit code easily enough.

Also, the pbdma_id for each engine has to be added to the device
struct. Eventually this was going to happen anyway, since the
device struct will soon replace the nvgpu_engine_info struct.
It's a little bit of an abuse but might be worth it long term. If
not, it should not be difficult to replace uses of dev->pbdma_id
with a proper lookup of PBDMA ID based on the device info.

JIRA NVGPU-5421

Change-Id: Ie8dcd3b0150184d58ca0f78940c2e7ca72994e64
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2351877
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Alex Waterman
fbb6a5bc1c gpu: nvgpu: Remove fifo->pbdma_map
The FIFO pbdma map is an array of bit maps that link PBDMAs to runlists.
This array allows other software to query what PBDMA(s) serves a given
runlist. The PBDMA map is read verbatim from an array of host registers.
These registers are stored in a kmalloc()'ed array.

This causes a problem for the device management code. The device
management initialization executes well before the rest of the FIFO
PBDMA initialization occurs. Thus, if the device management code
queries the PBDMA mapping for a given device/runlist, the mapping has
yet to be populated.

In the next patches in this series the engine management code is subsumed
into the device management code. In other words the device struct is
reused by the engine management and all host SW does is pull pointers to
the host managed devices from the device manager. This means that all
engine initialization that used to be done on top of the device
management needs to move to the device code.

So, long story short, the PBDMA map needs to be read from the registers
directly, instead of an array that gets allocated long after the device
code has run.

This patch removes the pbdma map array, deletes two HALs that managed
that, and instead provides a new HAL to query this map directly from
the registers so that the device code can use it.

JIRA NVGPU-5421

Change-Id: I5966d440903faee640e3b41494d2caf4cd177b6d
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2361134
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2020-12-15 14:13:28 -06:00
Vedashree Vidwans
fb1433811c gpu: nvgpu: modify gr.falcon.dump_stats
- Add gm20b_gr_falcon_gpccs_dump_stats() to print gpccs context switch
mailbox register values for all gpcs.
- Make gm20b_gr_falcon_fecs_dump_stats() a static function
- Add gm20b_gr_falcon_dump_stats() to trigger
gm20b_gr_falcon_fecs_dump_stats() and gm20b_gr_falcon_gpccs_dump_stats()
- Update legacy chips gr.falcon.dump_stats() to
gm20b_gr_falcon_dump_stats().

JIRA NVGPU-5597

Change-Id: I992c6432f3c2e3049bacc953f9b53ff6c4aa2f36
Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2357470
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: Seema Khowala <seemaj@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Alex Waterman
59eb714c48 unit: Disable some unit tests for device work
Fix what unit tests can be easily fixed, but disable some others. It's
not clear why the MM related tests started failing - there's really zero
reason for this. The list of disable tests are primarily engine related
but there are some others that get inflenced by the device and engine
structure.

  test_poweroff.init_poweroff=2
  test_is_stall_and_eng_intr_pending.intr_is_stall_and_eng_intr_pending=2
  test_isr_nonstall.isr_nonstall=2
  test_isr_stall.isr_stall=2
  test_engine_enum_from_type.enum_from_type=2
  test_engine_find_busy_doing_ctxsw.find_busy_doing_ctxsw=2
  test_engine_get_active_eng_info.get_active_eng_info=2
  test_engine_get_fast_ce_runlist_id.get_fast_ce_runlist_id=2
  test_engine_get_gr_runlist_id.get_gr_runlist_id=2
  test_engine_get_mask_on_id.get_mask_on_id=2
  test_engine_get_runlist_busy_engines.get_runlist_busy_engines=2
  test_engine_ids.ids=2
  test_engine_init_info.init_info=2
  test_engine_interrupt_mask.interrupt_mask=2
  test_engine_is_valid_runlist_id.is_valid_runlist_id=2
  test_engine_mmu_fault_id.mmu_fault_id=2
  test_engine_mmu_fault_id_veid.mmu_fault_id_veid=2
  test_engine_setup_sw.setup_sw=2
  test_engine_status.status=2
  test_fifo_init_support.init_support=2
  test_fifo_remove_support.remove_support=2
  test_gp10b_engine_init_ce_info.engine_init_ce_info=2
  test_nvgpu_mem_iommu_translate.mem_iommu_translate=2
  test_nvgpu_mem_phys_ops.nvgpu_mem_phys_ops=2

And delete unit tests for functions that no longer exist:

  test_device_info_parse_enum.top_device_info_parse_enum
  test_get_device_info.top_get_device_info
  test_get_num_engine_type_entries.top_get_num_engine_type_entries
  test_is_engine_ce.top_is_engine_ce
  test_is_engine_gr.top_is_engine_gr

JIRA NVGPU-5421

Change-Id: I343c0b1ea44c472b22356c896672153fc889ffc0
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2355300
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Alex Waterman
319520ff57 gpu: nvgpu: Add a new device manager unit
This adds a new device management unit in the common code responsible
for facilitating the parsing of the GPU top device list and providing
that info to other units in nvgpu.

The basic idea is to read this list once from HW and store it in a
set of lists corresponding to each device type (graphics, LCE, etc).
Many of the HALs in top can be deleted and instead implemented using
common code parsing the SW representation.

Every time the driver queries the device list it does so using a
device type and instance ID. This is common code. The HAL is responsible
for populating the device list in such a way that the driver can
query it in a chip agnostic manner.

Also delete some of the unit tests for functions that no longer
exist. This code will require new unit tests in time; those should be
quite simple to write once unit testing is needed.

JIRA NVGPU-5421

Change-Id: Ie41cd255404b90ae0376098a2d6e9f9abdd3f5ea
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2319649
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Alex Waterman
2a3bb9107f gpu: nvgpu: rename <nvgpu/top.h> to <nvgpu/device.h>
top.h is a description of "devices" available on the GPU. As such
rename this header to device.h.

device.h will ultimately be a unit of actual C code that will rely
on the top HAL to fill a device list.

JIRA NVGPU-5421

Change-Id: If6e4a537d2209e429a678761a34713723da7a00a
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2319648
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
tkudav
957b19092f gpu: nvgpu: Enable Quiesce on all builds
Make Recovery and quiesce co-exist to support quiesce state
on unrecoverrable errors. Currently, the quiesce code is wrapped
under ifndef CONFIG_NVGPU_RECOVERY. Isolate the quiesce code from
recovery config, thereby enabling it on all builds.

On Linux, the hung_task checker(check_hung_uninterruptible_tasks()
in kernel/hung_task.c) complains that quiesce thread is stuck for
more than 120 seconds.

INFO: task sw-quiesce:1068 blocked for more than 120 seconds.

The wait time of more than 120 seconds is expected as quiesce
thread will wait until quiesce call is triggered on fatal
unrecoverable errors. However, the INFO print upsets the
kernel_warning_test(KWT) on Linux builds. To fix the failing
KWT, change the quiesce task to interruptible instead of
uninterruptible as checker only looks at uninterruptible tasks.

Bug 2919899
JIRA NVGPU-5479

Change-Id: Ibd1023506859d8371998b785e881ace52cb5f030
Signed-off-by: tkudav <tkudav@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2342774
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Alex Waterman
5f0fdf085c nvgpu: unit: Add new mock register framework
Many tests used various incarnations of the mock register framework.
This was based on a dump of gv11b registers. Tests that greatly
benefitted from having generally sane register values all rely
heavily on this framework.

However, every test essentially did their own thing. This was not
efficient and has caused a some issues in cleaning up the device and
host code.

Therefore introduce a much leaner and simplified register framework.
All unit tests now automatically get a good subset of the gv11b
registers auto-populated. As part of this also populate the HAL with
a nvgpu_detect_chip() call. Many tests can now _probably_ have all
their HAL init (except dummy HAL stuff) deleted. But this does
require a few fixups here and there to set HALs to NULL where tests
expect HALs to be NULL by default.

Where necessary HALs are cleared with a memset to prevent unwanted
code from executing.

Overall, this imposes a far smaller burden on tests to initialize
their environments.

Something to consider for the future, though, is how to handle
supporting multiple chips in the unit test world.

JIRA NVGPU-5422

Change-Id: Icf1a63f728e9c5671ee0fdb726c235ffbd2843e2
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2335334
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Vedashree Vidwans
068e00749b gpu: nvgpu: update config_userd_writeback_enable
Field value of pbdma_config_userd_writeback_enable is changing from
0x1 to 0x0 for nvgpu-next. So,
- Update config_userd_writeback_enable() hal to accept u32 value.
- Update config_userd_writeback_enable() hal to return modified
  value after setting pbdma_config_userd_writeback_enable field.

Jira NVGPU-5162

Change-Id: I94efa20c34bb867f185778c973bd52b86902b32c
Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2330160
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Prateek sethi
470fe3a6d4 gpu: nvgpu: unit: update cg unit test
CG unit tests check for invalid registers access during configuration
of various CG modes for various units that involve multiple registers
accesses. Since ECC detect is now being done in hal init now,
corresponding registers need to be added to io space.

Bug 2919887

Change-Id: I8ded6a95952d810d9c8627a71752266e493e2c47
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2332262
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
2020-12-15 14:13:28 -06:00
Thomas Fleury
76295a5aeb gpu: nvgpu: redundant dependency on driver
Makefile.units.common.tmk already specifies dependency
on nvgpu driver interface.

Remove redundant dependency in units makefiles.

Jira NVGPU-5217

Change-Id: I94cbe707c25f41f0e61915c243fd55fd4bda9ccf
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2322205
(cherry picked from commit d9bdd8f589c121802c74da53945baa677578f71c)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2325907
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Vedashree Vidwans
c6908922e5 gpu: nvgpu: move generic preempt hals to common
- Move fifo.preempt_runlists_for_rc and fifo.preempt_tsg hals to common
source file as nvgpu_fifo_preempt_runlists_for_rc and
nvgpu_fifo_preempt_tsg.

Jira NVGPU-4881

Change-Id: I31f7973276c075130d8a0ac684c6c99e35be6017
Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2323866
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Konsta Hölttä
4f80c6b8a9 gpu: nvgpu: add channel_user_syncpt
Refactor user managed syncpoints out of the channel sync infrastructure
that deals with jobs submitted via the kernel api. The user syncpt only
needs to expose the id and gpu address of the reserved syncpoint. None
of the rest (fences, priv cmdbufs) is needed for that, so it hasn't been
ideal to couple with the user-allocated syncpts.

With user syncpts now provided by channel_user_syncpt, remove the
user_managed flag from the kernel sync api.

This allows moving all the kernel submit sync code to be conditionally
compiled in only when needed, and separates the user sync functionality
in a more clear way from the rest with a minimal API.

[this is squashed with commit 5111caea601a (gpu: nvgpu: guard user
syncpt with nvhost config) from
https://git-master.nvidia.com/r/c/linux-nvgpu/+/2325009]

Jira NVGPU-4548

Change-Id: I99259fc9cbd30bbd478ed86acffcce12768502d3
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2321768
(cherry picked from commit 1095ad353f5f1cf7ca180d0701bc02a607404f5e)
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2319629
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
vinodg
9d577b8d9a gpu: nvgpu: unit: remove gv11b_channel_debug_dump test
dump_channel hal is moved to common code.
Remove the test_gv11b_channel_debug_dump

Jira NVGPU-5109

Signed-off-by: vinodg <vinodg@nvidia.com>
Change-Id: If04314c09d9f7f0c789752a8a003e012a629d9be
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2323553
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00
Terje Bergstrom
caf9ba229f gpu: nvgpu: unit: Remove invalid dereference
test_tsg_format_gen() accessed test_args->timeslice when test_args
has not been initialized.

Change-Id: Ia72c5371c8634b7b1fe54c9aa20693edc47cb762
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2318389
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2020-12-15 14:13:28 -06:00