Commit Graph

317 Commits

Author SHA1 Message Date
Seema Khowala
c9d4df288d gpu: nvgpu: remove code for ch not bound to tsg
- Remove handling for channels that are no more bound to tsg
  as channel could be referenceable but no more part of a tsg
- Use tsg_gk20a_from_ch to get pointer to tsg for a given channel
- Clear unhandled gr interrupts

Bug 2429295
JIRA NVGPU-1580

Change-Id: I9da43a2bc9a0282c793b9f301eaf8e8604f91d70
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1972492
(cherry picked from commit 013ca60edd
in dev-kernel)
Reviewed-on: https://git-master.nvidia.com/r/2018262
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Debarshi Dutta <ddutta@nvidia.com>
Tested-by: Debarshi Dutta <ddutta@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-02-22 18:59:18 -08:00
Seema Khowala
220860d043 gpu: nvgpu: rename has_timedout and make it thread safe
Currently has_timedout variable is protected by wmb at places
where it is being set and there is no correspoding rmb whenever
has_timedout variable is read. This is prone to errors for
concurrent execution. This change is supposed to fix this issue.
Rename has_timedout variable of channel struct to ch_timedout.
Also to avoid rmb every time ch_timedout is read,
ch_timedout_spinlock is added to protect ch_timedout
variable for taking care of concurrent execution.

Bug 2404865
Bug 2092051

Change-Id: I0bee9f50af0a48720aa8b54cbc3af97ef9f6df00
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1930935
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
(cherry picked from commit 1f54ea09e3
in dev-kernel)
Reviewed-on: https://git-master.nvidia.com/r/2016975
GVS: Gerrit_Virtual_Submit
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-02-13 13:19:37 -08:00
Debarshi Dutta
18643ac135 gpu: nvgpu: replace input param chid with pointer to channel
preempt_channel needs to use the channel to pass it to other
public functions, get access to a tsg etc. This qualifies it to take a
pointer to a channel as an input parameter instead of a chid.

Increment the channel ref counter using the function
gk20a_channel_from_id in functions where we get the chid from the h/w
registers directly. Once the prempt_channel function call is done,
use a gk20a_channel_put on the referenced channel.

Jira NVGPU-1461

Change-Id: I6c87c8104cfcb418d468c8c590087fd4aeabf4bd
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1963200
(cherry picked from commit 9abe9fe062
in dev-kernel)
Reviewed-on: https://git-master.nvidia.com/r/2013728
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-02-11 08:18:47 -08:00
Debarshi Dutta
a8f0cb89f4 gpu: nvgpu: replace input param chid with pointer to channel
gk20a_fifo_recover_channel takes a reference to the channel via its
chid before passing the channel pointer to other public functions such
as gk20a_channel_abort and gk20a_fifo_error_ch. This qualifies the
gk20a_fifo_recover_channel to take a pointer to a channel instead of
only chid.

Jira NVGPU-1461

Change-Id: I338a12a05e5ccee785a202fea7848db5201a3a39
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1963199
(cherry picked from commit 99acb8011a
in dev-kernel)
Reviewed-on: https://git-master.nvidia.com/r/2013727
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-02-11 08:18:44 -08:00
Debarshi Dutta
d9efcd5871 gpu: nvgpu: replace input parameter tsgid with pointer to struct tsg_gk20a
The function gk20a_fifo_recover_tsg has to pass a valid struct tsg to
other functions from within. This qualifies it to have a pointer to
struct tsg_gk20a as an input parameter.

Tsg specific parts of the gk20a_fifo_preempt_timeout_rc are now moved
into another function gk20a_fifo_preempt_timeout_rc_tsg
that takes a tsg as an input and passes it to gk20a_fifo_recover_tsg.
The pointer to a tsg is also used to enumerate channels from within.

The function gk20a_fifo_preempt_timeout_rc now contains only channel
specific code.

Jira NVGPU-1461

Change-Id: Ice0a9921567841fb5586a7e4e010c442ca6cf172
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1961675
(cherry picked from commit e19cea7ab3
in dev-kernel)
Reviewed-on: https://git-master.nvidia.com/r/2013726
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-02-11 08:18:40 -08:00
Debarshi Dutta
ef9de9e992 gpu: nvgpu: replace input parameter tsgid with pointer to struct tsg_gk20a
gv11b_fifo_preempt_tsg needs to access the runlist_id of the tsg as
well as pass the tsg pointer to other public functions such as
gk20a_fifo_disable_tsg_sched. This qualifies the preempt_tsg to use a
pointer to a struct tsg_gk20a instead of just using the tsgid.

Jira NVGPU-1461

Change-Id: I01fbd2370b5746c2a597a0351e0301b0f7d25175
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1959068
(cherry picked from commit 1e78d47f15
in rel-32)
Reviewed-on: https://git-master.nvidia.com/r/2013725
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-02-11 08:18:36 -08:00
Debarshi Dutta
5b8ecbc51f gpu: nvgpu: replace tsgid input variable with pointer to a struct tsg_gk20a
replace tsgid with a pointer to a struct tsg_gk20a in the function
gk20a_fifo_tsg_abort(). gk20a_fifo_tsg_abort needs to enumerate through
all the channels within the tsg as well as pass the tsg pointer to
other functions, qualifying the need to use a pointer instead as an
input parameter.

Jira NVGPU-1461

Change-Id: I59cec05d5d778f733d0c3e9ffadf46e74e249080
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1956567
(cherry picked from commit e5bebd880f
in dev-kernel)
Reviewed-on: https://git-master.nvidia.com/r/2013724
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-02-11 08:18:33 -08:00
Konsta Holtta
7e8ba851a8 gpu: nvgpu: delete raw chid lookup
This (dangerous) array lookup with no channel references is now unused.

Jira NVGPU-1460

Change-Id: Ic6bdbcf19fc8996bc6ff02a40afe3224bdd5bc27
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1955402
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
(cherry picked from commit 4a53854a92 in
dev-kernel)
Reviewed-on: https://git-master.nvidia.com/r/2008517
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-02-05 09:04:33 -08:00
Konsta Holtta
3794afbeb1 gpu: nvgpu: add safe channel id lookup
Add gk20a_channel_from_id() to retrieve a channel, given a raw channel
ID, with a reference taken (or NULL if the channel was dead). This makes
it harder to mistakenly use a channel that's dead and thus uncovers bugs
sooner. Convert code to use the new lookup when applicable; work remains
to convert complex uses where a ref should have been taken but hasn't.

The channel ID is also validated against FIFO_INVAL_CHANNEL_ID; NULL is
returned for such IDs. This is often useful and does not hurt when
unnecessary.

However, this does not prevent the case where a channel would be closed
and reopened again when someone would hold a stale channel number. In
all such conditions the caller should hold a reference already.

The only conditions where a channel can be safely looked up by an id and
used without taking a ref are when initializing or deinitializing the
list of channels.

Jira NVGPU-1460

Change-Id: I0a30968d17c1e0784d315a676bbe69c03a73481c
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1955400
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
(cherry picked from commit 7df3d58750
in dev-kernel)
Reviewed-on: https://git-master.nvidia.com/r/2008515
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-02-05 09:04:20 -08:00
Philip Elcan
ed6e396090 gpu: nvgpu: channel: make chid u32
The chid member of the channel_gk20a struct was being used as a unsigned
value. By being declared as an int, it was causing MISRA 10.3 violations
for implicit assignment of different types.

JIRA NVGPU-647

Change-Id: I7477fad6f0c837cf7ede1dba803158b1dda717af
Signed-off-by: Philip Elcan <pelcan@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1918470
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
(cherry picked from commit 1c7bb9b538 in
dev-kernel)
Reviewed-on: https://git-master.nvidia.com/r/2008514
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-02-05 09:04:11 -08:00
Philip Elcan
bace52ac7a gpu: nvgpu: make tsgid a consistent type
Different units were declaring tsgid as int or u32. This makes everyone
use u32. This change resolves MISRA 10.3 violations for implicit
assingment to different types.

JIRA NVGPU-647

Change-Id: I78660e737acb0dad76dd538e5dd37f4527cf5acd
Signed-off-by: Philip Elcan <pelcan@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1918469
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
(cherry picked from commit f5cac144a0 in
dev-kernel)
Reviewed-on: https://git-master.nvidia.com/r/2008513
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-02-05 09:04:02 -08:00
Konsta Holtta
aa84e8a986 gpu: nvgpu: fix double handling in timeout
The context switch timeout works by triggering a hardware timeout at 10
Hz. When handling these, we check whether a channel has actually timed
out. Currently the timeout limit can be shorter than the 10 Hz interval
which always causes us to recover a channel but would also cause
detection of progress if there was any in the interval.

Handling both situations at the same time would reuse the channel
pointer local to the function after a loop has finished and would cause
memory corruption. Fix this by making the two branches mutually
exclusive, and move the recover case to happen first because that's how
our tests assume things to work.

Jira NVGPU-967
Bug 2502074

Change-Id: I26aa0fa7fd80ab42a9a1a93a6cca2cd29c9d3f3f
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1932449
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
(cherry picked from commit 8ac9a53d816a3d012a6948a9a96ac6db699c662di
in dev-kernel)
Reviewed-on: https://git-master.nvidia.com/r/1997597
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Tested-by: Bibek Basu <bbasu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2019-02-05 01:53:04 -08:00
Seema Khowala
0d110b7522 gpu: nvgpu: clear all handled fifo interrupts
Issue is that local variable clear_intr is reset if fifo intr
handler happens to handle interrupts handled by fifo_error_isr.
This fix is to take care of clearing all handled fifo interrupts.

Bug 2361571

Change-Id: Ic8fe2294cfb25c58925942750a81c104ec9747de
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1960330
(cherry picked from commit 1195239d1c
in dev-kernel)
Reviewed-on: https://git-master.nvidia.com/r/1979744
GVS: Gerrit_Virtual_Submit
Reviewed-by: Bitan Biswas <bbiswas@nvidia.com>
Tested-by: Bitan Biswas <bbiswas@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2018-12-28 23:51:32 -08:00
Terje Bergstrom
e3ae03e17a gpu: nvgpu: Add MC APIs for reset masks
Add API for querying reset mask corresponding to a unit. The reset
masks need to be read from MC HW header, and we do not want all
units to access Mc HW headers themselves.

JIRA NVGPU-954

Change-Id: I49ebbd891569de634bfc71afcecc8cd2358805c0
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1823384
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2018-09-27 15:05:25 -07:00
Konsta Holtta
34d552957d gpu: nvgpu: move channel header to common
channel_gk20a is clear from chip specifics and from most dependencies,
so move it under the common directory.

Jira NVGPU-967

Change-Id: I41f2160b96d4ec84064288ecc22bb360e82352df
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1810578
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2018-09-05 20:40:32 -07:00
Nicolas Benech
2eface802a gpu: nvgpu: Fix mutex MISRA 17.7 violations
MISRA Rule-17.7 requires the return value of all functions to be used.
Fix is either to use the return value or change the function to return
void. This patch contains fix for calls to nvgpu_mutex_init and
improves related error handling.

JIRA NVGPU-677

Change-Id: I609fa138520cc7ccfdd5aa0e7fd28c8ca0b3a21c
Signed-off-by: Nicolas Benech <nbenech@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1805598
Reviewed-by: svc-misra-checker <svc-misra-checker@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2018-09-05 20:39:08 -07:00
Srirangan
43851d41b1 gpu: nvgpu: gk20a: Fix MISRA 15.6 violations
MISRA Rule-15.6 requires that all if-else blocks be enclosed in braces,
including single statement blocks. Fix errors due to single statement
if blocks without braces by introducing the braces.

JIRA NVGPU-671

Change-Id: Iedac7d50aa2ebd409434eea5fda902b16d9c6fea
Signed-off-by: Srirangan <smadhavan@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1797695
Reviewed-by: svc-misra-checker <svc-misra-checker@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2018-09-05 04:51:32 -07:00
Scott Long
a18f364fd2 gpu: nvgpu: fix various MISRA 10.1 bool violations
This patch corrects a handful of MISRA 10.1 violations involving
illegal arithmetic operations (e.g. bitwise OR) on boolean values:

 * fix to status handling in regops validation code
 * fix to debugger event handling in gr code
 * fix to entries_left tracking in runlist construct code
 * fix to verbose channel dumping and reset tracking in fifo code

JIRA NVGPU-650

Change-Id: I3c3d9123b5a0e08fc936d0e63d51de99fc310ade
Signed-off-by: Scott Long <scottl@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1810957
Reviewed-by: svc-misra-checker <svc-misra-checker@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2018-09-04 10:54:24 -07:00
Scott Long
07f6739285 gpu: nvgpu: switch gk20a nonstall ops to #defines
Fix MISRA rule 10.1 violations involving gk20a_nonstall_ops
enums by replacing them with with corresponding #defines.

Because these values can be used in expressions that require
unsigned values (e.g. bitwise OR) we cannot use enums.

The g->ce2.isr_nonstall() function was previously returning an
int that was a combination of gk20a_nonstall_ops enum bits which
led to the violations.

JIRA NVGPU-650

Change-Id: I6210aacec8829b3c8d339c5fe3db2f3069c67406
Signed-off-by: Scott Long <scottl@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1796242
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2018-08-22 17:31:42 -07:00
Amulya
da43fc5560 gpu: nvgpu: MISRA 10.3-Conversions to/from an enum
Fix violations where the conversion is from a non-enum type to enum
type or vice-versa.

JIRA NVGPU-659

Change-Id: I45f43c907b810cc86b2a4480809d0c6757ed3486
Signed-off-by: Amulya <Amurthyreddy@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1802322
GVS: Gerrit_Virtual_Submit
Tested-by: Amulya Murthyreddy <amurthyreddy@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Adeel Raza <araza@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2018-08-21 14:54:51 -07:00
Vinod G
c9f8f1ea05 gpu: nvgpu: remove utils.h from gk20a.h
Removed the utils.h include from gk20a.h
utils.h is included in those files which
make use of the macros in utils.h

JIRA NVGPU-1005

Change-Id: Ifb41da58db6ff8682fa6b5dfdd8eda11a751fcac
Signed-off-by: Vinod G <vinodg@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1785952
GVS: Gerrit_Virtual_Submit
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2018-08-10 18:11:26 -07:00
Srirangan
17aeea4a2f gpu: nvgpu: gk20a: Fix MISRA 15.6 violations
This fixes errors due to single statement loop bodies
without braces, which is part of Rule 15.6 of MISRA.
This patch covers in gpu/nvgpu/gk20a/

JIRA NVGPU-989

Change-Id: I2f422e9bc2b03229f4d2c3198613169ce5e7f3ee
Signed-off-by: Srirangan <smadhavan@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1791019
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2018-08-06 17:36:39 -07:00
Debarshi Dutta
82a90170d3 gk20a: nvgpu: Remove io.h dependency from gk20a.h
In the current code, gk20a.h includes io.h which gets directly included
in a lot of other files. io.h contains methods which uses a struct
gk20a as a parameter leading to a circular dependency between io.h
and gk20a.h. This can be mitigated by removing io.h from gk20a.h as
part of larger effort to moving gk20a.h to nvgpu/gk20a.h

JIRA NVGPU-597

Change-Id: I93e504fa9371b88152737b342a75580c65e8f712
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1787316
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2018-07-30 11:24:06 -07:00
Seema Khowala
4cbec6b2c7 gpu: nvgpu: set preempt timeout
-For Si platforms, gk20a_get_gr_idle_timeout returns
 3000 ms i.e. 3 sec. Currently this time is used for
 preempt polling and this conflicts with channel
 timeout if polling times out. Use fifo_eng_timeout_us converted
 to ms for preempt polling.
-In case of preempt timeout, do not issue recovery
 for si platform. ctxsw timeout will trigger recovery
 if needed. For non si platforms, issue preempt timeout rc
 if preempt times out.

Bug 2113657
Bug 2064553
Bug 2038366
Bug 2028993
Bug 200426402

Change-Id: I8d9f58be9ac634e94defa92a20fb737bf256d841
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1762076
GVS: Gerrit_Virtual_Submit
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2018-07-30 00:21:04 -07:00
Seema Khowala
5d2058791f gpu: nvgpu: acquire/release runlist_lock during teardown/mmu_fault
-Recovery can be called for various types of faults. Acquire
 runlist_lock for all runlists so that current teardown is done
 before proceeding to next one.
-For legacy chips teardown is done by triggering mmu fault so
 make sure runlist_locks are acquired during teardown and also
 during handling mmu fault.
-gk20a_fifo_handle_mmu_fault is renamed as
 gk20a_fifo_handle_mmu_fault_locked
-gk20a_fifo_handle_mmu_fault called from gk20a_fifo_teardown_ch_tsg
 is replaced with gk20a_fifo_handle_mmu_fault_locked
-gk20a_fifo_handle_mmu_fault acquires/release runlist_lock for all
 runlists and calls gk20a_fifo_handle_mmu_fault_locked

Bug 2113657
Bug 2064553
Bug 2038366
Bug 2028993

Change-Id: I973d7ddb6924b50bae2d095152867e99c87e780a
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1761197
GVS: Gerrit_Virtual_Submit
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2018-07-30 00:21:00 -07:00
Vinod G
509139b8a0 gpu: nvgpu: Rearrange the static inline code
In order to avoid the circular dependencies,
rearrange the static inline functions from
gk20a.h file.

Moved gk20a_gr_flush_channel_tlb function to
gr_gk20a.c and removed the #include gr_gk20a.h
from gk20a.h

Added a helper function utils.h to
move all generic static inline functions which
have no reference to gpu related structures.

ptimer related functions are moved to
ptimer.h

Implementations for as and pmu are moved to
corresponding files.

JIRA NVGPU-624

Change-Id: I4e956326e773ba037bf3a1696cc4c462085dbbe5
Signed-off-by: Vinod G <vinodg@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1781941
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2018-07-24 16:11:07 -07:00
Seema Khowala
b1d0d8ece8 Revert "Revert: GV11B runlist preemption patches"
This reverts commit 0b02c8589d.

Originally change was reverted as it was making ap_compute test on
embedded-qnx-hv e3550-t194 fail. With fixes related to replacing tsg
preempt with runlist preempt during teardown, preempt timeout set to
100 ms (earlier this was set to 1000ms for t194 and 3000ms for legacy
chips) and not issuing preempt timeout recovery if preempt fails, helped
resolve the issue.

Bug 200426402

Change-Id: If9a68d028a155075444cc1bdf411057e3388d48e
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1762563
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2018-07-19 13:54:26 -07:00
Nitin Kumbhar
7c494c83cc gpu: nvgpu: add error check for init_runlist
Allocations in init_runlist can fail. Check for such
a failure during fifo setup is being done.

Bug 1987855

Change-Id: I1771a15ebeac81ab2e3ebc9a75363445a0b6f20d
Signed-off-by: Nitin Kumbhar <nkumbhar@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1770801
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2018-07-06 07:35:47 -07:00
Alex Waterman
0b02c8589d Revert: GV11B runlist preemption patches
This reverts commit 2d397e34a5.
This reverts commit cd6e821cf6.
This reverts commit 5cf1eb145f.
This reverts commit a8d6f31bde.
This reverts commit 067ddbc4e4.
This reverts commit 3eede64de0.
This reverts commit 1407133b7e.
This reverts commit 797dde3e32.

Looks like this makes the ap_compute test on embedded-qnx-hv
e3550-t194 quite bad. Might also affect ap_resmgr.

Signed-off-by: Alex Waterman <alexw@nvidia.com>
Change-Id: Ib9f06514d554d1a67993f0f2bd3d180147385e0a
Reviewed-on: https://git-master.nvidia.com/r/1761864
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
GVS: Gerrit_Virtual_Submit
2018-06-26 14:43:08 -07:00
Seema Khowala
cd6e821cf6 gpu: nvgpu: gv11b: add runlist abort & remove bare channel
-Add support for aborting runlist/s. Aborting runlist/s,
 will abort all active tsgs and associated active channels
 within these active tsgs
-Bare channels are no longer supported. Remove recovery
 support for bare channels. In case there are bare
 channels, recovery will trigger runlist abort

Bug 2125776
Bug 2108544
Bug 2105322
Bug 2092051
Bug 2048824
Bug 2043838
Bug 2039587
Bug 2028993
Bug 2029245
Bug 2065990
Bug 1945121
Bug 200401707
Bug 200393631
Bug 200327596

Change-Id: I6bec8a0004508cf65ea128bf641a26bf4c2f236d
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1640567
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2018-06-24 09:53:44 -07:00
Seema Khowala
067ddbc4e4 gpu: nvgpu: remove timeout_rc_type i/p param
-is_preempt_pending hal does not need timeout_rc_type input param as
 for volta, reset_eng_bitmask is saved if preempt times out. For
 legacy chips, recovery triggers mmu fault and mmu fault handler
 takes care of resetting engines.
-For volta, no special input param needed to differentiate between
 preempt polling during normal scenario and preempt polling during
 recovery. Recovery path uses preempt_ch_tsg hal to issue preempt.
 This hal does not issue recovery if preempt times out.

Bug 2125776
Bug 2108544
Bug 2105322
Bug 2092051
Bug 2048824
Bug 2043838
Bug 2039587
Bug 2028993
Bug 2029245
Bug 2065990
Bug 1945121
Bug 200401707
Bug 200393631
Bug 200327596

Change-Id: Ie76a18ae0be880cfbeee615859a08179fb974fa8
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1709799
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2018-06-24 09:53:33 -07:00
Vinod G
0aa8d6e273 gpu: nvgpu: Mask an unused HCE_ILLEGAL_OP Interrupt
HCE interrupt is not being used in nvgpu platform now,
masking the bit from the interrupt register.

bug 2082123

Change-Id: I1d53584afebe57b9621c8f4ec395cd1dcd6c7611
Signed-off-by: Vinod G <vinodg@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1746850
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2018-06-14 06:44:08 -07:00
Thomas Fleury
943e3158bc gpu: nvgpu: add g->fifo_eng_timeout_us
Add g->fifo_eng_timeout_us to define engine timeout in microseconds.
It is initialized with GRFIFO_TIMEOUT_CHECK_PERIOD_US. In RM server
case, it can be overriden with value defined in device tree.

Jira EVLR-2674

Change-Id: I69ac2ce779fe575566c8ba48e8cd2d0e6b2d93cf
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1728391
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2018-06-14 06:44:06 -07:00
Seema Khowala
25e727d997 gpu: nvgpu: release runlist_lock before issuing recovery
Release runlist_lock before issuing runlist update timeout
recovery.

Bug 2115080

Change-Id: I22cd0dd8ab6828412fcc98f587e4a5cdce907651
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1722308
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2018-05-18 19:55:10 -07:00
Seema Khowala
982fcfa737 gpu: nvgpu: Add timeouts_disabled_refcount for enabling timeout
-timeouts will be enabled only when timeouts_disabled_refcount
 will reach 0
-timeouts_enabled debugfs will change from u32 type to file type
 to avoid race enabling/disabling timeout from debugfs and ioctl
-unify setting timeouts_enabled from debugfs and ioctl

Bug 1982434

Change-Id: I54bab778f1ae533872146dfb8d80deafd2a685c7
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1588690
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2018-05-18 19:54:33 -07:00
Vinod G
ac687c95d3 gpu: nvgpu: Code updates for MISRA violations
Code related to MC module is updated for handling
MISRA violations

Rule 10.1: Operands shalln't be an inappropriate
essential type.
Rule 10.3: Value of expression shalln't be assigned
to an object with a narrow essential type.
Rule 10.4: Both operands in an operator shall have
the same essential type.
Rule 14.4: Controlling if statement shall have
essentially Boolean type.
Rule 15.6: Enclose if() sequences with braces.

JIRA NVGPU-646
JIRA NVGPU-659
JIRA NVGPU-671

Change-Id: Ia7ada40068eab5c164b8bad99bf8103b37a2fbc9
Signed-off-by: Vinod G <vinodg@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1720926
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2018-05-18 14:53:58 -07:00
David Li
a807cf2041 gpu: nvgpu: add NVGPU_IOCTL_CHANNEL_RESCHEDULE_RUNLIST
Add NVGPU_IOCTL_CHANNEL_RESCHEDULE_RUNLIST ioctl to reschedule runlist,
and optionally check host and FECS status to preempt pending load of
context not belonging to the calling channel on GR engine during context
switch.
This should be called immediately after a submit to decrease worst case
submit to start latency for high interleave channel.
There is less than 0.002% chance that the ioctl blocks up to couple
miliseconds due to race condition of FECS status changing while being read.
For GV11B it will always preempt pending load of unwanted context since
there is no chance that ioctl blocks due to race condition.
Also fix bug with host reschedule for multiple runlists which needs to
write both runlist registers.

Bug 1987640
Bug 1924808
Change-Id: I0b7e2f91bd18b0b20928e5a3311b9426b1bf1848
Signed-off-by: David Li <davli@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1549050
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2018-05-17 23:34:20 -07:00
Seema Khowala
4654d9abd1 gpu: nvgpu: runlist_lock released before preempt timeout recovery
Release runlist_lock and then initiate recovery if preempt
timed out. Also do not issue preempt if ch, tsg or runlist
id is invalid. tsgid could be invalid for below call trace
gk20a_prepare_poweroff->gk20a_channel_suspend->
*_fifo_preempt_channel->*_fifo_preempt_tsg

Bug 2065990
Bug 2043838

Change-Id: Ia1e3c134f06743e1258254a4a6f7256831706185
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1662656
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2018-05-17 09:33:04 -07:00
Deepak Nibade
0301cc01f6 gpu: nvgpu: add HAL to insert semaphore commands
Add below new HALs
gops.fifo.add_sema_cmd() to insert HOST semaphore acquire/release methods
gops.fifo.get_sema_wait_cmd_size() to get size of acquire command buffer
gops.fifo.get_sema_incr_cmd_size() to get size of release command buffer

Separate out new API gk20a_fifo_add_sema_cmd() to implement semaphore acquire/
release sequence and set it to gops.fifo.add_sema_cmd()

Add gk20a_fifo_get_sema_wait_cmd_size() and gk20a_fifo_get_sema_incr_cmd_size()
to return respective command buffer sizes

Jira NVGPUT-16

Change-Id: Ia81a50921a6a56ebc237f2f90b137268aaa2d749
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1704490
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2018-05-16 03:10:37 -07:00
Deepak Nibade
c5db005b73 gpu: nvgpu: remove access to mc_enable_pb_r()
We don't need to configure mc_enable_pb_r() register in any of the supported
chips, so remove access to this register

Jira NVGPUT-52

Change-Id: I8a7a524367ce7953f926143242c6d63bc8fd5ed1
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1711245
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2018-05-10 10:04:44 -07:00
Terje Bergstrom
dd739fcb03 gpu: nvgpu: Remove gk20a_dbg* functions
Switch all logging to nvgpu_log*(). gk20a_dbg* macros are
intentionally left there because of use from other repositories.

Because the new functions do not work without a pointer to struct
gk20a, and piping it just for logging is excessive, some log messages
are deleted.

Change-Id: I00e22e75fe4596a330bb0282ab4774b3639ee31e
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1704148
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2018-05-09 18:26:04 -07:00
Vinod G
010439ba08 gpu: nvgpu: add HALs to mmu fault descriptors.
mmu fault information for client and gpc differ
on various chip. Add separate table for each chip
based on that change and add hal functions to access
those descriptors.

bug 2050564

Change-Id: If15a4757762569d60d4ce1a6a47b8c9a93c11cb0
Signed-off-by: Vinod G <vinodg@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1704105
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2018-05-03 23:57:12 -07:00
Seema Khowala
c9463fdbb3 gpu: nvgpu: add rc_type i/p param to gk20a_fifo_recover
Add below rc_types to be passed to gk20a_fifo_recover
MMU_FAULT
PBDMA_FAULT
GR_FAULT
PREEMPT_TIMEOUT
CTXSW_TIMEOUT
RUNLIST_UPDATE_TIMEOUT
FORCE_RESET
SCHED_ERR
This is nice to have to know what triggered recovery.

Bug 2065990

Change-Id: I202268c5f237be2180b438e8ba027fce684967b6
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1662619
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2018-05-03 21:43:06 -07:00
Seema Khowala
bf03799977 gpu: nvgpu: rename mutex to runlist_lock
Rename mutex to runlist_lock in fifo_runlist_info_gk20a
struct. This is good to have for code readability.

Bug 2065990
Bug 2043838

Change-Id: I716685e3fad538458181d2a9fe592410401862b9
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1662587
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2018-05-03 21:42:57 -07:00
Deepak Nibade
fc1ebe57f5 gpu: nvgpu: add HALs to submit and wait for runlist
Add below two new HALs
gops.fifo.runlist_hw_submit() to submit a new runlist to hardware
gops.fifo.runlist_wait_pending() to wait until runlist write is successful

Set existing API gk20a_fifo_runlist_wait_pending() to
gops.fifo.runlist_wait_pending HAL

Add new API gk20a_fifo_runlist_hw_submit() which submits the runlist to h/w
and set it to gops.fifo.runlist_hw_submit HAL

Jira NVGPUT-20

Change-Id: Ic23f7d947e30883aca0b536de818e79e14733195
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1700548
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2018-04-24 11:10:48 -07:00
Sourab Gupta
abd5f68eef gpu: nvgpu: add usermode submission interface HAL
The patch adds the HAL interfaces for handling the usermode
submission, particularly allocating channel specific usermode
userd. These interfaces are currently implemented only on QNX,
and are created accordingly. As and when linux adds the
usermode submission support, we can revisit them
if any further changes are needed.

Change-Id: I790e0ebdfaedcdc5f6bb624652b1af4549b7b062
Signed-off-by: Sourab Gupta <sourabg@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1683392
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2018-04-05 05:22:58 -07:00
Sourab Gupta
0b2ea2924b gpu: nvgpu: add gops.fifo.setup_sw
bar1/userd setup is different for RM server. created common function
gk20a_init_fifo_setup_sw_common.

Jira VQRM-3058

Change-Id: I655b54e21ed5f15dcb8e7b01bd9cd129b35ae7a3
Signed-off-by: Richard Zhao <rizhao@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1665691
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2018-03-29 18:54:38 -07:00
Richard Zhao
8d8ff9d34e gpu: nvgpu: add gops.fifo.set_error_notifier
RM Server overrides it for handling stall interrupts.

Jira VQRM-3058

Change-Id: I8b14f073e952d19c808cb693958626b8d8aee8ca
Signed-off-by: Richard Zhao <rizhao@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1679709
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2018-03-29 18:54:29 -07:00
Richard Zhao
bcab5c1486 gpu: nvgpu: add gops.fifo.check_tsg_ctxsw_timeout/check_ch_ctxsw_timeout
RM Server acts differently for ctxsw timeout check. It won't check
GP_GET or accumulated timeouts, but notify guest and go to recovery.

Jira VQRM-3058

Change-Id: I428aea34dc517311eb7e73feb556145e916309fb
Signed-off-by: Richard Zhao <rizhao@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1679706
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2018-03-29 18:54:11 -07:00
Richard Zhao
c5f03db98a gpu: nvgpu: add gops.fifo.ch_abort_clean_up
Channel abort clean up is only needed by native and vgpu driver but not
RM server. RM server expects guest will clean up itself. RM server
should not set the callback.

Jira VQRM-3058

Change-Id: I11b49b6f2d51c871e31de16955d487dca82609cb
Signed-off-by: Richard Zhao <rizhao@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1679705
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
2018-03-29 18:54:02 -07:00