Add support for cyclestats snapshots in the virtual case
Bug 1700143
JIRA EVLR-278
Change-Id: I376a8804d57324f43eb16452d857a3b7bb0ecc90
Signed-off-by: Peter Daifuku <pdaifuku@nvidia.com>
Reviewed-on: http://git-master/r/1211547
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
We currently post bpt events (bpt.int and bpt.pause) even
before we process and clear the interrupts and this
could cause races with UMD
Fix this by posting bpt events only after we are done
processing the interrupts
Bug 200209410
Change-Id: Ic3ff7148189fccb796cb6175d6d22ac25a4097fb
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1184109
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
add gk20a_aperture_mask() for memory target selection now that buffers
can actually be allocated from vidmem, and use it in all cases that have
a mem_desc available.
Jira DNVGPU-76
Change-Id: I4353cdc6e1e79488f0875581cfaf2a5cfb8c976a
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/1169306
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
This patch_context map/unmap pair has become a mere wrapper for the more
general gk20a_mem_{begin,end}(). To be consistent about mappings,
require that each patch_write is surrounded by an explicit begin/end
pair, instead of relying on possible inefficient per-write map/unmap.
Remove also the cpu_va check from .._write_end() since the buffers may
be exist in vidmem without a cpu mapping.
JIRA DNVGPU-24
Change-Id: Ia05d52d3d712f2d63730eedc078845fde3e217c1
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/1157298
GVS: Gerrit_Virtual_Submit
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
This CL covers the following modification,
1) Added multiple engine_info support
2) Added multiple runlist_info support
3) Initial changes for ASYNC CE support
4) Added ASYNC CE interrupt handling support
for gm206 GPU family
5) Added generic mechanism to identify the
CE engine pri_base address for gm206
(CE0, CE1 and CE2)
6) Removed hard coded engine_id logic and
made generic way
7) Code cleanup for readability
JIRA DNVGPU-26
Change-Id: I2c3846c40bcc8d10c2dfb225caa4105fc9123b65
Signed-off-by: Lakshmanan M <lm@nvidia.com>
Reviewed-on: http://git-master/r/1155963
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
Program CWD TPC and SM registers correctly. The old code did not work
when there are more than 4 TPCs.
Refactor init_fs_mask to reduce code duplication.
Change-Id: Id93c1f8df24f1b7ee60314c3204e288b91951a88
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/1143697
GVS: Gerrit_Virtual_Submit
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
Add below flag fields to gpu characteristics to indicate
supported and default preemption modes on platform for
graphics and compute
__u32 graphics_preemption_mode_flags;
__u32 compute_preemption_mode_flags;
__u32 default_graphics_preempt_mode;
__u32 default_compute_preempt_mode;
Add struct nvgpu_preemption_modes_rec to struct gr_gk20a
to store these values locally
Use platform specific get_preemption_mode_flags() to
get the flags and define gk20a/gm20b specific
get_preemption_mode_flags() API
Bug 1646259
Change-Id: I80193c0d988dc93bd96585f9aa631fd817f4dfa3
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1133595
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
Add separate IOCTL NVGPU_IOCTL_CHANNEL_SET_PREEMPTION_MODE
to allow setting preemption modes from UMD
Define preemption modes in nvgpu.h and use them everywhere
Remove mode definitions from mm_gk20a.h
Also, we support setting only one preemption mode in a channel
But it is possible to have multiple preemption modes (one from
graphics and one from compute) set simultaneously
Hence, update struct gr_ctx_desc to include two separate
preemption modes (graphics_preempt_mode and compute_preempt_mode)
API NVGPU_IOCTL_CHANNEL_SET_PREEMPTION_MODE also supports
setting two separate preemption modes i.e. one for graphics
and one for compute
Make necessary changes in code to support two preemption
modes
Bug 1646259
Change-Id: Ia1dea19e609ba8cc0de2f39ab6c0c4cd6b0a752c
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1131805
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
Old code maxed at 2 PEs per GPC. Support 3 PEs and add code to make
sure we will get a warning if hardware supports more than that.
JIRA DNVGPU-6
Change-Id: Id6061567bad20474f4b4a7a0959be3426e5e4828
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/1142440
In gr_gk20a_elpg_protected_call(), we return with error value
if we fail to disable elpg
But since this is a #define'd function, we end up returning
from function which is using gr_gk20a_elpg_protected_call()
So in some cases it is possible that parent function does
not free up resources due to return statement in
gr_gk20a_elpg_protected_call()
Fix this by removing return statement, and execute rest
of the code if there is no error
Coverity id : 31980
Bug 200192125
Change-Id: Ic003b160b76820cdf9355f44658c23bfb2f3815f
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1133404
GVS: Gerrit_Virtual_Submit
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
On gk20a when PMU is updating ZBC colors it is reading them from L2.
But L2 has one port, and ZBC reads can race with other transactions.
Idle graphics before sending PMU the ZBC_UPDATE request.
Also makes pmu_save_zbc a HAL, because PMU ucode has changes to bypass
this problem on some chips.
Bug 1746047
Change-Id: Id8fcd6850af7ef1d8f0a6aafa0fe6b4f88b5f2d9
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/1129017
Add support to store error state of single SM before
preprocessing SM exception
Error state is stored as :
struct nvgpu_dbg_gpu_sm_error_state_record {
u32 hww_global_esr;
u32 hww_warp_esr;
u64 hww_warp_esr_pc;
u32 hww_global_esr_report_mask;
u32 hww_warp_esr_report_mask;
}
Note that we can safely append new fields to above
structure in the future if required
Also, add IOCTL NVGPU_DBG_GPU_IOCTL_READ_SINGLE_SM_ERROR_STATE
to support reading SM's error state by user space
Bug 200156699
Change-Id: I9a62cb01e8a35c720b52d5d202986347706c7308
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1120329
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Add TEX exception handling support. Also make SM exception handler into
a function pointer, which should allow different chips to implement
their own SM exception handling routine.
Bug 1635727
Bug 1637486
Change-Id: I429905726c1840c11e83780843d82729495dc6a5
Signed-off-by: Adeel Raza <araza@nvidia.com>
Reviewed-on: http://git-master/r/935329
global_esr and warp_esr are edge-triggered
and are cleared in kernel isr so skip checking
them when wait_for_pause is called from UMD via
ioctl.
Bug 1619430
Change-Id: I2ae54f23ba5c8bfaab35a476f88ccca0bbb10202
Signed-off-by: Ashutosh Jain <ashutoshj@nvidia.com>
Reviewed-on: http://git-master/r/935808
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Cory Perry <cperry@nvidia.com>
Tested-by: Cory Perry <cperry@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Restore comptags to be bitmap-allocated, like they were before we had
the buddy allocator.
The new buddy allocator introduced by
e99aa2485f8992eabe3556f3ebcb57bdc8ad91ff (originally
6ab2e0c49cb79ca68d2f83f1d4610783d2eaa79b) is fine for the big VAs, but
unsuitable for the small compbit store.
This commit reverts partially the combination of the above commit and
also one after it, 86fc7ec9a05999bea8de320840b962db3ee11410, that fixed
a bug which is not present when using a bitmap. With a bitmap allocator,
pruning the extra allocation necessary for user-mapped mode is possible,
so that is also restored.
The original generic bitmap allocator is not restored; instead, a
comptag-only allocator is introduced.
Bug 200145635
Change-Id: I87f3a911826a801124cfd21e44857dfab1c3f378
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/837180
(cherry picked from commit 5a504aeb54f3e89e6561932971158a397157b3f2)
Reviewed-on: http://git-master/r/839742
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
Add below API pointer to support masking of hww_warp_esr
after hardware read of register and before using it
further
u32 (*mask_hww_warp_esr)(u32 hww_warp_esr)
If needed, this API will mask value of hww_warp_esr
appropriately and return it
Bug 200156699
Change-Id: I1afb1347e650fab607009c1ee55691484653a4c1
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/927133
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Add new API gr_gk20a_get_ctx_id() to get/extract
context id from GR context
Bug 200156699
Change-Id: If0e8887a9a6b139cd795bf03f5def64fd664d12b
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/927130
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Add below APIs to suspend or resume single SM :
gk20a_suspend_single_sm()
gk20a_resume_single_sm()
Also, update gk20a_suspend_all_sms() to make it
more generic by passing global_esr_mask and
check_errors flag as parameter
Bug 200156699
Change-Id: If40f4bcae74a8132673b4dca10b7d9898f23c164
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/925884
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Support preprocessing of SM exceptions if API
pointer pre_process_sm_exception() is defined
Also, expose some common APIs
Bug 200156699
Change-Id: I1303642c1c4403c520b62efb6fd83e95eaeb519b
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/925883
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
- pmu version update to sync with CL-19816709
- GPCCS version update to sync with CL-19816709
Change-Id: Ia60bb538ddba35c973183ca2d4d3a7a0013b4b59
Signed-off-by: Mahantesh Kumbar <mkumbar@nvidia.com>
Reviewed-on: http://git-master/r/779628
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
That is a kernel supporting code for cyclestats mode E.
Cyclestats mode E implemented following Windows-design in user-space
and required the following operations to be implemented:
- attach a client for shared hardware buffer of device
- detach client from shared hardware buffer
- flush means copy of available data from hardware buffer to private
client buffers according to perfmon IDs assigned for clients
- perfmon IDs management for user-space clients
- a NVGPU_GPU_FLAGS_SUPPORT_CYCLE_STATS_SNAPSHOT capability added
Bug 1573150
Change-Id: I9e09f0fbb2be5a95c47e6d80a2e23fa839b46f9a
Signed-off-by: Leonid Moiseichuk <lmoiseichuk@nvidia.com>
Reviewed-on: http://git-master/r/740653
(cherry picked from commit 79fe89fd4cea39d8ab9dbef0558cd806ddfda87f)
Reviewed-on: http://git-master/r/753274
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
Allow querying and setting default betacb size via debugfs. For global buffers
the value takes effect upon first boot of GPU, and has no effect after that.
Bug 1628352
Change-Id: Ib63f4299249c41eab1b36cc501b525cc54211195
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/733328
Update extended buffer definition for Maxwell. On GM20B only PERF_CONTROL0 and
PERF_CONTROL5 registers are restored in extended buffer. They are needed for
stopping the counters as late as possible during ctx save and start them as
early as possible during context restore. On Maxwell, these registers contain
the enable/disable bit.
Bug 200086767
Change-Id: I59125a2f04bd0975be8a1ccecf993c9370f20337
Signed-off-by: Sandarbh Jain <sanjain@nvidia.com>
Reviewed-on: http://git-master/r/717421
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
GR status disable mask was never set, so driver always disabled all
engines from status rollup.
Change-Id: I500a127be9253294f73d1f42ce89b886471a9117
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/719141
The delay value used in gr usleep_range calls is
too high. We can start at a much lower value.
Change-Id: Id141df70b8892bc1ed1b49623c4aa125d541a636
Signed-off-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-on: http://git-master/r/715928
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
Introduce mem_desc, which holds all information needed for a buffer.
Implement helper functions for allocation and freeing that use this
data type.
Change-Id: I82c88595d058d4fb8c5c5fbf19d13269e48e422f
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/712699
New members are added in nvgpu_gpu_characterstics to export more
information required specially from CUDA tools.
Change-Id: I907f3bcbd272405a13f47ef6236bc2cff01c6c80
Signed-off-by: Sujeet Baranwal <sbaranwal@nvidia.com>
Reviewed-on: http://git-master/r/679202
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
The current CUDA drivers have been using the regops to
directly accessing the GPU registers from user space through
the dbg node. This is a security hole and needs to be avoided.
The patch alternatively implements the similar functionality
in the kernel and provide an ioctl for it.
Bug 200083334
Change-Id: Ic5ff5a215cbabe7a46837bc4e15efcceb0df0367
Signed-off-by: sujeet baranwal <sbaranwal@nvidia.com>
Reviewed-on: http://git-master/r/711758
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
Allow enabling of PC sampling hardware workaround. It is only
applicable to gm20b.
Bug 1517458
Bug 1573150
Change-Id: Iad6a3ae556489fb7ab9628637d291849d2cd98ea
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/710421
bug 200078367
using udelay for fecs status polling
during GR init phase brings down fecs
transaction time to < 20usec from few
hundred usec.
Change-Id: I61a27daaf1187ac086a42779b46aa3fbee3b37f2
Signed-off-by: Vijayakumar <vsubbu@nvidia.com>
Reviewed-on: http://git-master/r/691918
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
Compression page size varies depending on architecture. Make it
129kB on gk20a and gm20b.
Also export some common functions from gm20b.
Bug 1592495
Change-Id: Ifb1c5b15d25fa961dab097021080055fc385fecd
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/673790
CTA preemption needs to be enabled by setting a value in context. Set
it for gm20b.
Bug 200063473
Bug 1517461
Change-Id: I080cd71b348d08f834fd23ebbe7443dba79224db
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/661299
Make pagepool size query into a function instead of storing the value
during boot time in a structure. This simplifies the structure and
users of pagepool size do not need to worry about whether it has
already been set.
Change-Id: Iba16e840cdf9b6c39449730237aa7d8fdff47848
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/660907
Turn add ZBC functions into HALs that can be filled per chip.
Bug 1567274
Change-Id: Ic6ef29d3353d4a0079ea0c80f513ffd579fe554f
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/601109
Reviewed-by: Automatic_Commit_Validation_User