When gpu host is executing a context, there should not be any calls
to fecs that can change the current context in execution. For some
reason legacy fmodels are calling fecs method to golden
context restore while loading golden context for new channel.
This call is not required and should not be called. Only first
time during golden context creation, fecs methods like bind can be
called and it is pretty safe to do.
Bug 1834201
Change-Id: Ia6178e875e3ac37fb1cf10e27976c26b9a02c56f
Signed-off-by: seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-on: http://git-master/r/1284512
Reviewed-by: svccoveritychecker <svccoveritychecker@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
When kernel adds patches to a context, kernel needs to update
the patch count in order for FECS to pick up the new patches.
Previously patching was done only at the context creation
time. Now patching is used also when changing preemption mode,
but the patches did not take effect due to not updating count.
Update patch count every time we end patching of a context.
Bug 1852094
Change-Id: Ic2150741609d1d1956769e439ce1c5f2edcacb84
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/1280424
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reorganize the HW headers of gk20a. The headers are moved to a
new directory:
include/nvgpu/hw/gk20a
And from the code are included like so:
#include <nvgpu/hw/gk20a/hw_pwr_gk20a.h>
This is the first step in reorganizing all of the HW headers for
gm20b, gm206, etc. This is part of a larger effort to re-structure
and make the driver more readable and scalable.
Bug 1799159
Change-Id: Ic151155cbc2e6f75009f2d9d597b364a1bed2c4c
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/1244790
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Move the GPU allocators to common/mm/ since the allocators are common
code across all GPUs. Also rename the allocator code to move away from
gk20a_ prefixed structs and functions.
This caused one issue with the nvgpu_alloc() and nvgpu_free() functions.
There was a function for allocating either with kmalloc() or vmalloc()
depending on the size of the allocation. Those have now been renamed to
nvgpu_kalloc() and nvgpu_kfree().
Bug 1799159
Change-Id: Iddda92c013612bcb209847084ec85b8953002fa5
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/1274400
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
In calculation of pes_tpc_count, accumulate the number of PEs
with TPCs connected to them instead of using the architectural
maximum number.
Bug 200250616
Change-Id: I4b2edc420ac03e24f2c298587d4dd1d77c51f5d6
Signed-off-by: Peter Daifuku <pdaifuku@nvidia.com>
Reviewed-on: http://git-master/r/1262642
(cherry picked from commit 65723cf5be8fe24bcaf56570883f0880a198efcb)
Reviewed-on: http://git-master/r/1263958
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vladislav Buzov <vbuzov@nvidia.com>
gm20b_init_gr does not inherit the ops set by gk20a_init_gr_ops, and the
gr.setup_rop_mapping HAL was not set there, so it was not set for chips
that inherit from gm20b_init_gr and do not override it explicitly.
Set the pointer in gm20b_init_gr, which other chips inherit, and delete
the surrounding if condition from the call, making sure that future
users always call it, because there is an implementation since the
earliest supported chip.
Bug 1833382
Change-Id: I7893c9aac7c5c49ce9a55031ea6baa9382a1b7ca
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/1258960
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
Fix small problems related to signed versus unsigned comparisons
throughout the driver. Bump up the warning level to prevent such
problems from occuring in future.
Change-Id: I8ff5efb419f664e8a2aedadd6515ae4d18502ae0
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/1252068
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
In calls to gk20a_fifo_recover() we pass a bitfield of engines to
recover. We generate the bitfield by acquiring engine id from FIFO,
and using BIT(). If GR engine is now known, the resulting engine ID is
u32 with all bits set, which cannot be passed to BIT().
gk20a_fifo_recover() can already deal with all bits set, so pass that
verbatim instead.
Change-Id: Ib79d8e7e156deef0d483642cfb1ce7bf55f3c572
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/1249964
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Fix FBP and ROP_L2 enable masks for Maxwell+. Deprecate rop_l2_en_mask
in GPU characteristics by adding _DEPRECATED postfix. The array is
too small to hold ROP_L2 enable masks for desktop GPUs.
Add NVGPU_GPU_IOCTL_GET_FBP_L2_MASKS to expose the ROP_L2 masks for
userspace.
Bug 200136909
Bug 200241845
Change-Id: I5ad5a5c09f3962ebb631b8d6e7a2f9df02f75ac7
Signed-off-by: Sami Kiminki <skiminki@nvidia.com>
Reviewed-on: http://git-master/r/1245294
(cherry picked from commit 0823b33e59defec341ea7919dae4e5f73a36d256)
Reviewed-on: http://git-master/r/1249883
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Move ELCG parameter programming to a new function in therm,
elcg_init_idle_filter. Implement gk20a variant and use it for gk20a
and gm20b.
JIRA DNVGPU-74
Change-Id: I8ef400f3a6195311fb9e7da8db6c34993d62f461
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/1220433
(cherry picked from commit f6654ae4d83d31cd40b317bf55922964bbfa575d)
Reviewed-on: http://git-master/r/1239421
GVS: Gerrit_Virtual_Submit
We have following bug where GPU Host reports non-idle
when it should report engine idle
- if a context is preempted off the GPU, and there is
no other context to load, NV_PGRAPH_ENGINE_STATUS
will not be idle until new context is loaded
- this could cause gr_gk20a_wait_idle() to fail since
here we rely only on NV_PGRAPH_ENGINE_STATUS to
decide if engine is busy or not
To fix this, first check if context is valid or not
from NV_PFIFO_ENGINE_STATUS_CTX_STATUS
If context is invalid, return immediately
Otherwise, continue as before
Also, add accessors for invalid ctx_status
Bug 1826768
Change-Id: Id627be3f02e79f4beac59a8b5195d08eabf651f2
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1237521
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
There are eight tiles per map tile register and
depending on how many tpcs are present, there is
a chance that s/w will be accessing un-allocated
memory for reading tile values from temp buffers.
Bug 1735760
Change-Id: I5c0e09ec75099aaf6ad03dde964b9e93c2dc2408
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: http://git-master/r/1221580
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Tested-by: Seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Suppress error message when nvgpu tries to load VBIOS overlay, but
one is not found. This situation is normal. This is done by moving
gk20a_request_firmware() to be nvgpu generic function
nvgpu_request_firmware(), and adding a NO_WARN flag to it.
Introduce also a NO_SOC flag to suppress attempt to load firmware
from SoC specific directory in addition to the chip specific
directory. Use it for dGPU firmware files.
Bug 200236777
Change-Id: I0294d3308f029a6a6d3c2effa579d5f69a91e418
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/1223840
(cherry picked from commit cca44c3f010f15918cdd2259c15170ba1917828a)
Reviewed-on: http://git-master/r/1233353
GVS: Gerrit_Virtual_Submit
As the size of the golden_ctx_image is large,
the allocation may intermittently fail when using
kzalloc. Since we don't need physically continguous
memory, use vzalloc instead.
Bug 200231436
Change-Id: Ic2fb31dea94c8721832dc257334608e1fc283943
Signed-off-by: Sachit Kadle <skadle@nvidia.com>
Reviewed-on: http://git-master/r/1207172
(cherry picked from commit 994a7b162ec74518ae0f50dfb5ac197e44019992)
Reviewed-on: http://git-master/r/1229472
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
GVS: Gerrit_Virtual_Submit
Do not call load prod callbacks that are set to NULL.
Bug 1799537
Change-Id: Ie951fb71fa8eacd10623abcd058f32db59004c2e
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/1208467
(cherry picked from commit c020e16adfa2b2bc2e3e8d0c63527a6089c59906)
Reviewed-on: http://git-master/r/1227268
GVS: Gerrit_Virtual_Submit
While programming ucode's inst block in API
gr_gk20a_load_falcon_bind_instblk(), use gk20a_aperture_mask()
to select target address (i.e. if address is in sysmem or
vidmem) based on aperture
Also add target accessors for gr_fecs_new_ctx and
gr_fecs_arb_ctx_ptr
Jira DNVGPU-22
Change-Id: I88198080f188b349a4448a229dff8416a6a18073
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1216139
(cherry picked from commit 42bc14110df17400dd655bc994dc9e61c73048b1)
Reviewed-on: http://git-master/r/1219703
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Event notifications on TSGs should only be sent to the channel that caused the
event to happen in the first place, not evey channel in the tsg. Any more and
the debugger will not be able to tell what channel actually got the event.
Worse yet, if all the channels in a tsg are bound to the same debug session
(as is the case with cuda-gdb), then multiple nvgpu events for the same gpu
event will be triggered, causing events to be buffered and the client to get
out of sync.
One gpu exception, one nvgpu event per tsg.
Bug 1793988
Signed-off-by: Cory Perry <cperry@nvidia.com>
Change-Id: I4efb83b0593bd1af38f2342c80793d9db56e42b1
Reviewed-on: http://git-master/r/1194203
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
We currently post bpt events (bpt.int and bpt.pause) even
before we process and clear the interrupts and this
could cause races with UMD
Fix this by posting bpt events only after we are done
processing the interrupts
Bug 200209410
Change-Id: Ic3ff7148189fccb796cb6175d6d22ac25a4097fb
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1184109
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Before calling prod settings functions, check for
availability of those functions.
Similar check is extended for get_clk_freqs.
Bug 1735760
Change-Id: Ic4b38079043ab2049a479a2d8bb0cb6091e94f4a
Signed-off-by: seshendra Gadagottu <sgadagottu@nvidia.com>
Reviewed-on: http://git-master/r/1181571
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Adeel Raza <araza@nvidia.com>
For devices that have vidmem available, use the vidmem allocator in
gk20a_gmmu_alloc{,attr,_map,_map_attr}. For others, use sysmem.
Because all of the buffers haven't been tested to work in vidmem yet,
rename calls to gk20a_gmmu_alloc{,attr,_map,_map_attr} to have _sys at
the end to declare explicitly that vidmem is used. Enabling vidmem for
each now is a matter of removing "_sys" from the function call.
Jira DNVGPU-18
Change-Id: Ibe42f67eff2c2b68c36582e978ace419dc815dc5
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/1176805
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Propagate the buffer aperture flag in gk20a_locked_gmmu_map up so that
buffers represented as a mem_desc and present in vidmem can be mapped to
gpu.
JIRA DNVGPU-18
JIRA DNVGPU-76
Change-Id: I46cf87e27229123016727339b9349d5e2c835b3e
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/1169308
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
add gk20a_aperture_mask() for memory target selection now that buffers
can actually be allocated from vidmem, and use it in all cases that have
a mem_desc available.
Jira DNVGPU-76
Change-Id: I4353cdc6e1e79488f0875581cfaf2a5cfb8c976a
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/1169306
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
kmalloc() returns NULL instead of error code on failure. Do not check
if the return value is an error code.
Change-Id: I31a46080ab51773a22bebe4cf03a5b0c94467204
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/1172052
Some parameters like gfxp_wfi_timeout are context
switched. Once context has been initialized with
default values (sw_ctx_load), we need to ensure
that preemption state is properly set before
saving golden ctx image.
Bug 1593548
Jira VFND-1894
Change-Id: Ib1ba03f4ca1606302b1cf1f0738d3610a162a5c6
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: http://git-master/r/1168662
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
This method is called when setting up gr
hardware. It is meant to adjust preemption
parameters.
Bug 1593548
Jira VFND-1894
Change-Id: I0f5aa3212bec3058a0493366bed6fe2a365c9542
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: http://git-master/r/1162625
(cherry picked from commit c2e6d12570af28b3aae087401d7f670df40d40bd)
Reviewed-on: http://git-master/r/1166987
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
This patch_context map/unmap pair has become a mere wrapper for the more
general gk20a_mem_{begin,end}(). To be consistent about mappings,
require that each patch_write is surrounded by an explicit begin/end
pair, instead of relying on possible inefficient per-write map/unmap.
Remove also the cpu_va check from .._write_end() since the buffers may
be exist in vidmem without a cpu mapping.
JIRA DNVGPU-24
Change-Id: Ia05d52d3d712f2d63730eedc078845fde3e217c1
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/1157298
GVS: Gerrit_Virtual_Submit
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
This CL covers the following modification,
1) Added multiple engine_info support
2) Added multiple runlist_info support
3) Initial changes for ASYNC CE support
4) Added ASYNC CE interrupt handling support
for gm206 GPU family
5) Added generic mechanism to identify the
CE engine pri_base address for gm206
(CE0, CE1 and CE2)
6) Removed hard coded engine_id logic and
made generic way
7) Code cleanup for readability
JIRA DNVGPU-26
Change-Id: I2c3846c40bcc8d10c2dfb225caa4105fc9123b65
Signed-off-by: Lakshmanan M <lm@nvidia.com>
Reviewed-on: http://git-master/r/1155963
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
In gk20a_gr_handle_fecs_error(), if we do not see
any error interrupt from gr_fecs_host_int_status_r(),
just return immediately
Bug 1646259
Change-Id: Iea037e0dab57111d2a0fb41c5c19529b7d6c83c0
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1158591
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Fix calculation of timeout in multiple places. The #defines
GR_IDLE_CHECK_DEFAULT and GR_IDLE_CHECK_MAX are meant to be used
only for defining the frequency of checking for timeout. Using them
for actual timeouts makes the timeout really short.
Change-Id: I3d0f8cbc91d619be8e5a9168ee1ab1d6298f129b
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/1158269
Part of golden context initialization is in powerup sequence, and
part done as part of first channel creation. The sequence is
missing a context reset, which causes initialization of golden
context to fail on dGPU.
Just moving the code to golden context initialization does not work,
because iGPU can be rail gated, and part of the sequence is required
in GPU boot.
Thus a part of context initialization is replicated to golden context
init after a context reset.
Change-Id: Ife1b167447018317d3a692b706880e0eda073e43
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/1130698
In gr_gk20a_ctx_zcull_setup(), gr_gk20a_update_smpc_ctxsw_mode(),
and in gk20a_channel_suspend(), we call channel specific APIs
to disable/preempt/enable channel
But we do not consider TSGs in this case
Hence use correct (below) APIs in above functions which
will handle channel or TSG internally :
gk20a_disable_channel_tsg()
gk20a_fifo_preempt()
gk20a_enable_channel_tsg()
Bug 200205041
Change-Id: Ieed378dac4ad2322b35f9102706176ec326d386c
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1157189
GVS: Gerrit_Virtual_Submit
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
To update hwpm, we currently disable/preempt only one
channel without considering if channel could be part
of a TSG
Hence, use proper APIs to disable/preempt/enable which
will internally handle channel/TSG case
Bug 200203191
Change-Id: I329a3c02d635265775f2081abba8e047f491fe7d
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1155838
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>