Remove redundant cache maintenance operations. Instance blocks and
graphics context buffers are uncached, so they do not need any cache
maintenance.
Bug 1421824
Change-Id: Ie0be67bf0be493d9ec9e6f8226f2f9359cba9f54
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/406948
PMU, FECS and GPCCS use the same address space. We used to initialize
the address space only if PMU is enabled. Create the system address
space always.
FECS and GPCCS used to have slower bit bang and faster DMA method
for loading ucode. Slower method is needed when FECS and GPCCS do not
have an address space. Remove the slower method as not anymore
needed.
Change-Id: I155619741ecc36aa6bf13a9c1ccb03c7c1330f0a
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/406771
In channel_update(), we detect if channel is idle and if it is
idle then we free the syncpt. We do not free the syncpt if WFI is
scheduled on some other path.
Instead of checking for WFI, we can check if last submit is complete
or not (it can be WFI as well) and if last submit is complete then
we can free the syncpt.
Locking mechanism using submit lock will take care that syncpt is
kept alive until last submit or WFI completes
Bug 1305024
Change-Id: Ieafb82e1f924a01236ca73ed151eb03e88729835
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/405201
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
Add gk20a as a sub power domain of host1x. This enforces keeping
host1x on when using gk20a.
Bug 200003112
Change-Id: I08db595bc7b819d86d33fb98af0d8fb4de369463
Signed-off-by: Arto Merilainen <amerilainen@nvidia.com>
Reviewed-on: http://git-master/r/407006
(cherry picked from commit 009812b3e510518740e9c7e89b8b8b80439fe26a)
Reviewed-on: http://git-master/r/408013
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
When rail gating, we cleared all PMU status. Clear only the relevant
fields.
Change-Id: I5b4e8d74339aae6f1c6b945f45b8378bb563e8be
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/406843
Remove the path for turning on only gk20a. Always when turning on
hardware, turn both host1x and GPU on.
Change-Id: I5f972a487d3348bf2254bdb0fadb42ca600a559e
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/406405
This patch addresses two issues in fixes offset mappings:
- VA unmapping did not use lists safely. This caused an application hang
if the application did not free all (fixed offset) buffers before quiting.
- GPU was not powered closing AS node. If the address space had areas that
were not freed, the driver tried to access hw without powering it up first.
Change-Id: Ida526d222ea4e03b8d765eca16574ddc1823e60d
Signed-off-by: Arto Merilainen <amerilainen@nvidia.com>
Reviewed-on: http://git-master/r/405872
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Currently the generic platform is used only if the device tree
defines that we have a generic platform available, however, the
generic platform is fully compatible with the gk20a we have in tegra.
This patch modifies the definitions so that we use generic platform
also for tegra - even if if tegra configuration option is not enabled.
Bug 1434573
Change-Id: Ib35ce0ab935d27764e960bf4d74a5016ae047a1f
Signed-off-by: Arto Merilainen <amerilainen@nvidia.com>
Reviewed-on: http://git-master/r/396867
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Enable/disable powergating around regops so that the user
need not call the powergating IOCTLs with the regops IOCTL.
If the user does call the powergating IOCTL then the ref-counting
will ensure the correct behavior.
Bug 1451949
Change-Id: I1746f7d7cd1d2c0c497c213939df44a59d5d2834
Signed-off-by: Sandarbh Jain <sanjain@nvidia.com>
Reviewed-on: http://git-master/r/395131
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
gk20a is going to be moved under platform bus, however, the sysfs
interface should remain stable over the transition period. This
patch adds a symlink to keep current interfaces stable.
Bug 1311528
Bug 1434573
Change-Id: I951000f4b25285ff96e93eb726342d5b76cc84f1
Signed-off-by: Arto Merilainen <amerilainen@nvidia.com>
Reviewed-on: http://git-master/r/396926
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
Currently the gpu driver assumes that the GPU is a child of host1x.
This is an invalid assumption and therefore we need to get the host1x
device from device tree based on nvidia,host1x property.
Bug 1311528
Bug 1434573
Change-Id: I097e39369aaa15ab6652cd23f353f88f7c2b9c48
Signed-off-by: Arto Merilainen <amerilainen@nvidia.com>
Reviewed-on: http://git-master/r/395664
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
Make the perfmon sampling configurable, by adding an 'enabled' flag.
This is set according to the CONFIG initially. Modify the perfmon event
handler to not touch clock rates. Add a counter to count the number of
perfmon events.
Also add debugfs entries for the above.
Bug 1410515
Change-Id: Ic8197eef0e46e35af1179a5b06140393541cfd43
Signed-off-by: Prashant Malani <pmalani@nvidia.com>
Reviewed-on: http://git-master/r/351564
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
Currently creation of the load sysfs node is bound to devfreq
profile initialisation, however, this information is useful even
if the scaling is not enabled. This patch modifies the code to create
the sysfs node always.
Bug 1485489
Change-Id: Id20433344aa81108f89a36cd56c9a73dd9d2e1c8
Signed-off-by: Arto Merilainen <amerilainen@nvidia.com>
Reviewed-on: http://git-master/r/399474
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
On channel_finish() path, we first check if last submit was
WFI and in that case we do not submit new WFI but just wait
on old syncpt fence.
But it is possible that sync resource is already freed from
another path (channel_suspend())
Hence add a NULL check there to prevent Null pointer
exception.
Also, in channel_free() path, move syncpt free API after
channel_unbind() since we logically free the syncpt after
unbinding the channel.
Bug 1305024
Change-Id: Icc2fc83f004310560fc459527e1d37730428ec2d
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/400233
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Shridhar Rasal <srasal@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
All of the channel's submit jobs are added to the list channel->jobs
In channel_update(), we iterate over this list and check if any job
has completed. If any job is complete then we remove it from the list.
If this list is empty then it means channel is idle and we can free
its syncpt.
Hence after iterating this list, check if it is empty or not.
If it is empty AND if we are aggressive to free the syncpt
(syncpt_aggressive_destroy flag is set) then free the syncpt
at this point.
Keep the syncpt free code inside submit_lock to avoid race conditions.
Also, do not free the syncpt if we have already scheduled WFI on some
other path. In that case, syncpt is still needed to check for channel
idle. Once WFI completes, we free the syncpt anyway.
Bug 1305024
Change-Id: I1654e1db3b76b7ad14644dbb900b03f195ca3b2c
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/398617
Reviewed-by: Shridhar Rasal <srasal@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Add submit mutex lock to avoid race conditions between submitting
a job, removing a job and submitting WFI
With this lock make below operations atomic :
during submit_gpfifo() -
1. getting new syncpt
2. inserting syncpt increment
3. submitting gpfifo
4. setting job completion interrupt
during submit_wfi() -
1. getting new syncpt
2. inserting syncpt increment when idle
during channel_update() -
1. checking the submit job completion
2. freeing the job if it is completed
Bug 1305024
Change-Id: I0e3c0b8906d83fd59642344626ffdf24fad2aaab
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/397670
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
CBC frontdoor access works incorrectly in the simulator if CBC
is allocated from IOVA. This patch makes CBC allocation to happen
from physical memory if are running in simulator.
Bug 1409151
Change-Id: Ide08f4eab6911adc5737001c6d751ee227fec8f9
Signed-off-by: Arto Merilainen <amerilainen@nvidia.com>
Reviewed-on: http://git-master/r/401544
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
update headers from latest gen_register/ip_check info
Change-Id: Iae892ab7138e7bba4abc821b9d7893e768647daa
Signed-off-by: Ken Adams <kadams@nvidia.com>
Reviewed-on: http://git-master/r/399382
fixes one use of unitialized var
renames a register to make it match dev_* file.
Change-Id: Iafba659bbf2df509e0b494b2c5dab3819bf650ef
Signed-off-by: Ken Adams <kadams@nvidia.com>
Reviewed-on: http://git-master/r/394792
the CBC clean and invalidate is done for gk20a for bug 1409151, now
it's time to do the same fo gm20b. the text of this change is
strictly copied from gk20a, simply to make build pass.
Change-Id: Id717cb1e2ca0fa3f8483c3fd40d7629a9cc85ec9
Signed-off-by: Bo Yan <byan@nvidia.com>
Call railgate and unrailgate ops only if they are defined.
Change-Id: I0a87ac0259af3719098d4372be7e25f0a54416fc
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/396375
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Arto Merilainen <amerilainen@nvidia.com>
Reviewed-by: Bo Yan <byan@nvidia.com>
gk20a_ltc_init_comptags and gk20a_ltc_clear_comptags are defined
in ltc_gk20a.c, gm20b has its own init/clear functions, so remove
these two from ltc_common.c
change nvhost_allocator_init to gk20a_allocator_init, this is a
left-over after rebase, just like the above 2 function definitions,
so fix it.
Change-Id: I829639dd7fee9110dd65d5df7d7f0f8fe5fca6c1
Signed-off-by: Bo Yan <byan@nvidia.com>
Two calls to gk20a_init_gpu_characteristics() is not needed.
GPU sim aperture was defined twice.
Change-Id: Iaf78611717c55b1cae456358fcae2641ad552d9f
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/383855
Reviewed-by: Automatic_Commit_Validation_User
Move the set_zbc_color_entry() operation to the LTC common code
as this is part of the LTC.
Change-Id: Iba41e32e273d86fcf76094440c2313a75a928326
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/366174
(cherry picked from commit 569ce1f3370532f12face62664a07d2d17a96bef)
Reviewed-on: http://git-master/r/376505
Reviewed-by: Automatic_Commit_Validation_User
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Move the comptags cache init and clear operations to the LTC
from the gr code as this is part of the LTC.
Change-Id: I2163a09bcfe68a8833d5135bfa4035f37c7157ab
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/366173
(cherry picked from commit f56d4723f996f0dd2fcf0ae4279dbc4b6483b405)
Reviewed-on: http://git-master/r/376504
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-by: Kevin Huang (Eng-SW) <kevinh@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
This patch adds an interface for the gk20a driver to have
generic ops which are implemented by a chip specific HAL
layer. The HAL layer is provided by the gpu_ops struct which
defines function pointers for chip specific oeprations. This
is necessary for supporting multiple chips with the same
code base and minimal per chip hacking.
Also, since much code is common except in the HW headers
that are needed, the LTC common code is compiled by first
including the necessary chip specific header(s) and then
including the ltc common code file.
This allows for easy updating of functions that are only
different between chips as a result of register offset and
field changes whereas the HAL provides the mechanism for
functions that have actual semantic changes.
Change-Id: I96f9a8350d34e7e101beb141d4521fab69dcfbae
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/360627
(cherry picked from commit fe90cad939cf979fc2516a96e5911bd8ab6fc457)
Reviewed-on: http://git-master/r/362228
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
This adds new IOCTL that provides information for the userspace for
GPU characterization. Specifically, the following items are provided:
GPU arch/impl/rev, number of GPCs, L2 cache size, on-board video
memory size, num of tpc:s per gpc, and bus type. The primary user of
the new IOCTL will be rmapi_tegra.
Bug 1392902
Change-Id: Ia7c25c83c8a07821ec60be3edd018c6e0894df0f
Reviewed-on: http://git-master/r/346379
(cherry picked from commit 0b9ceca5a06d07cc8d281a92b76ebef8d4da0c92)
Reviewed-on: http://git-master/r/350658
Reviewed-by: Sami Kiminki <skiminki@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Tested-by: Terje Bergstrom <tbergstrom@nvidia.com>
Signed-off-by: Sami Kiminki <skiminki@nvidia.com>