Instead of relying on kernel version to determine if certain functions
or structure are present in the kernel, use the conftest.sh script to
test which functions, structures, etc are present at compile time. This
is beneficial for working with 3rd party Linux kernels that may have
back-ported upstream changes into their kernel and so the kernel version
checks do not work.
Bug 4119327
Change-Id: I56281fa5d95862338bd8a43d6e22225c27590462
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nv-oot/+/2984422
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Add a callback function host1x_syncpt_fence_signaled() to
.signaled op under host1x_syncpt_fence_ops.
.signaled is a optional operation. The change here is a performance
improvement and acts as temporary workaround for sync_file code not
calling enable_signaling.
Bug 4085239
Change-Id: Ief19c2d9af3f504bb1a067bfc9a31b9ef2ecd8fc
Signed-off-by: Santosh BS <santoshb@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nv-oot/+/2935867
Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
Building the Android Common Kernel (ACK) with clang exposes build
errors due to const correctness issues in actmon code.
Remove actmon name when it is not necessary.
Set host1x_info var as const.
Set host1x_actmon_entry var as const.
Bug 3974840
Change-Id: I50c1437199ad549f397944aefa535103ed2fa05c
Signed-off-by: Bruce Xu <brucex@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nv-oot/+/2921160
Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com>
Reviewed-by: Johnny Liu <johnliu@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
For tasks that execute very quickly, but not quickly enough to be
already complete by the time execution reaches host1x_syncpt_wait,
the proportion of time spent in allocating a fence and invoking
dma_fence_wait_timeout becomes dominating in comparison to the
time it actually takes to execute the task.
To improve wait latency in these cases, replace the current
"is threshold already reached" check with a short spin loop
to catch these situations before going to the heavy machinery.
For longer waits, since this function blocks anyway, the only
negative effect is slightly increased CPU consumption due to
the loop.
Bug 4001325
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Change-Id: I44e99cda88b4bcb33f190884d1a2e5f7588cb775
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvidia/+/2881716
Reviewed-by: Santosh BS <santoshb@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nv-oot/+/2916412
The current submission opcode sequence first takes the engine MLOCK,
and then switches to HOST1X class to wait prefences. This is fine
while we only use a single channel per engine and there is no
virtualization, since jobs are serialized on that one channel anyway.
However, when that assumption doesn't hold, we are keeping the
engine locked while not running anything on it while waiting for
prefences to complete.
To resolve this, execute wait commands in the beginning of the job
outside the engine MLOCK. We still take the HOST1X MLOCK because
recent hardware requires register opcodes to be executed within some
MLOCK, but the hardware also allows unlimited channels to take the
HOST1X MLOCK at the same time.
Jira HOSTX-4687
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Change-Id: I783cbc7f1bbd7415fbf0e61163935186b2ba0a44
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nv-oot/+/2911124
Reviewed-by: Santosh BS <santoshb@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
Upstream Linux commit ("1369459b2e21 iommu: Add a gfp parameter to
iommu_map()") adds a new parameter to the iommu_map function and this
breaks building the host1x driver with Linux v6.3.
Upstream Linux commit 2a81ada32f0e ("driver core: make struct
bus_type.uevent() take a const *") updates the uevent function pointer
type to make the device structure const which also breaks building the
host1x driver with Linux v6.3.
Address both of these issues to fix building the host1x driver with
Linux v6.3.
Bug 4014315
Change-Id: Ibd27f5e8442cc6970bcaac0dcfb9fc262860aee9
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvidia/+/2867136
Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com>
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
Add API for reading the activity monitor average count for VIC,
NVENC, and NVDEC. There is currently no support for initializing
actmon, so this relies on someone else on a virtualized system
having initialized it (and mapped the actmon region read-only).
Bug 3973633
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Change-Id: Ia1bfec6a090d4effb288b17cbac4d42bf5d0b4e5
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvidia/+/2864719
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Jonathan Hunter <jonathanh@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
dma_fence_wait_timeout (along with a host of other jiffies-based
timeouting functions) returns zero both in case of timeout and when
the wait completes during the last jiffy before timeout. As such,
we can't rely on it to distinguish between success and timeout.
To prevent confusing callers by returning -EAGAIN before the timeout
period has elapsed, check if the fence got signaled again after
the wait.
Bug 3955201
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Change-Id: Ib90cbd3d78bac773a724a523925ae5d1b70107c8
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvidia/+/2857405
Reviewed-by: Jonathan Hunter <jonathanh@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
The error handling for platform_get_irq() failing no longer works after
a recent change, clang now points this out with a warning:
drivers/gpu/host1x/dev.c:520:6: error: variable 'syncpt_irq' is uninitialized when used here [-Werror,-Wuninitialized]
if (syncpt_irq < 0)
^~~~~~~~~~
Fix this by removing the variable and checking the correct error status.
Fixes: 625d4ffb438c ("gpu: host1x: Rewrite syncpoint interrupt handling")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com>
Reported-by: "kernelci.org bot" <bot@kernelci.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Change-Id: I780c935101b8c65070eeba3552f96a1bfb109592
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvidia/+/2853776
Reviewed-by: Santosh BS <santoshb@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
Add support for syncpoint pools. These are configuration-dependent
subsets of syncpoints that can only be allocated by specifying
an allocation specifically from that pool.
In this patch, we add support for the GPU pool. On certain systems,
the GPU is for safety purposes limited to accessing a specific set
of syncpoints. Therefore, syncpoints allocated for GPU use need to
come from this pool.
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Change-Id: I385b8bdfdac8573b26f0b1a0feaf05de071148a1
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvidia/+/2826198
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Arto Merilainen <amerilainen@nvidia.com>
Reviewed-by: Sanif Veeras <sveeras@nvidia.com>
Reviewed-by: Raghavendra Vishnu Kumar <rvk@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
Tested-by: Sanif Veeras <sveeras@nvidia.com>
Add support for running as a guest system under a hypervisor, using
Host1x HW's virtualization capabilities.
In effect this involves not touching apertures other than the 'vm'
aperture, and channels and syncpoints other than those that are
assigned to the VM.
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Change-Id: Ideec5b0b9a692aa3ee6b4a0240c5755c983cb7bd
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvidia/+/2811837
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
On Tegra234, engines that are programmed through Host1x channels can
be attached to either the NISO0 or NISO1 SMMU. Because of that, when
selecting a context device to use with an engine, we need to select
one that is also attached to the same SMMU.
Add a parameter to host1x_memory_context_alloc to specify which device
we are allocating a context for, and use it to pick an appropriate
context device.
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Change-Id: I32af312c85164b72c14409d816d3b50ad5c7bfe5
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvidia/+/2811836
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Currently all fences have a 30 second timeout to ensure they are
cleaned up if the fence never completes otherwise. However, this
one size fits all solution doesn't actually fit in every case,
such as syncpoint waiting where we want to be able to have timeouts
longer than 30 seconds. As such, we want to be able to give control
over fence cancellation to the caller (and maybe eventually get rid
of the internal timeout altogether).
Here we add this cancellation mechanism by essentially adding a
function for entering the timeout path by function call, and changing
the syncpoint wait function to use it.
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Change-Id: I4600544afe21efdd3f7d06362bd124130ddec3db
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvidia/+/2786637
Reviewed-by: Jonathan Hunter <jonathanh@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
Move from the old, complex intr handling code to a new implementation
based on dma_fences. While there is a fair bit of churn to get there,
the new implementation is much simpler and likely faster as well due
to allowing signaling directly from interrupt context.
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Change-Id: I81c47fa1946679813f90e3fd8e1d1e9d6342143e
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvidia/+/2786635
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
In anticipation of removal of the intr API, implement job tracking
using DMA fences instead. The main two things about this are
making cdma_update schedule the work since fence completion can
now be called from interrupt context, and some complication in
ensuring the callback is not running when we free the fence.
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Change-Id: I25f7f5a6cad24a00563eed79e0e17b1df1eadcdc
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvidia/+/2786636
Reviewed-by: Jonathan Hunter <jonathanh@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
In anticipation of removal of the intr API, move host1x_syncpt_wait
to use DMA fences instead. As of this patch, this means that waits
have a 30 second maximum timeout because of the implicit timeout
we have with fences, but that will be lifted in a follow-up patch.
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Change-Id: I82a262b73861b35c4031983f4134d4b4006e3b16
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvidia/+/2786634
Reviewed-by: Jonathan Hunter <jonathanh@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
Add support for the optical flow accelerator. Implementation is the
same as for other Falcons except that we omit some legacy things
since the engine only exists from T234 onwards, and the addition
of having to initialize the OFA's safety RAM before boot.
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Change-Id: I9612e82a116cc76be492a0c533afce67c42f6a2c
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvidia/+/2784964
Reviewed-by: Jonathan Hunter <jonathanh@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
host1x_cdma_push_wide had the assumptions that the last parameter word
was a NOP opcode, and that NOP opcodes could be used in all situations.
Neither are true with the new job opcode sequence, so adjust the
function to not have these assumptions, and instead place an early
RESTART opcode when necessary to jump back to the beginning of the
pushbuffer.
Bug 3724727
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Change-Id: I88074e838e4f1471471f0848aca9d8d73c7b5f8c
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvidia/+/2745959
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Brad Griffis <bgriffis@nvidia.com>
Reviewed-by: Jonathan Hunter <jonathanh@nvidia.com>
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: Brad Griffis <bgriffis@nvidia.com>
When MLOCK enforcement is enabled, the 0-word write currently done
is rejected by the hardware outside of an MLOCK region. As such,
on these chips, which also have the newer, more convenient RESTART_W
opcode, use that instead to skip over the timed out job.
Bug 3724727
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Change-Id: I9e22eb7ccd17127ca517a034f5dbd32326412f9d
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvidia/+/2745957
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Brad Griffis <bgriffis@nvidia.com>
Reviewed-by: Jonathan Hunter <jonathanh@nvidia.com>
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: Brad Griffis <bgriffis@nvidia.com>
With the full-featured opcode sequence using MLOCKs, we need to also
unlock those MLOCKs in the event of a timeout. However, it turns out
that on Tegra186/Tegra194, by default, we don't need to do this;
furthermore, on Tegra234 it is much simpler to do; so only implement
this on Tegra234 for the time being.
Bug 3724727
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Change-Id: Icc15ae705844cd26ae3f1d1146ff20f1d9b7a14d
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvidia/+/2745956
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Brad Griffis <bgriffis@nvidia.com>
Reviewed-by: Jonathan Hunter <jonathanh@nvidia.com>
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: Brad Griffis <bgriffis@nvidia.com>
For new (Tegra186+) SoCs, use a new ('full-featured') job opcode
sequence that is compatible with virtualization. In particular,
the Host1x hardware in Tegra234 is more strict regarding the sequence,
requiring ACQUIRE_MLOCK-SETCLASS-SETSTREAMID opcodes to occur in
that sequence without gaps (except for SETPAYLOAD), so let's do it
properly in one go now.
Bug 3724727
Change-Id: Ifae148975457d2d275cfae25fcaf735e6529fbd3
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvidia/+/2745964
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Brad Griffis <bgriffis@nvidia.com>
Reviewed-by: Jonathan Hunter <jonathanh@nvidia.com>
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: Brad Griffis <bgriffis@nvidia.com>