Add support for running as a guest system under a hypervisor, using
Host1x HW's virtualization capabilities.
In effect this involves not touching apertures other than the 'vm'
aperture, and channels and syncpoints other than those that are
assigned to the VM.
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Change-Id: Ideec5b0b9a692aa3ee6b4a0240c5755c983cb7bd
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvidia/+/2811837
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
On Tegra234, engines that are programmed through Host1x channels can
be attached to either the NISO0 or NISO1 SMMU. Because of that, when
selecting a context device to use with an engine, we need to select
one that is also attached to the same SMMU.
Add a parameter to host1x_memory_context_alloc to specify which device
we are allocating a context for, and use it to pick an appropriate
context device.
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Signed-off-by: Thierry Reding <treding@nvidia.com>
Change-Id: I32af312c85164b72c14409d816d3b50ad5c7bfe5
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvidia/+/2811836
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Currently all fences have a 30 second timeout to ensure they are
cleaned up if the fence never completes otherwise. However, this
one size fits all solution doesn't actually fit in every case,
such as syncpoint waiting where we want to be able to have timeouts
longer than 30 seconds. As such, we want to be able to give control
over fence cancellation to the caller (and maybe eventually get rid
of the internal timeout altogether).
Here we add this cancellation mechanism by essentially adding a
function for entering the timeout path by function call, and changing
the syncpoint wait function to use it.
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Change-Id: I4600544afe21efdd3f7d06362bd124130ddec3db
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvidia/+/2786637
Reviewed-by: Jonathan Hunter <jonathanh@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
Move from the old, complex intr handling code to a new implementation
based on dma_fences. While there is a fair bit of churn to get there,
the new implementation is much simpler and likely faster as well due
to allowing signaling directly from interrupt context.
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Change-Id: I81c47fa1946679813f90e3fd8e1d1e9d6342143e
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvidia/+/2786635
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
In anticipation of removal of the intr API, implement job tracking
using DMA fences instead. The main two things about this are
making cdma_update schedule the work since fence completion can
now be called from interrupt context, and some complication in
ensuring the callback is not running when we free the fence.
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Change-Id: I25f7f5a6cad24a00563eed79e0e17b1df1eadcdc
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvidia/+/2786636
Reviewed-by: Jonathan Hunter <jonathanh@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
In anticipation of removal of the intr API, move host1x_syncpt_wait
to use DMA fences instead. As of this patch, this means that waits
have a 30 second maximum timeout because of the implicit timeout
we have with fences, but that will be lifted in a follow-up patch.
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Change-Id: I82a262b73861b35c4031983f4134d4b4006e3b16
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvidia/+/2786634
Reviewed-by: Jonathan Hunter <jonathanh@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
Add support for the optical flow accelerator. Implementation is the
same as for other Falcons except that we omit some legacy things
since the engine only exists from T234 onwards, and the addition
of having to initialize the OFA's safety RAM before boot.
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Change-Id: I9612e82a116cc76be492a0c533afce67c42f6a2c
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvidia/+/2784964
Reviewed-by: Jonathan Hunter <jonathanh@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
host1x_cdma_push_wide had the assumptions that the last parameter word
was a NOP opcode, and that NOP opcodes could be used in all situations.
Neither are true with the new job opcode sequence, so adjust the
function to not have these assumptions, and instead place an early
RESTART opcode when necessary to jump back to the beginning of the
pushbuffer.
Bug 3724727
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Change-Id: I88074e838e4f1471471f0848aca9d8d73c7b5f8c
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvidia/+/2745959
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Brad Griffis <bgriffis@nvidia.com>
Reviewed-by: Jonathan Hunter <jonathanh@nvidia.com>
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: Brad Griffis <bgriffis@nvidia.com>
When MLOCK enforcement is enabled, the 0-word write currently done
is rejected by the hardware outside of an MLOCK region. As such,
on these chips, which also have the newer, more convenient RESTART_W
opcode, use that instead to skip over the timed out job.
Bug 3724727
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Change-Id: I9e22eb7ccd17127ca517a034f5dbd32326412f9d
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvidia/+/2745957
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Brad Griffis <bgriffis@nvidia.com>
Reviewed-by: Jonathan Hunter <jonathanh@nvidia.com>
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: Brad Griffis <bgriffis@nvidia.com>
With the full-featured opcode sequence using MLOCKs, we need to also
unlock those MLOCKs in the event of a timeout. However, it turns out
that on Tegra186/Tegra194, by default, we don't need to do this;
furthermore, on Tegra234 it is much simpler to do; so only implement
this on Tegra234 for the time being.
Bug 3724727
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Change-Id: Icc15ae705844cd26ae3f1d1146ff20f1d9b7a14d
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvidia/+/2745956
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Brad Griffis <bgriffis@nvidia.com>
Reviewed-by: Jonathan Hunter <jonathanh@nvidia.com>
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: Brad Griffis <bgriffis@nvidia.com>
For new (Tegra186+) SoCs, use a new ('full-featured') job opcode
sequence that is compatible with virtualization. In particular,
the Host1x hardware in Tegra234 is more strict regarding the sequence,
requiring ACQUIRE_MLOCK-SETCLASS-SETSTREAMID opcodes to occur in
that sequence without gaps (except for SETPAYLOAD), so let's do it
properly in one go now.
Bug 3724727
Change-Id: Ifae148975457d2d275cfae25fcaf735e6529fbd3
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvidia/+/2745964
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Brad Griffis <bgriffis@nvidia.com>
Reviewed-by: Jonathan Hunter <jonathanh@nvidia.com>
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: Brad Griffis <bgriffis@nvidia.com>
On Tegra234, each Host1x VM has 8 interrupt lines. Each syncpoint
can be configured with which interrupt line should be used for
threshold interrupt, allowing for load balancing.
For now, to keep backwards compatibility, just set all syncpoints
to the first interrupt.
Bug 3724727
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Change-Id: I7058f9bd38bc59db83ee92613e4e733813db7a46
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvidia/+/2745954
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Brad Griffis <bgriffis@nvidia.com>
Reviewed-by: Jonathan Hunter <jonathanh@nvidia.com>
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: Brad Griffis <bgriffis@nvidia.com>
Program virtualization tables specifying which VMs have access to which
Host1x hardware resources. Programming these has become mandatory in
Tegra234.
For now, since the driver does not operate as a Host1x hypervisor, we
basically allow access to everything to everyone.
Bug 3724727
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Change-Id: I67e09e7d9c55128c40c54c078912cc38c7616eac
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvidia/+/2745952
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Brad Griffis <bgriffis@nvidia.com>
Reviewed-by: Jonathan Hunter <jonathanh@nvidia.com>
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: Brad Griffis <bgriffis@nvidia.com>
Add code to do stream ID switching at the beginning of a job. The
stream ID is switched to the stream ID specified by the context
passed in the job structure.
Before switching the stream ID, an OP_DONE wait is done on the
channel's engine to ensure that there is no residual ongoing
work that might do DMA using the new stream ID.
Bug 3724727
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Change-Id: I7705c33417e27d99bb265db13eb3c19960f5a363
Add code to register context devices from device tree, allocate them
out and manage their refcounts.
Bug 3724727
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Change-Id: I1e50462fc07247faf4cb217f4ab75fb610b10024
Update the host1x driver to align with the latest upstream host1x
driver from Linux v5.19-rc1. Please note that the context bus support
is not included, because this needs to be built into the kernel.
Bug 3724727
Change-Id: Ic6fbe001462d160d1bb24f76038dd755c5550690
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvidia/+/2739538
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
Buffer mappings used in job submissions are usually small and not
rapidly reused as opposed to framebuffers (which are usually large and
rapidly reused, for example when page-flipping between double-buffered
framebuffers). Avoid going through the mapping cache for these buffers
since the cache would also lead to leaks if nobody is ever releasing
the cache's last reference. For DRM/KMS these last references are
dropped when the framebuffers are removed and therefore no longer
needed.
While at it, also add a note about the need to explicitly remove the
final reference to the mapping in the cache.
Bug 3556250
Change-Id: If6043dec54d6f73d1dc773cf854fa68e25c6556e
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvidia/+/2693869
Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com>
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
GVS: Gerrit_Virtual_Submit
BOs pinned by calling host1x_bo_pin() are never being freed as
expected. Fix this by removing the unneeded refcount increment
in host1x_bo_pin() when creating a new mapping.
JIRA LS-128
Change-Id: I948a289e7953621302d75f5509464e7607d72299
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvidia/+/2657439
Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com>
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
GVS: Gerrit_Virtual_Submit
In some cases we may need to initialize the host1x client first before
registering it. This commit adds a new helper that will do nothing but
the initialization of the data structure.
At the same time, the initialization is removed from the registration
function. Note, however, that for simplicity we explicitly initialize
the client when the host1x_client_register() function is called, as
opposed to the low-level __host1x_client_register() function. This
allows existing callers to remain unchanged.
Change-Id: I504462a1749d8f9602f6b4075f1aec56eefb14e5
Signed-off-by: Thierry Reding <treding@nvidia.com>
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvidia/+/2545957
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: svc_kernel_abi <svc_kernel_abi@nvidia.com>
Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Syncpoint interrupts are not working as expected on Tegra194. The
problem is that the syncpoint interrupt threshold being used is the
global interrupt threshold and not the virtual interrupt threshold.
Fix this by using the virtual interrupt threshold which aligns with
downstream.
JIRA LS-34
Change-Id: I4a3191c12298d4f0af264fd1f89754171a3829e9
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvidia/+/2498820
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Trivial clean-up for building Host1x by ...
1. Removing the Kconfig file because this is not needed for building
as an external module and we only plan to build this as an external
module.
2. Moving both 'host1x-next.h' header files under the Host1x source
directory so that we don't need to pass the path for finding the
header when building Host1x as an external module.
3. Remove the 'CONFIG_TEGRA_HOST1X_NEXT' variable because we always
build Host1x as a module.
Bug 200687525
Change-Id: Ica5534f92a3a2ee6f6500ec419fa05f03993b90b
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvidia/+/2485760
Reviewed-by: Mikko Perttunen <mperttunen@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit