Not sure if there's an actual bug or JIRA filed for this, but the
change here fixes a long standing bug in the MM code for unit tests.
Te GMMU programming code verifies that the CPU _physical_ address
programmed into the GMMU PDE0 is a valid Tegra SoC CPU physical
address. That means that it's not too large a value.
The POSIX imlementation of the nvgpu_mem related code used the CPU
virtual address as the "phys" address. Obviously, in userspace,
there's no access to physical addresses, so in some sense it's a
meaningless function. But the GMMU code does care, as described
above, about the format of the address.
The fix is simple enough: since the nvgpu_mem_get_addr() and
nvgpu_mem_get_phys_addr() values shouldn't actually be accessed by
the driver anyway (they could be vidmem addresses or IOVA addresses
in real life) ANDing them with 0xffffffff (e.g 32 bits) truncates
the potentially problematic CPU virtual address bits returned by
malloc() in the POSIX environment.
With this, a run of the unit test framework passes for me locally
on my Ubuntu 18 machine.
Also, clean up a few whitespace issues I noticed while I debugged
this and fix another long standing bug where the
NVGPU_DEFAULT_DBG_MASK was not being copied to g->log_mask during
gk20a struct init.
Change-Id: Ie92d3bd26240d194183b4376973d4d32cb6f9b8f
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2395953
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Nvgpu does not support nested interrupts and as a result priv/pbus
interrupt do not reach cpu while other interrupts on intr_0 (stall)
tree are being processed. This issue is not specific to priv/pbus
but since pbus errors are critical, it is important to detect it
early on.
Below is the snippet from one of the failing logs where nvgpu
is doing recovery to process gr interrupt.
Right after GR engine is reset (PGRAPH of PMC_ENABLE), failing priv
accesses should have triggered pbus interrupt but it does not reach cpu
until gr interrupt is handled. Any interrupt that requires recovery will
take longer to finish isr as recovery is done as part of isr.
Also intr_0 (stall) interrupts are paused while stall interrupt is being
processed.
gm20b_gr_falcon_bind_instblk:147 [ERR] arbiter idle timeout, status: badf1020
gm20b_gr_falcon_wait_for_fecs_arb_idle:125 [ERR] arbiter idle timeout, fecs ctxsw status: 0xbadf1020
Fix to detect pbus intr while other stall interrupts are being processed
is to move pbus intr enable/disable/clear/handle to nonstall (intr_1)
tree. Configure pbus_intr_en_1 to route pbus to nostall tree.
Priv interrupts cannot be moved to nonstall (intr_1) tree due
to h/w not supporting this.
In Turing, moving pbus intr to nonstall is not feasible as mc_intr(1)
tree is deprecated. Add Turing specific stall intr handler hals with
original logic to route pbus intr to mc_intr(0).
JIRA NVGPU-25
Bug 200603566
Change-Id: I36fc376800802f20a0ea581b4f787bcc6c73ec7e
Signed-off-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2354192
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Many tests used various incarnations of the mock register framework.
This was based on a dump of gv11b registers. Tests that greatly
benefitted from having generally sane register values all rely
heavily on this framework.
However, every test essentially did their own thing. This was not
efficient and has caused a some issues in cleaning up the device and
host code.
Therefore introduce a much leaner and simplified register framework.
All unit tests now automatically get a good subset of the gv11b
registers auto-populated. As part of this also populate the HAL with
a nvgpu_detect_chip() call. Many tests can now _probably_ have all
their HAL init (except dummy HAL stuff) deleted. But this does
require a few fixups here and there to set HALs to NULL where tests
expect HALs to be NULL by default.
Where necessary HALs are cleared with a memset to prevent unwanted
code from executing.
Overall, this imposes a far smaller burden on tests to initialize
their environments.
Something to consider for the future, though, is how to handle
supporting multiple chips in the unit test world.
JIRA NVGPU-5422
Change-Id: Icf1a63f728e9c5671ee0fdb726c235ffbd2843e2
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2335334
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Use the updated test types.
Other (setup) --> Other (cleanup)
Feature based --> Feature
Also, change test names to unique identifier to avoid below errors in
SWVS:
Warning doxygenfunction: Unable to resolve multiple matches for
function 'test_xxx' with arguments () in doxygen xml output for
project 'nvgpu_doxygen'...
JIRA NVGPU-4679
Change-Id: If8a6b9ec8b26c3f99bc657bce24751b0e75fabbf
Signed-off-by: tkudav <tkudav@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2269046
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>