common.cic unit is divided into common.cic.mon and common.cic.rm
based on rm and mon process split.
CIC-mon subunit includes the code which is utilized in critical
interrupt handling path like initialization, error detection and
error reporting path. CIC-rm subunit includes the code corresponding
to rest of interrupt handling(like collecting error debug data from
registers) and ISR status management (status of deferred interrupts).
Split the CIC APIs and data-members into above two subunits.
JIRA NVGPU-6899
Change-Id: I151b59105ff570607c4a62e974785e9c1323ef69
Signed-off-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2551897
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Add REMAP ioctl and accompanying support to the linux nvgpu driver.
REMAP support provides per-page control over sparse VM areas using the
concept of a virtual memory pool.
The REMAP ioctl accepts a list of operations (each a map or unmap) that
modify the VM area pages tracked by the virtual mmemory pool.
Inclusion of REMAP support in the nvgpu build is controlled by the new
CONFIG_NVGPU_REMAP flag. This flag is enabled by default for linux builds.
A new NVGPU_GPU_FLAGS_SUPPORT_REMAP characteristics flag is added for use
in detecting when REMAP support is available.
When a VM allocation tagged with NVGPU_VM_AREA_ALLOC_SPARSE is made the
base virtual memory pool resources are allocated. Per-page resources are
later allocated when the NVGPU_AS_IOCTL_REMAP ioctl is issued. All REMAP
resources are released when the corresponding VM area is freed.
Jira NVGPU-6804
Change-Id: I1f2cdc0c06c1698a62640c1c6fbcb2f9db24a0bc
Signed-off-by: scottl <scottl@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2542178
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Add a new Central Interrupt Controller(CIC) unit in common code.
The interrupt handling is done in a distributed manner currently.
The error handling policy for different errors resides in each unit's
ISR code. The goal is to converge this data under one central place -
the CIC unit.
This patch creates framework for CIC unit and moves the gv11b QNX
safety LUT to CIC unit. All the error reporting APIs from different
units are also moved to CIC.
New APIs are exposed by CIC unit to access its internal data like:
1. Struct err_desc - the static err handling /injection data per
error id
2. Num_hw_modules - the number of error reporting HW units
supported by CIC
Init and deinit of CIC unit:
1. CIC unit should be initialized earlyon during boot so that it
is available for any interrupt handling.
2. Initialize CIC just before the interrupts are enabled during
boot.
3. Similarly, CIC is disabled late during deinit cycle; right
after the interrupts are masked.
LUT:
1. LUT is currently used only for reporting error to safety
services in gv11b QNX safety build.
2. This error handling policy LUT currently has only two levels
of handing - correctable and quiecse.
3. Once, the error handling policy decision is moved from leaf
unit nodes to CIC, LUT will be updated to have additional levels
like fast recovery and full recovery.
4. Also, then a separate LUT will be added for each platform/build.
5. In current framework, the LUT is set to NULL for all
configurations except gv11b.
report_err() ops is added to report error to safety services.
This ops is only effective for gv11b qnx build; and set to NULL for
other configurations.
NVGPU-6521
NVGPU-6523
NVGPU-6750
NVGPU-6758
NVGPU-6760
NVGPU-6754
Change-Id: I24be7836a96d787741e37b732e19863ed8014635
Signed-off-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2518683
Reviewed-by: Ajesh K V <akv@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Currently, there are few chip specific erratas present in nvgpu code.
For better traceability of the erratas and corresponding fixes,
introduce flags to indicate existing erratas on a chip. These flags
decide if a corresponding solution is applied to the chip(s).
This patch introduces below functions to handle errata flags:
- nvgpu_init_errata_flags
- nvgpu_set_errata
- nvgpu_is_errata_present
- nvgpu_print_errata_flags
- nvgpu_free_errata_flags
nvgpu_print_errata_flags: print below details of erratas present in chip
1. errata flag name
2. chip where the errata was first discovered
3. short description of the errata
Flags corresponding to erratas present in a chip are set during chip hal
init sequence.
JIRA NVGPU-6510
Change-Id: Id5a8fb627222ac0a585aba071af052950f4de965
Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2498095
Reviewed-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
- moved reg fields to gk20a
- added os abstract register accessor in nvgpu/io.h
- defined linux register access abstract implementation
- hook up with posix. posix implementation of the register accessor uses
the high 4 bit of address to identify register apertures then call the
according callbacks.
It helps to unify code across OSes.
Bug 2999617
Signed-off-by: Richard Zhao <rizhao@nvidia.com>
Change-Id: Ifcb737e4b4d5b1d8bae310ae50b1ce0aa04f750c
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2497937
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
The simulator ring buffer DMA interface supports buffers of the following sizes:
4, 8, 12 and 16K. At present, it is configured to 4K and it happens to match
with the kernel PAGE_SIZE, which is used to wrap back the GET/PUT pointers once
4K is reached. However, this is not always true; for instance, take 64K pages.
Hence, replace PAGE_SIZE with SIM_BFR_SIZE.
Introduce macro NVGPU_CPU_PAGE_SIZE which aliases to PAGE_SIZE and replace
latter with former.
Bug 200658101
Jira NVGPU-6018
Change-Id: I83cc62b87291734015c51f3e5a98173549e065de
Signed-off-by: Antony Clince Alex <aalex@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2420728
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
* Removed unnecessary irqs_enabled flag, and
Replaced enable/disable irq logics with nvgpu variant functions.
* Added nvgpu_interrupts data structure to hold interrupt details.
* Interpret all stall irqs first and followed by nonstall irq from dt.
* Used interrupt size checks for enable/disable irqs instead of
comparing stall and nonstall interrupt lines.
Now adding new stall interrupt lines as easy as just updating macro.
Jira NVGPU-6019
Change-Id: I5a5eaa8d333c68ee87d25d2b45ec244ec8d7b297
Signed-off-by: Sagar Kadamati <skadamati@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2400777
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
The max values that the Linux nvhost driver tracks are adding some
complexity to our wrapper APIs. Max values are used only for internal
submit syncpoint tracking, so implement that tracking in the sync code
by just storing the last value that the syncpoing will reach after all
jobs are complete.
The value is a simple u32. It's accessed from functions in the submit
path that already is serialized, so there's no worrying about atomic
modifications.
Previously nvhost_syncpt_set_min_eq_max_ext() was used to reset the
syncpoint when necessary. Now with the internal max value we'll use
nvhost_syncpt_set_minval(), so add a wrapper for it.
The maxval reported with the user syncpoint allocation is just the
current value at allocation time since no jobs have affected it yet;
there is no means for the kernel to track the max value of user
syncpoints.
Jira NVGPU-5506
Change-Id: I34672eaa7fe3af36b2fbac92d11babe2bc6a2d2b
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2400635
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Not sure if there's an actual bug or JIRA filed for this, but the
change here fixes a long standing bug in the MM code for unit tests.
Te GMMU programming code verifies that the CPU _physical_ address
programmed into the GMMU PDE0 is a valid Tegra SoC CPU physical
address. That means that it's not too large a value.
The POSIX imlementation of the nvgpu_mem related code used the CPU
virtual address as the "phys" address. Obviously, in userspace,
there's no access to physical addresses, so in some sense it's a
meaningless function. But the GMMU code does care, as described
above, about the format of the address.
The fix is simple enough: since the nvgpu_mem_get_addr() and
nvgpu_mem_get_phys_addr() values shouldn't actually be accessed by
the driver anyway (they could be vidmem addresses or IOVA addresses
in real life) ANDing them with 0xffffffff (e.g 32 bits) truncates
the potentially problematic CPU virtual address bits returned by
malloc() in the POSIX environment.
With this, a run of the unit test framework passes for me locally
on my Ubuntu 18 machine.
Also, clean up a few whitespace issues I noticed while I debugged
this and fix another long standing bug where the
NVGPU_DEFAULT_DBG_MASK was not being copied to g->log_mask during
gk20a struct init.
Change-Id: Ie92d3bd26240d194183b4376973d4d32cb6f9b8f
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2395953
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
This patch updates nvgpu_assert as macro to print the information
about the calling function. Specifically, to print the function
name and the line number details.
This patch introduces misra violations (misra_c_2012_rule_10_1_violation)
in nvgpu_assert(). However, leaving misra violations unfixed has low
safety impact since misra violations are coming after fatal error is
hit where GPU driver is not expected to be serviceable thereafter.
Further, this patch provides debug benefit in quickly finding the
function that lead to the exit of NvGPU process.
Bug 2964898
Change-Id: Iba85f4a9226742a0bb08b045bcbfa26949bbe746
Signed-off-by: Rajesh Devaraj <rdevaraj@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2342086
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: Ankur Kishore <ankkishore@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Currently, nvgpu_writel_loop() writes to a register and immediately
checks if register value is updated. It might take some time for
hardware registers to get updated with value written by software.
Modify nvgpu_writel_loop() to accept number of retries to check if
register value is updated and assert with nvgpu_assert().
Also, move nvgpu_writel_loop() to common code and use generic
nvgpu_readl() and nvgpu_writel() APIs.
JIRA NVGPU-5490
Change-Id: Iaaf24203a91eee3d05de7d0c7dea18113367de5f
Signed-off-by: Vedashree Vidwans <vvidwans@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2348628
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
Reviewed-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Split out the max value increment and syncpt interrupt registration out
of nvgpu_channel_sync_incr*(). This API is called in the submit path to
prepare buffers and tracking resources, but later on in the submit path
errors can still occur so that the increment wouldn't happen (unless
artificially forced by sw).
The increment and irq registration cannot easily be undone and it makes
more sense to do these at the moment when the prepared job is finally
ready, so add a new nvgpu_channel_sync_mark_progress() API to be called
later in the submit path to signal that progress shall eventually happen
on the sync. Without this, the max value would stay too large after an
unsuccessful submit until the channel gets closed.
The sync object (syncpt or semaphore) is always exclusively owned by the
channel that allocated it, so nonatomically reading the max value first
in sync_incr() and incrementing it later in mark_progress() is racefree;
all submits per channel are serialized.
Change the channel syncpoint to client managed from host managed so that
nvhost-exported sync fences behave correctly with the temporary state
where the fence threshold is over the max value. Ideally we'd always
track nvgpu-owned syncpts' max values internally, but this is enough for
now.
Jira NVGPU-5491
Change-Id: Idf0bda7ac93d7f2f114cdeb497fe6b5369d21c95
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2340465
Reviewed-by: automaticguardword <automaticguardword@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Many tests used various incarnations of the mock register framework.
This was based on a dump of gv11b registers. Tests that greatly
benefitted from having generally sane register values all rely
heavily on this framework.
However, every test essentially did their own thing. This was not
efficient and has caused a some issues in cleaning up the device and
host code.
Therefore introduce a much leaner and simplified register framework.
All unit tests now automatically get a good subset of the gv11b
registers auto-populated. As part of this also populate the HAL with
a nvgpu_detect_chip() call. Many tests can now _probably_ have all
their HAL init (except dummy HAL stuff) deleted. But this does
require a few fixups here and there to set HALs to NULL where tests
expect HALs to be NULL by default.
Where necessary HALs are cleared with a memset to prevent unwanted
code from executing.
Overall, this imposes a far smaller burden on tests to initialize
their environments.
Something to consider for the future, though, is how to handle
supporting multiple chips in the unit test world.
JIRA NVGPU-5422
Change-Id: Icf1a63f728e9c5671ee0fdb726c235ffbd2843e2
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2335334
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Enable logging and error reporting for MIF, DLPL, and TLC blocks.
Configure the NVLIPT and IOCTRL interrupt registers to rollup
the MIF and TLC errors on the link-specific fatal line and the
DLPL interrupts on link-specific intr_a(fatal) line. Both
link_err_fatal and link_intr_a are rolled up to stall interrupt line.
In the handling ISR, clear the interrupt status registers and print
an error.
Move the interrupt handling HAL code to /common/hal.
JIRA NVGPU-4350
JIRA NVGPU-4351
JIRA NVGPU-5231
JIRA NVGPU-4354
JIRA NVGPU-4355
JIRA NVGPU-4356
Change-Id: I14812499caf506592f3ae84d6681d857730d31ff
Signed-off-by: Tejal Kudav <tkudav@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2313221
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
nvgpu_has_syncpoints is more general than a channel synchronization
related, so move it to nvhost.c from channel_sync.c. Move the
declaration from gk20a.h to nvhost.h.
As the debugfs knob is Linux related, move it from struct gk20a to
struct nvgpu_os_linux.
Jira NVGPU-4548
Change-Id: I4236086744993c3daac042f164de30939c01ee77
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2318814
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
qnx unit test access ucode from /proc/boot/gv11b. QNX Unit test face
issues like permission, platform dependency etc when test tries to
access ucode from /proc/boot. To fix issue updating qnx firmware unit
to read ucode from firmware/gv11b in case of unit test. Patch also
updates firmware access path for posix as well.
Jira NVGPU-3582
Bug 2693908
Change-Id: I1b28c8475b6bc4fe5ec3d6a525cb3af152feb887
Signed-off-by: Prateek sethi <prsethi@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2306278
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-by: Nicolas Benech <nbenech@nvidia.com>
Reviewed-by: Dinesh T <dt@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
This is fixing the following misra violation
MISRA 5.1 :
Declaration with identical names.
The first 31 characters of identifiers
"nvgpu_nvhost_syncpt_unit_interface_get_aperture" and
"nvgpu_nvhost_syncpt_unit_interface_get_byte_offset" are identical.
JIRA NVGPU-4811
Change-Id: Ib862c4acd53cf748b47c1edffa91b5f033c08953
Signed-off-by: Dinesh <dt@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2298136
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: Vaibhav Kachore <vkachore@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>