Add NVGPU_IOCTL_CHANNEL_PREEMPT_NEXT ioctl to check host and FECS status
and preempt pending load of context not belonging to the calling channel on
GR engine during context switch. This should be called after a submit with
NVGPU_SUBMIT_GPFIFO_FLAGS_RESCHEDULE_RUNLIST to decrease worst case submit
to start latency for high interleave channel.
There is less than 0.002% chance that the ioctl blocks up to couple
miliseconds due to race condition of FECS status changing while being read.
Also fix bug with host reschedule for multiple runlists which needs to write
both runlist registers.
Bug 1987640
Bug 1924808
Change-Id: I0b7e2f91bd18b0b20928e5a3311b9426b1bf1848
Signed-off-by: David Li <davli@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1549598
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Add these bits in the gpu characteristics flags:
NVGPU_GPU_FLAGS_SUPPORT_DETERMINISTIC_SUBMIT_NO_JOBTRACKING - fast
submits with no in-kernel job tracking are supported.
NVGPU_GPU_FLAGS_SUPPORT_DETERMINISTIC_SUBMIT_FULL - deterministic
submits also with job tracking and num_inflight_jobs set are supported.
Either of these may get disabled if the particular channel or submit
still requires features that block these.
Make gk20a_channel_sync_needs_sync_framework() take a gk20a pointer
instead of a channel pointer so that it can be called without a channel.
It does not need any per-channel data.
Bug 20029130
Bug 200274674
Change-Id: I5f82510b6d39b53bcf6f1006dd83bdd9053963a0
Signed-off-by: Konsta Holtta <kholtta@nvidia.com>
Reviewed-on: http://git-master/r/1456845
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
(cherry picked from commit ee9733e587 in
dev-kernel)
Reviewed-on: https://git-master.nvidia.com/r/1558993
GVS: Gerrit_Virtual_Submit
Reviewed-by: Bibek Basu <bbasu@nvidia.com>
NVGPU_SUBMIT_GPFIFO_FLAGS_RESCHEDULE_RUNLIST causes host to expire
current timeslice and reschedule from front of runlist.
This can be used with NVGPU_RUNLIST_INTERLEAVE_LEVEL_HIGH to make a
channel start sooner after submit rather than waiting for natural
timeslice expiration or block/finish of currently running channel.
Bug 1968813
Change-Id: I632e87c5f583a09ec8bf521dc73f595150abebb0
Signed-off-by: David Li <davli@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/1537218
GVS: Gerrit_Virtual_Submit
Reviewed-by: Aingara Paramakuru <aparamakuru@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
In order to perform timestamps correlation for FECS
traces, we need to collect GPU / GPU timestamps
samples. In virtualization case, it is possible for
a guest to get GPU timestamps by using read_ptimer.
However, if the CPU timestamp is read on guest side,
and the GPU timestamp is read on vm-server side,
then it introduces some latency that will create an
artificial offset for GPU timestamps (~2 us in
average). For better CPU / GPU timestamps correlation,
Added a command to collect all timestamps on vm-server
side.
Bug 1900475
Change-Id: Idfdc6ae4c16c501dc5e00053a5b75932c55148d6
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: http://git-master/r/1472447
Reviewed-by: svccoveritychecker <svccoveritychecker@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Aparna Das <aparnad@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Support for hwpm reservations in the virtual case:
- Add session ops for checking and setting global and context reservations, and
releasing reservations
- in the native case, these just update reservation counts and flags
- in the vgpu case, when the reservation count is 0, check with the RM server
that a reservation is possible: for global reservations, no other guest
can have a reservation; for context reservations, no other guest can have
a global reservation
- in the vgpu case, when the reservation count is decremented to 0, notify
the RM server that the guest no longer has any reservations
Bug 1775465
JIRA VFND-3428
Change-Id: Idf115b730e465e35d0745c96a8f8ab6b645c7cae
Signed-off-by: Peter Daifuku <pdaifuku@nvidia.com>
Reviewed-on: http://git-master/r/1326166
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Currently callbacks from the PM_QOS framework (for
thermal events), result in a RPC call to set GPU frequency.
Since the governor will now be responsible for setting desired
rate, the max PM_QOS callback will now cap the possible
GPU frequency w/ a new RPC call to the server. The server
is responsible for setting the ultimate frequency
based on the cap & desired rates.
Jira VFND-3699
Change-Id: I806e309c40abc2f1381b6a23f2d898cfe26f9794
Signed-off-by: Sachit Kadle <skadle@nvidia.com>
Reviewed-on: http://git-master/r/1295543
(cherry picked from commit e81693c6e087f8f10a985be83715042fc590d6db)
Reviewed-on: http://git-master/r/1282467
(cherry picked from commit 7b4e0db647572e82a8d53e823c36b465781f4942)
Reviewed-on: http://git-master/r/1321836
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Add devfreq governor support in order to allow frequency scaling
in virtualization config. GPU clock frequency operations are
re-directed to the server over RPC.
Bug 200237433
Change-Id: I1c8e565a4fff36d3456dc72ebb20795b7822650e
Signed-off-by: Sachit Kadle <skadle@nvidia.com>
Reviewed-on: http://git-master/r/1295542
(cherry picked from commit d5c956fc06697eda3829c67cb22987e538213b29)
Reviewed-on: http://git-master/r/1280968
(cherry picked from commit 25e2b3cf7cb5559a6849c0024d42c157564a9be2)
Reviewed-on: http://git-master/r/1321835
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Query the RM server to retrieve gpu preemption policy
of guest based on pct configuration. If guest is not
allowed to request wfi preemption mode then set context
with either gfxp or cta preemption mode only.
Jira VFND-3079
Jira VFND-3081
Change-Id: I60cbf121d6f0e2373568cf40b3dfdb4df76fe02d
Signed-off-by: Aparna Das <aparnad@nvidia.com>
Reviewed-on: http://git-master/r/1280903
(cherry picked from commit 16ee09bb59)
Reviewed-on: http://git-master/r/1321661
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Add support for clearing single SM error state for CUDA debugger.
In addition to clearing local copy of SM error state,
vgpu_gr_clear_sm_error_state now sends a command to RM server
(TEGRA_VGPU_CMD_CLEAR_SM_ERROR_STATE), to clear global ESR and
warp ESR.
Bug 1791111
Change-Id: I3a1f0644787fd900ec59a0e7974037d46a603487
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: http://git-master/r/1296311
(cherry picked from commit fd07e03c3d086f396e4d65575c576a4dd68c920a)
Reviewed-on: http://git-master/r/1299060
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: svccoveritychecker <svccoveritychecker@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Cory Perry <cperry@nvidia.com>
Tested-by: Cory Perry <cperry@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Add ability to suspend/resume contexts for a debug session
(NVGPU_DBG_GPU_IOCTL_SUSPEND_RESUME_CONTEXTS), in virtualized
case:
- added hal function to resume contexts.
- added vgpu support for suspend contexts, i.e. build a list
of channel ids, and send TEGRA_VGPU_CMD_SUSPEND_CONTEXTS
- added vgpu support for resume contexts, i.e. build a list
of channel ids, and send TEGRA_VGPU_CMD_RESUME_CONTEXTS
Bug 1791111
Change-Id: Icc1c00d94a94dab6384ac263fb811c00fa4b07bf
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: http://git-master/r/1294761
(cherry picked from commit d17a38eda312ffa92ce92e5bafc30727a8b76c4e)
Reviewed-on: http://git-master/r/1299059
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: svccoveritychecker <svccoveritychecker@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Cory Perry <cperry@nvidia.com>
Tested-by: Cory Perry <cperry@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Remove nvgpu_gpuid_t18x.h since this file is now visible. Migrate
the relevant definitions and defines into their expected places and
make the code use the real defines. No longer is hiding t18x specific
stuff necessary.
Bug 1799159
Change-Id: I47fa2392e46fdb7aacc70aeb0cc8c3f5ca0dc22f
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: http://git-master/r/1300976
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Remove MCLK and GPCCLK domain aliases, now that userspace
has swithed to new enumerations.
Jira DNVGPU-211
Change-Id: I2af2fd67dbed47088d7161ba0605e13dd7c674a5
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: http://git-master/r/1292609
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Richard Zhao <rizhao@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: svccoveritychecker <svccoveritychecker@nvidia.com>
This ioctl can be used on gp10b to set a flag in the context header
indicating this context should be run at elevated clock
frequency. FECS ctxsw ucode will read this flag as part of the context
switch and will request higher GPU clock frequencies from BPMP for the
duration of the context execution.
Bug 1819874
Change-Id: I84bf580923d95585095716d49cea24e58c9440ed
Signed-off-by: Peter Boonstoppel <pboonstoppel@nvidia.com>
Reviewed-on: http://git-master/r/1292746
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Guest doesn't explicitly send command to the RM server
to invalidate tlb which is done implicitly when mapping
or unmapping buffer. Remove support for this call.
Bug 1665111
Change-Id: Icf2edae7feffa35b1dbf87c227b3e98b506e6519
Signed-off-by: Aparna Das <aparnad@nvidia.com>
Reviewed-on: http://git-master/r/1287728
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GPC2CLK has been replaced with GPCCLK on user API.
Remove related definition from kernel API.
GPCLCK and MCLK are currently assigned EQU values in kernel API.
We want to move to a simple enumeration as used in nvrm_gpu.
During the transition, an alias value will be defined for each
clock, and kernel will accept both.
Jira DNVGPU-210
Jira DNVGPU-211
Change-Id: I944fe78be9f810279f7a69964be7cda9b9c8d40d
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: http://git-master/r/1292593
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Fields added to nvgpu_gpu_characteristics must be before the
/* Notes: .. */ section. Otherwise, there is a possibility that
cherry-picks for new fields actually go before "Notes" and that
breaks binary compatibility.
Jira DNVGPU-186
Change-Id: Idcd5100be357c187e7194d4c9577f85e12541053
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: http://git-master/r/1284324
Reviewed-by: Sami Kiminki <skiminki@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Take interrupts as one kind of event message, and make it
easier to add new kind of events.
JIRA VFND-3291
Bug 200257899
Change-Id: I83482293230c0aa10b05caf61e249a042bf6653c
Signed-off-by: Richard Zhao <rizhao@nvidia.com>
Reviewed-on: http://git-master/r/1278396
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Aingara Paramakuru <aparamakuru@nvidia.com>
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Internally we use GPC2CLK in the arbiter, but we should expose
GPCCLK on kernel API and in user space. Added GPCCLK on the ioctl
API. Arbiter uses GPC2CLK to make queries, then converts to GPCCLK.
Jira DNVGPU-210
Change-Id: Id0b8134d0505c1f9bfd655a08e902bdcd03ebd96
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: http://git-master/r/1280316
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Separate out new API gr_gp10b_set_ctxsw_preemption_mode()
which will check requested preemption modes and take appropriate
action for each preemption mode
This API will also do some sanity checking for valid
preemption modes and combinations
Define API set_preemption_mode() for gp10b which will set the
preemption modes passed as argument and then use
gr_gp10b_set_ctxsw_preemption_mode() and
update_ctxsw_preemption_mode() to update preemption mode
Legacy path from gr_gp10b_alloc_gr_ctx() will convert
flags NVGPU_ALLOC_OBJ_FLAGS_* into appropriate preemption modes
and then call gr_gp10b_set_ctxsw_preemption_mode()
New API set_preemption_mode() will use new flags
NVGPU_GRAPHICS/COMPUTE_PREEMPTION_MODE_* and set and update
ctxsw preemption mode
In gr_gp10b_update_ctxsw_preemption_mode(), update graphics
context to set CTA premption mode if mode
NVGPU_COMPUTE_PREEMPTION_MODE_CTA is set
Also, define preemption modes in nvgpu-t18x.h
and use them everywhere
Remove old definitions of modes from gr_gp10b.h
Bug 1646259
Change-Id: Ib4dc1fb9933b15d32f0122a9e52665b69402df18
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1131806
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
include/linux/tegra_vgpu_t18x.h was missed while spliting the nvgpu driver
off. This patch imports it into the nvgpu repo.
bug 200187033
Signed-off-by: Alex Van Brunt <avanbrunt@nvidia.com>
Change-Id: Ia46384241bd0e24a8a560c3b13b5fd3523c9cc68
Reviewed-on: http://git-master/r/1119779
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
GVS: Gerrit_Virtual_Submit
This change adds gp10b to the nvgpu build as
well as enabling CMA for buffer allocation.
Change-Id: Id3d45ad6ffdab14120395952e68b285dd7364c76
Signed-off-by: Ken Adams <kadams@nvidia.com>
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/553324
GVS: Gerrit_Virtual_Submit
Process long regops lists in 4-kB fragments, overcoming the overly
low limit of 128 reg ops per IOCTL call. Bump the list limit to 1024
and report the limit in GPU characteristics.
Bug 200248726
Change-Id: I3ad49139409f32aea8b1226d6562e88edccc8053
Signed-off-by: Sami Kiminki <skiminki@nvidia.com>
Reviewed-on: http://git-master/r/1253716
(cherry picked from commit 22314619b28f52610cb8769cd4c3f9eb01904eab)
Reviewed-on: http://git-master/r/1266652
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Add HAL interface for TSG open, which is intended to
be called from the exisiting gk20a_tsg_open function.
The tsg_open entryoint is only implemented for vgpu,
as the server needs to clear metadata when a tsg is opened.
Bug 200215060
Change-Id: Icc8fd602f31e52d9fa9b2e7786b665b9e7b9294e
Signed-off-by: Sachit Kadle <skadle@nvidia.com>
Reviewed-on: http://git-master/r/1249218
(cherry picked from commit 35c86f7c796c6574d3dc336e20012ea5c16d7cb4)
Reviewed-on: http://git-master/r/1256468
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Update vsms_mapping ioctl to copy from the internal sm_to_cluster array to
new nvgpu_gpu_vsms_mapping_entry array before copying the latter back to user.
Bug 200260086
Change-Id: I0fccc6fb6e0d6b6f737b3a44818d2b47438cd3c8
Signed-off-by: Peter Daifuku <pdaifuku@nvidia.com>
Reviewed-on: http://git-master/r/1266174
(cherry picked from commit e28882c05491cb8f9573ff71c2d7309e5714e385)
Reviewed-on: http://git-master/r/1269623
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
use CPU timestamp in nanoseconds
define last event/alarm number
Jira DNVGPU-186
Change-Id: I769c8a7a41ac1fb49234f0d5144a78fa657ec230
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: http://git-master/r/1260799
(cherry picked from commit 379171b43cb20d7a31b3966cad3696525e8cf7d9)
Reviewed-on: http://git-master/r/1267159
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Install one completion fd per SET request.
Notifications on dedicated event fd.
Changed frequencies unit to Hz from MHz.
Remove sequence numbers from dummy arbiter.
Added effective clock type (query frequency from counters).
Jira DNVGPU-125
Change-Id: Ica364eccdf85b188fd208f770e4eae0e9f0379e9
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: http://git-master/r/1230224
(cherry picked from commit f9b06686c090c676e60e1e137fdc9bbfc76d4843)
Reviewed-on: http://git-master/r/1243109
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Added NVGPU_GPU_CLK_FLAG_SPECIFIC_DOMAINS to indicate that a
request (get clock info/range) applies only to domains specified
in clock entries. If flag is not set, request returns all clock
domains.
Jira DNVGPU-125
Change-Id: I11bffbdf491ebffa7f47bd327037b0b8cfcbde31
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: http://git-master/r/1227998
(cherry picked from commit 7613dd30e120a82d342da402b4e0b070512dddad)
Reviewed-on: http://git-master/r/1243108
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Add ioctls for clock range and VF points query.
Add ioctls to set target mhz, and get actual mhz.
Jira DNVGPU-125
Change-Id: I7639789bb15eabd8c98adc468201dba3a6e19ade
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: http://git-master/r/1223473
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
(cherry picked from commit 5e635ae34221c99a739321bcfc1418db56c1051d)
Reviewed-on: http://git-master/r/1243107
Reviewed-by: Automatic_Commit_Validation_User
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>
Add IOCTL API NVGPU_DBG_GPU_IOCTL_ACCESS_FB_MEMORY
to read/write fb/vidmem memory
Interface will accept dmabuf_fd of the buffer in vidmem,
offset into the buffer to access, temporary buffer
to copy data across API, size of read/write and
command indicating either read or write operation
API will first parse all the inputs, and then call
gk20a_vidbuf_access_memory() to complete fb access
gk20a_vidbuf_access_memory() will then just use
gk20a_mem_rd_n() or gk20a_mem_wr_n() depending
on the command issued
Bug 1804714
Jira DNVGPU-192
Change-Id: Iba3c42410abe12c2884d3b603fa33d27782e4c56
Signed-off-by: Deepak Nibade <dnibade@nvidia.com>
Reviewed-on: http://git-master/r/1255556
(cherry picked from commit 2c49a8a79d93fc526adbf6f808484fa9a3fa2498)
Reviewed-on: http://git-master/r/1260471
GVS: Gerrit_Virtual_Submit
Reviewed-by: Bharat Nihalani <bnihalani@nvidia.com>
Fix small problems related to signed versus unsigned comparisons
throughout the driver. Bump up the warning level to prevent such
problems from occuring in future.
Change-Id: I8ff5efb419f664e8a2aedadd6515ae4d18502ae0
Signed-off-by: Terje Bergstrom <tbergstrom@nvidia.com>
Reviewed-on: http://git-master/r/1252068
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
This reverts commit 5f1c2bc27f.
Added back now that matching RM server has been updated:
In hypervisor mode, all GPU VA allocations must be done by client;
fix this for the allocation of the hwpm ctxt buffer
Bug 200231611
Change-Id: Ie5ce2c2562401b1f00821231d37608e3fc30d4a4
Signed-off-by: Peter Daifuku <pdaifuku@nvidia.com>
Reviewed-on: http://git-master/r/1252138
GVS: Gerrit_Virtual_Submit
Reviewed-by: Terje Bergstrom <tbergstrom@nvidia.com>