Following changes are added to fix the issue.
1) Threads having higher priority e.g. RT may preempt
threads with sched-normal priority. As a consequence, higher priority
threads might not still see initialization of data in another thread
resulting in failures such as accessing a condition value before initialization.
Any initialization in the parent thread must be accompanied by a barrier
to make it visible in other thread. Added appropriate barriers to prevent
reordering of the initialization in the thread construction path.
2) There is a race condition between nvgpu_cond_signal() and
nvgpu_cond_destroy() in the asynchronous submit code and corresponding
worker thread's process_item callback for NVS. This may lead to
data corruption and resulting in the above errors as well. Fixed
that by adding a refcount based mechanism for ownership sharing
of the struct nvgpu_nvs_worker_item between the two threads.
Bug 3778235
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Change-Id: Ie9b9ba57bc1dcbb8780801be79863adc39690f72
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2771535
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Prateek Sethi <prsethi@nvidia.com>
Reviewed-by: Ketan Patil <ketanp@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
In linux threaded interrupts run with a Realtime priority
of 50. This bumps up the priority of bottom-half handlers
over regular kernel/User threads even during process
context.
In the current implementation scheduler thread still
runs in normal kernel thread priority. In order to
allow a seamless scheduling experience, the worker
thread is now created with a Realtime priority of 1.
This allows for the Worker thread to work at a priority
lower than interrupt handlers but higher than the regular
kernel threads.
Linux kernel allows setting priority with the help of
sched_set_fifo() API. Only two modes are supported
i.e. sched_set_fifo() and sched_set_fifo_low().
For more reference, refer to this article
https://lwn.net/Articles/818388/.
Added an implementation of nvgpu_thread_create_priority()
for linux thread using the above two APIs.
Jira NVGPU-860
Signed-off-by: Debarshi Dutta <ddutta@nvidia.com>
Change-Id: I0a5a611bf0e0a5b9bb51354c6ff0a99e42e76e2f
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2751736
Reviewed-by: Prateek Sethi <prsethi@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
GVS: Gerrit_Virtual_Submit <buildbot_gerritrpt@nvidia.com>
nvs worker thread is created on each resume and deinitialized on every
suspend. nvgpu can be resumed when process is getting killed. Thread
creation can fail when the process is getting killed. That will lead
to driver resume failure.
To avoid the issue above, don't stop the nvs worker thread in suspend
and let the first created thread handle the nvs work always.
Deinitialize the nvs worker thread during nvgpu unload.
Also, log the error returned by nvgpu_thread_create in the function
nvgpu_worker_start.
bug 3480192
Change-Id: I8d5d9e7716a950b162cc3c2d9fcfde07c4edfcf6
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2646218
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
As the docs of nvgpu_thread say, each thread (which the worker loop is)
should wake up and check also nvgpu_thread_should_stop() to manage
graceful and quick exit as requested. The loop does have that check
already, but the workqueue condition does not, so the cond wait might
end up waiting until its timeout hits.
It's not robust to trust the worker users to have a swift timeout for
exiting the thread, so read the should-stop flag in the wakeup condition
too.
Simplify the clk arb worker ops now that calling
nvgpu_worker_should_stop from there is no longer necessary. (Other
worker users did not have those, so they were technically buggy.)
Change-Id: I5409b8037564d4b6445a15cdbd4f1f3d616c4083
Signed-off-by: Konsta Hölttä <kholtta@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2635808
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: svcacv <svcacv@nvidia.com>
Reviewed-by: Alex Waterman <alexw@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
Make the nvgpu_init_mutex function return void.
In linux case, this doesn't affect anything since mutex_init
returns void.
For posix, we assert() and die if pthread_mutex_init fails.
This alleviates the need to error inject for _every_
nvgpu_mutex_init function in the driver.
Jira NVGPU-3476
Change-Id: Ibc801116dc82cdfcedcba2c352785f2640b7d54f
Signed-off-by: Thomas Fleury <tfleury@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2130538
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
The utils unit contains utilities that are useful to everyone. Things
like rbtree, enabled, string, etc go here. This helps prevent clutter
in the top level common directory. Also by organizing source code into
these top level units we reduce our SWUD burden: all utility code may
be described by one SWUD instead of many tiny SWUDs.
JIRA NVGPU-3544
Change-Id: Idc6169f375ba87b8a5d325712bf09aee8f27fb96
Signed-off-by: Alex Waterman <alexw@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2127479
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>