- When DISALLOW cmd is sent from driver to PMU the actual
completion of the disallow will be acknowledged by PMU
via a PG EVENT: ASYNC_CMD_RESP.
- Disallow needs a delayed ACK from PMU in order to disable
the ELPG.
- If ELPG is already engaged, the DISALLOW cmd will trigger
ELPG exit and then transition to PMU_PG_STATE_DISALLOW.
- After this whole process is completed, PMU will send
DISALLOW_ACK through ASYNC_CMD_RESP msg.
- After disallow command is sent from the driver, NvGPU driver
waits/polls for disallow command ack. This is sent immediately
by msg framework of PMU.
- Then, the driver will poll/wait for ASYNC_CMD_RESP event which
is the delayed DISALLOW ACK.
- The driver captures the ASYNC_CMD_RESP sent from PMU.
- set disallow_state to ELPG_OFF.
- If the driver does not wait/poll for this delayed disallow
ack from PMU, it can result in erros as PMU is still
processing DISALLOW cmd but the driver progressed further.
Bug 3580271
Change-Id: I332180c05b6a398107f065d54e9718b7038fb1b2
Signed-off-by: Divya <dsinghatwari@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2689500
Reviewed-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
GVS: Gerrit_Virtual_Submit
Move ACR WPR init region cmd from ISR to LSFM as part of LSF bootstrap
request to execute the ACR commands sequentially as well as a blocking
call by polling is_wpr_init_done status till set to true. Needed to
add dealy after each ACR command for ga10b LSPMU due to nvriscv priv
lockdown for ACR commands asynchronously from the nvgpu as detailed
below,
LSPMU engages priv lockdown whenever ACR commands needs to be
processed, and nvgpu polls for interrupt status by polling
pwr_falcon_irqstat_r registers once command is sent to PMU to
process the ACK message from LSPMU if priv lockdown is not
engaged. During NVRISCV priv lockdown couple of register are
not accessible including irqstat register, priv lockdown is
done by LSPMU upon ACR command receive and its asynchronous
to nvgpu which cause nvgpu irqstat read data to be 0xbadf*
during polling at corner cases even though priv lockdown
check is present and interpreting wrongly the irq stat
register.
Add delay of 5ms after ACR command sent to LSPMU(LSPMU takes
~3.5msec to complete the command process) and before polling
the irqstat register in nvgpu to engage priv lockdown in LSPMU.
This additional delay will help to skip reading the irqstat at
corner case during the priv lockdown process.
Bug 3464141
Bug 3482947
Change-Id: I494493a92f6ede5dcb876aeb0d76d54969f0f59e
Signed-off-by: mkumbar <mkumbar@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2673246
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
GVS: Gerrit_Virtual_Submit
- When DISALLOW cmd is sent from driver to PMU the actual
completion of the disallow will be acknowledged by PMU
via a new RPC: ASYNC_CMD_RESP.
- Disallow needs a delayed ACK from PMU in order to disable
the ELPG.
- If ELPG is already engaged, the DISALLOW cmd will trigger
ELPG exit and then transition to PMU_PG_STATE_DISALLOW.
- After this whole process is completed, PMU will send
DISALLOW_ACK through ASYNC_CMD_RESP RPC.
- After disallow command is sent from the driver, NvGPU driver
waits/polls for disallow command ack. This is sent immediately
by RPC framework of PMU.
- Then, the driver will poll/wait for ASYNC_CMD_RESP event which
is the delayed DISALLOW ACK.
- The driver captures the ASYNC_CMD_RESP RPC sent from PMU.
- set disallow_state to ELPG_OFF.
- If the driver does not wait/poll for this delayed disallow
ack from PMU, it can result in pmu halt issues as PMU is still
processing DISALLOW cmd but the driver progressed further which
can result in errors.
Bug 3430273
Bug 3439350
Change-Id: If2acf8391d18cd3c6b8b07e3bf6577667ec99eea
Signed-off-by: Divya <dsinghatwari@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2631214
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
Reviewed-by: svc-mobile-cert <svc-mobile-cert@nvidia.com>
Reviewed-by: Seema Khowala <seemaj@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
GVS: Gerrit_Virtual_Submit
-Update SSMD array size to hold all supported super-surface
members
-Handle the error and report if invalid SSMD ID is found.
issue: At present SSMD array size set to 32 but overall
33 super-surface members are supported, when 33rd member
accessed system crash happened due to overflow access,
so fixing it by setting the SSMD array size to actual
number of super-surface members supported
Bug 200721968
Bug 200721966
Change-Id: I5ba1084a661d7497056f13a053d2fc79d50f595c
Signed-off-by: mkumbar <mkumbar@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2528569
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Created Perfmon events handling for nvgpu-next.
Nvgpu-next pmu send perfmon events in the form of
rpc events. Events are:
- Change event: This gives information of whether
it is increase/decrease event.
- Init event: This gives information of perfmon init
done in PMU.
NVGPU-5202
NVGPU-5205
NVGPU-5206
Signed-off-by: rmylavarapu <rmylavarapu@nvidia.com>
Change-Id: Ida7e77dbaf70d2b594a0801c91a168dcb4a860bd
Reviewed-on: https://git-master.nvidia.com/r/c/linux-nvgpu/+/2395358
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Currently for every +/-5 degC change in
temperature, PMU internally evaluate VF curve on
temp change and will send VFE callback to NVGPU for
initiating change seq to program voltage and frequency.
This is the only callback we receive on temp change
which we handle in perf unit, and we don't have any
other temp events raised by PMU.
So, deleting the therm event handler.
NVGPU-4360
Change-Id: I3c7279dcf691135c178b6a05766403a935bc7e73
Signed-off-by: rmylavarapu <rmylavarapu@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2241488
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
-dGPU PMU init message interface updated to support RPC style init
PMU init message changed to RPC event & made needed changes to
handle RPC event during init stage
-Added new RPC header PMU_RM_RPC_HEADER, header from PMU to NvGPU
which will be part of RPC events received from PMU.
-GID info moved to super-surface for dGPU, so removed GID info
fetch from DMEM for dGPU & kept support for iGPU only.
-PMU_UNIT_INIT value for dGPU init changed
JIRA NVGPU-3723
Change-Id: I016bd1150494007a56905db23b4769e693ecd5da
Signed-off-by: Mahantesh Kumbar <mkumbar@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2153141
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
-Message header is added as part of FB message queue to have
sequence number & checksum to perform sanity check on
received message.
-Made required changes in structs to read message correctly
from data member offset but skipped to handle sanity checks
in code as NvGPU not needed for current supported messages .
-Added support to handle cmd/msg queue element changes.
JIRA NVGPU-3724
Change-Id: I85dccfab8902cbf71752582666931f482c3ec408
Signed-off-by: Mahantesh Kumbar <mkumbar@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2155165
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: svc-mobile-misra <svc-mobile-misra@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: Automatic_Commit_Validation_User
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Allocate space at runtime for PMU RTOS fw struct, this helps
to reduce the size of nvgpu_pmu struct when LS_PMU support
is not required.
Allocation happens at pmu early init stage & will deinit at
remove_support stage.
JIRA NVGPU-1972
Change-Id: I1452b085f8d3a76e12186f788c2d999a8b4b202d
Signed-off-by: Mahantesh Kumbar <mkumbar@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2111072
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Fix following misra violations in pmu ipc units:
1. Rule 10.4: msg->msg.init.msg_type was being set value from enum.
converted corresponding value PMU_INIT_MSG_TYPE_PMU_INIT to u8.
Other conversions from signed to unsigned. Conversion of the
enum PMU_RC_MSG_TYPE_UNHANDLED_CMD to unsigned value.
2. Rule 10.6: casted msg->hdr.size to U32 wherever required.
3. Rule 10.7: same as above.
4. Rule 13.5: nvgpu_timeout_expired() has side-effects of updating
the timer counts. Using it as first operand of && in if clause.
5. Rule 16.4: added non-empty default clause to switch.
6. Rule 17.7: return value of nvgpu_pmu_vidmem_surface_alloc,
nvgpu_falcon_copy_to_dmem, nvgpu_pmu_lsfm_int_wpr_region,
nvgpu_timeout_init, pmu_init_perfmon, pmu_handle_event,
pmu_response_handle and memset is handled.
7. Rule 2.2: removed unnecessary initialization of local variable.
JIRA NVGPU-3273
Change-Id: Ie5a53bcdf0d138cb02867a09dc42195449e146a0
Signed-off-by: Sagar Kamble <skamble@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2112619
Reviewed-by: svc-mobile-coverity <svc-mobile-coverity@nvidia.com>
Reviewed-by: Mahantesh Kumbar <mkumbar@nvidia.com>
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Some functions are not accessing hardware directly
but are being called using HAL ops: For example
.pmu_init_perfmon = nvgpu_pmu_init_perfmon_rpc,
.pmu_perfmon_start_sampling = nvgpu_pmu_perfmon_start_sampling_rpc,
.pmu_perfmon_stop_sampling = nvgpu_pmu_perfmon_stop_sampling_rpc,
.pmu_perfmon_get_samples_rpc = nvgpu_pmu_perfmon_get_samples_rpc,
These were being called by:
g->ops.pmu.pmu_init_perfmon,
g->ops.pmu.pmu_perfmon_start_sampling,
g->ops.pmu.pmu_perfmon_stop_sampling,
g->ops.pmu.pmu_perfmon_get_samples_rpc
Change the function access by using sw ops, like:
Create new functions:
int nvgpu_pmu_perfmon_init(struct gk20a *g,
struct nvgpu_pmu *pmu, struct nvgpu_pmu_perfmon *perfmon);
int nvgpu_pmu_start_sampling_perfmon(struct gk20a *g,
struct nvgpu_pmu *pmu, struct nvgpu_pmu_perfmon *perfmon);
int nvgpu_pmu_stop_sampling_perfmon(struct gk20a *g,
struct nvgpu_pmu *pmu, struct nvgpu_pmu_perfmon *perfmon);
int nvgpu_pmu_get_samples_rpc_perfmon(struct gk20a *g,
struct nvgpu_pmu *pmu, struct nvgpu_pmu_perfmon *perfmon);
and based on hardware chip call the chip specific
perfmon sw init function: nvgpu_gv11b_perfmon_sw_init() and
nvgpu_gv100_perfmon_sw_init() and assign the sw ops for perfmon
JIRA NVGPU-3210
Change-Id: I2470863f87a7969e3c0454fa48761499b08d445c
Signed-off-by: Divya Singhatwaria <dsinghatwari@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2109899
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Allocate space at runtime for PMU sequences, this helps to reduce the size
of nvgpu_pmu struct when LS_PMU support is not required.
Allocation happens at pmu early init stage & will deinit at remove_support
stage.
And also removed some unused seq functions as part of CL
JIRA NVGPU-1972
Change-Id: Ib1ba983b476ddf937b08ef96e130ece2645b314c
Signed-off-by: Mahantesh Kumbar <mkumbar@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2110104
GVS: Gerrit_Virtual_Submit
Reviewed-by: Vijayakumar Subbu <vsubbu@nvidia.com>
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Created PMU fw unit to hold PMU RTOS f/w specific ops, images,
flags & command arguments needed for PMU RTOS ucode support.
Moved PMU fw ops from gk20a.gpu_ops to pmu.fw.ops as these ops
are needed to support different version of PMU fw version for
different chips
JIRA NVGPU-1955
Change-Id: I51385d8c20524431f07cba3378676464663deb20
Signed-off-by: Mahantesh Kumbar <mkumbar@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2090769
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
- Move the perfmon unit source code to common/pmu/perfmon/ folder
- Separate perfmon unit headers under include/nvgpu/pmu/pmu_perfmon.h
- Make a new structure: nvgpu_pmu_perfmon for perfmon unit
- This new struct combines all perfmon unit variables like
perfmon_query, perfmon_ready etc. into one
structure as a part of perfmon unit refactoring.
- Use pmu_perfmon struct to access all perfmon variables.
- Eg: pmu->pmu_perfmon->perfmon_query, pmu->pmu_perfmon->perfmon_ready
and so on.
JIRA NVGPU-1961
Change-Id: I57516c646bfb256004dd7b719e40fafd3c2a09b2
Signed-off-by: Divya Singhatwaria <dsinghatwari@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2080555
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
Created PMU super surface unit & moved structs/functions related to
super surface under a unit, separated super surface structs into
private/public based on its usage/access, made changes to supper
surface dependent files to reflect supper surface changes
respective to unit.
JIRA NVGPU-3045
Change-Id: I6ac426052eb60f00b432d9533460aa0afd939fe3
Signed-off-by: Mahantesh Kumbar <mkumbar@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2088405
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
lsfm-LS falcon manager
Created lsfm unit under common/pmu/lsfm, moved functions &
variables related to lsfm functionality under lsfm unit,
within lsfm unit created separate files based on init which
does chip specific s/w init, separated private/public
functionality.
JIRA NVGPU-3021
Change-Id: Iad4a4e5533122fb2387a4980581a0d7bcdb37d67
Signed-off-by: Mahantesh Kumbar <mkumbar@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2080546
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>
This patch fixes below two issues.
1.Currently clk arb exit is called after GPU registers are released.
This causes crash when clk arb WQ accesses GPU HW register for status.
The ideal way is to exit the clk_arb which removes the WQ from running
before calling lockout register.
2.Check if dGPU is dying during processing of PMU Commands.
This prevents race condition when PMU is waiting for response and device
is shutdown.
Bug 200488054
Change-Id: I812b07af7db4494d5ea2ed6197742ceb23d30a4b
Signed-off-by: Abdul Salam <absalam@nvidia.com>
Reviewed-on: https://git-master.nvidia.com/r/2081916
Reviewed-by: mobile promotions <svcmobile_promotions@nvidia.com>
Tested-by: mobile promotions <svcmobile_promotions@nvidia.com>