-Change cde_buf to use writecombined cpu mapping.
-Since reading writecombined cpu data is still slow, avoid reads in
gk20a_replace_data by checking whether a patch overwrites a whole
word.
-Remove unused distinction between src and dst buffers in cde_convert.
-Remove cde debug dump code as it causes a perf hit.
Bug 1546619
Change-Id: Ibd45d9c3a3dd3936184c2a2a0ba29e919569b328
Signed-off-by: Jussi Rasanen <jrasanen@nvidia.com>
Reviewed-on: http://git-master/r/553233
Reviewed-by: Arto Merilainen <amerilainen@nvidia.com>
Tested-by: Arto Merilainen <amerilainen@nvidia.com>