I have executed dashboard.sh to test models cost on tesla a100, but I find that most of time part is unprofiled.

Is this result right? I use profiler to check runtime of different kernel, and cpu time percent sum of backward in Transformer is only about 8%, 89% for aten operations, these percents results seems to be dismatched with the figure above.
I have executed dashboard.sh to test models cost on tesla a100, but I find that most of time part is unprofiled.

Is this result right? I use profiler to check runtime of different kernel, and cpu time percent sum of backward in Transformer is only about 8%, 89% for aten operations, these percents results seems to be dismatched with the figure above.