Windows Performance Investigation – PerfMon Log Collection and Analysis

Sometimes I say performance issues are“State of Mind” when there isn’t really a performance issue but customers or users says there performance issue with my server. BUT sometimes they are real too. To prove this we need to look at the right logs.

There are many instances where we need to investigate performance issues with servers running Windows Operating Systems whether in cloud or on-premises environment and we required to gather real time performance logs to understand the root cause of the performance issue. In general performance issues with machines running Windows OS are quite tricky to root cause and it becomes even more complex when its intermittent and we may have to gather multiple set of logs to root cause the issue.

There are few native tools like Windows Performance Recorder (earlier xPerf) and Windows Performance Monitor provided by Microsoft to collect and analyze Windows OS performance logs in different scenarios. In this blog post, my objective is to show you how to collect performance logs using Windows Performance Monitor tool using command line in a very simple way.

Windows Performance Monitor (PerfMon) tool can be used to diagnose common Windows server performance problems like:
–High Memory Utilization
–High CPU Utilization
–Poor Disk Performance

Steps to collect Windows Performance Logs:
1. Create PerfMon Counter Data Collector using below command, we are running two command for same counters to collect performance log samples at different intervals. First one collects samples at every 2 minutes and second one at 1 seconds interval respectively.

Logman.exe create counter PerfLog-Long -o “c:\perflogs\PerfLog-Short.blg” -f bincirc -v mmddhhmm -max 1024 -c “\LogicalDisk()*” “\Memory*” “.NET CLR Memory()*” “\Cache*” “\Network Interface()*” “\Paging File()*” “\PhysicalDisk()*” “\Processor()*” “\Processor Information()*” “\Process()*” “\Thread()*” “\Redirector*” “\Server*” “\System*” “\Server Work Queues()*” “\Terminal Services*” -si 00:02:00


Logman.exe create counter PerfLog-Short -o “c:\perflogs\PerfLog-Long.blg” -f bincirc -v mmddhhmm -max 1024 -c “\LogicalDisk(*)\*” “\Memory\*” “\.NET CLR Memory(*)\*” “\Cache\*” “\Network Interface(*)\*” “\Paging File(*)\*” “\PhysicalDisk(*)\*” “\Processor(*)\*” “\Processor Information(*)\*” “\Process(*)\*” “\Thread(*)\*” “\Redirector\*” “\Server\*” “\System\*” “\Server Work Queues(*)\*” “\Terminal Services\*” -si 00:00:01



2. Now run below command to start both of the above data collectors and wait for the performance issue to happen.

Logman.exe start PerfLog-Long
Logman.exe start PerfLog-Short




3. If you are not sure about the specific time when issue would occur, leave the log collection running and wait for the issue to happen. Once the issue occurs you can stop logs after few mins so that PerfMon log file will not overwrite. Please note the timestamp when the issue did occur as it will be helpful during log analysis.

Logman.exe stop PerfLog-Long
Logman.exe stop PerfLog-Short




4. Performance Log files will be available under “c:\perflogs” on the machine, once you have the logs collected you can start analyzing it using Windows Performance Monitor tool.



Note: It would only be useful if you collect the performance data during the time your Windows server is facing the issue, otherwise the collected data might not be relevant (e.g. if you collect the data while your server is working fine).

How to collect PerfMon logs remotely:
There might be certain scenarios where it becomes difficult to collect PerfMon logs from problematic machine locally, in such scenarios we can capture PerfMon logs remotely as well.

1. First replace domain\username with the domain admin user and servername with the name of the problematic server. Then paste the modified command in the command window of a remote machine.

Logman.exe create counter PerfLog-Long -u DOMAIN\USERNAME * -f bincirc -v mmddhhmm -max 2048 -c “\Servername\LogicalDisk()*” “\Servername\Memory*” “\Servername.NET CLR Memory()*” “\Servername\Cache*” “\Servername\Network Interface()*” “\Servername\Paging File()*” “\Servername\PhysicalDisk()*” “\Servername\Processor()*” “\Servername\Processor Information()*” “\Servername\Process()*” “\Servername\Thread()*” “\Servername\Redirector*” “\Servername\Server*” “\Servername\System*” “\Servername\Server Work Queues()*” “\Servername\Terminal Services*” -si 00:02:00

Logman.exe create counter PerfLog-Short -u DOMAIN\USERNAME * -f bincirc -v mmddhhmm -max 2048 -c “\Servername\LogicalDisk()*” “\Servername\Memory*” “\Servername.NET CLR Memory()*” “\Servername\Cache*” “\Servername\Network Interface()*” “\Servername\Paging File()*” “\Servername\PhysicalDisk()*” “\Servername\Processor()*” “\Servername\Processor Information()*” “\Servername\Process()*” “\Servername\Thread()*” “\Servername\Redirector*” “\Servername\Server*” “\Servername\System*” “\Servername\Server Work Queues()*” “\Servername\Terminal Services*” -si 00:00:01

2. Run below command to start the log collection:
Logman.exe start PerfLog-Long
Logman.exe start PerfLog-Short 

3. Run below command to stop the log collection:
Logman.exe stop PerfLog-Long
Logman.exe stop PerfLog-Short 

Analyzing Windows PerfMon Logs:

Scenario: High Resource Utilization.

Let’s have a look at the logs that we collected above and see what’s going on, I have extracted PerfMon logs and below is the summary of the collected logs from the one of the problematic machine.

Performance Monitor Log Summary:

Log Filename : PERFLOG-SHORT_04051442.BLG
Server Name : TESTSERVER
Start Date & Time : 04/05/2018 14:42:13 (Uptime was: 5d 22h 10m 15s)
End Date & Time : 04/05/2018 15:22:44 (Uptime was: 5d 22h 50m 46s)
Log Duration : 0d 0h 40m 31s
Total Samples : 2432
Sample Interval : 0d 0h 0m 1s

Memory                                  Minimum                Maximum          Average

================================================================

Available Bytes                    :   18,404MB |          19,955MB |     19,661MB

Pool Paged Bytes                :    1,240MB |             1,284MB |       1,264MB

Pool NonPaged Bytes        :      602MB |               651MB |        607MB

Free System PTEs               : 12,234,501 |            4,294,967, |   3,296,485,

Handle Count                     :    148,029 |              158,714 |        153,559

Thread Count                     :     3,363 |                    4,042 |         3,591

Private Bytes                      :   11,581MB |        13,056MB |      11,829MB

Virtual Bytes                      : 201,451,75 |          237,125,50 |   215,969,29

Working Set                       :   11,029MB |         12,707MB |     11,344MB

Cache Bytes                       :      469MB |              526MB |         506MB

Committed Bytes              :   16,225MB |       17,813MB |     16,620MB

Commit Limit                     :   64,914MB |      64,914MB |      64,914MB

% Committed Bytes Used     :        25% |         27% |               26%

Pages/sec                          :      0 |                     64,756.825 |     63.978

Processor                               Minimum      Maximum      Average

================================================================

Processor Queue Length    :      0 |               61 |          0

% Processor Time               :         0% |      100% |     16.69%

   Processor: 0                    :         0% |       100% |     16.34%

   Processor: 1                    :         0% |       100% |     10.91%

   Processor: 2                    :         0% |       100% |     19.07%

   Processor: 3                    :         0% |       100% |     12.98%

   Processor: 4                    :         0% |       100% |     19.07%

   Processor: 5                    :         0% |       100% |     15.75%

   Processor: 6                    :         0% |       100% |     20.04%

   Processor: 7                    :         0% |       100% |     19.48%

% User Time                        :      0.59% |     96.76% |       5.4%

   Processor: 0                    :         0% |     92.27% |      6.82%

   Processor: 1                    :         0% |     96.96% |      4.17%

   Processor: 2                    :         0% |    100.08% |      5.99%

   Processor: 3                    :         0% |     98.52% |      5.19%

   Processor: 4                    :         0% |     95.47% |      6.65%

   Processor: 5                    :         0% |     95.47% |      3.76%

   Processor: 6                    :         0% |    100.08% |      6.27%

   Processor: 7                    :         0% |    100.08% |      4.37%

% Privileged Time                  :      0.39% |     49.61% |     11.29%

   Processor: 0                    :         0% |     47.88% |      9.52%

   Processor: 1                    :         0% |      53.8% |      6.73%

   Processor: 2                    :         0% |     84.28% |     13.08%

   Processor: 3                    :         0% |     56.16% |      7.78%

   Processor: 4                    :         0% |     61.77% |     12.42%

   Processor: 5                    :         0% |     68.05% |     11.93%

   Processor: 6                    :         0% |     59.25% |     13.77%

   Processor: 7                    :         0% |     73.09% |     15.08%

% DPC Time                         :         0% |      0.79% |      0.07%

   Processor: 0                    :         0% |      6.33% |      0.53%

   Processor: 1                    :         0% |      1.58% |         0%

   Processor: 2                    :         0% |      1.59% |      0.02%

   Processor: 3                    :         0% |      1.56% |         0%

   Processor: 4                    :         0% |      3.16% |      0.01%

   Processor: 5                    :         0% |      1.58% |         0%

   Processor: 6                    :         0% |      1.56% |         0%

   Processor: 7                    :         0% |      1.56% |         0%

% Interrupt Time                   :         0% |      1.93% |       0.2%

   Processor: 0                    :         0% |      9.29% |      1.35%

   Processor: 1                    :         0% |      3.16% |      0.03%

   Processor: 2                    :         0% |      3.17% |      0.06%

   Processor: 3                    :         0% |      3.09% |      0.05%

   Processor: 4                    :         0% |      3.12% |      0.05%

   Processor: 5                    :         0% |      1.59% |      0.02%

   Processor: 6                    :         0% |      1.58% |      0.03%

   Processor: 7                    :         0% |      1.59% |      0.03%

Physical Disk                           Minimum      Maximum      Average

================================================================

% Idle Time                     :     12.67% |       100% |     97.48%

   Disk: 0 C:                      :     12.67% |       100% |     97.48%

Avg. Disk sec/Transfer  :          0 |            0.013 |      0.002

   Disk: 0 C:                      :          0 |            0.013 |      0.002

Disk Bytes/sec                :      441KB |     259,749KB |    1,211KB

   Disk: 0 C:                      :      441KB |  259,749KB |    1,211KB

Avg. Disk Queue Length :          0 |             2.245 |       0.04

   Disk: 0 C:                      :          0 |                2.245 |       0.04

Split IO/Sec                     :     0 |                  61.793 |      2.068

   Disk: 0 C:                      :      0 |                  61.793 |      2.068

Disk Transfers/Sec         :      6.897 |          1,929.16 |     28.685

   Disk: 0 C:                      :      6.897 |           1,929.16 |     28.685

TOP 10: % Processor Time                Minimum      Maximum      Average

================================================================

1.  svchost#17                          :         0% |    115.05% |     61.73%

2.  devenv                                 :         0% |     571.8% |     11.88%

3.  VBCSCompiler                    :         0% |    768.39% |      8.93%

4.  SearchUI                              :         0% |     26.56% |      7.12%

5.  dwm                                     :         0% |     17.14% |      5.14%

6.  System                                 :         0% |     32.83% |      3.82%

7.  chrome#16                          :         0% |     93.42% |      3.66%

8.  chrome#21                          :         0% |     18.64% |      3.29%

9.  ServiceHub.RoslynCodeAnalysisSe:  0% |    522.08% |  3.26%

10. SearchProtocolHost             :     0% |    131.17% |      2.73%

TOP 10: % Privileged Time               Minimum      Maximum      Average

================================================================

1.  svchost#17                           :      0% |                115.05% |     61.72%

2.  System                                  :       0% |                32.83% |      3.82%

3.  devenv                                  :    0% |                  296.52% |      2.96%

4.  SearchProtocolHost            :      0% |               106.18% |      2.53%

5.  SearchUI                               :      0% |                 17.23% |      2.29%

6.  dwm                                      :      0% |                 10.91% |      2.09%

7.  SoftMon                               :      0% |                  9.36% |      1.83%

8.  rundll32#1                           :      0% |                  12.47% |      1.76%

9.  tmcsvc                                  :     0% |                   9.47% |      1.43%

10. rundll32                              :     0% |                   7.87% |      0.81%

Looking at the PerfMon results above, the highest CPU consuming process is devenv and VBCSCompiler as highlighted below. But the average utilization is still low. However I do see CPU spikes on the machine at other times, around 2:45 PM. It is being caused by devenv and VBCSCompiler. So we need to which application these processes belongs to.

SVCHOST#17” process looks problematic in above log but it we can’t conclude that because it doesn’t show which process or service exactly might be causing the problem as SVCHOST process is a container host for multiple core services in Windows Operating System. So what’s running inside SVCHOST can only be seen from “tasklist /svc | find svchost.exe” command or Process Explorer tool.



Below are the services which were running in SVCHOST#17. We will have to isolate them in separate containers and capture the same set of logs once again during the issue.

svchost.exe    17448 hidserv, NcbService, StorSvc, UmRdpService, wudfsvc

To isolate these in separate containers use below commands for each service. Once done reboot machine and start monitoring, Once machine comes up fire tasklist /svc and ensure these  services are running in separate svchost containers.

Sc config ServiceName type= own

Example:-
Sc config hidserv type= own
Sc config NcbService type= own
Sc config StorSvc type= own
Sc config UmRdpService type= own
Sc config wudfsvc type= own

Now we have separated them to their own containers so that we can have a clear picture of what is causing the high resource usages on the machine. We need to start PerfMon log collection again and stop it as soon as issue reproduces. We need to debug specific process further based to find root cause.

Please keep below in mind when gathering PerfMon from problematic Windows systems.

1. Stop log collection after the issue has reproduced and make a note of time stamp of the issue.
2. Do not reboot the machine before stopping logs as it will corrupt the logs.
3. Do not reboot the machine before running the tasklist command as process ID’s change with every reboot.

Please refer some additional documentation to understand Windows Performance Monitor tools and analysis:

Windows Performance Recorder and Windows Performance Toolkit:
https://docs.microsoft.com/en-us/windows-hardware/test/wpt/windows-performance-recorder
https://docs.microsoft.com/en-us/windows-hardware/test/wpt/

Windows Performance Step-by-Step Guides:
https://docs.microsoft.com/en-us/windows-hardware/test/wpt/windows-performance-step-by-step-guides

Windows Performance Analyzer:
https://docs.microsoft.com/en-us/windows-hardware/test/wpt/windows-performance-analyzer

Windows Performance Deep Dive Troubleshooting:
https://channel9.msdn.com/Events/TechEd/NorthAmerica/2014/WIN-B413

Random ASCII – tech blog of Bruce Dawson:
https://randomascii.wordpress.com/

Windows Performance Monitor Overview:
https://techcommunity.microsoft.com/t5/ask-the-performance-team/windows-performance-monitor-overview/ba-p/375481

ExPerfAnalyzer:
https://github.com/microsoft/ExPerfAnalyzer













1 thought on “Windows Performance Investigation – PerfMon Log Collection and Analysis”

Leave a Reply

Your email address will not be published. Required fields are marked *