For VMware vSphere infrastructures and the how and why my environment is doing that, it is helpful to understand how vSphere and vCenter standardly collect and store statistics, and how these are displayed. The point here is that there is an awful lot of assumption in performance reports or troubleshooting done. When does there assumptions come in to play? When looking at several counters and the way the data is collected (and the what), storage and graphs are made. Especially when selecting intervals how to display (or gather), peaks can become less because they’ve been averaged out over the displayed graph time of the historical (historical vs realtime) or metrics are missing when needed.
How does the statitics gathering work?
Each host stores statistics data for up to a hour via the local performance manager. The performance manager receives realtime instance data from for example the CPU instances. Within the vCenter data collection interval, vCenter performance manager queries each host (that is the hosts that are managed by this vCenter) and vCenter will retrieve a subset of the host statistics data, and stores it in the vCenter database. When, what and how much are configurable in your VMware infrastructure. We have two values for that, interval and levels that can be set on the vCenter.
These collected historical metrics can then be displayed via the vSphere client.
A small model taken from the VMware Documentation Center.
Statistics rollups and intervals
As the ESXi host collects the statistics realtime (that is every 20 seconds) these are rolled up to the vCenter database for historical purposes. The vCenter collects data from all of the hosts that the vCenter Server manages. The PerformanceManager defines performance intervals that specify time periods for performance data rollup, a methodology for combining data values. The server stores the rolled up performance counter data in the vCenter database. This is done in four performance intervals that determine how collected instance data is aggregated and stored. The aggregated data is a set of instance data values collected for a performance counter. These intervals can be modified to a limited extent via the collection intervals. These determine the duration for which statistics are aggregated, calculated, rolled up, and archived. Together, the collection interval and collection level determine how much statistical data is gathered and stored in your vCenter Server database.
Are those rollups evil? Yes they can be. Peaks can become less because they’ve been averaged out over the graph time. But you must not forget the fact these are historical data, graphs made for a month with 12 data points can show different peak values than week graphs. Know what you are looking for, for which periode and don’t base your conclusion on just one graph.
To reduce traffic to the vCenter database vCenter uses a technique to limit which metrics are archived in the database. Certain statistics might be deemed more valuable for your like then others. The statistics levels varies from one to four, with one being the least-detailed statistics level and four being the most detailed.
- Level one; statistics cluster Services, CPU, Disk, Memory, Network, System, and Virtual Machine Operations counters. Default level.
- Level two; level one plus all disk, memory and VM operations metrics. Use for long term performance monitoring when device statistics are not required but you want to monitor more than the basic statistics.
- Level three incorporates level one and two plus per-device statistics, such as CPU usage of a host on a per-CPU basis, or per-virtual machine statistics .Use for short-term performance monitoring after encountering problems or when device statistics are required.
- Level four. All possible metrics. Use for short-term performance monitoring after encountering problems or when device statistics are required. Only to be used in the shortest amount of time due to the large quantity of data.
The statistics level is used to dictate whether or not a statistic is stored in the vCenter database. If a metric is a level two statistic, but vCenter is configured to level one, this metric is not stored in the database. Not stored means users are not able to query its historical values. Not a problem all the time, but also not good to have if you are looking for just these counters.
These levels sure have their benefits (information at minimal database costs) and drawbacks (possible missing metrics), but the ability to have and change the statistics levels gives something back. We can gather basic information at minimal database cost for the normal running environment with the level one counters. When needed at a troubleshooting scenario we can temporarily increase the statistics level to get more detailed information.
– Happy statistics gathering!