vROPS: Beware of a Whole Lotta Metrics creating the Spaghetti Incident

When doing consultancy at organisations I often find a vRealize Operations, vROPS (or vCOPS) initial deployment left alone because of the IT operations department responsible persons are overwhelmed by the information you (can) receive from vROPS (manager and optionally other suite components). Mostly this is because of the lack of time invested in getting to know the product as admins are busy with reacting to operational actions/issues and operational processes. The downside of this is that perfectly useful pieces of information, recommendations and actions are left alone and the virtual infrastructure (and the IT admins) suffers from neglect. Which in turn only increases the stated problem and were all going down the same roundabout round and round without getting to leave at the vROPS successful exit. This is what we don’t want.

But start at the why we want vROPS in the first place:

  • Continual visibility gained across virtual and physical infrastructure.
  • Pro-actively identify and solve emerging issues with predictive analytic and smart alerts and re-mediation.
  • Reclaim unused resources and capacity, getting the other VM’s happier while saving on unnecessary investments (assets and people).
  • Unified IT Management, complete visibility in one place, across applications, cloud, storage and network devices; with an open and extensible Operations Manager architecture.

With the latter, that extensible OPS architecture, start moderately! Just adding products and management pack after management pack will get you in a spaghetti incident. And lots of metrics, alerts and headaches…..

To help from overwhelming information here are a few pointers you can follow in deployment of vROPS in your environment:

  • Determine beforehand what you want to learn and like to see from your environment. What policies do you currently have for your virtual infrastructure and their application workloads, are those sufficient, and is the information presented out-of-the-box close to these policies.
  • Get to know the out-of-the-box insights and policies that vROPS offers. Stop here and take a breather before wanting to customize every aspect.
  • After implementation let vROPS gather metric data for minimal one week, but preferably longer, before there can be trusted analytic’s.
  • Start with virtual infrastructure components and then move to next levels.
  • Determine what information you would like next to vSphere and in what particular phase you want to introduce this information. Again; go easy here and take your time. Introduce one adapter (or other vROPS components) at a time. Familiarize yourself with the specific insights offered by this specific adapter (for example what is collected) and after a proven success move on to a next. 
  • Familiarize yourself with the basic vROPS Architecture and where collectors and data flow moves in the architecture. A model that I created recently with the data flow in a vROPS basic architecture can be helpful in understanding:

vROPS 6 - Data Flow

  • When designing a vROPS architecture with multiple physical sites, determine where you users connect to and where the data needs to go. When trying to achieve a single pane use a distributed architecture with remote collectors for the connected physical locations. When using a per physical site vROPS instance the users connect to that local vROPS UI with data of that collected site. There is no single pane and metric data is not transferred between instances, ergo complexity your IT operations needs to handle accordantly.

– vROPS is to be enjoyed as it gives you very valuable information and let’s IT ops get from reactive mode to a proactive mode, not get headaches from!

Sources: vmware.com


vCenter Operations Manager Part 3: Nice, but yeah I’ve also got some other stuff to monitor

While consulting on using vCenter Operations Manager (vCOPS) I am often (or is this always) confronted with the current organizations monitoring product in place and/or the multiple components in the infrastructure. It is unknown that vCOPS can also be leveraged to `other than that VMware hypervisor’ infrastructure components. Or let it work together with other specialistic monitoring solutions. With the latter it is important to define who works with what solution as a point of entry, where the specific solution is used for detailed information , and where a single monitoring dashboard can be created for a single monitoring pane.

vCOPS provides a complete VMware hypervisor monitoring, analysis and reporting tool from the box. It’s required to have an license /configuration investment for the analysis / reporting, and for example the additional vCenter Hyperic installations to include other than VMware hypervisor monitoring and application landscape. vCenter Hyperic uses a Hyperic Server (can be deployed as a virtual appliance) and agent architecture, and uses a management pack from VMware solution Exchange (https://solutionexchange.vmware.com/) to provide with collectors and dashboards. Next to Hyper-V this adds SQL and Exchange to your custom dashboard.

Hyperic Dashboard

The agent is installed on the to monitor hosts where there are Windows agents for your Hyper-V or Windows based workloads, and Linux agents for XenServer and Linux based workloads.

Further more there is a SCOM management Pack to leverage the SCOM monitoring that is usialy found with Microsoft deployments. The following model shows them both.


Other devices such as networking can be integrated by vendor specific adapters/management packs, SNMP and/or leveraging Log Insight integration. Not all vendors have a vCOPS adapter/management pack, for these SNMP will need to be leveraged. Traps can be catched with a vCenter Orchestrator workflow to add an alert to the monitoring solution. Plus side on vCOPS is the addon to the vCenter Management interface providing a single pane of glass for management, monitoring and reporting. Yes, it needs some more management packs to be a replacement of all the monitoring solutions.

Getting more with adding VMware vCenter Multi-Hypervisor Manager 1.1 to the vCenter management layer. This provides means to manage Hyper-V from vCenter. vCenter Orchestrator can be leveraged to manage XenServer tasks, with SSH workflows to run xe actions for example. Putting the vCenter, vCenter Orchestrator, vCOPS and Multi-Hypervisor support provides a management, monitoring and control plane of a software defined data center.

Collectors, adapters and Management Packs

Adapters work closely with the collectors. The collector is the gateway between the adapters and vCOPS. The adapters connect to and collect data from data sources, transforming the data into a vCOPS format. The analytics VM houses the collector or in a distributed architectures remote collectors gather metrics locally and feed the central vCOPS to offload the central install. Depending on the data source and the adapter implementation, an adapter might collect data by making API calls, using a command-line interface, or sending database queries.

Management Packs have specific configuration/information to connect to specific data sources with it’s specific connection and gathering method. They also offer ready to use custom dashboards or automatic discovery information adding the adapters to the environment.

vCOPS Architecture

Custom collectors and management packs can be downloaded from solution exchange. These are in *.pak format.

Monitoring other hypervisor or workloads with Hyperic

Hyperic server component can be downloaded from my.vmware.com (that is if you have a valid download account). There are the installable downloads, but also the Hyperic vApp. I am deploying this one as the server. You need the install bundle for the agent installation (that is the easy one). The appliance creates two VM’s in a Hyperic vApp, the Hyperic server and the vFabric vPostgres database server. Deploy the OVF in your ESXi environement. Fill in the values as required (location, size, space and networking) and fill in the hqadmin details. Or choose a new user account. Start up the Hyperic vApp when finished. You can manage the Hyperic environment by using: https://<fqdn or IP>:5480/ for the appliance, http://<fqdn or ip>:7880, https://<fqdn or ip>:7443 or logging on to the console. Logon with root and the password supplied for the hqadmin user in the OVF deployment. Here you can set the time zone and view networking (managed by VAMI).

Hyperic Agent

Deploy a Hyperic Agent. For this demo I am using the Windows agent on a Hyper-V host (demo/test environment that is).
Create a destination directory, for example program files\Hyperic. Start by extracting the zip and command prompting to that directory (run as….). Run setup.bat -full. Accept EULA and choose 2. for the HQ Agent installation. 

HQ Agent

Type the destination path created. For the example C:\Program Files\Hyperic.
When entering the setup this unpacks the agent to this directory. Go to the Hyperic Agent installation at with the c:\Program Files\Hyperic\agent-5.8.1-EE/bin/. Do a hq-agent.bat install to install as a Windows service. Do a hq-agent.bat start to start the service for the first time and do an initial configuration. Here you define, communication method, IP of the HQ server, login credentials, port numbers et al. Accept self signed when trusted. Don’t forget firewalls when they are in the transport.

Configuration Agent (a yes failed to type in the password on the first go 😉 )

Creating the match made in heaven…

Next up is configuring vCenter Hyperic and vCOPS to find each other (going from one interface back to the other and back again warning).

The Hyperic management PAK can be installed via the Update tab in the admin console of vCOPS. Browse to the https://<UI fqdn (or IP)>/admin. Log in and go to the Update tab. Browse the pak file and update. Notice the warning that updates are irreversible, you should have a back-up or this is the time.

vcops updateConfirm Update

Login to the vCOPS custom dashboard (/vcops-custom/).  Select Admin > Support. On the Info tab, find the Adapters Info pane. Check if MP for Hyperic in the pane. If not, click the Describe icon. The Describe icon is located at the top right of the Adapters Info pane.Click Yes to start the describe process and click OK. And be patience….

Log in to the vCenter Hyperic site, at http://<hyperic fqdn or IP>:7080/ or https at port 7443. Here you can check the autodiscovery. My hyper-v.testlab.local is found and can be added to the inventory. Yeah!

Hyperic Found Hyper-V

Go to the Hyperic  Administration tab. Select the HQ Server Settings link and configure the vCenter Server settings. Click ok on the bottom to save. This will find vCenter as a resource. Sometimes a little discovery will get it from stuck state (this can also be just my lab environment).

Management Pack for vCenter Hyperic instance requires a user name and password to connect to vCenter Hyperic server. vCenter Hyperic server requires a user name and password to connect to the vCOPS adapter. You must provide this information in a credential in vCOPS. Go to the custom vCOPS and browse to Environment > Configuration > Credentials. In the manage credentials select the MP Hyperic and Hyperic credentials type. Click add. Fill in the required values.


Are we there yet? Almost. Add an adapter instance to vCOPS. Go to Environment > Configuration > Adapter Instances. You will only find the default adapters if you haven’t added any. Select the vCOPS collector and adapter kind MP for Hyperic and click on the add icon (the first). Fill in the required values in the Add window. Important are the url’s, the certificate checking and the auto discovery. I select no checking for the certificates, but if you want you have to import the required certificates in the appropriate trust stores. For the URL of the vCenter Hyperic server in the Hyperic Server URL text box.For example: https://<fqdn IP Hyperic server>:7443. The URL of your vCenter Operations Manager server in the vCenter Operations Manager URL text box.For example: https://<fqdn or IP vCOPS>. Select the credential name you created in the previous step.

Add Adapter

Click test to test the connections. Ok to add the adapter. When auto discovery is finished the Hyper-V VM’s pop up in the vCOPS custom dashboard. Just take a look at Hyperic Hyper-V VM utilzation tab. My lab isn’t much but you will get it.

HYper-V VM's

– Happy Monitoring your other hypervisors and workloads!

Sources: vmware.com

vCenter Operations Manager Part 2: How to interpret the metrics?

In the first part of writing about vCenter Operations Manager, or vCOPS for short, I handled the architecture and installation of vCOPS. You can read that part at https://www.pascalswereld.nl/post/84944390488/installing-vcenter-operations-manager.

As I also visit customers that have vCOPS in place, but don’t know how to use it. I want to follow up with an article on how to interpret the presented metrics, how are they calculated and when to act. Future posts must go about tuning and customization, but first is first.

The Dashboard

What do we see when opening the standard interface of vCOPS?


We see a World overview. World is the vOPS process with all vCenters, Datacenter, cluster, host and other sibling objects. Per object type you narrow down the view from that objects tree. Health, Risk and Efficiency badges can be green on the world, but down in the cluster specific stress or efficiency levels can bring them down for that zoomed object. There will be metrics involved specific for the object. Always check down in the tree to see what specifically is happening there.

Like said in the previous part, for Health and Efficiency high is good and low is bad. For Risk this is the other way around, low is good high is bad. This opening screen isn’t that bad (if we forget all the alerts and reclaimable waste for now).

When I go down an object layer to one of the vCenters, I get this:


My health went down a few points, nothing serious. Risk went up a few points. Efficiency went from green to a yellow state. There is a reclaimable waste here, and density is also not ideal (out of view so you will have to trust me) on memory. Something to investigate further.

And when we go to a cluster in that datacenter, we get this view.


Risk is up to 96, there is definitely something to do here in this cluster!

We didn’t see this right away in the world dashboard, but we could have come here by investigating and looking at the environment, operations, planning, alert etc. from the world view. Don’t go and trust all your initial information as these are calculated at the whole environment and you will miss important information from the siblings. There is always something lurking in the environment.

Where does vCOPS gets it’s metrics?

What VCOPS actually does is take thousands of metrics from vCenter Server and categorize them up into 3 actionable higher level badges for Health, Risk and Efficiency.  These are critical pieces of info that would help any admin without having to go through all those vCenter metrics.

All of the metrics that are collected from the vCenter servers database, are moved into the vCOPS embedded database running inside the vCOPS Analytics VM. Because vCOPS itself pulls data from the vCOPS database, it starts providing useful performance and capacity information the first day it is installed because it can use the historical data available in the vCenter database (can as this depends on your specific settings). No need to wait for important business cycles, again depending on the fact your vCenter is already configured to take account of these specific workloads.

Certain metrics are identified more important than other metrics.  These more important metrics can indicate that there are severe problems in the virtual infrastructure. Those special groups of metrics are KPIs, or key performance indicators.

The way badges, alerts, forecast and all are made up, is controlled via the policies. Standard the default policy is applied when no custom policy is applied. For tuning the environment custom policies are required.

You will also customize the groups of metrics. The created groups of metrics might, for example, track the average free disk space for all MSSQL server data disks in your organizations infrastructure.

When and how to act?

vCOPS is big and will tell a lot of information about your environment, take the time to get used to the way vCOPS presents its information. Try to get the why out of the presented data. vCOPS is complex, and stays complex until you understand the variety of badges and how/why they are presented in the different layes and object type (these have their own metrics). It becomes even more complex when you  want to customize.  Make sure you take the time to recognize important badges and use them to their (and yours) advantage.

See any stress or oversized and go full me(n)tal jacket and to try to solve all those events by changing the VM’s and blindly solving issues? Well your application or server administrators won’t be pleased, perhaps some customers as well as services might go down (without application redundant roles/services), and when they do their half years cycle resources needed weren’t calculated yet as those statistics are not saved, woops. Removing resources, and solving issues as well, needs some planning. Hot add is okay for the most of the last releases of OS, but hot removing still is far from support. This takes down time. But it also takes planning. Right size the VM’s.  Just removing all resources to the minimum amount registered in monitoring is probably not the way, you will need to test/monitor the workload for capacity planning. Maybe the workload needs some more in a few cycles. Talk to the owners of the workload, they know what is needed and when (or they should at least). Think, communicate and plan your actions.

Customize or extend with other monitoring? Sure that is possible and recommended as well. But take it a step at a time, first know what is going around before creating a bigger monster that burns down your brain with an informational overload.

Sources: vmware.com

Installing vCenter Operations Manager

A yes, I want a little PowerCLI and more vOrchestrator, but my next task brings me vCenter Operations Manager. Also a very handy, must have, excellent tool for every vSphere administrator/engineer/architect. Here goes.


vCenter operations manager is the key component of the vCenter Operations Management Suite. It provide a holistics view and deep operational insights into the health, risks and efficiency of the virtual infrastructure and its application workload. It identifies capacity short falls and over-provisioning, with that information you can right size the VM’s, reclaim not used resources and increase consolidation ratio’s. This is all in a proactive performance management solution, with automated root cause analysis and recommended actions to remediate potential bottlenecks.

The operations management suite consists of the following components:

  •          vCenter Operations Manager, or vCOPS in short.
  •          vCenter Configuration Manager;
  •          vCenter Hyperic;
  •          vCenter Infrastructure Navigator;
  •          vCenter Chargeback Manager;

This document handles the installation of the key component VCOPs.


vCOPS is a vApp that you import and deploy in the virtualization layer, in this case vSphere ESXi.

The vApp consists of two virtual machines:

  •        the UI VM. Access to the analytics and access to the administration portal via several web interfaces. Addins/plugins to vSphere client and the vSphere web client let’s you view and manage the environment from a single web control plane.
  •       the analytics VM. Responsible for collecting data from vCenter and other VMware or third party infrastructure components. Raw metric data is stored in the File System Database (FSDB) and other collected data, objects, relationships, alerts and thresholds are stored in a Postgres DB.

All put together in an architecture model taken from the vmware.com site:


For deployment you will need:

  •          vCenter to connect to.
  •          Active Directory user with access to the source vCenter, and access in the infrastructure vCenter to deploy.
  •          Amount of VM’s to monitor, depending on your license (vSphere edition or vCOPS license) you will need the amount of VM’s licensed and you will need the resource calculated for the amount of VM’s:
  • Small: less than 1500 VM’s. 4vCPU’s and 16GB vRAM.

  • Medium: between 1500-3000 VM’s. 8 vCPU’s and 25GB vRAM.

  • Large: Larger than 3000 VM’s. 16 vCPU’s and 34 GB vRAM.

  •          vCenter for the infrastructure where your vApp will run.
  •         Networking information.
  •         Storage location and capacity, starting with 344 GB Thick provisioned (or 3.8 GB Thin when you want to thin provision).
  •         Resources, depending on the amount of VM’s.


Download the OVA or OVF and VMDK files from vmware.com. Either import the OVF from the web client or from the vSphere client. When using the vSphere client you will have to create an IP Pool to support the vApp. With the web client this is created upon import and configuration.

Note: when creating the IP Pool in the vSphere client you don’t have to enable the IP Pool and add a IP range. When deploying the vApp you either configure a static IP (recommended) or use a DHCP server present in the IP range.

Go to Inventory – Networking and select the datacenter object you want to deploy vCOPS into.  Select the IP Pools tab. Click add.

Fill in the IP Subnet and gateway, DHCP (only when a DHCP server is to be used and present), DNS and Associations. Do not select Enable IP Pool (normally when you want a Transient allocation, but this is not supported with vCOPS). But as I want static, I don’t need the IP Pool to be enabled. Association of the IP pool must be to the portgroup(s) you want you VM’s configured with.


Next up is deploying the vApp. Select deploy OVF Template and browse to your downloaded location. Follow the prompts of the Deploy OVF Template Wizard.

  1. Accept the EULA.
  2. Name your vAPP and select an Inventory location in the data center.
  3. Select your deployment configuration. I select small as the infrastructure is much smaller than 1500 VM’s.
  4. Select the Cluster or host the vApp needs to run on.
  5. Select the storage location. Standard the compatibility check is on Thick provisioned, you can change this to go to disk format and then return to Storage location.
  6. Thick Provisioned Eager Zeroed disk format is recommended, or whatever is the organizations standard is.
  7. Select Network Mapping to the wanted portgroup.
  8. Select an IP Allocation Policy, I want Fixed.
  9. Fill in the required properties and when finshed let it rip.
  10. Wait for the VM’s to be started
  11. Check the IP’s in the VM tab’s.

Connect to the vCOPS web service. This can be reached via https://<ip UI VM>/admin. Use the default admin admin combination on first boot.

Configure the following:

  •           vCenter Hosting server. If FQDN gives error, IP address should be used to import certificate.
  •           Change password on admin and on root.
  •           Specify the vCenter to register with (for monitoring). Specify users for registering and collecting information.
  •           Import data.

Next assign a license via the License management to the vCOPS solution.

When needed set SMTP en SNMP to forward alerts, and add users to the roles to be able to use vCOPS as well.

You can now go to solutions and vCOPS. Let the solution gather some information on the infrastructure before any actions will be done. As more data becomes available, more information is displayed. Depending on the size of the to monitor infrastructure this can take a few minutes to hours.

Check if there are no alerts, both in the admin and in the vCOPS UI.



The dashboard is opened via the vSphere client (after you enabled the plug-in) when you go to Home – solutions and Applications – vCenter Operations Manager. You can also open this in a web client by browsing to https://<UI IP address>/.

You can view the dashboard and details on several point in the infrastructure:

  •           World shows all connected infrastructures and the combined status.
  •           vCenter shows that vCenter inventory status.
  •           From vCenter you can zoom in to the vCenter objects just like in the vSphere infrastructure:
  • Datacenter

  • Cluster

  • Hosts

  • Datastores

When you are used to the default dashboard and it’s badges you can create custom dashboards to further specialize vCOPS solution to your organization’s needs. The custom interface is reachable via https://<ui IP address>/vcops-custom. Same goes for setting organization specific policies instead to the default policies. This means setting specific threshold values, detection rules, alerts and forecasts and trends.

Dashboard badges

The standard dashboard is created from the following badges:

  •          Health. Which in turn is a combination of workload, anomalies and faults. The higher this number, the healthier the environment is. Low number is bad.
  •          Risk. This is a combination of stress, time remaining and capacity remaining. This is a projected risk in a future condition (near or long term). The lower this number the better, high is bad.
  •          Efficiency. This is a combination of reclaimable waste and density (consolidation p:v for CPU and memory and VM to host ratio). The higher this number, the better. Low number is bad.

Going from Dashboard to more

Going from the dashboard badges to more specific information, planning, analysis and reporting via the tabs shown on the top pages or by clicking on some of the badges of interest in the dashboard. The active alerts, warnings and informational numbers are shown on the right of the screen.

But for now this concludes this document.

– Enjoy!

vSphere Performance monitoring tools standard available

I am currently working on a project where we are optimizing the virtual infrastructure which consist of vSphere and XenServer hypervisors. In the project we want to measure and confirm some of the performance related counters. We got several standard tools at the infrastructure components to see what the environment is capable of and check if there are some bottlenecks regarding IO flow and processing. 

With any of the analyzing it is important to plan (or know) what to measure on what layer so this is repeatable when wanting to check what certain changes can do to your environment. This check can also be done from some of the tools available, such as earlier written in the blog post about VMware View planner (to be checked at this url https://www.pascalswereld.nl/post/66369941380/vmware-view-planner) or is a repeat of your plan (which then can be automated/orchestrated). Your measuring tools need to have similar counters/metric throughout the chain, or at least show what your putting/requesting from a start and at the end (but if there is a offset you got little grey spots in the chain).
A correct working time service (NTP) is next to correct working of for example clustering and logging, also necessary for monitoring. To get to right values at the right intervals. Slightly off will in some cases give you negative or values that are off at some components.

Some basics about measuring

You will have to know what the measuring metrics are at a point. Some are integers, some are floating, some are averages over periods or amounts used, some need a algorithm to calculate to human or a similar metric (Kb at one level and bytes on the other, some of them are not that easy). A value that is high in first view but consists of several components and is an average of a certain period, could be normal when devided by the amounts of worlds.

Next up know or decide on your period and data collection intervals. If you are measuring every second you probably get a lot of information and are a busy man (or woman) trying to analyze trough all the data. Measuring in December gives a less representative workload then measuring in a company’s peak February period (and for Santa it is the other way around ;-)). And measure the complete proces cycle, try to get a 4 weeks/month period to get the month opening and closing processes in there (well depending on the workload of course).

Most important is that you know what your workloads are, what the needs for IO is and what your facilitating networking and storage components are capable off. If you don’t know what your VD image is build of for a certain group of users and what is required for these, how will you know if a VD from this groups requesting 45 IOPS is good or bad. At the other hand if you put all your management, infrastructure and VD’s on the same storage how are you going separate the cumulative counters from the specific workload.

Hey you said something about vSphere in the title, let’s see what is standard available for the vSphere level.

VM monitoring. In guest Windows Perfmon counters or Linux guest statistics. The last is highly depending on what you put in your distribution, but think of top, htop, atop, vmstat, mpstat et al.
Windows Perfmon counters are supplemented with some VM insights with VMware tools. There are a lot of counters available, so know what you want to measure. And use the data collection sets to group them and have them for reference/repeatable sets (scheduling of the data collection). 

– Host level; esxtop or vscsistats. Esxtop is great tool for performance analysis of all types. Duncan Epping has an excellent post about esxtop metrics and usage, you can find it here http://www.yellow-bricks.com/esxtop// Esxtop can be used in interactive or batch mode. With the batch mode you can load you output file in Windows Perf mon or in esxplot (http://labs.vmware.com/flings/esxplot). Use VisualESXtop (http://labs.vmware.com/flings/visualesxtop) for enhancements to the esxtop commandline and a nice GUI. On the VMA you can use resxtop to remotely get the esxtop stats. vscsistats is used when wanting to get scsi collections or get storage information that esxtop is not capable of showing. And ofcourse PowerCLI can be an enormous help here.

vCenter level; Statistics collection which depends on your statistics level. Graphs can be shown on several components in the vSphere Web Client, can be read via the vSphere API or again use PowerCLI to extract the wanted counters. To get an overview of metrics at the levels please read this document http://pubs.vmware.com/vsphere-55/topic/com.vmware.ICbase/PDF/vsphere-esxi-vcenter-server-55-monitoring-performance-guide.pdf or check documentation center for your version.

– vCenter™ Operations Management Suite (vCOPS). Well standard, you still have to option to not include operations in your environment. But your missing out on some of the automated (interactive/proactive) performance monitoring, reporting and insight in your environment options. Root cause analysis is part of the suite, and not down to your own understanding and analytic skills. If you are working on the previous levels your life could have been simpler with vCOPS suite.

Next up

These standard tools need to be supplemented with specific application, networking (hops and other passed components) and storage (what are the storage processors up to is there latency build up in the device it self) counters.

– Happy measuring!