So after the Cloud Connector Edition (CCE) V2.1.0 announcement i wanted to look at , setup and play around with OMS as ive never really looked into this before. so here’s my steps
Started here
https://azure.microsoft.com/en-gb/pricing/details/log-analytics/
Clicked Try for Free
So here the question is do you have an Azure Tenant already ? if not you need to sign up for one and enter card details but for OMS im using the free plan but you still need payment details i believe.
I had an existing tenant although as i started to add an OMS workspace it had expired so i had to add a pay as you go subscription but there are free trails out there to be had.
If you already have an Azure Subscription you can log in and start here
Within the Azure portal after i had logged in
I clicked the plus New button
I then went
Monitoring > Management
Clicked on Log Analytics
Then i was asked to create a new OMS Workspace
More details on workspaces here
I wanted to understand a little on what is an OMS Workspace and the link above helped but this summarized it nicely
A workspace is an Azure resource and is a container where data is collected, aggregated, analyzed, and presented in the Azure portal.
I selected create new and entered a workspace name, selected the subscription i was using so yours may be different, i created a new resource group and selected location.
Pricing i was staying on the free pricing tier but there are paid options.
Clicked ok and it went off to create.
Awesome!
Know i went to all resources from azure portal left hand side menu and found my Workspace and clicked on it.
I opened the workspace and for administration i needed the OMS Portal so i click the icon for OMS portal
Click OMS Portal
Now this opened a blank dashboard it seems so you need to configure this
Click Settings Top right
Now we need to look at connected sources > Windows Computers
We need to find the Workspace ID and Keys we will configure on CCE
Configure Cloud Connector to use OMS
You’ll need to configure your Cloud Connector on-premises environment to use OMS.
*** Please note i found without connecting a source you could not create alerts to workaround this for this blog i added my surface pro. I download the Windows agents, installed and entered the workspace id and primary key.
CCE instructions on Configuring CCE side
This part i haven’t done yet as im waiting for access to the CCE in our lab. Update should follow very soon but the steps from TechNet are here for the moment.
*** Updated 08/12/2018 with some screenshots from existing CCE in our Lab, special thanks to Darren Ellis for assisting ****
From https://technet.microsoft.com/en-us/library/mt828598.aspx
-
If you are installing a new Cloud Connector appliance or you want to re-deploy an appliance, follow these steps before you run Install-CcAppliance:
-
In the CloudConnector.ini file [Common] section, set the OMSEnabled parameter to True.
Each time Cloud Connector is deployed or upgraded, it will try to install the OMS agent automatically onto the VMs. Enable this feature so the OMS agent can survive the Cloud Connector automatic update.
-
To configure the OMS ID and key, run Set-CcCredential -AccountType OMSWorkspace.
-
If you are installing an OMS agent onto an existing Cloud Connector appliance, follow these steps:
-
In the CloudConnector.ini file [Common] section, set OMSEnabled=true.
Located the CloudConnector.ini file
Opened in Notepad and added OMSEnabled=true under [Common]
Saved and closed notepad.
2. Run Import-CcConfiguration.
Open PowerShell as Administrator and ran Run-Import-CcConfiguration
We accepted the message to redeployed after we ran the cmdlet. This was on an AudioCodes CCE appliance.
We then went to run the next cmdlet to install the OMS agents
3. Run Install-CcOMSAgent.
I was asked to enter the OMS workspace ID first.
I got the OMS workspace Key from my OMS workspace ID and keys from the OMS Portal
After i grabbed the ID and Key i first entered the ID and thenm the Key when prompted.
We ran the cmdlet and then was asked to Enter the OMS Workspace Key as shown above and then reconfirm.
It then went off to download the latest OMS agent
Once downloaded it started to install on first CCE VM
Once installed finished it went onto the next VM
After Each VM and Hyper V Host was installed it looked like below
And thats our existing CCE configured and OMS Agents deployed.
Lets go check the OMS portal and see if the CCE VMs and Host are connected.
I had to wait a little while and refresh the OMS Portal but i now have 5 Conncted windows computers
Lets see what they are
Yep all VMs and Host is here
-
If you want to update the OMS workspace ID or key in a Cloud Connector appliance that has already installed an OMS agent:
-
To configure the OMS ID and key, run Set-CcCredential -AccountType OMSWorkspace.
-
To apply the updates, run Install-CcOMSAgent.
For all scenarios, verify that the agents are connected as follows:
-
-
In the OMS portal, go to Settings -> Connected Sources -> Windows Servers. You will see a list of connected machines.
Now OMS workspace is created and CCE is pointing to our workspace its time to configure OMS.
Configure OMS
Back in OMS portal settings
Settings->Data->Windows Event logs, and add event logs for:
-
Lync Server
-
Application
You must manually enter Lync Server in the text box. It does not appear as an option in the drop-down list.
Click Save
Settings->Data-> Windows Performance Counters
Here i clicked the “Add the selected performance counters before adding the new ones.
Total active calls:
-
LS:MediationServer – Inbound Calls(_Total)\- Current
-
LS:MediationServer – Outbound Calls(_Total)\- Current
Total active media bypass calls:
-
LS:MediationServer – Inbound Calls(_Total)\- Active media bypass calls
-
LS:MediationServer – Outbound Calls(_Total)\- Active media bypass calls
i then had a big list of counters to cover the OS and CCE
Click Save
I then had to save configuration to move to create alerts.
Create Alerts
First off we need to consider the following
You should consider the following when creating alerts:
-
Make sure the alert is a Number-of-results alert, which is the default selection.
-
The demo queries require that “Number of results” is set to “Greater than 0”.
-
It is recommended that you set both Time window and Alert frequency to 5 minutes.
-
It is recommended that you do not enable “Suppress alerts” for demo alerts.
-
For typical alert scenarios, Microsoft recommends creating a pair of alerts: one error alert and one reset alert. For the error alert, select severity level Critical; for the reset alert, select severity level Informational .
For Alerts they look to be pairs so you have a alert for error state and a alert for reset back to normal which makes sense so you know when its broke and when its back to normal.
I found this Azure documentation on alerts which was useful to read for creating alerts as its not mega easy to start with.
https://docs.microsoft.com/en-us/azure/log-analytics/log-analytics-alerts-creating
I tried under Settings below but couldn’t see how to create, i think they are listed here only once created.
it does mention though
You can create rules in Search and manage them here in Settings.
Over to search then
Under Log Search by clicking the search magnifier glass on the left menu
I pasted the first query into the box and clicked search
Alert button on top menu
Now i was in Add Alert Rule there must be a better way but im there
Here i copied the sample CCE alerts from Technet link but perhaps noted also an error in one. please read below.
https://technet.microsoft.com/en-us/library/mt828598.aspx
RTCMEDSRV is NOT running in Mediation Servers
Event | where Computer contains "MediationServer" | where EventLog == "Lync Server" and (EventID == 25002 or EventID == 25003)
| summarize arg_max(TimeGenerated, EventID) by Computer | where EventID == 25003
You need to update the server name though as mentioned in technet link this looks for server which contain name mediationserver.
Create an alert pair: "RTCMEDSRV is NOT running in Mediation Servers" and "RTCMEDSRV is back in running in Mediation Servers"
The query for the error alert is:
Event | where Computer contains "MediationServer" | where EventLog == "Lync Server" and (EventID == 25002 or EventID == 25003)
| summarize arg_max(TimeGenerated, EventID) by Computer | where EventID == 25003
Clicked Save
Next created the other alert in the pair this time it was the reset alert so i set severity to informational.
The query for the reset alert is:
Event | where Computer contains "MediationServer" | where EventLog == "Lync Server" and (EventID == 25002 or EventID == 25003)
| summarize arg_max(TimeGenerated, EventID) by Computer | where EventID == 25002
On Technet there is an Error as the reset is missing the 2 on the end.
Create an alert pair: " Too many concurrent calls in Mediation Servers" and “Concurrent calls fall back to normal load”
The query for the error alert is:
Perf | where Computer contains "MediationServer" | where (ObjectName == "LS:MediationServer – Outbound Calls" or ObjectName
== "LS:MediationServer – Inbound Calls") | summarize arg_max(TimeGenerated, CounterValue) by ObjectName, Computer | summarize
TotalCalls = sum(CounterValue) by Computer| where TotalCalls >= 500
The query for the reset alert is:
Perf | where Computer contains "MediationServer" | where (ObjectName == "LS:MediationServer – Outbound Calls" or ObjectName ==
"LS:MediationServer – Inbound Calls") | summarize arg_max(TimeGenerated, CounterValue) by ObjectName, Computer | summarize
TotalCalls = sum(CounterValue) by Computer| where TotalCalls < 500
Create an alert: "CPU usage > 90 or RTCMEDIARELAY stopped in Servers" alert
The query will get all processor usage counter and service stop event from all computers and return one log if either processor usage exceeds 90% or service is ever stopped.
search *| where Computer contains "MediationServer" | where (Type == "Perf" or Type == "Event") | where ((ObjectName ==
"Processor" and CounterName == "% Processor Time") or EventLog == "Lync Server") | where (CounterValue > 90 or EventID == 22003)
Recommended minimal monitoring set from Microsoft.
So looks like we need to work this out on our own so ill give it go. I dont if these are correct but perhaps it will help someone.
Lets start with the table first
The following table lists the services that Microsoft recommends monitoring by listing the stop and start event IDs:
You need to update the server name though as mentioned in technet link this looks for server which contain name mediationserver or edgeserver
Mediation Server
Service Name – RTCMEDSRV
Please these were added in the examples before from Technet but i think Technet has missed the 2 off the end of the start event ID.
Here they are again
The query for the error alert is:
Event | where Computer contains "MediationServer" | where EventLog == "Lync Server" and (EventID == 25002 or EventID == 25003)
| summarize arg_max(TimeGenerated, EventID) by Computer | where EventID == 25003
The query for the reset alert is:
Event | where Computer contains "MediationServer" | where EventLog == "Lync Server" and (EventID == 25002 or EventID == 25003)
| summarize arg_max(TimeGenerated, EventID) by Computer | where EventID == 25002
Edge Server
Service Name – RTCSRV
The query for the error alert is:
Event | where Computer contains "EdgeServer" | where EventLog == "Lync Server" and (EventID == 12288 or EventID == 12289)
| summarize arg_max(TimeGenerated, EventID) by Computer | where EventID == 12289
The query for the reset alert is:
Event | where Computer contains "EdgeServer" | where EventLog == "Lync Server" and (EventID == 12288 or EventID == 12289)
| summarize arg_max(TimeGenerated, EventID) by Computer | where EventID == 12288
Service Name – RTCMRAUTH
The query for the error alert is:
Event | where Computer contains "EdgeServer" | where EventLog == "Lync Server" and (EventID == 19002 or EventID == 19003)
| summarize arg_max(TimeGenerated, EventID) by Computer | where EventID == 19003
The query for the reset alert is:
Event | where Computer contains "EdgeServer" | where EventLog == "Lync Server" and (EventID == 19002 or EventID == 19003)
| summarize arg_max(TimeGenerated, EventID) by Computer | where EventID == 19002
Service Name – RTCMEDIARELAY
The query for the error alert is:
Event | where Computer contains "EdgeServer" | where EventLog == "Lync Server" and (EventID == 22002 or EventID == 22003)
| summarize arg_max(TimeGenerated, EventID) by Computer | where EventID == 22003
The query for the reset alert is:
Event | where Computer contains "EdgeServer" | where EventLog == "Lync Server" and (EventID == 22002 or EventID == 22003)
| summarize arg_max(TimeGenerated, EventID) by Computer | where EventID == 22002
Now lets look at the second table.
The following table lists the network issues that Microsoft recommends monitoring:
Monitor Name
Mediation Server to gateway connectivity failure
The query for the error alert is:
Event | where Computer contains "MediationServer" | where EventLog == "Lync Server" and (EventID == 25061 or EventID == 25062 or EventID == 25002)
| summarize arg_max(TimeGenerated, EventID) by Computer | where EventID == 25061
The query for the reset alert is:
Event | where Computer contains "MediationServer" | where EventLog == "Lync Server" and (EventID == 25061 or EventID == 25062 or EventID == 25002)
| summarize arg_max(TimeGenerated, EventID) by Computer | where (EventID == 25062 or EventID == 25002)
Mediation Server to gateway call completion failure
The query for the error alert is:
Event | where Computer contains "MediationServer" | where EventLog == "Lync Server" and (EventID == 25063 or EventID == 25064 or EventID == 25002)
| summarize arg_max(TimeGenerated, EventID) by Computer | where EventID == 25063
The query for the reset alert is:
Event | where Computer contains "MediationServer" | where EventLog == "Lync Server" and (EventID == 25063 or EventID == 25062 or EventID == 25002)
| summarize arg_max(TimeGenerated, EventID) by Computer | where (EventID == 25064 or EventID == 25002)
Critical network problems
The query for the error alert is:
Event | where Computer contains "EdgeServer" | where EventLog == "Lync Server" and (EventID == 14624 or EventID == 14353 or EventID == 12288)
| summarize arg_max(TimeGenerated, EventID) by Computer | where EventID == 14624
The query for the reset alert is:
Event | where Computer contains "EdgeServer" | where EventLog == "Lync Server" and (EventID == 14624 or EventID == 14353 or EventID == 12288)
| summarize arg_max(TimeGenerated, EventID) by Computer | where (EventID == 14353 or EventID == 12288)
Next its looking at call capacity counters
The following lists the call capacity counters that should be monitored. These numbers should be less that 500 for Cloud Connector standard edition; less than 50 for Cloud Connector minimum edition.
-
LS:MediationServer – Inbound Calls(_Total)\- Current
-
LS:MediationServer – Outbound Calls(_Total)\- Current
-
LS:MediationServer – Inbound Calls(_Total)\- Active media bypass calls
-
LS:MediationServer – Outbound Calls(_Total)\- Active media bypass calls
These i believe were all created in the examples but here they are
Create an alert pair: " Too many concurrent calls in Mediation Servers" and “Concurrent calls fall back to normal load”
To create this alert:
-
The query for the error alert is:
Perf | where Computer contains "MediationServer" | where (ObjectName == "LS:MediationServer – Outbound Calls" or ObjectName
== "LS:MediationServer – Inbound Calls") | summarize arg_max(TimeGenerated, CounterValue) by ObjectName, Computer | summarize
TotalCalls = sum(CounterValue) by Computer| where TotalCalls >= 500
The query for the reset alert is:
Perf | where Computer contains "MediationServer" | where (ObjectName == "LS:MediationServer – Outbound Calls" or ObjectName ==
"LS:MediationServer – Inbound Calls") | summarize arg_max(TimeGenerated, CounterValue) by ObjectName, Computer | summarize
TotalCalls = sum(CounterValue) by Computer| where TotalCalls < 500
Now if i go back into Settings > Alerts i have quite a few
Analyze the alerts in your Log Analytics repository
A section i skipped over but will look at now is analysing the log rep.
So CCE OMS TechNet sends me to
https://docs.microsoft.com/en-us/azure/log-analytics/log-analytics-solution-alert-management
Reading this it looks like i should add Alert Management solution to my OMS workspace. Whats Alert Management ?
When you add the Alert Management solution to your OMS workspace, the Alert Management tile is added to your OMS dashboard. This tile displays a count and graphical representation of the number of currently active alerts that were generated within the last 24 hours. You cannot change this time range.
Found this detailing how to add
https://docs.microsoft.com/en-us/azure/log-analytics/log-analytics-add-solutions
Let try and add
Back to OMS Portal click the bags icon on left hand menu
This is the solution Gallery! Wow theres a lot here
Im going to select Alert Management
Click Add
I looked at a few other solutions to add that maybe useful link
Agent Health
You can also then see the solutions added in settings > solutions
Summary
Seems quite a bit of work but once its all set up i can see the power of OMS and alerting for CCE will be awesome.
I just need to hook up a CCE to my OMS workspace and get testing this now which i hope to do very soon so ill update. Cant wait to play around more with OMS and use it for more than just CCE monitoring and management !
Hopefully this will be useful to someone setting it up.
**** Updates 08/12/2017 ****
Alerting Examples
After playing around with the Alerts and thresholds i found at first i wasnt getting any alerts so i had to tweak the settings i had first used as getting 5 errors in 5 minutes didnt seem like it would ever hit a alert.
I had to set the Number of results” is set to “Greater than 0” to get alerts to work.
Here’s an example an email alert from the Alerts i had setup.
As i continue to play more i will update this post.
References
Monitor Cloud Connector using Operations Management Suite (OMS)
https://technet.microsoft.com/en-us/library/mt828598.aspx
Working with alert rules in Log Analytics
https://docs.microsoft.com/en-us/azure/log-analytics/log-analytics-alerts-creating
Alert Management solution in Operations Management Suite (OMS)
https://docs.microsoft.com/en-us/azure/log-analytics/log-analytics-solution-alert-management
Add Azure Log Analytics management solutions to your workspace
https://docs.microsoft.com/en-us/azure/log-analytics/log-analytics-add-solutions
One thought on “Operations Management Suite Setup walkthrough for Cloud Connector Edition (CCE) V2.1.0”