Operations Management Suite Setup walkthrough for Cloud Connector Edition (CCE) V2.1.0

So after the Cloud Connector Edition (CCE) V2.1.0 announcement i wanted to look at , setup and play around with OMS as ive never really looked into this before. so here’s my steps

Started here

https://azure.microsoft.com/en-gb/pricing/details/log-analytics/

Clicked Try for Free

image

So here the question is do you have an Azure Tenant already ? if not you need to sign up for one and enter card details but for OMS im using the free plan but you still need payment details i believe.

I had an existing tenant although as i started to add an OMS workspace it had expired so i had to add a pay as you go subscription but there are free trails out there to be had.

If you already have an Azure Subscription you can log in and start here

this link

Within the Azure portal after i had logged in

I clicked the plus New button

image

I then went

Monitoring > Management

Clicked on Log Analytics

image

Then i was asked to create a new OMS Workspace

More details on workspaces here

I wanted to understand a little on what is an OMS Workspace and the link above helped but this summarized it nicely

A workspace is an Azure resource and is a container where data is collected, aggregated, analyzed, and presented in the Azure portal.

image

I selected create new and entered a workspace name, selected the subscription i was using so yours may be different, i created a new resource group and selected location.

Pricing i was staying on the free pricing tier but there are paid options.

image

Clicked ok and it went off to create.

image

image

image

Awesome!

image

Know i went to all resources from azure portal left hand side menu and found my Workspace and clicked on it.

image

I opened the workspace and for administration i needed the OMS Portal so i click the icon for OMS portal

image

Click OMS Portal

image

Now this opened a blank dashboard it seems so you need to configure this

Click Settings Top right

image

image

Now we need to look at connected sources > Windows Computers

We need to find the Workspace ID and Keys we will configure on CCE

Configure Cloud Connector to use OMS

You’ll need to configure your Cloud Connector on-premises environment to use OMS.

Screen shot for Cloud Connector OMS

*** Please note i found without connecting a source you could not create alerts Sad smile to workaround this for this blog i added my surface pro. I download the Windows agents, installed and entered the workspace id and primary key.

CCE instructions on Configuring CCE side

This part i haven’t done yet as im waiting for access to the CCE in our lab. Update should follow very soon but the steps from TechNet are here for the moment.

*** Updated 08/12/2018  with some screenshots from existing CCE in our Lab, special thanks to Darren Ellis for assisting ****

From https://technet.microsoft.com/en-us/library/mt828598.aspx

  • If you are installing a new Cloud Connector appliance or you want to re-deploy an appliance, follow these steps before you run Install-CcAppliance:

    1. In the CloudConnector.ini file [Common] section, set the OMSEnabled parameter to True.

      Each time Cloud Connector is deployed or upgraded, it will try to install the OMS agent automatically onto the VMs. Enable this feature so the OMS agent can survive the Cloud Connector automatic update.

    2. To configure the OMS ID and key, run Set-CcCredential -AccountType OMSWorkspace.

If you are installing an OMS agent onto an existing Cloud Connector appliance, follow these steps:

  1. In the CloudConnector.ini file [Common] section, set OMSEnabled=true.

Located the CloudConnector.ini file

image

Opened in Notepad and added OMSEnabled=true under [Common]

image

Saved and closed notepad.

2. Run Import-CcConfiguration.

Open PowerShell as Administrator and ran Run-Import-CcConfiguration

We accepted the message to redeployed after we ran the cmdlet. This was on an AudioCodes CCE appliance.

image

We then went to run the next cmdlet to install the OMS agents

3. Run Install-CcOMSAgent.

I was asked to enter the OMS workspace ID first.

I got the OMS workspace Key from my OMS workspace ID and keys from the OMS Portal

image

After i grabbed the ID and Key i first entered the ID and thenm the Key when prompted.

image

We ran the cmdlet and then was asked to Enter the OMS Workspace Key as shown above and then reconfirm.

clip_image001

clip_image001[5]

It then went off to download the latest OMS agent

image

Once downloaded it started to install on first CCE VM

clip_image001[7]

Once installed finished it went onto the next VM

clip_image001[9]

After Each VM and Hyper V Host was installed it looked like below

clip_image001[11]

And thats our existing CCE configured and OMS Agents deployed.

Lets go check the OMS portal and see if the CCE VMs and Host are connected.

I had to wait a little while and refresh the OMS Portal but i now have 5 Conncted windows computers

image

Lets see what they are

Yep all VMs and Host is here

image

 

  • If you want to update the OMS workspace ID or key in a Cloud Connector appliance that has already installed an OMS agent:

    1. To configure the OMS ID and key, run Set-CcCredential -AccountType OMSWorkspace.

    2. To apply the updates, run Install-CcOMSAgent.

       

      For all scenarios, verify that the agents are connected as follows:

  • In the OMS portal, go to Settings -> Connected Sources -> Windows Servers. You will see a list of connected machines.

Now OMS workspace is created and CCE is pointing to our workspace its time to configure OMS.

 

Configure OMS

Back in OMS portal settings

Settings->Data->Windows Event logs, and add event logs for:

  • Lync Server

  • Application

image

You must manually enter Lync Server in the text box. It does not appear as an option in the drop-down list.

image

Click Save

image

Settings->Data-> Windows Performance Counters

Here i clicked the “Add the selected performance counters before adding the new ones.

image

 

Total active calls:

  • LS:MediationServer – Inbound Calls(_Total)\- Current

  • LS:MediationServer – Outbound Calls(_Total)\- Current

Total active media bypass calls:

  • LS:MediationServer – Inbound Calls(_Total)\- Active media bypass calls

  • LS:MediationServer – Outbound Calls(_Total)\- Active media bypass calls

image

i then had a big list of counters to cover the OS and CCE

image

Click Save

I then had to save configuration to move to create alerts.

Create Alerts

First off we need to consider the following

You should consider the following when creating alerts:

  • Make sure the alert is a Number-of-results alert, which is the default selection.

  • The demo queries require that “Number of results” is set to “Greater than 0”.

  • It is recommended that you set both Time window and Alert frequency to 5 minutes.

  • It is recommended that you do not enable “Suppress alerts” for demo alerts.

  • For typical alert scenarios, Microsoft recommends creating a pair of alerts: one error alert and one reset alert. For the error alert, select severity level Critical; for the reset alert, select severity level Informational .

For Alerts they look to be pairs so you have a alert for error state and a alert for reset back to normal which makes sense so you know when its broke and when its back to normal.

I found this Azure documentation on alerts which was useful to read for creating alerts as its not mega easy to start with.

https://docs.microsoft.com/en-us/azure/log-analytics/log-analytics-alerts-creating

I tried under Settings below but couldn’t see how to create, i think they are listed here only once created.

it does mention though

You can create rules in Search and manage them here in Settings.

image

Over to search then

Under Log Search by clicking the search magnifier glass on the left menu

image

I pasted the first query into the box and clicked search

image

Alert button on top menu

image

Now i was in Add Alert Rule Smile there must be a better way but im there

image

Here i copied the sample CCE alerts from Technet link but perhaps noted also an error in one. please read below.

https://technet.microsoft.com/en-us/library/mt828598.aspx

RTCMEDSRV is NOT running in Mediation Servers

Event | where Computer contains "MediationServer" | where EventLog == "Lync Server" and (EventID == 25002 or EventID == 25003)
| summarize arg_max(TimeGenerated, EventID) by Computer | where EventID == 25003

You need to update the server name though as mentioned in technet link this looks for server which contain name mediationserver.

Create an alert pair: "RTCMEDSRV is NOT running in Mediation Servers" and "RTCMEDSRV is back in running in Mediation Servers"

The query for the error alert is:

Event | where Computer contains "MediationServer" | where EventLog == "Lync Server" and (EventID == 25002 or EventID == 25003)
| summarize arg_max(TimeGenerated, EventID) by Computer | where EventID == 25003

image

Clicked Save

image

Next created the other alert in the pair this time it was the reset alert so i set severity to informational.

The query for the reset alert is:

Event | where Computer contains "MediationServer" | where EventLog == "Lync Server" and (EventID == 25002 or EventID == 25003)
| summarize arg_max(TimeGenerated, EventID) by Computer  | where EventID == 25002

On Technet there is an Error as the reset is missing the 2 on the end.

image

Create an alert pair: " Too many concurrent calls in Mediation Servers" and “Concurrent calls fall back to normal load”

The query for the error alert is:

Perf | where Computer contains "MediationServer" | where (ObjectName == "LS:MediationServer – Outbound Calls" or ObjectName
== "LS:MediationServer – Inbound Calls") | summarize arg_max(TimeGenerated, CounterValue) by ObjectName, Computer | summarize
TotalCalls = sum(CounterValue) by Computer| where TotalCalls >= 500

image

The query for the reset alert is:

Perf  | where Computer contains "MediationServer" | where (ObjectName == "LS:MediationServer – Outbound Calls" or ObjectName ==
"LS:MediationServer – Inbound Calls") | summarize arg_max(TimeGenerated, CounterValue) by ObjectName, Computer | summarize
TotalCalls = sum(CounterValue) by Computer| where TotalCalls < 500

image

Create an alert: "CPU usage > 90 or RTCMEDIARELAY stopped in Servers" alert

The query will get all processor usage counter and service stop event from all computers and return one log if either processor usage exceeds 90% or service is ever stopped.

search *| where Computer contains "MediationServer" | where (Type == "Perf" or Type == "Event") | where ((ObjectName ==
"Processor" and CounterName == "% Processor Time") or EventLog == "Lync Server") | where (CounterValue > 90 or EventID == 22003)

image

Recommended minimal monitoring set from Microsoft.

So looks like we need to work this out on our own so ill give it go. I dont if these are correct but perhaps it will help someone.

Lets start with the table first

The following table lists the services that Microsoft recommends monitoring by listing the stop and start event IDs:

image

You need to update the server name though as mentioned in technet link this looks for server which contain name mediationserver or edgeserver

Mediation Server

Service Name – RTCMEDSRV

Please these were added in the examples before from Technet but i think Technet has missed the 2 off the end of the start event ID.

Here they are again

The query for the error alert is:

Event | where Computer contains "MediationServer" | where EventLog == "Lync Server" and (EventID == 25002 or EventID == 25003)
| summarize arg_max(TimeGenerated, EventID) by Computer | where EventID == 25003

The query for the reset alert is:

Event | where Computer contains "MediationServer" | where EventLog == "Lync Server" and (EventID == 25002 or EventID == 25003)
| summarize arg_max(TimeGenerated, EventID) by Computer  | where EventID == 25002

 

Edge Server

Service Name – RTCSRV

The query for the error alert is:

Event | where Computer contains "EdgeServer" | where EventLog == "Lync Server" and (EventID == 12288 or EventID == 12289)
| summarize arg_max(TimeGenerated, EventID) by Computer | where EventID == 12289

The query for the reset alert is:

Event | where Computer contains "EdgeServer" | where EventLog == "Lync Server" and (EventID == 12288 or EventID == 12289)
| summarize arg_max(TimeGenerated, EventID) by Computer  | where EventID == 12288

Service Name – RTCMRAUTH

The query for the error alert is:

Event | where Computer contains "EdgeServer" | where EventLog == "Lync Server" and (EventID == 19002 or EventID == 19003)
| summarize arg_max(TimeGenerated, EventID) by Computer | where EventID == 19003

The query for the reset alert is:

Event | where Computer contains "EdgeServer" | where EventLog == "Lync Server" and (EventID == 19002 or EventID == 19003)
| summarize arg_max(TimeGenerated, EventID) by Computer | where EventID == 19002

Service Name – RTCMEDIARELAY

The query for the error alert is:

Event | where Computer contains "EdgeServer" | where EventLog == "Lync Server" and (EventID == 22002 or EventID == 22003)
| summarize arg_max(TimeGenerated, EventID) by Computer | where EventID == 22003

The query for the reset alert is:

Event | where Computer contains "EdgeServer" | where EventLog == "Lync Server" and (EventID == 22002 or EventID == 22003)
| summarize arg_max(TimeGenerated, EventID) by Computer | where EventID == 22002

 

Now lets look at the second table.

The following table lists the network issues that Microsoft recommends monitoring:

image

Monitor Name

Mediation Server to gateway connectivity failure

The query for the error alert is:

Event | where Computer contains "MediationServer" | where EventLog == "Lync Server" and (EventID == 25061 or EventID == 25062 or EventID == 25002)
| summarize arg_max(TimeGenerated, EventID) by Computer | where EventID == 25061

The query for the reset alert is:

Event | where Computer contains "MediationServer" | where EventLog == "Lync Server" and (EventID == 25061 or EventID == 25062 or EventID == 25002)
| summarize arg_max(TimeGenerated, EventID) by Computer | where (EventID == 25062 or EventID == 25002)

 

Mediation Server to gateway call completion failure

The query for the error alert is:

Event | where Computer contains "MediationServer" | where EventLog == "Lync Server" and (EventID == 25063 or EventID == 25064 or EventID == 25002)
| summarize arg_max(TimeGenerated, EventID) by Computer | where EventID == 25063

The query for the reset alert is:

Event | where Computer contains "MediationServer" | where EventLog == "Lync Server" and (EventID == 25063 or EventID == 25062 or EventID == 25002)
| summarize arg_max(TimeGenerated, EventID) by Computer | where (EventID == 25064 or EventID == 25002)

 

Critical network problems

The query for the error alert is:

Event | where Computer contains "EdgeServer" | where EventLog == "Lync Server" and (EventID == 14624 or EventID == 14353 or EventID == 12288)
| summarize arg_max(TimeGenerated, EventID) by Computer | where EventID == 14624

The query for the reset alert is:

Event | where Computer contains "EdgeServer" | where EventLog == "Lync Server" and (EventID == 14624 or EventID == 14353 or EventID == 12288)
| summarize arg_max(TimeGenerated, EventID) by Computer | where (EventID == 14353 or EventID == 12288)

 

Next its looking at call capacity counters

The following lists the call capacity counters that should be monitored. These numbers should be less that 500 for Cloud Connector standard edition; less than 50 for Cloud Connector minimum edition.

  • LS:MediationServer – Inbound Calls(_Total)\- Current

  • LS:MediationServer – Outbound Calls(_Total)\- Current

  • LS:MediationServer – Inbound Calls(_Total)\- Active media bypass calls

  • LS:MediationServer – Outbound Calls(_Total)\- Active media bypass calls

These i believe were all created in the examples but here they are

Create an alert pair: " Too many concurrent calls in Mediation Servers" and “Concurrent calls fall back to normal load”

To create this alert:

  • The query for the error alert is:

Perf | where Computer contains "MediationServer" | where (ObjectName == "LS:MediationServer – Outbound Calls" or ObjectName
== "LS:MediationServer – Inbound Calls") | summarize arg_max(TimeGenerated, CounterValue) by ObjectName, Computer | summarize
TotalCalls = sum(CounterValue) by Computer| where TotalCalls >= 500

The query for the reset alert is:

Perf  | where Computer contains "MediationServer" | where (ObjectName == "LS:MediationServer – Outbound Calls" or ObjectName ==
"LS:MediationServer – Inbound Calls") | summarize arg_max(TimeGenerated, CounterValue) by ObjectName, Computer | summarize
TotalCalls = sum(CounterValue) by Computer| where TotalCalls < 500

 

Now if i go back into Settings > Alerts i have quite a few

image

 

Analyze the alerts in your Log Analytics repository

A section i skipped over but will look at now is analysing the log rep.

So CCE OMS TechNet sends me to

https://docs.microsoft.com/en-us/azure/log-analytics/log-analytics-solution-alert-management

Reading this it looks like i should add Alert Management solution to my OMS workspace. Whats Alert Management ?

When you add the Alert Management solution to your OMS workspace, the Alert Management tile is added to your OMS dashboard. This tile displays a count and graphical representation of the number of currently active alerts that were generated within the last 24 hours. You cannot change this time range.

Found this detailing how to add

https://docs.microsoft.com/en-us/azure/log-analytics/log-analytics-add-solutions

Let try and add

Back to OMS Portal click the bags icon on left hand menu

image

This is the solution Gallery! Wow theres a lot here

image

Im going to select Alert Management

image

Click Add

image

image

image

image

I looked at a few other solutions to add that maybe useful link

Agent Health

You can also then see the solutions added in settings > solutions

image

 

Summary

Seems quite a bit of work but once its all set up i can see the power of OMS and alerting for CCE will be awesome.

I just need to hook up a CCE to my OMS workspace and get testing this now which i hope to do very soon so ill update.  Cant wait to play around more with OMS and use it for more than just CCE monitoring and management !

Hopefully this will be useful to someone setting it up.

 

**** Updates 08/12/2017 ****

Alerting Examples

After playing around with the Alerts and thresholds i found at first i wasnt getting any alerts so i had to tweak the settings i had first used as getting 5 errors in 5 minutes didnt seem like it would ever hit a alert.

I had to set the Number of results” is set to “Greater than 0” to get alerts to work.

Here’s an example an email alert from the Alerts i had setup.

image

As i continue to play more i will update this post.

References

Monitor Cloud Connector using Operations Management Suite (OMS)

https://technet.microsoft.com/en-us/library/mt828598.aspx

Working with alert rules in Log Analytics

https://docs.microsoft.com/en-us/azure/log-analytics/log-analytics-alerts-creating

Alert Management solution in Operations Management Suite (OMS)

https://docs.microsoft.com/en-us/azure/log-analytics/log-analytics-solution-alert-management

Add Azure Log Analytics management solutions to your workspace

https://docs.microsoft.com/en-us/azure/log-analytics/log-analytics-add-solutions

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s