Operations Management Suite Setup walkthrough for Cloud Connector Edition (CCE) V2.1.0

So after the Cloud Connector Edition (CCE) V2.1.0 announcement i wanted to look at , setup and play around with OMS as ive never really looked into this before. so here’s my steps

Started here

https://azure.microsoft.com/en-gb/pricing/details/log-analytics/

Clicked Try for Free

image

So here the question is do you have an Azure Tenant already ? if not you need to sign up for one and enter card details but for OMS im using the free plan but you still need payment details i believe.

I had an existing tenant although as i started to add an OMS workspace it had expired so i had to add a pay as you go subscription but there are free trails out there to be had.

If you already have an Azure Subscription you can log in and start here

this link

Within the Azure portal after i had logged in

I clicked the plus New button

image

I then went

Monitoring > Management

Clicked on Log Analytics

image

Then i was asked to create a new OMS Workspace

More details on workspaces here

I wanted to understand a little on what is an OMS Workspace and the link above helped but this summarized it nicely

A workspace is an Azure resource and is a container where data is collected, aggregated, analyzed, and presented in the Azure portal.

image

I selected create new and entered a workspace name, selected the subscription i was using so yours may be different, i created a new resource group and selected location.

Pricing i was staying on the free pricing tier but there are paid options.

image

Clicked ok and it went off to create.

image

image

image

Awesome!

image

Know i went to all resources from azure portal left hand side menu and found my Workspace and clicked on it.

image

I opened the workspace and for administration i needed the OMS Portal so i click the icon for OMS portal

image

Click OMS Portal

image

Now this opened a blank dashboard it seems so you need to configure this

Click Settings Top right

image

image

Now we need to look at connected sources > Windows Computers

We need to find the Workspace ID and Keys we will configure on CCE

Configure Cloud Connector to use OMS

You’ll need to configure your Cloud Connector on-premises environment to use OMS.

Screen shot for Cloud Connector OMS

*** Please note i found without connecting a source you could not create alerts Sad smile to workaround this for this blog i added my surface pro. I download the Windows agents, installed and entered the workspace id and primary key.

CCE instructions on Configuring CCE side

This part i haven’t done yet as im waiting for access to the CCE in our lab. Update should follow very soon but the steps from TechNet are here for the moment.

*** Updated 08/12/2018  with some screenshots from existing CCE in our Lab, special thanks to Darren Ellis for assisting ****

From https://technet.microsoft.com/en-us/library/mt828598.aspx

  • If you are installing a new Cloud Connector appliance or you want to re-deploy an appliance, follow these steps before you run Install-CcAppliance:

    1. In the CloudConnector.ini file [Common] section, set the OMSEnabled parameter to True.

      Each time Cloud Connector is deployed or upgraded, it will try to install the OMS agent automatically onto the VMs. Enable this feature so the OMS agent can survive the Cloud Connector automatic update.

    2. To configure the OMS ID and key, run Set-CcCredential -AccountType OMSWorkspace.

If you are installing an OMS agent onto an existing Cloud Connector appliance, follow these steps:

  1. In the CloudConnector.ini file [Common] section, set OMSEnabled=true.

Located the CloudConnector.ini file

image

Opened in Notepad and added OMSEnabled=true under [Common]

image

Saved and closed notepad.

2. Run Import-CcConfiguration.

Open PowerShell as Administrator and ran Run-Import-CcConfiguration

We accepted the message to redeployed after we ran the cmdlet. This was on an AudioCodes CCE appliance.

image

We then went to run the next cmdlet to install the OMS agents

3. Run Install-CcOMSAgent.

I was asked to enter the OMS workspace ID first.

I got the OMS workspace Key from my OMS workspace ID and keys from the OMS Portal

image

After i grabbed the ID and Key i first entered the ID and thenm the Key when prompted.

image

We ran the cmdlet and then was asked to Enter the OMS Workspace Key as shown above and then reconfirm.

clip_image001

clip_image001[5]

It then went off to download the latest OMS agent

image

Once downloaded it started to install on first CCE VM

clip_image001[7]

Once installed finished it went onto the next VM

clip_image001[9]

After Each VM and Hyper V Host was installed it looked like below

clip_image001[11]

And thats our existing CCE configured and OMS Agents deployed.

Lets go check the OMS portal and see if the CCE VMs and Host are connected.

I had to wait a little while and refresh the OMS Portal but i now have 5 Conncted windows computers

image

Lets see what they are

Yep all VMs and Host is here

image

 

  • If you want to update the OMS workspace ID or key in a Cloud Connector appliance that has already installed an OMS agent:

    1. To configure the OMS ID and key, run Set-CcCredential -AccountType OMSWorkspace.

    2. To apply the updates, run Install-CcOMSAgent.

       

      For all scenarios, verify that the agents are connected as follows:

  • In the OMS portal, go to Settings -> Connected Sources -> Windows Servers. You will see a list of connected machines.

Now OMS workspace is created and CCE is pointing to our workspace its time to configure OMS.

 

Configure OMS

Back in OMS portal settings

Settings->Data->Windows Event logs, and add event logs for:

  • Lync Server

  • Application

image

You must manually enter Lync Server in the text box. It does not appear as an option in the drop-down list.

image

Click Save

image

Settings->Data-> Windows Performance Counters

Here i clicked the “Add the selected performance counters before adding the new ones.

image

 

Total active calls:

  • LS:MediationServer – Inbound Calls(_Total)\- Current

  • LS:MediationServer – Outbound Calls(_Total)\- Current

Total active media bypass calls:

  • LS:MediationServer – Inbound Calls(_Total)\- Active media bypass calls

  • LS:MediationServer – Outbound Calls(_Total)\- Active media bypass calls

image

i then had a big list of counters to cover the OS and CCE

image

Click Save

I then had to save configuration to move to create alerts.

Create Alerts

First off we need to consider the following

You should consider the following when creating alerts:

  • Make sure the alert is a Number-of-results alert, which is the default selection.

  • The demo queries require that “Number of results” is set to “Greater than 0”.

  • It is recommended that you set both Time window and Alert frequency to 5 minutes.

  • It is recommended that you do not enable “Suppress alerts” for demo alerts.

  • For typical alert scenarios, Microsoft recommends creating a pair of alerts: one error alert and one reset alert. For the error alert, select severity level Critical; for the reset alert, select severity level Informational .

For Alerts they look to be pairs so you have a alert for error state and a alert for reset back to normal which makes sense so you know when its broke and when its back to normal.

I found this Azure documentation on alerts which was useful to read for creating alerts as its not mega easy to start with.

https://docs.microsoft.com/en-us/azure/log-analytics/log-analytics-alerts-creating

I tried under Settings below but couldn’t see how to create, i think they are listed here only once created.

it does mention though

You can create rules in Search and manage them here in Settings.

image

Over to search then

Under Log Search by clicking the search magnifier glass on the left menu

image

I pasted the first query into the box and clicked search

image

Alert button on top menu

image

Now i was in Add Alert Rule Smile there must be a better way but im there

image

Here i copied the sample CCE alerts from Technet link but perhaps noted also an error in one. please read below.

https://technet.microsoft.com/en-us/library/mt828598.aspx

RTCMEDSRV is NOT running in Mediation Servers

Event | where Computer contains "MediationServer" | where EventLog == "Lync Server" and (EventID == 25002 or EventID == 25003)
| summarize arg_max(TimeGenerated, EventID) by Computer | where EventID == 25003

You need to update the server name though as mentioned in technet link this looks for server which contain name mediationserver.

Create an alert pair: "RTCMEDSRV is NOT running in Mediation Servers" and "RTCMEDSRV is back in running in Mediation Servers"

The query for the error alert is:

Event | where Computer contains "MediationServer" | where EventLog == "Lync Server" and (EventID == 25002 or EventID == 25003)
| summarize arg_max(TimeGenerated, EventID) by Computer | where EventID == 25003

image

Clicked Save

image

Next created the other alert in the pair this time it was the reset alert so i set severity to informational.

The query for the reset alert is:

Event | where Computer contains "MediationServer" | where EventLog == "Lync Server" and (EventID == 25002 or EventID == 25003)
| summarize arg_max(TimeGenerated, EventID) by Computer  | where EventID == 25002

On Technet there is an Error as the reset is missing the 2 on the end.

image

Create an alert pair: " Too many concurrent calls in Mediation Servers" and “Concurrent calls fall back to normal load”

The query for the error alert is:

Perf | where Computer contains "MediationServer" | where (ObjectName == "LS:MediationServer – Outbound Calls" or ObjectName
== "LS:MediationServer – Inbound Calls") | summarize arg_max(TimeGenerated, CounterValue) by ObjectName, Computer | summarize
TotalCalls = sum(CounterValue) by Computer| where TotalCalls >= 500

image

The query for the reset alert is:

Perf  | where Computer contains "MediationServer" | where (ObjectName == "LS:MediationServer – Outbound Calls" or ObjectName ==
"LS:MediationServer – Inbound Calls") | summarize arg_max(TimeGenerated, CounterValue) by ObjectName, Computer | summarize
TotalCalls = sum(CounterValue) by Computer| where TotalCalls < 500

image

Create an alert: "CPU usage > 90 or RTCMEDIARELAY stopped in Servers" alert

The query will get all processor usage counter and service stop event from all computers and return one log if either processor usage exceeds 90% or service is ever stopped.

search *| where Computer contains "MediationServer" | where (Type == "Perf" or Type == "Event") | where ((ObjectName ==
"Processor" and CounterName == "% Processor Time") or EventLog == "Lync Server") | where (CounterValue > 90 or EventID == 22003)

image

Recommended minimal monitoring set from Microsoft.

So looks like we need to work this out on our own so ill give it go. I dont if these are correct but perhaps it will help someone.

Lets start with the table first

The following table lists the services that Microsoft recommends monitoring by listing the stop and start event IDs:

image

You need to update the server name though as mentioned in technet link this looks for server which contain name mediationserver or edgeserver

Mediation Server

Service Name – RTCMEDSRV

Please these were added in the examples before from Technet but i think Technet has missed the 2 off the end of the start event ID.

Here they are again

The query for the error alert is:

Event | where Computer contains "MediationServer" | where EventLog == "Lync Server" and (EventID == 25002 or EventID == 25003)
| summarize arg_max(TimeGenerated, EventID) by Computer | where EventID == 25003

The query for the reset alert is:

Event | where Computer contains "MediationServer" | where EventLog == "Lync Server" and (EventID == 25002 or EventID == 25003)
| summarize arg_max(TimeGenerated, EventID) by Computer  | where EventID == 25002

 

Edge Server

Service Name – RTCSRV

The query for the error alert is:

Event | where Computer contains "EdgeServer" | where EventLog == "Lync Server" and (EventID == 12288 or EventID == 12289)
| summarize arg_max(TimeGenerated, EventID) by Computer | where EventID == 12289

The query for the reset alert is:

Event | where Computer contains "EdgeServer" | where EventLog == "Lync Server" and (EventID == 12288 or EventID == 12289)
| summarize arg_max(TimeGenerated, EventID) by Computer  | where EventID == 12288

Service Name – RTCMRAUTH

The query for the error alert is:

Event | where Computer contains "EdgeServer" | where EventLog == "Lync Server" and (EventID == 19002 or EventID == 19003)
| summarize arg_max(TimeGenerated, EventID) by Computer | where EventID == 19003

The query for the reset alert is:

Event | where Computer contains "EdgeServer" | where EventLog == "Lync Server" and (EventID == 19002 or EventID == 19003)
| summarize arg_max(TimeGenerated, EventID) by Computer | where EventID == 19002

Service Name – RTCMEDIARELAY

The query for the error alert is:

Event | where Computer contains "EdgeServer" | where EventLog == "Lync Server" and (EventID == 22002 or EventID == 22003)
| summarize arg_max(TimeGenerated, EventID) by Computer | where EventID == 22003

The query for the reset alert is:

Event | where Computer contains "EdgeServer" | where EventLog == "Lync Server" and (EventID == 22002 or EventID == 22003)
| summarize arg_max(TimeGenerated, EventID) by Computer | where EventID == 22002

 

Now lets look at the second table.

The following table lists the network issues that Microsoft recommends monitoring:

image

Monitor Name

Mediation Server to gateway connectivity failure

The query for the error alert is:

Event | where Computer contains "MediationServer" | where EventLog == "Lync Server" and (EventID == 25061 or EventID == 25062 or EventID == 25002)
| summarize arg_max(TimeGenerated, EventID) by Computer | where EventID == 25061

The query for the reset alert is:

Event | where Computer contains "MediationServer" | where EventLog == "Lync Server" and (EventID == 25061 or EventID == 25062 or EventID == 25002)
| summarize arg_max(TimeGenerated, EventID) by Computer | where (EventID == 25062 or EventID == 25002)

 

Mediation Server to gateway call completion failure

The query for the error alert is:

Event | where Computer contains "MediationServer" | where EventLog == "Lync Server" and (EventID == 25063 or EventID == 25064 or EventID == 25002)
| summarize arg_max(TimeGenerated, EventID) by Computer | where EventID == 25063

The query for the reset alert is:

Event | where Computer contains "MediationServer" | where EventLog == "Lync Server" and (EventID == 25063 or EventID == 25062 or EventID == 25002)
| summarize arg_max(TimeGenerated, EventID) by Computer | where (EventID == 25064 or EventID == 25002)

 

Critical network problems

The query for the error alert is:

Event | where Computer contains "EdgeServer" | where EventLog == "Lync Server" and (EventID == 14624 or EventID == 14353 or EventID == 12288)
| summarize arg_max(TimeGenerated, EventID) by Computer | where EventID == 14624

The query for the reset alert is:

Event | where Computer contains "EdgeServer" | where EventLog == "Lync Server" and (EventID == 14624 or EventID == 14353 or EventID == 12288)
| summarize arg_max(TimeGenerated, EventID) by Computer | where (EventID == 14353 or EventID == 12288)

 

Next its looking at call capacity counters

The following lists the call capacity counters that should be monitored. These numbers should be less that 500 for Cloud Connector standard edition; less than 50 for Cloud Connector minimum edition.

  • LS:MediationServer – Inbound Calls(_Total)\- Current

  • LS:MediationServer – Outbound Calls(_Total)\- Current

  • LS:MediationServer – Inbound Calls(_Total)\- Active media bypass calls

  • LS:MediationServer – Outbound Calls(_Total)\- Active media bypass calls

These i believe were all created in the examples but here they are

Create an alert pair: " Too many concurrent calls in Mediation Servers" and “Concurrent calls fall back to normal load”

To create this alert:

  • The query for the error alert is:

Perf | where Computer contains "MediationServer" | where (ObjectName == "LS:MediationServer – Outbound Calls" or ObjectName
== "LS:MediationServer – Inbound Calls") | summarize arg_max(TimeGenerated, CounterValue) by ObjectName, Computer | summarize
TotalCalls = sum(CounterValue) by Computer| where TotalCalls >= 500

The query for the reset alert is:

Perf  | where Computer contains "MediationServer" | where (ObjectName == "LS:MediationServer – Outbound Calls" or ObjectName ==
"LS:MediationServer – Inbound Calls") | summarize arg_max(TimeGenerated, CounterValue) by ObjectName, Computer | summarize
TotalCalls = sum(CounterValue) by Computer| where TotalCalls < 500

 

Now if i go back into Settings > Alerts i have quite a few

image

 

Analyze the alerts in your Log Analytics repository

A section i skipped over but will look at now is analysing the log rep.

So CCE OMS TechNet sends me to

https://docs.microsoft.com/en-us/azure/log-analytics/log-analytics-solution-alert-management

Reading this it looks like i should add Alert Management solution to my OMS workspace. Whats Alert Management ?

When you add the Alert Management solution to your OMS workspace, the Alert Management tile is added to your OMS dashboard. This tile displays a count and graphical representation of the number of currently active alerts that were generated within the last 24 hours. You cannot change this time range.

Found this detailing how to add

https://docs.microsoft.com/en-us/azure/log-analytics/log-analytics-add-solutions

Let try and add

Back to OMS Portal click the bags icon on left hand menu

image

This is the solution Gallery! Wow theres a lot here

image

Im going to select Alert Management

image

Click Add

image

image

image

image

I looked at a few other solutions to add that maybe useful link

Agent Health

You can also then see the solutions added in settings > solutions

image

 

Summary

Seems quite a bit of work but once its all set up i can see the power of OMS and alerting for CCE will be awesome.

I just need to hook up a CCE to my OMS workspace and get testing this now which i hope to do very soon so ill update.  Cant wait to play around more with OMS and use it for more than just CCE monitoring and management !

Hopefully this will be useful to someone setting it up.

 

**** Updates 08/12/2017 ****

Alerting Examples

After playing around with the Alerts and thresholds i found at first i wasnt getting any alerts so i had to tweak the settings i had first used as getting 5 errors in 5 minutes didnt seem like it would ever hit a alert.

I had to set the Number of results” is set to “Greater than 0” to get alerts to work.

Here’s an example an email alert from the Alerts i had setup.

image

As i continue to play more i will update this post.

References

Monitor Cloud Connector using Operations Management Suite (OMS)

https://technet.microsoft.com/en-us/library/mt828598.aspx

Working with alert rules in Log Analytics

https://docs.microsoft.com/en-us/azure/log-analytics/log-analytics-alerts-creating

Alert Management solution in Operations Management Suite (OMS)

https://docs.microsoft.com/en-us/azure/log-analytics/log-analytics-solution-alert-management

Add Azure Log Analytics management solutions to your workspace

https://docs.microsoft.com/en-us/azure/log-analytics/log-analytics-add-solutions

Advertisements

Cloud Connector Edition CCE v2.1.0 Released – Adds Operation Management Suite (OMS) support

Over the weekend i noticed thanks to Twitter and Tom Arbuthnot that Friday 17/11/2017 it seems Microsoft have released CCE v2.1 for Skype for Business Online 🙂

Microsoft previously announced before Ignite and SfBO is becoming Teams  we should see co-existence in CCE so it could be deployed alongside Lync Server or Skype for Business Server as currently this is not possible and ive had many requests from customers for this as moving from an existing Lync Server deployment means ripping this out first and then deploying CCE but there was rumors after Ignite that coexistence with CCE will never come even in SfB Server 2019 and it seems this confirmed and not included in V2.1 😦

Microsoft now recommend configuring Hybrid on your existing Lync or SfB Server deployment or looking at deploying a third party appliance which is a bit of a u turn on previous announcement perhaps the shift to Teams has meant this maybe too much work as the focus is on Teams. Microsoft did comment on Tech Community post they will release more information on Hybrid voice so we will have to see. The post is below. This follows a post on the Tech Community saying v2.1 will come mid November check that out here

V2.1 following the Tech Community post focus on Cloud management with Operation Management Suite

Our primary goal for this release focused on improving the ability to cloud manage CCE via the Microsoft Operations Management Suite.

Tenant administrators care about monitoring the state of several indicators when managing CCE:

  • Key services to insure the solution is green and available
  • Hardware utilization for the virtual machines
  • Key statistics such in-bound and out bound calls to allow for fine tuning of resources

Ive been looking for release notes but cant seem to find anything as of yet only the Tech Community posts and community blog posts, highly recommend Tom Arbuthnots Posts below:

Updating to CCE v2.1

Updating option include Automatic and Manual and depends what you are configured for AutoUpdate and your maintenance time window you have configured on your tenant or you may have turned off AutoUpdates or your starting new.

Remember if you have turned off AutoUpdate you need to update to the latest release I think you have 60days before your classed as out of support so you need to schedule this update to stay in support.

I found the support statement

Microsoft supports the previous version of Cloud Connector Edition for 60 days after the release of a new version. Microsoft will support version 2.0.1 for 60 days after the release of 2.1 to allow you time to upgrade. All versions previous to 2.0.1 are no longer supported.

OMS?

Ive heard of OMS but to be honest i dont know alot so looks a have a quick look.

Whats is OMS ?

Operation Management Suite is a collection of Cloud management service entirely hosted in Azure.

Made up of number of services below but for CCE i think it will leverage Log Analytics.

image

https://technet.microsoft.com/en-us/library/mt828598.aspx

OMS Pricing Details

OMSCapture

I wondered what the pricing would be for OMS as its Azure and most things have a price tag but it seems there is a free plan but has limitations on daily limit of up to 500mb and retention period of 7 Days.

If you go reach the 500mb data analysis will stop and resume at the start of the next day.

So it looks like we can have a free offering for OMS or you can pay for the paid plan for no daily limit and up to 1Mponth retention. If you want to keep longer then you can pay.

https://azure.microsoft.com/en-gb/pricing/details/log-analytics/

OMS Log Analytics

Log Analytics provides monitoring services for OMS by collecting data from managed resources into a central repository. So we deploy an Agent to the virtual machines running on CCE and then configure what to look for, set alerts on the data we collect and view and report on the information. Looks nice and easy. Just set, collect, alert and report and seems we can also not only collect from on premises windows or linux, but Azure services and data collector api’s so you OMS can be useful for much more than just CCE if your not using it.

image

https://docs.microsoft.com/en-us/azure/operations-management-suite/operations-management-suite-overview

OMS Configuration and Configuring CCE to use OMS

You will need

  • Confirm Prereqs to use OMS
  • Configure CCE for OMS
  • Configure OMS
  • Create Alerts
  • Microsoft have provided Recommended minimal monitoring set so check it out on the url above.

First you need OMS and have the prereqs you need an Azure Tenant with OMS workspace, CCE v2.1 and Log Analytics new log search configuring.

For configuring on your CCE it depends if your deploying new or upgrading existing both you need to update the CloudConnector.ini file to enable OMS

Then configure OMS and specify alerts and event logs and performance counters

I think its great that you can use Cloud Monitoring and I wonder if SfB Server 2015 will take advantage of OMS as well in a similar way as it was announced with SfB Server 2019 it will take advantage of Cloud services.

Check out the details Config Information here

https://technet.microsoft.com/en-us/library/mt828598.aspx

Further Info and Useful Links

Version Number 2.1.0

Released 17/11/2017

Upgrade Information

https://technet.microsoft.com/en-us/library/mt740656.aspx

Further Information on OMS for CCE v2.1

https://technet.microsoft.com/en-us/library/mt828598.aspx

After upgrades it recommended to Validate the upgrades so check here

https://technet.microsoft.com/en-us/library/mt740653.aspx

Manual Download Link

https://aka.ms/CloudConnectorInstaller

Planning Information for new deployments of CCE

https://technet.microsoft.com/EN-US/library/mt605227.aspx

What is OMS?

https://docs.microsoft.com/en-us/azure/operations-management-suite/operations-management-suite-overview

OMS Pricing

https://azure.microsoft.com/en-gb/pricing/details/log-analytics/