For this post I’m going to scratch the surface of the relatively new Microsoft product, Operations Management Suite (yea, MOMS) . And while I will be using Skype for Business as the example this post is more of an introduction to the OMS product itself.
What is Microsoft OMS?
Microsoft Operations Management Suite is somewhat difficult to explain. It’s a mixture of monitoring, auditing, log aggregation, and automation that is targeting both on-premises and cloud infrastructure. OMS fits into the same category as other log collection and monitoring suites, such as ELK, Splunk, and SCOM.
The most obvious and interesting comparison to be made is with SCOM. Right now, OMS does not fully replace the functionality of SCOM but rather is used to compliment and extend it as they can be easily integrated.
There are plenty differences between SCOM and OMS, some of the most important being:
- There is minimal skill and time required to setup OMS — I had it setup and providing value within 15 minutes
- Administration is much more simple — you don’t need multiple skilled sysadmins to just keep it running)
- It is only offered as a cloud solution — no on-prem option is available
- It’s incredibly fast compared to SCOM — which is generally a bit sluggish
- There’s a beautiful mobile app for it
- Can be easily and seamlessly used to monitor both on-prem and cloud infrastructure
OMS is free for up to 500MB/day. All you need to get rolling is an Azure account since OMS does require it to be associated with an Azure tenant.
To begin, you need to create your OMS account and associate it with your Azure account. You can do this by going here.
Once the OMS account is created, you will need to install the OMS agent on the server to be monitored. In this example, I’ll be installing the agent on an on-prem Skype for Business server.
Start by logging to OMS. Once logged in, go to settings by clicking one of the gears
Then go to Connected Sources and then Windows Servers. Click Download Windows Agent (64-bit). Keep this window open as you will need the other information soon.
Copy the agent to your server. You can run the installer as you would any installer, or you can install it via PowerShell — either way you will need the Workspace ID and the Primary Key
To install via command line:
MMASetup-AMD64.exe /Q:A /R:N /C:"setup.exe /qn ADD_OPINSIGHTS_WORKSPACE=1 OPINSIGHTS_WORKSPACE_ID= OPINSIGHTS_WORKSPACE_KEY= AcceptEndUserLicenseAgreement=1"
And to install via MSI, do the normal steps of clicking next a number of times but do make sure to connect the agent to OMS
On the next screen , add the Workspace ID and Workspace Key which was in the same place as the agent installer within OMS settings.
If you are curious, you can check out the new entry within the control panel called Microsoft Monitoring Agent
This is where you can see all of the settings for the client such as the connection parameters for OMS and whether or not it’s talking to SCOM.
This is the only configuration required. The endpoint should now be reporting into OMS. On the main dashboard it will show up as a connected source. Since I have 3 servers currently reporting to OMS, it lists 3 connected data sources
Now we’ll tell OMS what data we are looking for. In this case we want to grab certain Windows event logs and some performance data. Go to Settings->Data. Now click on Windows Event Logs
This is where we will specify which event logs we are looking for.
Let’s say I want to gather all events from Microsoft/Windows/All-User-Install-Agent/Admin as seen here
To add choose that specific log, just start typing the name of each item, separating the levels with ‘-‘. It will autocomplete the entry for me
That makes it really easy — when it works. But no auto-completion doesn’t necessarily mean the log is invalid. In this case, I want to add the Lync Server events. This did not auto-complete.
I just typed it, clicked the plus sign, and checked the boxes to collect all Error and Warning events from Lync Server
Now we are now collecting those event logs. Now let’s collect some performance counters as well.
Click on Windows Performance Counters.
It will suggest some basic counters to get you started.
Auto-complete works well here as well — so if there are additional counters just start typing the name (memory, system, processor, etc…) and you will likely see what you are looking for.
Set the interval at which you would like to collect the data and then we’re done. I left mine at the default.
Log Searches and Dashboards
Alright, now that we are collecting the data, it’s time to do something with it. This is where Log Search comes in.
Use the magnifying glass on the left bar and you will be brought to a search bar.
The auto-complete works incredibly well here. In fact, they show you right away which format to use and give you samples
They really do make it as easy as possible to get started. However, instead of just learning how to search I found it much easier to look at a dashboard and then get the queries by just utilizing pre-built filters.
Under the magnifying glass on the left-hand navbar is the dashboard menu. It will start blank, let’s add a pre-canned display. Click the gear icon
This will bring up the filters which you can add to the dashboard. Underneath the Log Management category, add “All Events” by clicking the plus sign next to it
Click the customize gear again to get out of dashboard edit mode.
There will now be a All Events tile on this dashboard
Go ahead and click on the new tile to get details on all events. You can see filters on the left, events in the main, and a search bar at the top.
Clicking show more will show more details of the event and clicking on nearly anything there will apply another filter. As you select more filters you can look at the search bar and see the query that is being run. This is great for learning the syntax.
For example, I’m going to click “error” on the left to only filter for events in that category. I’ll also switch to the table view rather than the default list view
Now this applies another filter which is visible in the filter bar. We are selecting all data which belongs to the Event type and only those which are categorized as errors. Selecting the table view shows a more compact view.
The syntax for this is very straightforward
We could also look for trends by using the minify view
In this scenario, I able to quickly see which errors are reported the most often. This is an incredibly powerful method for discovering systemic issues.
When I was looking through the errors, I noticed a potentially important error. The mediation server was not receiving a response from the session border controller after sending SIP OPTIONS. This could be indicative of an issue or it could be transient.
This is a great opportunity to create an alert with a threshold. If I click on the Show More button to expand the error I can see all kinds of different fields. One of those fields is called Rendered Description and it gives a good description of the issue
Now what I can do is simply add the text to my query (adding the computer name is optional).
Event EventLog="Lync Server" EventLevelName="Error" Computer="LYNC.home.lab" "no response from a Trunk to an OPTIONS request sent by the Mediation Server"
Now I want to make use of this data and create an alert
This is where you configure the details of the alert. I’ll give the alert a name, set it to send an email, set the subject of the email, the content of the email, as well as the exact parameters of the query.
Additionally, for this example I’m going to run this check every 5 minutes, and generate an alert based on just once instance of match on this query.
Now if I hit save, it will activate the alert. You can see all active alerts by going to Settings-> Alerts
From here you can turn alerts on and off, edit them, or remove them entirely. I’ll leave this one on.
I’m going to use this same query to do something else useful: create a dashboard item. Currently the only dashboard tile I have is for all events. I’m going to enter that query again in log search, but this time I’m going to hit save
This will bring up a new menu which will ask you to enter a name for the saved query as well as a category. I’ll use the following
Now if I go back to the dashboard section in OMS I can go to customize and I’ll see a new category called PSTN Connectivity Monitoring with a single saved query underneath it — No Options from SBC. All I need to do is click the plus sign to add it to my dashboard
I could then keep building this out until I have a complete dashboard. I just added a couple more as an example
At this point I can now look at this dashboard and see a high-level overview of the health of that system. Additionally, alerts are being sent for the specific logs that I really care about.
OMS also has a marketplace for all sorts of different modules. All that I’ve shown is the most basic log gathering and alerting but there is much more than that. There are solutions for change management, security and auditing, update tracking and orchestration and more.
Adding a solution is as simple as clicking on it and clicking Add.
I haven’t played with many of these yet, but I’m excited to see what kind of solutions are released.
I’m just getting my feet wet with OMS. So far it feels like it is a bit half-baked and limited in its abilities. There are not a ton of solutions in the marketplace yet but many of them are labeled as “Coming Soon”. It also does not seem to be particularly customizable at this point. But all of this is expected given how new to the market it is.
However, it has enormous potential. The ease of use, speed, and cloud focus is going to make it a key player in this market.
I’m really excited about this service and it’s definitely something worth keeping on your radar.