Following ignite there’s a ton and awesome content and session recordings to watch so this today i saw Thomas Binders session on “Understanding Media Flows in Microsoft Teams and Skype for Business” and thought this should be a goodie.
Great session by Thomas Binder and there’s a ton of awesome information and tips on media flows and understand media / transports relays and the difference between Skype for Business and Teams. Its amazing just how much happens under the hood that users never see just how SfB and Teams finds the best media path, codecs to set up and have a best quality call possible with client connected everywhere. Towards the end great tips on tools to use to read logs and traffic and troubleshooting.
Hot TIP with teams logs towards the bottom of the highlighted Yellow is how to format Teams logs to noted with line breaks “\r\n this is line break so replace with “ “
Thank you Thomas for this great session! there was a lot of applause at the end and well deserved!
Reference URL – https://www.youtube.com/watch?v=aD5mUg2ZzLQ
Done this session a couple of times for SfB before and opens questions the audience
- understand traffic peer to peer,
- great to have local internet breakout and not all traffic to central locations,
- stress UDP ports 3478, 3479 these are critical
Not taking about signalling, its all about media
Candidate is combination of IP and port and allow other peer to connect
ICE – uses two techniques, STUN and help to transverse a net device, TURN – relay technique. two types of relays., media relay and transport relay.
Two endpoints that need to communicate
First they need Signalling to say “Hey I’m here”
Here we have signalling via Office365
Call could be audio, video or desktop sharing
If they want a call we want to send as direct as possible, they could be in same site or same office or across floor but the network is directly routable.
They have devices that don’t allow direct calls.this is a problem.
Then theres Charlie’s, outside the network as well
Firewalls also may not allows direct communication from external clients on internet to internal clients. Charlie to Alice
Now we need some logic that helps to establish all the different call flows
lets break down
NAT – Network Address translation
Example at home you can have lots of different devices, Xbox’s, PlayStations, pcs with internal ip address all sharing a single public ip address. Your router does the NAT. Great as it provides security as well as unknown traffic to your ip would get dropped is not requested.
- Control traffic that’s coming
- Additional features, deep packet inspections and proxies
- Sharing of IP Addresses
HTTP proxy servers
Now HTTP proxies
- Bad for Teams and SfB as doesn’t allow UDP only HTTP will always use TCP
- UDP preferred for real time
- may corrupt packets
- block traffic or slow down
- real time may not be real time if any latency added
The solution is ICE, STUN and TURN!
First there’s signalling that goes via the Cloud
- For SfB signalling is done via SIP
- For Teams is not SIP its REST API via https and web sockets for more persistent comms no more sip
In terms of ICE very similar
- Now we have STUN and TURN server these are servers and function as a relay if client wants to talk to someone but cant it can use stun and turn server as relays
- also same time helps us find our public ip address and will allows net to allow incoming traffic
- client sends packet to relay servers and allocates candidates and sends back packet and knows my public ip is this and then client knows this is my public ip and maybe i can accept traffic there
- Calls to PSTN via Office 365 uses ICE
- ICE used for all real time modalities
- Teams we upload files to OneDrive for Business
Relay – very important for ICE negations
Two types of Relay
- Media relays
- transport relays
Media relay component built for Skype for business server and was the edge server and was moved to the cloud but wasn’t built for the cloud so a cloud solution was born
Transport relay built for scales and more flexibility
Media relay static in one DC, if your in Orlando and media relay in Europe traffic travels back to Europe to use the relay.
Transport Relays – much smarter and uses dynamic discovery via anycast
travel to orlando i can use transport relay in the US not Europe.
Important for local internet connections as you may not be able to take advantages of the transport relay and keep traffic local.
View the other two ignite sessions as well
- Media relay same UDP ports
- Transport relays uses different UDP port per workloads
Skype for Business uses Media Relay
Transport Relay in progress with SfB but is in use with Teams
Teams always transport relay!
- One IP for all Anycast servers
- and closest servers is always used with least hops
- based on endpoint location and privacy boundaries
- US government cloud uses only US
- Tenant in EMEA
- all traffic encrypted with Key
based on ECMP and how can easily distribute load
super easy to manage
5 phases of ICE
1. request credentials
2. candidate discovery – once i know where i can be reached i send to client
3. candidate exchange and try to establish connection
4 connectivity Checks
5. candidate promotion selects best media path
Sign into service, from signalling learn a relay configured for me
SfB Online using Media relay or Lync 2010, Lync 2010 always uses media relay
Option 2 SfB Online, Lync 2013 or never
Teams always uses TRAP!
Shows different sip dialogs and left SIP header and on the right the details
Look for MRAS
First incoming 200 OK – in band provisioning
Learn Audio ports range
Interested in MRAS, here we have a relay configured. Office365 should always have this!”"
Next Service request and there is a relay configure with credentials
Valid for 480 minutes – 8 hours (SfB)
Teams valid 24 hours
Next Credential Response
Here’s the credentials and used its own certificate to create this and if relay used it will present this
Media relay list
Learn what media relay is, username and password and ports to use
Only one relay listed and Office365 will only show external media relay
That’s was for SfB but for Teams its more tricky!
For teams there’s is no nice tools to read logs, all traffic is https and sometimes web socket. You need to trust the certificate and it does a man in the middle attack.
Charles web proxy, Charles has a sequence view and structure view
address is not a fqdn its an IP Address, different to media relay
Just tell the IP directly so faster
- Now i need to discover my ip addresses
- first candidate is always the local interface address
- then ask the relay and allocated candidate for me
- and then relay sends its candidates
then the same for TCP
Always prefer UDP but can use TCP as its better than no call at all!
3478 no matter the workload in TEAMS at the moment! 4478 listed above should be 3478 mistake on slides here
Some SfB workloads always use TCP! 1:1 file transfer and desktop sharing via RDP
- send message to peer i want to talk to
- then other endpoint will do the same with where they can be reached
- then person picks up and this is the endpoint were talking to.
Lets look at these logs
back to snooper
We can see here Martin calls Thomas by the invite
we can see this was an audio call and the candidates
scroll down and there’s more information
we can see the codecs Martins supports
let look at the candidates again
first one are 1 and 1 candidate come in pairs, one for RTP and one for RTCP
Then priority – the higher the number the more i want to use this candiate
Then IP Address
This is this IP of this actual candidate
here we have host and we know this is the local ip address of the endpoint!
there are other interesting types
there srvflx raddress this is where i send a packet to the relay and the relay says the address is the following
then the ip address matches host address and relays says when you send messages from 192.168.1.110 the address is coming from 126.96.36.199
then relay address
if i can establish direct connection or srvflx address other may be able to talk to my relay address
also IPv6 candidates
TCP passive and active candidates
TCP passive will be able to received traffic as well, active and passive will match each other
now theres session progress 183 session progress – back from called progress and here my information
There two here but Why ?
We see one from Skype for Business
and the other coming from SfB but an android Phone
user has more than one device we establish media session with all of them
now incoming packets there are no more pairs
here we have rctp mux (multiplexing) so i send old version and hey i know the new version as well.
and another thing that’s interesting is the encryption, so we can see hear cypto and suite and key this is how the two endpoints encrypt the traffic they will via the secure signalling channel and let each other know which cipher and only the two endpoints know how to encrypt the traffic, the relay never sees this and just passes them on.
MRAS allow endpoints to allocate candidates
No encryption of traffic
Now each one know where the other one can be reached and will determine all possible udp and tcp ports pairing
IPv4 and IPv6
For SfB relay can bridge TCP and UDP, is SfB can only talk TCP and the other UDP and TCP the whole call needs to be TCP.
In Teams one can talk UDP and the other TCP and the relay will translate
We found out which candidate pairs work and prioritising and most optimum and that’s the one we use for the call
we can not see this for snooper or Charles
After other person picks up and identified best candidate and then we can see which one
IPv4 over IPv6
UDP over TCP
Prefer more direct path
See re invite in logs and there’s only one candidate that will be used for this call
TCP very good protocol as it protects against lost packets and lost information, if i send packet i will get acknowledge and if i don’t get it i wait then resend the packet but this times time and in real time comms we want to make sure traffic sent gets there as fast as possible, we don’t like lost packets but packet may contain 20ms of voice you may not hear that and codecs are smart and can recover
TCP adds lost packets , delays and can cause
UDP fire and forget approach ideal for real time communications
lets look at final candidates
before that lets look at teams candidates
In Charles search for a=candidate
its one super long line !
\r\n this is line break
copy and paste into text editor and replace \r\n with line breaks and this gets you the below
not super nice to read but
scroll down and we can see info on codecs
look similar BUT
based on relay candidates it will use ports based on workloads
here we have 3480 not the high ports
other interesting thing all relay candidates will come with MTRUN ID this is security and who can access my service, in SfB we use the huge port range and when someone wanted to allocate we randomly picked one and gave some security and was opened for short time, it was additional, but if we use the same port for connections they can go there but they can as they need a MTURN ID to connect to that port.
back to snooper for final candidate for SfB
search for a=remote candidate
contains 1 candidate
and its the prflx candidate mean relexivate and who ever im talking to they are talking to my net device and relexative, IP the same as the reflexactive but port is different.
if we look at 200 OK
we can see here remote candidate is the relay, this client is talking to the relay.
we have talking from the calling person to the relay of the called person and theres one relay in the media path. we can understand how traffic is flowing.
mentioned before we have 1:1 call we want to send as direct as possible, different if meeting as the cloud needs to mix
We have two SfB clients and there own relays with 443, 3478-81
Both connect to relay allocated candidates port 433 TCP or 3478 UDP, for udp it will then be redirected on workload 3479 for audio
next try to establish direct call as best option
same time they try to talk via the relay
and now the calling client try’s to connect to called clients relay on the 50k port range as that was candidate allocated for me
then we do the same for the other relay
If all work then Fantastic and we can pick direct
if direct doesn’t work we pick the relay of the called client or if that doesn’t work we use the calling client relay
and if both don’t work then the relays need to talk to each other! this is why its still useful for SfB if the 50K is still open! if you have 50k port range open then calls can establish for one relay if you close 50K port range as Microsoft recently said its not required anymore then you have two media relays in the media path
Looked at the difference for quality if you close 50K and its not that big of difference, calls setup may be quicker, if you don’t have them open its seems not essential BUT if they are already open then no reason to close them.
they connect to relay on 443 TCP or 3478 UDP, they connect to their own relay always talk 3478
Then the other one via relay on 3479 – 3481 depending on workloads
other relay will be tested
and if all of that doesn’t work they could still talk to each other
SfB and Teams side by Side for 1:1 (Peer to Peer)
SfB – Client to Service
Mediation server or conferencing server
Mediation servers on right side as its internal to network
Client talks to its own relay 443 tcp , 3478-81 UDP
server does the same
now the client will try and talk directly to the server and if not firewalled this may be possible but cant be guaranteed
If it doesn’t work then we would use relay of called endpoint which is the servers
If that doesn’t work we can talk to the realy of the end user
you should not see two relays as the 50K port range is open as ports the cloud service
Teams: Client to Service
Teams client allocated candidates
The service will never allocate candidates as we know the service can talk its relays, it doesn’t needs its own relay
again we try direct connection, if direct works
The Teams client we talk to assigned transport relay and the service component will talk tot the same relay
Bring that all together! in single table
Left we have workloads, allocate candidate, audio, video, desktop sharing
Teams, SfB, service port media relay, transport relay
SfB Client port while i allocated candidates will honour client ports per workload, and all of this if i can have media relay to 3478 UDP or transport relay also to 443 tcp and be redirected and once sfb establishes audio send to 443 tcp / 3478 udp , transport relay 3479 udp for audio.
Teams client source port will always 1024 and up plan to change this and have similar to SfB so you can look at traffic and see what workloads
Teams client to transport relay it will be UDP 3478 always and plan to change this and you can look at source ports to destination port. still working on this.
Direct is required, every client needs to connect direct to Office365 so they can establish media path, talk direct to transport or media relay
- no proxy
- no shaping
- no deep packet inspection
- If possible use local internet breakout and go to shortest route to transport relay and route over Microsoft network.
- Prefer UDP over TCP – better for real time
- TCP can be used as backup and in SfB used for some scenarios
- Important to look at documented list of IPs and FQDN to open environment to
- quite a list and is updated a lot, subscribe to RSS feed!
- Open UDP ports
If people have SfB a year ago for media open 443 (not changed) or 3478 but in the past we didn’t need 3479-3481 UDP these may not be open
Problems seen with transports relays and client try 3478 and works and then allocate candidates and talk to this IP BUT on port 3479 or 80 or 81 which could be blocked. Firewall may block this and UDP will FAIL ! now media will go over TCP! no one will call and say calls wont work but quality may be worse!
Be sure all UDP ARE OPENED!
Skype for Business Hybrid you need your on premises servers needs to talk to Office365 they don’t need the new ports 3479-3481 just for client to service.
Edge server will still talk 3478
Tools and Troubleshooting
SfB super easy! Uccapilog.log and have snooper
Teams – not so easy!
Need to do trick with local proxy, man in middle attack, collect traffic, examples are fiddler and Charles proxy.
SfB turn on logs
may need to delete logs, sign out and sign back in, start with clean logs
when reproduce problem and you want to see a=candidate sometimes after someone answer it may take 7-10 seconds for this so recommend to leave call running for 20 seconds then disconnect calls so can makes sure final candidates are there.
Reason for that is when other person picks up we may not do call over optimal candidate, in background may be talking for better connection and then switching to better connection.once final candidate pair is listed it wont change.
Tips to configure, web sockets can be very persistent and in test had hard time to capture them each time and then close Teams and start and sometime would see it and sometimes not.
How teams does it today but it may CHANGE!
Also CQD Call quality Dashboard, after every call over signalling it logs the call quality experience, IPs, ports,
can look at data and create filters and look at UDP calls and TCP calls, shouldn’t see a lot of TCP calls,
Practical guidance on CQD.
Filters created on this example as below
then report created
lots of TCP but that’s on App sharing so that’s expected in SfB
very few session using VBSS and it seems a lot of RDP going on, could be giving control or old clients.
you can investigate client types and check if client support only RDP
Other report with filters applied on the left
subnets replace to hide customer data
can compare subnet by number of TCP and UDP
find top offending subnets and find out why so much TCP traffic
Test that ports are open
SfB network assessment tool send real media to transport relay and collects information on jitter, delay and packet loss.
However SOON new version will be available to test connectivity for TCP and UDP ports! run from pc and find can it connect to required ports
Tests all the ports against set of IPs and downloaded at run and always up to date IPs, any connectivity issue this tool is great to run on PC and test connectivity
might be situations where connectivity is working but something in the way may corrupt packets
IF the tools worked then perhaps trace a call
Resources and summary
- Now we understand the challenges
- find most optimum media path
- use tools
- Traffic peer to peer
- client to server
- Leverage local internet if possible
- Open 3478-3481 UDP on firewall !