cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post

An Overview of Network Telemetry

Level 9

Network monitoring has relied historically on SNMP as a primary means of gathering granular statistics. SNMP works on a pull model. The network monitoring station reaches out and pulls value from OIDs, and then reacts to that data. There are also monitoring options where a network device pushes statistics to data collectors such as network management stations, flow collectors, or syslog engines.


In researching Ethernet switches, I've run across the term telemetry that describes datasets coming from these devices. Vendors are positioning telemetry as if it is some new feature that you need to be on the lookout for.


So, is telemetry something new? In digging through vendor literature, watching presentations, and talking to one of my vendor contacts specifically, I’ve concluded network telemetry represents both old and new forms of network statistics, and new ways of gathering and exposing data.


First, in presentations, it's clear that some networking vendors use the term “telemetry” generically. As they work through their demo, they display, for example, sFlow and syslog data. Those are not new data formats to network engineers. We know flow data in its various formats, including sFlow. We also know syslog well. And we also know that those formats typically contain information pushed to our data collectors in real-time or near real-time.


However, I do think that for some vendors, telemetry is more than a fancy way to describe the same old data. For instance, Juniper Networks shared several facts with me about their Junos Telemetry Interface that are a bit different than what network engineers might be used to. Here are the more relevant points:


  • Junos telemetry is streamed in a push model, like syslog or flow data.
  • Juniper uses Google's Protobuf message format to stream the data. Protobuf is interesting. The big idea according to Google is to define your message format and fields, and then compile code optimized to read that data stream. This means that Juniper doesn't have to shoehorn telemetry into a format that might be ill-suited to the data. They can build their structured message format and optimize it however they like, and extend as they go.
  • Juniper is not exposing every conceivable value via their telemetry interface (which proprietary SNMP vendor MIBs tend to do). Rather, they’ve focused on performance management data: I/O & error counters, queue statistics, and so on.
  • The Junos telemetry interface is open to anyone that wants to parse the data. Therefore, any vendor that wishes to create a custom application for end users could work with Juniper, get the data format details, and go to town.


Other vendors that come up when talking about telemetry include Cisco with their ACI data center fabric, and Arista with the telemetry interface in their EOS operating system. While I don't have specific details on how Cisco and Arista telemetry interfaces might differ from the Junos telemetry interface, they all seem to emphasize the near real-time pushing of descriptive network data to a collector that can aggregate the data and present it to a network operator.


So whether the term telemetry is being used generically to mean "data from the network" or specifically to mean "pushing specific network metrics to a data collector," I believe it's a term we're going to see used more and more.


While the data gathered via telemetry might be familiar, I believe the method used to gather the data, as well as what's being done with that data, is where the magic lies.


This begs another question. Could network telemetry be the end of SNMP? While my crystal ball remains murky, I believe SNMP has a long run still ahead of it. To supplant the familiar and ubiquitous SNMP, vendors will need to get their heads together on just exactly what this new telemetry format should be.


From what I can tell looking at just three vendors — Cisco, Juniper, and Arista — network telemetry is implemented differently for each of them. Differences slow technology adoption, as the variant solutions place monitoring vendors in the unenviable position of having to pick and choose which telemetry solutions to align themselves with.


Whatever SNMP's shortcomings might be, all you have to do is sort out the OIDs. The industry has already agreed upon the rest.


22 Comments
MVP
MVP

Actually I believe there is a place in the world for both.

Some data lends itself well to telemetry based models...while other data is still well suited to an occasional poll based model.

So in the case of telemetry, interface rates, and other critical traffic based info would be a good fit and gives you more of a near time view of the environment.  Physical device metrics, temperature, power, etc as well as other metrics could be poll based.  Ideally you the customer would be able to determine while bits of information is sent as telemetry and then poll whatever else you need.  It could reduce traffic, especially in the case of switches with large numbers of interfaces and having to pull the interface table.  Once you get a baseline, it only send updates to the appropriate metrics as they change so yo are not getting full datasets (tables) each time.

Level 11

‌I agree.  There is a need for both, I see push data expanding.  I like the idea of juniper telemetry, I would like to see Cisco expand there's.

MVP
MVP

Agreed, there needs to be both or at least SNMP should remain.

I've not heard the term telemetry being used other than in motor racing

MVP
MVP

It is used a lot in other industries...public water systems, oil and gas industry, hydrologic monitoring (river and lake levels), national weather service (monitoring of various sites for temp, wind, humidity, etc.) and so on.  NASA uses it extensively on space missions.

Level 14

Always looking for another method of gathering network statistical data.  What Juniper has delivered looks promising.

Level 17

I think SW gives me a lot of the Telemetry that we need! Specific measurements and polling of neighbors and other end/connected devices allow for one device to alert on the other that is currently inaccessible  

Telemetry is an automated communications process by which measurements are made and other data collected at remote or inaccessible points and transmitted to receiving equipment for monitoring. The word is derived from Greek roots: tele = remote, and metron = measure.

Level 16

‌If that "push" model will take over snmp.

solarwinds need to start rebuilding Orion now 🙂

They are lucky that they have "the parts on the shelf "   Its should be Linux VM with LEM engine - flash + API

MVP
MVP

It sounds like eventually we could all be calling any type of data collecting, telemetry.

Level 12

I am in agreement with you Jfrazier!  Need for both exists and will until something better comes along, but the adoption will be a long time coming, if at all.  SNMP has been around for 40+ years and has it's good and bad points.

I like what SW has done with their products and incorporating all of the players, SNMP, SNMP-TRAP, WMI and added the NPM agent to assist with attaining full insight into a server/device and how it functions.  Having alternative methods to monitor is very intuitive and the best available option for monitoring and reporting on and in all OS and hardware vendor worlds.

MVP
MVP

You could simulate a telemetry lite sort of thing with SNMP traps.  The drawback is that you can't pull data from the trap and update a component with said value.....easily.  I can think of some kludgy ways, but Orion needs to be able to update components and custom property values from traps and log file entries.  If the trap receiver/alerting engine were merged, then that would be a possibility and allows the product to do more. 

Level 12

The other contingent upon activity being current and factual for SNMP/SNMP-TRAPs is that they are UDP based and thus not guaranteed delivery.

Totally agree with the integration and it appears that there is a possibility of this moving forward as it is on the "Road Map".  Unfortunately, nothing on the "Road Map" is guaranteed to happen, so we will just have to wait and see....

Our hospitals' Biomed groups have been using "telemetry" for (it seems like) 50 years to track patient health status via radio.  They still refer to anything that provides patient health status in real-time as "telemetry".

Health Systems have had serious challenges to keep and use a viable radio signal in various frequencies, some reserved/protected, some not.  When I saw their vendors moving towards 802.11b I tried to give them good advice:  don't do it!  That was back when any script kiddy could shut down communications to an Access point with a flood of DeAuth packets. 

Unfortunately the vendors follow the money, not my advice.  Just because we've upgraded WLC code and AP hardware/code to make it more challenging to kill an AP remotely doesn't mean it's a great use for "telemetry".

Changing the meaning to include snmp-like data, or to make it mean anything over wired or wireless, isn't significant in my opinion.  Finding a way to make that data safe, predictably reliable, and 100% secure will be a significant change that might be worth coining a new term for, whatever it may be.  "S-Telemetry" (for "Secure")?  "Sec-Tel"?  Something more intuitive and catchy?

Level 21

I have read a lot of discussions that touch on several of the subjects being touched on here; SNMP being antiquated, the need for a better data collection model, etc.  I think instead of re-inventing the wheel it may make more sense to apply a different approach to an already existing technology (or set of technologies).  The pony I would pick for this task is Logs. 

I imagine a system where logs are sent out from the different end-points and then parsed by management systems which can include numeric values, classification information, instructions, etc.  This data can then be presented in pretty much any format based on the type of log being received.

The benefits of using logs include the following: just about every device and/or system is already capable of logging, logs are a push based technology, logs are relatively low overhead and parsing logs is easy and well understood.  Imagine instead of pulling SNMP every 5 minutes for something such as CPU utilization just having the device send a log that specifies the log is CPU Utilization and then including a numeric value that is sent to the database to be plotted as a CPU value for the device that can then be later graphed.

Now I realize that this isn't significantly different than SNMP traps; however, going to logs would allow a more open format for vendors (like they do now) to use pretty much what ever syntax they want and as long as the management system can be told how to parse them via some type of connector then the system will work.  It would also allow for SNMP to go the way of the Dodo Bird.

This is just my thought and I am sure I haven't considered every angle on the issue.  I am always interested to hear what others have to say on these topics so thanks for sharing!

MVP
MVP

Interesting theory. Only downside I see with using Logs, the device will have to send out every bit of information in the log file or it would need to be configured to just send out the logs for the information you wish you see such as CPU utilization.

There may be overhead issues if the device has to send out every single bit of information it has. If not, then this could be a good option.

Alternatively if the devices need to be configured to receive certain bits of information, then more people may need full access to it or have to pay each time you wish to monitor something else. Currently we get any statistics we want for our routers that are managed externally all via SNMP. But if the external company needed to make changes each time we wanted to log something else, it could become a costly exercise (as we have to pay for changes).

ethan.banks‌ makes some good points and Jfrazier‌, rschroeder, and byrona provide good data, but at the end of the day we can have SNMP, WMI, or whatever protocol/processs or "Telemetry" that would be of no use if there was not a good way to display the data so that all levels of the IT organization can use and react to it.  Executives want a view, Engineers need a separate view, and our honorable Service Desk professional need theirs.  I believe that SW is making strides along with all of our dedicated Thwackers in helping the masses take the data, easily configure views, and display so that value is given to the data and our lives go on......

~~~ still thinking on this one.....

Level 7

Are there any new standards being developed for this streaming telemetry?

Level 7

You have touched a very important point about visualization. Available solutions today give only a small part of the end-to-end picture. Would you say that there is something at the top of your mind that you would love to have?

Level 13

Lets face it, the more tools at our disposal, the better. we can then implement the ones that suit our individual environment. My concern is and always is, why can't manufacturers get together to ensure consistent standards and interoperability.

MVP
MVP

Because that will limit the amount of money they can continue to get.

If you have to buy another tool of theirs to use their metrics the better for them...then you focus on one vendor...them.

We the consumers want consistent standards and interoperability, but the vendors business is to sell software...consistent standards and interoperability doesn't always make them money.

That is why you see suites of products...you'll have it at a vendor level but not always at an industry level.

Level 17

The moment a vendor uses an industry standard rather than the proprietary model they have developed the door is open to competition. No more do you need 'their specific solution' and now any tool works... very bad for future sales, but very good for market competition and other smaller 'market forces' to enter the game and offer solutions. When a vendor develops a new tech watch the time frame - usually it gets close to or just after the end of their patent and once other developers can get their hands on it and customize the market begins to see advancement. Before that point you've got to pay the $$$ for that new tech.  As in SSD and other new tech, just wait for everyone to get their hands on it and then things get good.

MVP
MVP

cahunt​ said it a lot more eloquently than I did...

Level 7

Is telemetry just a fancy name for log messages

About the Author
Network architect who rules packets with an iron console. Introvert. Writer. Packet Pushers podcast host.