One of the things I like most about the writing challenge is we’ve set it at a time when many of us are either “off” (because how many of us in tech are ever REALLY “off”) or at least find ourselves with a few extra compute cycles to devote to something fun. This week, more than any so far, has shown this to be true.
Despite a conspicuous absence of references to brightly colored interlocking plastic blocks, our ELI5 imaginations ran wild, from tin can telephones to poetry (with and without illustration) to libraries.
I’m thrilled with how the challenge has gone so far, and what other examples are yet to come as we finish strong next week.
Kevin Sparenberg—former SolarWinds customer, master of SWUG™ ceremonies, semi-official SolarWinds DM, and owner of many titles both official and fictitious—takes the idea of routing back to its most fundamental and builds it back up from there.
I love this challenge—all these definitions that I can now use when a non-geek asks me what I do.
For adults—so not five-year olds, I always reverted to the sending a letter through the post office, which seems to cover it
I use that same analogy when I explain to my unfortunate Sales Agents who work in neighborhoods with a shared DSL line. I always get calls around the holidays that the internet in their model has suddenly started crawling. It’s too hard for me to explain shared DSL lines and that we have no control over what’s put under our feet when we build houses, and that ISP’s with older cable lines will “store” up a certain amount of data per neighborhood—and depending on how heavily it’s used during the day—it can make all the difference with those…speeds UP TO 75 MBps.”
If I tried to explain how the old neighborhood DSL’s route and “borrow” data when it’s not being used by someone else, their heads would explode.
So, I use our highway as an example.
“You know how the internet’s called an Information Highway?” Well, in your neighborhood it works a lot like that. During the day, your speeds are okay because most people are at work and school. They’re not on your “information highway.” But when the holidays hit, you got kids at home streaming and gaming and suddenly your own internet’s gonna drop because now many people are on your “highway.” Just like when you get on the highway to go home—if you don’t have many people on the road, you can go the 60 – 70 mph that you’re allowed on the posted signs. But if it’s the rush hour—and cars are jammed for miles—it doesn’t matter if the posted signs say “60 mph”—you’re gonna go the same crawling 30 mph that everyone else is, because it’s jammed.
Right now—you got a heavy “rush hour” on your DSL line because there’s a lot of people in your neighborhood on it.”
What I wouldn’t give to have us all on fiber.
But such is the life of a home builder.
- Speedtest.net ... you’re my only friend.
I also think about it as getting to work. I know the preferred path, but due to insane drivers outside of my control, I’m sometimes forced to take alternate paths to get to the same place. If there’s an accident on the main road I take, US 301, then I might have to take the interstate 75 which is often flowing smoothly until it gets backed up then I might need to take the turnpike. Luckily for me, there’s almost no time difference in getting from home to work and vice versa, but at the end of the day, I’m only able to measure the difference in distance traveled. It’s more miles to use the I-75 and/or turnpike. So, my route is within minutes of each other, but the distance traveled to get there is much greater when I don’t get to use my preferred path.
When a few folks here at SolarWinds began talking about “NetFlowetry”—mostly as a silly idea—we had no idea how it would take off. THWACK MVP Thomas Iannelli’s entry shows how much the idea has caught on, and how well it can be used to make a challenging concept seem accessible.
This is so wonderful! This is so great for non-experts in this subject. Your poem is full of visual words for a visual learner like me. Thank you!
We rely on ping for a lot, but we as Network Analysts understand much about pings that many other folks may not. For example, a switch or router may be humming along, working perfectly, forwarding and routing packets for users without a single issue. But pinging that switch or router may not be the best way to discover latency between that switch and any other device. This is because ICMP isn’t as important to forward or respond to as TCP traffic.
A switch or router “knows” its primary job is to forward data and reply to pings as fast as possible just isn’t as important as moving TCP packets between users. So, a perfectly good network and set of hardware may serve users quite well, but might simultaneously show varying amounts of latency. It’s because we may be monitoring a switch that’s busy doing other things; when it gets a free microsecond, it might reply to our pings. Or it might not. And users aren’t experiencing slowness or outages when the switch starts showing higher latency than it did when there was very little traffic going through it.
It’s important to not place excessive reliance on pings “to” routers or switches for this very reason.
However, you might just find pings more valuable if you ping from endpoint to endpoint instead of from monitoring station to switch or routers. The switch or router will forward the ICMP traffic nicely, and may do so much better than it will REPLY to the pings.
So, ping from a workstation to another workstation, or to a server, or server to server, instead of from a workstation to a router or switch that might have better things to do with its processing resources than reply to your ping quickly.
Give me a ping, Vasili. One ping only, please.
When explaining the speed of reads and writes, most people’s minds wouldn’t think about libraries. But THWACK MVP Jake Muszynski isn’t like most people, and his example was brilliantly, elegantly simple.
Reads and writes per second
Is a metric measured here?
Where is the system bottleneck
Of our data we hold so dear?
IOPS—Read and write without any latency, most of us would want the data on our screen in split seconds and IOPS does contribute to it, we would always love to keep this as healthy as possible, with data pouring in we need to keep these things at scale -> IOPS, data storage, data retrieval and throughput.
Well said sir... I love the book analogy.
It’s also like a puzzle... except that you get the same picture but the number of pieces in box change... sometimes 50, others 100, others 500 and some even 1000 pieces. Same view just more to consider.
Or when I mentioned your post to an accountant friend of mine.... debits=credits!!!!
24. Virtual Private Network (VPN)
THWACK MVP Matthew Reingold finds what is perhaps the most amazing, most simple, and most accurate ELI5 explanation for virtual private networks I’ve ever seen. You can bet I will be adding it to my mental toolbox.
We use VPNs for everything from connecting to the office to protecting our torrent downloads from nosy ISPs. Everyone uses a VPN these days to encrypt and protect their information from prying eyes.
You could also describe it as using a water hose through a pool. You get to go through the pool, but the hose hides your data and what comes out of the pool is only what has gone through your hose.
Open a connection between me and you
Encrypt the data before it goes through,
Then the only people who can see
The flowing data is you and me.
The word telemetry is still obscured by a healthy dose of “hand wavium” from companies and individuals who don’t understand it but want to sound impressive. – Josh Biggley, who has devoted a good portion of his career to both building systems to gather and present telemetry data; and clarifying what the word means.
To me, telemetry is to reach to a point/milestone where normal/generic process/procedure can’t -> be it collecting data, be it monitoring, be it inducing instructions or any other possible thing.
If telemetry can tell a pit crew in NASCAR exactly how the race car is behaving, it can do the same (and possibly more) for us. The idea is, as mentioned by the author, to remove the noise. What do we care about? What matters? What is measurable vs. what is observable? Finally, how do we put that into a dashboard that we can use to have an overview of everything at once? That’s where telemetry really is useful, the combined overview of all our metrics.
I’ve known people who worked on Boeing’s Delta program and the SpaceX Falcon 9 program. In rocketry, a lot of telemetry data is the difference between “it exploded” and “here’s what went wrong.”
26. Key Performance Indicator (KPI)
If Senior UX researcher Rashmi Kakde ever thought about a second career, I’d suggest writing and illustrating tech books for kids. Her poetic story about KPI is something I plan to print and use often.
I have started working with KPI’s that I track for the Orion® Platform. As I delegate work to others or if I get distracted (when) I need an easy way to verify that the Orion Platform is doing what I expect it to. I have overall system health from App monitors and the “my Orion deployment” page, but what about all those things that are more like house cleaning? Things like custom properties. Unknown devices. Nodes missing polls. I build out dashboards and reports to let me know how the processes I have in place (both automated and human) are getting things done. I pull them into a PowerShell monitor from SAM via SWQL queries.
Did I have a spike in unmanaged devices? Do I need to find out why?
Do all my Windows servers have at least one disk?
Are there disks that need to be removed?
Not all of them are important, at least not right now. But once I gather stats on what we need to clean up to be current, then I choose a few significant metrics to improve. Those are my KPI’s. I look at the number for a quarter and try to improve the process and the automation to make sure stuff doesn’t fall between the cracks. And having stats over time mean that I can see if thing change and need my attention. If I make a few things better, and other stuff suffers, I change my KPI’s.
I love, love, love this one, especially the pictures
For me the most important part of KPIs is to try to refer to them by their full name rather than the TLA.
I’ve seen several “service desk” systems that try to label ticket close rates as a KPI.
Where something is measured because it was easy to measure, not because it indicates how well the service desk is being run.
That it isn’t a Key Performance Indicator any more than MPG is a KPI for how comfortable a car is.
It’s just an interesting statistic, not a KPI.
We need to remember what Performance our Indicator is Key for highlighting to us and why it is important enough to make it “Key.”
The trick with KPIs is figuring out what is actually “key” and observable to system performance. Of course, one must begin by asking what it means that the system is performing well.
I was laid off years ago because someone “upstairs” decided to change what was key without telling anyone. I was laid off for handling fewer tickets than my colleagues. For months, if not a couple of years, I had been an unofficial escalation point—working high-priority tickets and customers. That took—with explicit approval from managers with whom I shared space—more time than ordinary tickets, so I handled fewer overall. I also would help colleagues if they had questions.
Well above those folks, it was decided that my group would have one KPI—number of tickets processed. On Friday going into Labor Day Weekend that year, I was working with a customer, who thanked me profusely, when I heard my manager (two levels above me), getting rather upset. I found out later that was when higher-ups told him I was getting laid off. I found out about 20 – 30 minutes later.
So, was processing tickets quickly the KPI? Should it have been combined with, say, customer satisfaction, perhaps measured via survey? What about some sort of metric in which the severity or difficulty of the tickets was taken into account? What was really key to the support desk’s performance?
27. Root Cause Analysis
Principal UX researcher Kellie Mecham is trying to inspire an entire new generation of UX/UI folks with her explanation, by pointing out the ability to ask questions a core skill. By way of example, she shows how enough “why” questions can uncover the root cause of any situation.
Root cause analysis is critical to understanding the past and the why did that happen. Along with RCA I like to include the how questions of How can we prevent that in the future and How can we use this information to make things better, faster, more resilient. When asked for the root cause I like to provide not just the answer, but the value obtained from that answer.
Getting to the bottom of things
Is what we are looking for,
Diagnose the disease
Lease the symptoms at the door.
Unfortunately, I have worked with people who would then take it to the level of: “Why do we need to pay? Why can’t we just have?”
A root cause analysis can only go so far, and some people have difficulty with reasonable limits.