1 2 3 Previous Next

Geek Speak

1,496 posts
TiffanyNels

There will be blood...

Posted by TiffanyNels Mar 30, 2015

That is it folks, the Mischievous round is done. Once again, I am personally devastated that the Doctor Who representatives (The Master and the Daleks) did not make it beyond the first round. Thanks to those (looking in your direction sqlrockstar and @emoore@empireiron.com) for keeping the dream alive. Perhaps, we will take this as a powerful message. Next year, no Doctor Who. Humpf.

 

Who else avoided a first round elimination? Our official results will post shortly, but here is a recap of some of the most popular and hotly contested match ups. 

  • We thought that this match up would be closer by far, but the power of the Dark Side and a choke hold were too much for Voldemort's weak geek cred
  • Bahlkris thinks these two are a winning sitcom combination (dark sense of humor, indeed)... the pilot episode could be a rematch to determine who gets the bigger bedroom. In the meantime, this round goes to Khan over Hannibal
  • And Skeletor's Cinderella story continues, with a complete trouncing of Jaws.

 

So, we have a new set of match-ups, likely more random than before.

 

Voting in the Rotten round starts around 10 am CT today. We still need your help to determine the most infamous of the baddies.

We are a solid 3 months into 2015 at this point, the first quarter is nearly out, and subsequently we should start seeing the seeds or even some fruit of various ‘2015 Predictions’ by “industry” folks.  Well, rather than wait until the end of the year to see, what ‘might’ happen vs what might not. What are *you* seeing.  You are the community, you are the people for whom the predictions toll, but most importantly you are the ones who actually decide IF they will come to fruition or not.

 

We’ve seen predictions around Cloud adoption, implementation of Software Defined Data Centers, whether components of Software Defined Storage, Software Defined Networking or other similar type capabilities.   I don’t want to even bring up the fact that the past 10 years have been, “The Year of VDI” but every year it is predicted, and every year sadness follows because wide-scale adoption keeps people saying, “one more time…”

 

So what are you seeing, let’s say… Forget the Analysts, the Trade Press, even my own predictions because they mean nothing if they don’t actually happen. (No, this isn’t some kind of obscure Nostradamus stuff :)) But the best kind of predictions follow as Self-Fulfilling prophecies and you are in the drivers seat of IT to see things start happening or *want* things to start happening.

 

So now that we have a few months under our belt, what are some of the Predictions you DO see happening and others you would *like* to see happen.  Fortunately the industry and marketplace is mature that a majority of the solutions exist to adopt, it’s just a matter of…

 

Which ones are resonating with you?

Management wants to be able to track employees productivity and their performance to use in periodic employee evaluations. Performance improvements can lead to keeping your job, bonuses, and pay increases, where declines in performance lead to pay decreases, black marks in your HR file, and possibly losing your job. When accurate metrics are used in evaluations, they can be beneficial to the organization and individual.

 

One type of metric is the Service Level Agreement, or SLA. This defines how IT, as a group, is going to respond to the issue that a user reports. Some organizations define the SLA to be the time it takes to resolve the issue. While some issues can be resolved quickly, like a password change, other issues can take a long time to diagnose and correct. The SLA can also be measured by how long it takes for the initial response to the request. While I believe this is a better metric than time to resolution, this can be abused. The person assigned the ticket can easily make initial contact but not make any progress in diagnosing or resolving the issue.

 

Another metric is customer satisfaction. One way of getting this metric is using surveys sent to the requestor after the issue is resolved. Not all surveys make it to their destination and if they do, many get ignored or deleted. The questions on surveys are written in multiple choice for easy analysis, but don’t really provide much for collecting real feedback from the requestor. If an issue was handled by multiple people, then who does the survey reflect upon and does the requestor realize this.

 

Managers want to know how well their employees are doing and a way to accurately measure the employee. Ticketing systems have some metrics that can be used to track how well employees are doing. How do you accurately measure employees, in particular Help Desk employees?  What are the metrics that really matter? Can all of these metrics be tracked in one system? How would you like to be measured?

Amit.JPGAmit Panchal was one of the Virtualization Field Day 4 delegates we recently presented to in Austin and I got to thinking we should get to know our friend from “Blighty” a bit better. So, for this month’s IT

Blogger Spotlight series we caught up with him.


SW: Tell us a bit about yourself. How did you get your start in IT?

 

AP: I started in IT as a Helpdesk Analyst for a large solutions provider. I actually completed an accountancy degree, but my passion was in IT, so naturally I worked my way up from there.


SW: I guess you could say your interest in all things IT naturally led you to start blogging?

 

AP: I started blogging because I wanted to share what I was learning in the industry with the wider IT community. I’m particularly interested in virtualization so my blog (Amit’s Technology Blog: Demystifying the World of Virtualisation & Technology) allows me to share my own thoughts, and helps me raise my profile in the community.


SW: So, what keeps you busy during the week?


AP: Monday to Friday is hectic with work and managing a team/projects and ad hoc jobs for a fast-paced global manufacturing company. Outside work, I'm very busy in the evenings juggling home life and carving out time to follow the latest tech trends and keeping my social circles buzzing mainly via Twitter (@amitpanchal76). I also try to make time to stay up-to-date with fellow bloggers by listening to different podcasts.

 

SW: Wow, you’re a busy guy! What blogs are you following these days?

 

AP: Some of my favorite blogs are Yellow Bricks, The Saffa Geek and I regularly follow Frank Denneman, Mike Preston and Hans De Leenheer.


SW: I’d say you have your finger on the “IT” pulse. Do you have any favorite tools you use?

 

AP: I use a number of different tools in my day-to-day, but if I had to pick my top tools I’d say vSphere Ops Manager, System Center Operations Manager, and Quest Active Roles Server. I’ve also used SolarWinds Storage Manager. It’s an excellent product for trending, reporting, and real-time analysiswhich has been recently followed up with SolarWinds Storage Resource Monitor.


SW: OK, time to talk football (soccer). I see you’re a Manchester United Fan. I’ve got two questions for you. David Moyes or Louis van Gaal?

 

AP: Louis van Gaal all the way! He's made some good moves this season and hasn’t struggled in the way that Moyes did. He has a good track record, so I have high hopes for him.


SW: Ruud van Nistelrooy or Robin van Persie?

 

AP: Ruud van Nistelrooy is my pick as he was lethal in attack and could score in many different ways. He was also less prone to injury!


SW: Other than football, what else do you enjoy when you’re not “plugged in”?

 

AP: When I'm not working or blogging, I'm usually trying to keep control of my two boys. I love traveling and regularly go abroad with my family to escape the UK weather and catch some sun, sea, and sand. I'm also an avid reader and love both fiction and non-fiction.


SW: Escaping this winter weather sounds good right about now. Changing gears for the final question, what trends are you seeing in the industry?

 

AP: I’ve seen a recent explosion in hyper-converged solutions (1 box solutions that are scalable), SaaS apps, hybrid cloud (as companies seek to experiment with workloads off-premise), and flash storage in the data center.

Incident responders: Build or buy?

There is far more to security management than technology. In fact, one could argue that the human element is more important in a field where intuition is just as valuable as knowledge of tech. In the world of security management I have not seen a more hotly debated non-technical issue than the figurative “build or buy” when it comes to incident responder employees. The polarized camps are the obvious:

  • Hire for experience.
  • In this model the desirable candidate is a mid-career or senior level, experienced incident responder. The pros and cons are debatable:Hire for ability
    • More expensive
    • Potentially set in ways
    • Can hit the ground running
    • Low management overhead
  • In this model, a highly motivated but less experienced engineer is hired and molded into as close to what the enterprise requires as they can get. Using this methodology the caveats and benefits are a bit different, as it is a longer term strategy.
    • Less expensive
    • “Blank Slate”
    • Requires more training and attention
    • Initially less efficient
    • More unknowns due to lack of experience
    • Can potentially become exactly what is required
    • May leave after a few years

In my stints managing and being involved with hiring, I have found that it is a difficult task to find a qualified, senior level engineer or incident responder that has the personality traits conducive to melding seamlessly into an existing environment. That is not so say it isn’t possible, but soft skills are a lost art in technology, and especially so in development and security. In my travels, sitting on hiring committees and writing job descriptions, I have found that the middle ground is the key. Mid-career, still hungry incident responders that have a budding amount of intuition have been the blue chips in the job searches and hires I have been involved with. They tend to have the fundamentals and a formed gut instinct that makes them incredibly valuable and at the same time, very open to mentorship. Now, the down side is that 40% of the time they’re going to move on just when they’re really useful, but that 60% that stick around a lot longer? They are often the framework that thinks outside the box and keeps the team fresh.

Better Network Configuration Management promises a lot. Networks that are more reliable, and can respond as quickly as the business needs. But it’s a big jump from the way we've run traditional networks. I'm wondering what’s holding us back from making that jump, and what we can do to make it less scary.


We've all heard stories about the amazing network configuration management at the Big Players (Google, Facebook, Twitter, Amazon, etc). Zero Touch Provisioning, Google making 30,000 changes per month, auto-magic fine-grained path management, etc. The network is a part of a broader system, and managed as such. The individual pieces aren't all that important - it's the overall that matters.


Meanwhile, over here in the real world, most of us are just scraping by. I've seen many networks that didn't even have basic automated network device backups. Even doing something like automated VLAN deployment is crazy talk. Instead we're stuck in a box-by-box mentality, configuring each device independently. We need to think of the network as a system, and but we're just not in a place to do that.


Why is that? What's stopping us from moving ahead? I think it’s a combination of being nervous of change, and of not yet having a clear path forward.


Are we worried about greater automation because we're worried about a script replacing our job? Or do we have genuine concerns about automation running amok? I hear people say things like "Oh our Change Management team would never let us do automated changes. They insist we make manual changes." But is that still true? For server management, we've had tools like Group Policy, DRS, Puppet/Chef/Ansible/etc for years now. No reasonable-sized organisation would dream of managing each of their servers by hand. Change Management got used to that, so why couldn't we do the same for networking? Maybe we're just blaming Change Management as an excuse?


Maybe the problem is that we need to learn new ways of working, and change our processes, and that’s scary. I’m sure that we can learn new things - we’ve done it before. But A) do we want to? and B) do we even know where to start?


If you’re building an all-new network today you’d bake in some great configuration management. But we, as a wider industry, need to figure out how to improve the lot of existing networks. We can’t rip & replace. We’ve got legacy gear, often with poor interfaces that don’t work well with automation toolsets. We need to figure out transition plans - for both technology & people.


Have you started changing the way you approach network configuration management? Or are you stuck? What’s holding you back? Or if you have changed, what steps did you take? What worked, and what didn’t?

I talk (“brag”) a lot about being a SolarWinds® Head Geek®. Everyone on the Head Geek team (Patrick Hubbard, Thomas LaRock, Kong Yang, Don Jacob, and Praveen Manohar) often gushes about how we feel this is the best job in the world.

 

But what does being a Head Geek really mean? Is it all tweets about bacon, sci-fi references on video, and Raspberry Pi-connected coffee machines?

 

Honestly, all that stuff is part of the job. More to the point, being excited about tech, engaging in geek and nerd culture, and being openly passionate about things is part of what makes people want to listen, friend, and follow us Head Geeks.

 

First and foremost, Head Geeks are experts in, and champions of monitoring and automation—regardless of the tools, techniques, or technology. We know our stuff. We are proud of our accomplishments. And, we love to share everything we know.

 

Our core message is, “Monitoring and automation are awesome, and companies can derive a huge benefit both financially and operationally from having effective tools in place. I’d love to tell you more about it. Oh, and by-the-way, the company where I work happens to make some cool stuff to help you get that done.” Education and inspiration first—sales second.

 

In addition, being a Head Geek means:

  • Finding and building your voice as a trusted and credible source of information about monitoring and automation for both companies and IT professionals.
  • Providing information about new technologies, developments, and techniques, and showing how they impact the current state of technology.
  • Building solid opinions about technology based on facts and experience, and then engaging in dialogue and meaningful interactions to share those opinions with the public.
  • Being a proponent not just for pure technology, but also for cultural issues and initiatives within the IT community.
  • Enjoying the thrill of digging into new products and features (regardless of vendor) and exploring their impact to the IT landscape.
  • Mentoring IT pros on honing their skills in using monitoring and automation tools as well as fostering significant and meaningful value, stability, and reliability.
  • Using our platform to shine a light on the awesome achievements of other IT pros, not ourselves.

 

But how does that look day-to-day?

 

We have a love of writing. Our idea of a great day includes cobbling together 800-2000 words for a magazine or blog post; helping tweak the phrasing on a customer campaign email; and scripting demonstrations that teach customers how to fix a common (or not-so-common) issue, use a feature, or gain insight into the way things work.

 

Our love of writing goes hand-in-hand with a love of reading. We consume bushels of blogs, piles of podcasts, and multitudes of magazine articles.

 

We are at ease in front of a crowd. Head Geeks appear in videos (SolarWinds Lab as well as creative videos, tutorials, and demos). We lead webcasts, participate in podcasts, and perform product demonstrations for potential customers. We staff booths and do presentations at trade shows like VMworld®, MS-Ignite, SQL Pass, and Cisco® Live where we get to interact with thousands of IT pros.

 

We are social media savvy. That goes beyond “the big three.” We seek out the hotbeds of IT discussions—from StackExchange and reddit® to thwack® and SpiceWorks®—and we get involved in the conversations happening there. We contribute, ask, answer, teach, and learn.

 

But there is even more that happens behind the scenes.

 

As much as we are a voice of experience and credibility in the IT professional community, our opinions are equally sought after inside SolarWinds.

  • Our experience helps set the direction for new features and even new products.
  • We interact with the development teams and serve as the voice of the customer.
  • We take issues we hear about on the street and make sure they get attention.
  • We contribute insight during the design of user interfaces and process flows.
  • We push product managers and developers to include features that may be hard to implement, but will honestly delight our users.

 

Head Geeks are true “technical creatives.” We are critical to many departments—especially the marketing arm of our organization. We make sure that our message speaks with a sincere, passionate, human voice to our customers. We ensure that marketing and sales literature is technically accurate, but also crafted to highlight the features we know customers will be excited to use.

 

We love a challenge, and we know that the name of the game is change. Can we film two SolarWinds Lab episodes in one day? Let’s give it a shot! Quick, what can we do to help our European team build awareness and excitement? Let’s host a “mega lab” presented in their time zone. How about attending Cisco Live and then doing a week of SolarWinds Lab filming back to back? We’ll be exhausted, but we’re going to get some great material.

 

Being a Head Geek isn’t for everyone. Not because it takes some magical alchemical mixture of traits, nor because it’s a job that has to be earned with years of sweat and toil. It’s not for everyone because not everyone finds the things I just described to be exciting, rewarding, or fun.

 

But for those who do—and you can put me firmly in that camp—it’s the job of a lifetime.

What seems like a lifetime ago I worked for a few enterprises doing various things like firewall configurations, email system optimizations and hardening of Netware, NT4, AIX and HPUX servers. There were 3 good sized employers, a bank and two huge insurance companies that both had financial components. While working at each and every one of them, I was, subject to their security policy (one of which I helped to craft, but that is a different path all together), and none of which really addressed data retention. When I left those employers, they archived my home directories, remaining email boxes and whatever other artifacts I left behind. None of this was really an issue for me as I never brought any personal or sensitive data in and everything I generated on site was theirs by the nature of what it was. What did not occur to me then, though, was that this was essentially a digital trail of breadcrumbs that could exist indefinitely. What else was left behind and was it also archived? Mind you, this was in the 1990s and network monitoring was fairly clunky, especially at scale, so the likely answer to this question is "nothing", but I assert that the answer to that question has changed significantly in this day and age.

Liability is a hard pill for businesses to swallow. Covering bases is key and that is where data retention is a double edged sword. Thinking like I am playing a lawyer on TV, keeping data on hand is useful for forensic analysis of potentially catastrophic data breaches, but it can also be a liability in that it can prove culpability in employee misbehavior on corporate time, resources and behalf. Is it worth it?

Since that time oh so long ago I have found that the benefit has far outweighed the risk in retaining the information, especially traffic data such proxy, firewall, and network flows.  The real issues I have, as noted in previous posts, is the correlation of said data and, more often than not, the archival method of what can amount to massive amounts of disk space.

If I can offer one nugget of advice, learned through years of having to decide what goes, what stays and for how long, it is this: Buy the disks. Procure the tape systems, do what you need to do to keep as much of the data as you can get away with because once it is gone it is highly unlikely that you can ever get it back.

Double double, toil and trouble… Something wicked this way comes. Our third annual Bracket Battle is upon us.

 

Mwuh huh huh huh ha!

 

On March 23, thirty-three infamous individuals begin the battle to rip worlds apart and crush each other until only one remains as the most despicable of all time. It’s VILLAINS time, people!

 

These head-to-head, villain-versus-villain matchups should once again spark controversy. Trust us… every year, we get something stirred up – whether is it the absence of someone, or the overrating of another. Y’all are a hard group to please!

 

We have included a wide range of our worst enemies, including cunning and depraved villains who have tried to rule Middle-earth, Castle Greyskull, Asgard, Springfield and beyond. Draw your weapons; it’s time to decide:

 

  • Demi-god or Immortal?
  • Lightsaber or Wand?
  • Clown Prince or Dr. Fava Bean?
  • The Dragon versus The Ring?

 

We came up with the field and decided where we would start, but the power is yours to decide who will be foiled again and which single scoundrel, in the end, will rule them all.

 

But, in a twist no one saw coming, we are changing things up this year.  We are setting up a little trap, ummm, no… giving you a PREVIEW of the bracket today, even though voting will not begin until Monday.

 

Dastardly plans (AKA Rules of Engagement) are outlined below…

 

MATCH UP ANALYSIS

  • For each combatant, we offer links to the best Wikipedia reference page by clicking on the NAME link in bracket
  • A breakdown of each match-up is available by clicking on the VOTE link.
  • Anyone can view the bracket and the match-up descriptions, but to comment and VOTE you must be a thwack member (and logged IN). 

 

VOTING

  • Again, you have to be logged in to vote and debate… 
  • You may only vote ONCE for each match up
  • Once you vote on a match, click the link to return to the bracket and vote on the next match up in the series.
  • Each vote gets you 50 thwack points!  So, over the course of the entire battle you have the opportunity to rack up 1550 points.  Not too shabby…

 

CAMPAIGNING

  • Please feel free to campaign for your favorites and debate the merits of our match ups to your hearts content in the comments section and via twitter/Facebook/Google + etc. etc. etc.
  • We even have hashtags… #swbracketbattle and #EvilLaugh… to make it a little bit easier.
  • There will be a PDF version of the bracket available to facilitate debate with your hencemen.
  • And, if you want to post pics of your bracket predictions, we would love to see them on our Facebook page!

 

SCHEDULE

  • Bracket Release is TODAY
  • Every round voting will begin at 10 am CT
  • Play-in Battle OPENS March 23
  • The Mischievous round OPENS March 25
  • The Rotten round OPENS March 30
  • The Wicked round OPENS April 2
  • The Vile round OPENS April 6
  • The Diabolical Battle OPENS April 9
  • And, finally, the one true OVERLORD will be announced on APRIL 13

 

If you have other questions… feel free to drop them below and we will get right back with you!

 

So, which of these villains we love to hate can plot and scheme their way to the top of this despicable heap? Whose dastardly plans will rip apart worlds and crush humanity?

 

OK, we will stop our monologue and let you decide.

A finely-tuned NMS is at the heart of a well-run network. But it’s easy for an NMS to fall into disuse. Sometimes that can happen slowly, without you realizing. You need to keep re-evaluating things, to make sure that the NMS is still delivering.

 

Regular Checkups

Consultants like me move around different customers. When you go back to a customer site after a few weeks/months, you can see step changes in behavior. This usually makes it obvious if people are using the system or not.


If you're working on the same network for a long time, you can get too close to it. If behaviors change slowly over time, it can be difficult to detect. If you use the NMS every day, you know how to get what you want out of it. You think it's great. But casual users might struggle to drive it. If that happens, they'll stop using it.


If you're not paying attention, you might find that usage is declining, and you don't realize until it’s too late. You need to periodically take an honest look at wider usage, and see if you're seeing any of the signs of an unloved NMS.

 

Signs of an unloved NMS

Here’s some of the signs of an unloved NMS. Keep an eye out for these:

  1. Too many unacknowledged alarms
  2. No-one has the NMS screen running on their PC - they only login when they have to
  3. New devices not added

 

What if things aren't all rosy?

So what if you've figured out that maybe your NMS isn't as loved as you thought. What now? First don't panic. It's recoverable. Things you can do include:

  1. Talk. Talk to everyone. Find out what they like, and what’s not working. It might just be a training issue, or it might be something more. Maybe you just need to show them how to set up their homepage to highlight key info.
  2. Check your version. Some products are evolving quickly. Stay current, and take advantage of the new features coming out. This is especially important with usability enhancements.
  3. Check your coverage: Are you missing devices? Are you monitoring all the key elements on those devices? Keep an ear to the ground for big faults and outages: Is there any way your NMS could have helped to identify the problem earlier? If people think that the NMS has gaps in its coverage, they won't trust it.

 

All NMS platforms have a risk of becoming shelfware. Have you taken a good honest look recently to make sure yours is still working for you? What other signs do you look for to check if it's loved/loathed/ignored? What do you do if you think it might be heading in the wrong direction?

This is a conversation I have A LOT with clients. They say we want "logfile monitoring" and I am not sure what they mean. So I end up having to unwind all the different things it COULD be, so we can get to what it is they actually need.

 

It's also an important clarification for me to make as SolarWinds Head Geek because depending on what the requested means, I might need to point them toward Kiwi Syslog Server, Server & Application Monitor, or Log & Event Manager (LEM).

 

Here’s a handy guide to identify what people are talking about. “Logfile monitoring” is usually applied to 4 different and mutually exclusive areas. Before you allow the speaker to continue, please ask them to clarify which one they are talking about:

  1. Windows Logfile
  2. Syslog
  3. Logfile aggregation
  4. Monitoring individual text files on specific servers

 

More clarification on each of these areas below:

Windows Logfile

Monitoring in this area refers specifically to the Windows event log, which isn’t actually a log “file” at all, but a database unique to Windows machines.

 

In the SolarWinds world, the tool that does this is Server & Application Monitor (SAM). Or if you are looking for a small, quick, and dirty utility, the Eventlog Forwarder for Windows will take Eventlog messages that match a search pattern and pass them via Syslog to another machine.

 

Syslog

Syslog is a protocol, which describes how to send a message from one machine to another on UDP port 514. The messages must fit a pre-defined structure. Syslog is different from SNMP Traps. This protocol is most often found when monitoring network and *nix (Unix, Linux) devices, although network and security devices send out their fair share as well.

 

In terms of products, this is covered natively by Network Performance Monitor (NPM), but as I've said often you shouldn't send syslog or trap directly to your NPM primary poller. You should send it into a syslog/trap "filtration" first. And that would be the Kiwi Syslog server (or its freeware cousin).

 

Logfile aggregation

This technique involves sending (or pulling) log files from multiple machines and collecting them on a central server. This collection is done at regular intervals. A second process then searches across all the collected logs, looking for trends or patterns in the enterprise. When the audit and security groups talk about “logfile monitoring,” this is usually what they mean.

 

As you may have already guessed, the SolarWinds tool for this job is Log & Event Manager (LEM). I should point out that LEM will ALSO receive syslog and traps, so you kind of get a twofer if you have this tool. Although, I personally STILL think you should send all of your syslog and trap to a filtration layer, and then send the non-garbage messages to the next step in the chain (NPM or LEM).

 

Monitoring individual text files on specific servers

This activity focuses on watching a specific (usually plain text) file in a specific directory on a specific machine, looking for a string or pattern to appear. When that pattern is found, an alert is triggered. Now it can get more involved than that—maybe not a specific file, but a file matching a specific pattern (like a date); maybe not a specific directory, but the newest sub-directory in a directory; maybe not a specific string, but a string pattern; maybe not ONE string, but 3 occurrences of the string within a 5 minute period; and so on. But the goal is the same—to find a string or pattern within a file.

 

Within the context of SolarWinds, Server & Application Monitor has been the go-to solution for this type of thing. But, at this moment it’s only through a series of Perl, Powershell, and VBScript templates.

 

We know that’s not the best way to get the job done, but that's a subject for another post.

 

The More You Know…

For now, it's important that you are able to clearly define—for both you and your colleagues, customers, and consumers—the difference between "logfile monitoring" and which tool or technique you need to employ to get the job done.

Remote control software is a huge benefit to all IT staff when troubleshooting an issue. There are big benefits for using a service provider to host this functionality for you. There are many reasons, mainly security, to not use a service provider and instead host this application internally. However, internally hosting a remote control application can cost more in capital expenditure and overhead.

 

When you host something in the cloud you are giving that service provider responsibility for a significant portion of your security control. Even for something as simple as remote control software there are concerns about security. For many solutions you have to rely on the authentication mechanism the provider built, although some will allow you to tie authentication into your internal Active Directory. The provider may allow for two-factor authentication. You have to rely on the provider’s encryption mechanism and trust that all signaling (setup, control, and tear down) and data traffic is encrypted, along with the appropriate algorithms. The remote control service provider not only services your hosts, but that of many other organizations and you have trust them to keep everyone separated. Also, with all of those combined hosts, it makes the service provider a larger target for an attack than your organization may be on it’s own. When your organization’s Internet connection goes down you loose the ability to control any of your end hosts from the internal side of your organization’s network. When you delete an end host or discontinue service from the provider you data might not be completely deleted.

 

Hosting a remote control application within your own organization can be difficult in itself. You have to have the infrastructure to host the application. Then if you want redundancy, the application has to support redundancy and you have to have more infrastructure. Then you need to make sure you update the application on your server(s), on top of ensuring the end hosts are up to date, which requires planning, testing, and change control. If you expose your internal remote control application to the Internet, like a service provider would, then you need to monitor it for potential intrusions and attacks, and defend against those. That may require additional infrastructure and add complexity. If your organization’s Internet connection goes down and you are on the inside of your organization, then you loose connectivity to all of the remote hosts. If you are external, then you loose connectivity to all of the internal hosts.

 

There is no one solution that fits everyone’s needs. As a consultant I have seen many different solutions and have ones that I prefer. Do you use a remote control solution from a service provider or do you have one you host yourself? Why did your organization choose that one?

tweet.pngI’ve had a few members ask about my silly Pi Day TwitterBot hack, and why someone would even want to do such a thing.  The real answer is a geek compulsion, but the thinking went something like this:

 

Pi Day 2015 was going to give us Pi via the date to 10 digits, but if you included the milliseconds you could go to 13 digits.  Well, to be fair you could go to 1,000,000 digits if you had a sufficiently accurate timer to produce a decimal second with enough accuracy.  But let’s face it; true millisecond accuracy in IT gear is unlikely anyway.   Are you happy with the clarification realtime programmers?!

 

IMG_8762.JPG

I realized that if I could loop tight enough to trigger at a discrete millisecond boundary I could do something at that fateful moment.  And because a geek can, a geek should.  So what should happen at that moment? Do a SQL update, save a file, update a config?  No, there was only one thing to do: tweet.  The next trick was to create a bot. I use Raspberry Pi’s for just about all maker projects now.  After years of playing with microcontrollers I finally switched over.  Pi’s are cheaper than Arduinos when you consider adding IO, they run a full Linux OS, and many add-on boards work with them. (And yes I monitor my Pi’s at home with Orion, so be sure to check out Wednesday’s SolarWinds Lab which is all about monitoring Linux.)

 

But neither Arduinos nor Pi's have is real-time clocks, which is a little bit of a problem if you’re planning to do time sensitive processing.  So here’s the general setup for the project, and I’ll save you the actual code because 1) I made it in < 20 minutes, 2) no one will ever need it again and 3) mostly because it’s so ugly I’m embarrassed.  I used Python because there were libs aplenty.

 

The Hack

 

  1. Find “real enough time” i.e. accurate offset.  I’m too cheap to buy a GPS module, so I used NTP. However, single NTP syncs aren’t nearly enough to get millisecond-ish accuracy, plus the Raspberry’s system (CPU) clock drifts a bit.  So first, we need to keep a moving average of the offset and I used ntplib

     

    c = ntplib.NTPClient()
    response = c.request('europe.pool.ntp.org', version=3)
    >>> response.offset
    -0.143156766891
    >>> response.root_delay
    0.0046844482421875
    
    
    
    
    

     

     

    Next, poll 30 times in a minute and, deposit the results into a collections.deque.  It’s a double ended buffer object, meaning you can add or remove items from either end.  (It’s easier to implement than a circular buffer).  Adjusting the overall length in 30 sample increments lets you expand the running average beyond a single update cycle.

     

  2. Keep an eye on clock drift.  The actual trigger loop on the Raspberry would need to hammer the CPU and I didn’t want to get into a situation where I’d be about to trigger on the exact millisecond but get hit with an NTP update pass.  To do that I’d need to fire based on a best guess of the accumulated drift since the previous sync.  So, whenever the NTP sync fired, I saved the previous average offset delta from the internal clock, also into a deque.  On average the Pi was drifting 3.6 secs/day or 0.0025 secs/min.  Because it was constantly recalculating this value I corrected for thermal effects and other physical factors and the drift was remarkably stable.

     

  3. oAuth, the web and twitter.  Twitter is REST based and if I were building an app to make some cash, I’d probably either be really picky about choosing a client library or implement something myself.  But there was no need of it here, so I checked the Twitter API docs and picked tweepy.

     

    auth = tweepy.OAuthHandler(consumer_key, consumer_secret) 
    auth.secure = True auth.set_access_token(access_token, access_token_secret) 
    api = tweepy.API(auth)
    # If the authentication was successful, you should 
    # see the name of the account print out print(api.me().name) 
    # If the application settings are set for "Read and Write" then 
    # this line should tweet out the message to your account's timeline.
    api.update_status('Updating using OAuth authentication via Tweepy!')
    
    
    
    
    

     

    I gave my app permission to my feed, including updates (DANGER!) generated keys and that was about it.  Tweepy  makes it really easy to use to tweet, and pretty nicly hides the oAuth foo.

     

  4. The RESTfull bit.  As sloppy as NTP really is, it’s nothing compared to the highly variable latency of web transactions.  With a REST call, especially to a SaaS service, there are exactly 10^42 things that can affect round trip times.  The solution was twofold.  First make sure the most variable transaction – oAuth – happened well in advance of the actual tweet. Second, you need to know what the average LAN -> gateway -> internet-> Twitter REST service delay is.  Turns out, you guessed it, it’s easy to use a third deque object to do some test polls and keep a moving average to at least guestimate future web delay.

     

  5. Putting it all together - the ugly bit. The program pseudocode looked a little something like this:

     

    // For all code twitterTime = time.time() – {offset rolling average} – {predicted accumulating drift}
    // Gives the corrected network time rather than the actual CPU time.
    Do the oAuth
    While (twitterTime < sendTime - 20)
    {
       Do the NTP moving average poll
       Update the clock drift moving average
       Update the REST transaction latency moving average 
       Wait 10 minutes     
    }
    While (twitterTime < sendTime - 2)
    {
       Do the NTP moving average poll
       Update the clock drift moving average
       Wait  1 minute
    }
    While (twitterTime < sendTime – {RESTlatency moving average})
    {
       Sleep 1 tick // tight loop
    }
    Send Tweet
    Write tweet time and debug info to a file
    End
    
    
    
    
    

 

Move Every Tweet, For Great Justice

 

I watched my Twitter feed Saturday morning from the bleachers at kickball practice, and sure enough at ~9:26 am, there it was.  This morning with a little JSON viewing I confirmed it was officially received in the 53rd second of that minute.

 

Why do geeks do something like this?  Because it’s our mountain, it’s there and we must climb it.  There won’t be another Pi day like this, making it singular and special and in need of remembrance.  So, we do what we do. The only question is how closely did I hit the 589th millisecond?  Maybe if I ask Twitter, really nicely...

Support centers in organizations are under constant pressure due to increasing volume of service tickets and increasing end-users to manage. The complexity and diversity of support cases make it all the more difficult to provide timely resolution considering the lean support staff and tight deadlines. So, how can help desk admins increase the efficiency of the help desk process, and ultimately result in faster service delivery? Considering all the things you do, the question to ask next is: “Where can I save time in all my daily goings-on?” Conserving time on repetitive, less important, and menial tasks can help you gain that time for actual ticket resolution.

 

Here are 5 useful time-saving strategies for improved help desk productivity:

 

#1 DON’T GO INVENTING FIXES. SOMEONE MIGHT HAVE ALREADY DONE THAT.Help Desk Flash 2.PNG

Not all service tickets are unique and different from one another. It is highly common to have had different users face the same issue in the past. The smart way here is to track repeating help desk tickets, their technician assignment, and capture the best resolution applied in an internal knowledge base. This way, it will be never be a new issue to deal with from scratch. Any new technician can look up the fix, and resolve the issue quickly.

 

#2 KNOW WHAT YOUR ARE DEALING WITH.

Before jumping the gun and assuming you know exactly what problem you are dealing with and start fixing it, make sure you have elicited all the details about the issue from the end-user. Sometimes it might just be that the user doesn’t know how to use something, or it is such a simple fix that the user can do it himself. So, don’t settle for vague descriptions from tickets. Make sure you get as many details from the user about the issue, before you start providing the solution.

 

#3 PROMOTE END-USER SELF-SERVICE.

If your user base is growing and you are receiving tickets for commonplace issues with easy fixes, it is time you start thinking about building an internal self-service portal with updated how-to’s and FAQs to help users resolve Level 0 issues themselves. Password reset is still a top call driver for support teams. Self-service portals will free up a fair share of IT admins’ time if this could be automated.

 

#4 ESCALATE WHEN YOU CAN’T RESOLVE.

While you might feel capable of resolving any level of support ticket, there will be a time when you face some technical challenges. Finding the cause for slow database response time may not be your forte. That pesky VM always reports memory exhaustion no matter what you do. These are times you must act on judgment and escalate the issue to another technician or your IT manager. Getting all worked up on the same issue (going only off on a hunch) will not only delay resolution, but will result in more tickets piling up. Make sure your help desk has proper escalation, de-escalation, and automated ticket routing functionality in cases where SLAs are not met.

 

#5 DO IT REMOTELY.

Yes, personal human contact is the best possible means of communication. However, it can cost you handsomely in time and money if you start visiting your end-users one by one for desktop support. So many service tickets can be resolved remotely if you conduct a remote session of the user’s system. And if you have additional remote administration tools, you can master the art of telecommuting for IT support.

 

What other tricks of the trade do you have to up your sleeve to help fellow IT pros speed up customer support?

Leon Adato

Sour Notes in iTunes

Posted by Leon Adato Mar 11, 2015

On Monday, iTunes was down. But we all expected that because Apple was holding its “Spring Ahead” event, and was poised to announce a slate of new products.

 

Today, iTunes was down again (or at least parts of it) and this was very NOT expected.

 

The first report of the outage appeared on TheNextWeb.com. They noted that  iTunes connect was down, you could see music but not buy it, and several app pages were dead when you click them.

 

As is the case with most short-term outages (Apple responded and resolved it within an hour or two) we will likely never know what really happened. And that’s fine. I’m not on the iTunes internal support team so I don’t need the ugly details.

 

But it's always fun to guess, right? Armchair quarterbacking an outage is the closest to sports that some of us I.T. Pro's get.

 

First, I ruled out security. A simple DDOS or other targeted hack would have defaced the environment, taken out entire sections (or the whole site), and made a much larger mess of things.

 

Second, I took simple network issues off the list. Having specific apps, song purchasing, and individual pages die is not the profile of a failure in routing, bandwidth, or even load balancing.

 

My first choice was Storage – if the storage devices that contain the actual iTunes songs as well as app downloads were affected that would explain why we saw failures once we got past those initial pages. It could have explained why the failure is geographic (UK and US) but we didn't hear about failures in other parts of the world.

 

My runner-up vote went  to Database – corrupt records in the database that houses the CMS which undoubtedly drives the entire iTunes site. Having specific records corrupted would explain why some pages worked and others don’t.

 

Then CNBC published a statement from Apple apologizing for the outage and explaining it was an internal DNS problem.

 

Whatever the reason, this failure underscores why today’s complex, inter-connected, cloud and hybrid cloud environments need monitoring that is both specific and holistic.

 

Specific because it needs to pull detailed data about disk and memory IOPS, errored packets, application pool member status, critical service status (like DNS), synthetic tests against key elements (like customer purchase actions), and more.

 

Holistic because we now need a way to view the way write errors on a single disk in an array affects the application running on a VM that uses the array in its datastore. We need to see when a DNS resolution fails (before the customer tries it) and correlate that to the systems that depend on those name resoolutions.

 

That means monitoring that can take in the entire environment top to bottom.

 

Yes, I mean AppStack.

 

Hey, Apple internal support: If you want us to set up a demo for you, give us a call.

Filter Blog

By date:
By tag: