1 6 7 8 9 10 Previous Next

Geek Speak

1,593 posts

Double double, toil and trouble… Something wicked this way comes. Our third annual Bracket Battle is upon us.

 

Mwuh huh huh huh ha!

 

On March 23, thirty-three infamous individuals begin the battle to rip worlds apart and crush each other until only one remains as the most despicable of all time. It’s VILLAINS time, people!

 

These head-to-head, villain-versus-villain matchups should once again spark controversy. Trust us… every year, we get something stirred up – whether is it the absence of someone, or the overrating of another. Y’all are a hard group to please!

 

We have included a wide range of our worst enemies, including cunning and depraved villains who have tried to rule Middle-earth, Castle Greyskull, Asgard, Springfield and beyond. Draw your weapons; it’s time to decide:

 

  • Demi-god or Immortal?
  • Lightsaber or Wand?
  • Clown Prince or Dr. Fava Bean?
  • The Dragon versus The Ring?

 

We came up with the field and decided where we would start, but the power is yours to decide who will be foiled again and which single scoundrel, in the end, will rule them all.

 

But, in a twist no one saw coming, we are changing things up this year.  We are setting up a little trap, ummm, no… giving you a PREVIEW of the bracket today, even though voting will not begin until Monday.

 

Dastardly plans (AKA Rules of Engagement) are outlined below…

 

MATCH UP ANALYSIS

  • For each combatant, we offer links to the best Wikipedia reference page by clicking on the NAME link in bracket
  • A breakdown of each match-up is available by clicking on the VOTE link.
  • Anyone can view the bracket and the match-up descriptions, but to comment and VOTE you must be a thwack member (and logged IN). 

 

VOTING

  • Again, you have to be logged in to vote and debate… 
  • You may only vote ONCE for each match up
  • Once you vote on a match, click the link to return to the bracket and vote on the next match up in the series.
  • Each vote gets you 50 thwack points!  So, over the course of the entire battle you have the opportunity to rack up 1550 points.  Not too shabby…

 

CAMPAIGNING

  • Please feel free to campaign for your favorites and debate the merits of our match ups to your hearts content in the comments section and via twitter/Facebook/Google + etc. etc. etc.
  • We even have hashtags… #swbracketbattle and #EvilLaugh… to make it a little bit easier.
  • There will be a PDF version of the bracket available to facilitate debate with your hencemen.
  • And, if you want to post pics of your bracket predictions, we would love to see them on our Facebook page!

 

SCHEDULE

  • Bracket Release is TODAY
  • Every round voting will begin at 10 am CT
  • Play-in Battle OPENS March 23
  • The Mischievous round OPENS March 25
  • The Rotten round OPENS March 30
  • The Wicked round OPENS April 2
  • The Vile round OPENS April 6
  • The Diabolical Battle OPENS April 9
  • And, finally, the one true OVERLORD will be announced on APRIL 13

 

If you have other questions… feel free to drop them below and we will get right back with you!

 

So, which of these villains we love to hate can plot and scheme their way to the top of this despicable heap? Whose dastardly plans will rip apart worlds and crush humanity?

 

OK, we will stop our monologue and let you decide.

A finely-tuned NMS is at the heart of a well-run network. But it’s easy for an NMS to fall into disuse. Sometimes that can happen slowly, without you realizing. You need to keep re-evaluating things, to make sure that the NMS is still delivering.

 

Regular Checkups

Consultants like me move around different customers. When you go back to a customer site after a few weeks/months, you can see step changes in behavior. This usually makes it obvious if people are using the system or not.


If you're working on the same network for a long time, you can get too close to it. If behaviors change slowly over time, it can be difficult to detect. If you use the NMS every day, you know how to get what you want out of it. You think it's great. But casual users might struggle to drive it. If that happens, they'll stop using it.


If you're not paying attention, you might find that usage is declining, and you don't realize until it’s too late. You need to periodically take an honest look at wider usage, and see if you're seeing any of the signs of an unloved NMS.

 

Signs of an unloved NMS

Here’s some of the signs of an unloved NMS. Keep an eye out for these:

  1. Too many unacknowledged alarms
  2. No-one has the NMS screen running on their PC - they only login when they have to
  3. New devices not added

 

What if things aren't all rosy?

So what if you've figured out that maybe your NMS isn't as loved as you thought. What now? First don't panic. It's recoverable. Things you can do include:

  1. Talk. Talk to everyone. Find out what they like, and what’s not working. It might just be a training issue, or it might be something more. Maybe you just need to show them how to set up their homepage to highlight key info.
  2. Check your version. Some products are evolving quickly. Stay current, and take advantage of the new features coming out. This is especially important with usability enhancements.
  3. Check your coverage: Are you missing devices? Are you monitoring all the key elements on those devices? Keep an ear to the ground for big faults and outages: Is there any way your NMS could have helped to identify the problem earlier? If people think that the NMS has gaps in its coverage, they won't trust it.

 

All NMS platforms have a risk of becoming shelfware. Have you taken a good honest look recently to make sure yours is still working for you? What other signs do you look for to check if it's loved/loathed/ignored? What do you do if you think it might be heading in the wrong direction?

This is a conversation I have A LOT with clients. They say we want "logfile monitoring" and I am not sure what they mean. So I end up having to unwind all the different things it COULD be, so we can get to what it is they actually need.

 

It's also an important clarification for me to make as SolarWinds Head Geek because depending on what the requested means, I might need to point them toward Kiwi Syslog Server, Server & Application Monitor, or Log & Event Manager (LEM).

 

Here’s a handy guide to identify what people are talking about. “Logfile monitoring” is usually applied to 4 different and mutually exclusive areas. Before you allow the speaker to continue, please ask them to clarify which one they are talking about:

  1. Windows Logfile
  2. Syslog
  3. Logfile aggregation
  4. Monitoring individual text files on specific servers

 

More clarification on each of these areas below:

Windows Logfile

Monitoring in this area refers specifically to the Windows event log, which isn’t actually a log “file” at all, but a database unique to Windows machines.

 

In the SolarWinds world, the tool that does this is Server & Application Monitor (SAM). Or if you are looking for a small, quick, and dirty utility, the Eventlog Forwarder for Windows will take Eventlog messages that match a search pattern and pass them via Syslog to another machine.

 

Syslog

Syslog is a protocol, which describes how to send a message from one machine to another on UDP port 514. The messages must fit a pre-defined structure. Syslog is different from SNMP Traps. This protocol is most often found when monitoring network and *nix (Unix, Linux) devices, although network and security devices send out their fair share as well.

 

In terms of products, this is covered natively by Network Performance Monitor (NPM), but as I've said often you shouldn't send syslog or trap directly to your NPM primary poller. You should send it into a syslog/trap "filtration" first. And that would be the Kiwi Syslog server (or its freeware cousin).

 

Logfile aggregation

This technique involves sending (or pulling) log files from multiple machines and collecting them on a central server. This collection is done at regular intervals. A second process then searches across all the collected logs, looking for trends or patterns in the enterprise. When the audit and security groups talk about “logfile monitoring,” this is usually what they mean.

 

As you may have already guessed, the SolarWinds tool for this job is Log & Event Manager (LEM). I should point out that LEM will ALSO receive syslog and traps, so you kind of get a twofer if you have this tool. Although, I personally STILL think you should send all of your syslog and trap to a filtration layer, and then send the non-garbage messages to the next step in the chain (NPM or LEM).

 

Monitoring individual text files on specific servers

This activity focuses on watching a specific (usually plain text) file in a specific directory on a specific machine, looking for a string or pattern to appear. When that pattern is found, an alert is triggered. Now it can get more involved than that—maybe not a specific file, but a file matching a specific pattern (like a date); maybe not a specific directory, but the newest sub-directory in a directory; maybe not a specific string, but a string pattern; maybe not ONE string, but 3 occurrences of the string within a 5 minute period; and so on. But the goal is the same—to find a string or pattern within a file.

 

Within the context of SolarWinds, Server & Application Monitor has been the go-to solution for this type of thing. But, at this moment it’s only through a series of Perl, Powershell, and VBScript templates.

 

We know that’s not the best way to get the job done, but that's a subject for another post.

 

The More You Know…

For now, it's important that you are able to clearly define—for both you and your colleagues, customers, and consumers—the difference between "logfile monitoring" and which tool or technique you need to employ to get the job done.

Remote control software is a huge benefit to all IT staff when troubleshooting an issue. There are big benefits for using a service provider to host this functionality for you. There are many reasons, mainly security, to not use a service provider and instead host this application internally. However, internally hosting a remote control application can cost more in capital expenditure and overhead.

 

When you host something in the cloud you are giving that service provider responsibility for a significant portion of your security control. Even for something as simple as remote control software there are concerns about security. For many solutions you have to rely on the authentication mechanism the provider built, although some will allow you to tie authentication into your internal Active Directory. The provider may allow for two-factor authentication. You have to rely on the provider’s encryption mechanism and trust that all signaling (setup, control, and tear down) and data traffic is encrypted, along with the appropriate algorithms. The remote control service provider not only services your hosts, but that of many other organizations and you have trust them to keep everyone separated. Also, with all of those combined hosts, it makes the service provider a larger target for an attack than your organization may be on it’s own. When your organization’s Internet connection goes down you loose the ability to control any of your end hosts from the internal side of your organization’s network. When you delete an end host or discontinue service from the provider you data might not be completely deleted.

 

Hosting a remote control application within your own organization can be difficult in itself. You have to have the infrastructure to host the application. Then if you want redundancy, the application has to support redundancy and you have to have more infrastructure. Then you need to make sure you update the application on your server(s), on top of ensuring the end hosts are up to date, which requires planning, testing, and change control. If you expose your internal remote control application to the Internet, like a service provider would, then you need to monitor it for potential intrusions and attacks, and defend against those. That may require additional infrastructure and add complexity. If your organization’s Internet connection goes down and you are on the inside of your organization, then you loose connectivity to all of the remote hosts. If you are external, then you loose connectivity to all of the internal hosts.

 

There is no one solution that fits everyone’s needs. As a consultant I have seen many different solutions and have ones that I prefer. Do you use a remote control solution from a service provider or do you have one you host yourself? Why did your organization choose that one?

tweet.pngI’ve had a few members ask about my silly Pi Day TwitterBot hack, and why someone would even want to do such a thing.  The real answer is a geek compulsion, but the thinking went something like this:

 

Pi Day 2015 was going to give us Pi via the date to 10 digits, but if you included the milliseconds you could go to 13 digits.  Well, to be fair you could go to 1,000,000 digits if you had a sufficiently accurate timer to produce a decimal second with enough accuracy.  But let’s face it; true millisecond accuracy in IT gear is unlikely anyway.   Are you happy with the clarification realtime programmers?!

 

IMG_8762.JPG

I realized that if I could loop tight enough to trigger at a discrete millisecond boundary I could do something at that fateful moment.  And because a geek can, a geek should.  So what should happen at that moment? Do a SQL update, save a file, update a config?  No, there was only one thing to do: tweet.  The next trick was to create a bot. I use Raspberry Pi’s for just about all maker projects now.  After years of playing with microcontrollers I finally switched over.  Pi’s are cheaper than Arduinos when you consider adding IO, they run a full Linux OS, and many add-on boards work with them. (And yes I monitor my Pi’s at home with Orion, so be sure to check out Wednesday’s SolarWinds Lab which is all about monitoring Linux.)

 

But neither Arduinos nor Pi's have is real-time clocks, which is a little bit of a problem if you’re planning to do time sensitive processing.  So here’s the general setup for the project, and I’ll save you the actual code because 1) I made it in < 20 minutes, 2) no one will ever need it again and 3) mostly because it’s so ugly I’m embarrassed.  I used Python because there were libs aplenty.

 

The Hack

 

  1. Find “real enough time” i.e. accurate offset.  I’m too cheap to buy a GPS module, so I used NTP. However, single NTP syncs aren’t nearly enough to get millisecond-ish accuracy, plus the Raspberry’s system (CPU) clock drifts a bit.  So first, we need to keep a moving average of the offset and I used ntplib

     

    c = ntplib.NTPClient()
    response = c.request('europe.pool.ntp.org', version=3)
    >>> response.offset
    -0.143156766891
    >>> response.root_delay
    0.0046844482421875
    
    
    
    
    

     

     

    Next, poll 30 times in a minute and, deposit the results into a collections.deque.  It’s a double ended buffer object, meaning you can add or remove items from either end.  (It’s easier to implement than a circular buffer).  Adjusting the overall length in 30 sample increments lets you expand the running average beyond a single update cycle.

     

  2. Keep an eye on clock drift.  The actual trigger loop on the Raspberry would need to hammer the CPU and I didn’t want to get into a situation where I’d be about to trigger on the exact millisecond but get hit with an NTP update pass.  To do that I’d need to fire based on a best guess of the accumulated drift since the previous sync.  So, whenever the NTP sync fired, I saved the previous average offset delta from the internal clock, also into a deque.  On average the Pi was drifting 3.6 secs/day or 0.0025 secs/min.  Because it was constantly recalculating this value I corrected for thermal effects and other physical factors and the drift was remarkably stable.

     

  3. oAuth, the web and twitter.  Twitter is REST based and if I were building an app to make some cash, I’d probably either be really picky about choosing a client library or implement something myself.  But there was no need of it here, so I checked the Twitter API docs and picked tweepy.

     

    auth = tweepy.OAuthHandler(consumer_key, consumer_secret) 
    auth.secure = True auth.set_access_token(access_token, access_token_secret) 
    api = tweepy.API(auth)
    # If the authentication was successful, you should 
    # see the name of the account print out print(api.me().name) 
    # If the application settings are set for "Read and Write" then 
    # this line should tweet out the message to your account's timeline.
    api.update_status('Updating using OAuth authentication via Tweepy!')
    
    
    
    
    

     

    I gave my app permission to my feed, including updates (DANGER!) generated keys and that was about it.  Tweepy  makes it really easy to use to tweet, and pretty nicly hides the oAuth foo.

     

  4. The RESTfull bit.  As sloppy as NTP really is, it’s nothing compared to the highly variable latency of web transactions.  With a REST call, especially to a SaaS service, there are exactly 10^42 things that can affect round trip times.  The solution was twofold.  First make sure the most variable transaction – oAuth – happened well in advance of the actual tweet. Second, you need to know what the average LAN -> gateway -> internet-> Twitter REST service delay is.  Turns out, you guessed it, it’s easy to use a third deque object to do some test polls and keep a moving average to at least guestimate future web delay.

     

  5. Putting it all together - the ugly bit. The program pseudocode looked a little something like this:

     

    // For all code twitterTime = time.time() – {offset rolling average} – {predicted accumulating drift}
    // Gives the corrected network time rather than the actual CPU time.
    Do the oAuth
    While (twitterTime < sendTime - 20)
    {
       Do the NTP moving average poll
       Update the clock drift moving average
       Update the REST transaction latency moving average 
       Wait 10 minutes     
    }
    While (twitterTime < sendTime - 2)
    {
       Do the NTP moving average poll
       Update the clock drift moving average
       Wait  1 minute
    }
    While (twitterTime < sendTime – {RESTlatency moving average})
    {
       Sleep 1 tick // tight loop
    }
    Send Tweet
    Write tweet time and debug info to a file
    End
    
    
    
    
    

 

Move Every Tweet, For Great Justice

 

I watched my Twitter feed Saturday morning from the bleachers at kickball practice, and sure enough at ~9:26 am, there it was.  This morning with a little JSON viewing I confirmed it was officially received in the 53rd second of that minute.

 

Why do geeks do something like this?  Because it’s our mountain, it’s there and we must climb it.  There won’t be another Pi day like this, making it singular and special and in need of remembrance.  So, we do what we do. The only question is how closely did I hit the 589th millisecond?  Maybe if I ask Twitter, really nicely...

Support centers in organizations are under constant pressure due to increasing volume of service tickets and increasing end-users to manage. The complexity and diversity of support cases make it all the more difficult to provide timely resolution considering the lean support staff and tight deadlines. So, how can help desk admins increase the efficiency of the help desk process, and ultimately result in faster service delivery? Considering all the things you do, the question to ask next is: “Where can I save time in all my daily goings-on?” Conserving time on repetitive, less important, and menial tasks can help you gain that time for actual ticket resolution.

 

Here are 5 useful time-saving strategies for improved help desk productivity:

 

#1 DON’T GO INVENTING FIXES. SOMEONE MIGHT HAVE ALREADY DONE THAT.Help Desk Flash 2.PNG

Not all service tickets are unique and different from one another. It is highly common to have had different users face the same issue in the past. The smart way here is to track repeating help desk tickets, their technician assignment, and capture the best resolution applied in an internal knowledge base. This way, it will be never be a new issue to deal with from scratch. Any new technician can look up the fix, and resolve the issue quickly.

 

#2 KNOW WHAT YOUR ARE DEALING WITH.

Before jumping the gun and assuming you know exactly what problem you are dealing with and start fixing it, make sure you have elicited all the details about the issue from the end-user. Sometimes it might just be that the user doesn’t know how to use something, or it is such a simple fix that the user can do it himself. So, don’t settle for vague descriptions from tickets. Make sure you get as many details from the user about the issue, before you start providing the solution.

 

#3 PROMOTE END-USER SELF-SERVICE.

If your user base is growing and you are receiving tickets for commonplace issues with easy fixes, it is time you start thinking about building an internal self-service portal with updated how-to’s and FAQs to help users resolve Level 0 issues themselves. Password reset is still a top call driver for support teams. Self-service portals will free up a fair share of IT admins’ time if this could be automated.

 

#4 ESCALATE WHEN YOU CAN’T RESOLVE.

While you might feel capable of resolving any level of support ticket, there will be a time when you face some technical challenges. Finding the cause for slow database response time may not be your forte. That pesky VM always reports memory exhaustion no matter what you do. These are times you must act on judgment and escalate the issue to another technician or your IT manager. Getting all worked up on the same issue (going only off on a hunch) will not only delay resolution, but will result in more tickets piling up. Make sure your help desk has proper escalation, de-escalation, and automated ticket routing functionality in cases where SLAs are not met.

 

#5 DO IT REMOTELY.

Yes, personal human contact is the best possible means of communication. However, it can cost you handsomely in time and money if you start visiting your end-users one by one for desktop support. So many service tickets can be resolved remotely if you conduct a remote session of the user’s system. And if you have additional remote administration tools, you can master the art of telecommuting for IT support.

 

What other tricks of the trade do you have to up your sleeve to help fellow IT pros speed up customer support?

Leon Adato

Sour Notes in iTunes

Posted by Leon Adato Mar 11, 2015

On Monday, iTunes was down. But we all expected that because Apple was holding its “Spring Ahead” event, and was poised to announce a slate of new products.

 

Today, iTunes was down again (or at least parts of it) and this was very NOT expected.

 

The first report of the outage appeared on TheNextWeb.com. They noted that  iTunes connect was down, you could see music but not buy it, and several app pages were dead when you click them.

 

As is the case with most short-term outages (Apple responded and resolved it within an hour or two) we will likely never know what really happened. And that’s fine. I’m not on the iTunes internal support team so I don’t need the ugly details.

 

But it's always fun to guess, right? Armchair quarterbacking an outage is the closest to sports that some of us I.T. Pro's get.

 

First, I ruled out security. A simple DDOS or other targeted hack would have defaced the environment, taken out entire sections (or the whole site), and made a much larger mess of things.

 

Second, I took simple network issues off the list. Having specific apps, song purchasing, and individual pages die is not the profile of a failure in routing, bandwidth, or even load balancing.

 

My first choice was Storage – if the storage devices that contain the actual iTunes songs as well as app downloads were affected that would explain why we saw failures once we got past those initial pages. It could have explained why the failure is geographic (UK and US) but we didn't hear about failures in other parts of the world.

 

My runner-up vote went  to Database – corrupt records in the database that houses the CMS which undoubtedly drives the entire iTunes site. Having specific records corrupted would explain why some pages worked and others don’t.

 

Then CNBC published a statement from Apple apologizing for the outage and explaining it was an internal DNS problem.

 

Whatever the reason, this failure underscores why today’s complex, inter-connected, cloud and hybrid cloud environments need monitoring that is both specific and holistic.

 

Specific because it needs to pull detailed data about disk and memory IOPS, errored packets, application pool member status, critical service status (like DNS), synthetic tests against key elements (like customer purchase actions), and more.

 

Holistic because we now need a way to view the way write errors on a single disk in an array affects the application running on a VM that uses the array in its datastore. We need to see when a DNS resolution fails (before the customer tries it) and correlate that to the systems that depend on those name resoolutions.

 

That means monitoring that can take in the entire environment top to bottom.

 

Yes, I mean AppStack.

 

Hey, Apple internal support: If you want us to set up a demo for you, give us a call.

Users only call the HelpDesk with problems. Some of the issues, like password resets, are easy to resolve. Other issues can get very complex and then add into the mix the user not properly describing the issue they are having or exactly what the error message they see says. When helping a user with an issue, have you ever asked a user to click on something here or there and let you know what pops up on the screen? How long did you wait until you asked if anything different is on the screen and the user says that something was displayed several minutes ago?

 

 

I am a very visual person and I need to see the error or see how long it took for the error message to pop up. An error message that comes back right away could mean something completely different than if it took a few seconds; users cannot really convey that timing well. Years ago when I first started working in IT, I used a product called PCAnywhere that would let me remote control another machine. I could even do it remotely from home via dial up!

 

 

The ability to remotely see what is happening on a user’s machine makes a huge difference. Today I use a variety of these applications, depending on what my client will support, but they all have a large set of features beyond just remotely controlling the machine. Solarwinds DameWare lets you remotely reboot, start/stop processes, view logs, AD integration, and mobile remote control. Other than remote control of a machine, what other features do you use? Which features make it easier for you to troubleshoot issues from wherever you are?

As an admin, how do you ensure that you don’t run out of disk space? In my opinion, thin provisioning is the best option. It reduces the amount of storage that needs to be purchased for any application to start working. Also, monitoring thin provisioning helps you understand the total available free space and thus you can allocate more storage dynamically (when needed). In a previous blog I wrote, I explained how thin provisioning works and the environment it can be useful in. Now I’d like to discuss the different approaches for converting from fat volume to thin.


Once you’ve decided to move forward with thin provisioning, you can start implementing all your new projects with minimum investment. With thin provisioning, it’s very important to account for your active data (in fat volume) and to be aware of challenges you might encounter. For example, when conducting a regular copy of existing data from fat volumes to thinall the blocks associated with fat volume will be copied to the thin, ultimately wasting any benefits from thin provisioning.


There’s several ways to approach copying existing data. Let’s look at a few:


File copy approach

This is the oldest approach for migrating data from fat volume to thin volume. In this method, the old fat data is backed up at the file level and restored as new thin data. The disadvantage of this type of backup and restore is that it’s very time consuming. In addition, this type of migration can cause interruption to the application. However, an advantage to the file copy approach is that it marks the zero value blocks as available to be overwritten.


Block-by-block copy approach

Another common practice is using a tool that does a block-by-block copy from an old array (fat volume) to a new thin volume. This method offers much higher performance compared to the file copy method. However, the drawback to this method is the zero-detection issuemeaning fat volumes will have unused capacity which will be filled with zero’s awaiting the eventual probability of an application writing data to it. So, when you do general migration by copying block-by-block data from array to the new, you receive no benefit from thin provisioning. The copied data will have unused space with zero-blocks, and you end up with wasted space.


Zero-detection

A tool that can handle zero block detection can also be used. The tool should remove the zero valued blocks, while copying the old array to the new. This zero-detection technology can be software based or hardware based. Both software and hardware based fat to thin conversions can help remove zero blocks. However, the software based fat to thin conversion has a disadvantagethe software needs to be installed on a server. That means this software will consume large amounts of server resources and will impact other server activities. The hardware based fat to thin conversion also has a disadvantageit’s on the expensive side.


As discussed, all the methods to convert from fat volumes to thin have advantages and disadvantages. But, you cannot continue using traditional provisioning or fat provisioning for storagesince fat provisioning wastes money and results in poor storage utilization. Therefore, I highly advise using thin provisioning in your environment, but make sure you convert your old fat volumes to thin ones before you do.

 

After you have implemented thin provisioning, you can start the over-committing of storage space. Be sure to keep an eye out for my upcoming blog where I will discuss the over-commitment of storage. 

If you haven’t heard already, SolarWinds’ Head Geeks are available for daily live chat, Monday-Thursday for the month of March at 1:00PM CT.  kong.yang, adatole, sqlrockstar and yes me too, will be online to help answer any questions you may have about products, best practices, or general IT.  Though unlikely, some chump stumpage may occur, so we’ll also have experts from support and dev to make sure we have the best answer for anything you can throw at us.  You’ll find us here http://thwack.com/officehours on the Office Hours event page in thwack.

 

My Question

 

Daily Office Hours is part of a thwack & Head Geek experiment to test new ways for the community to reach product experts.   I’m also testing a new tag line for our fortnightly web TV show, SolarWinds Lab http://lab.solarwinds.com, and would love your feedback before I rebuild the show header graphic.

 

What do you think of: What Will You Solve Next?

 

It means something specific to me, but I’d love to get your feedback before I say what I think it means.  Please leave some comments below.  Do you like it?  Is it the kind of thing we ask each other on thwack?  Is it something SolarWinds does/should ask?  Am I taking liberties with the tm?  After you all chime in and let me know what you think, I’ll reply with what I think it means.

 

Thanks as always, we hope to see you in March for Office Hours!

Of all of the security techniques, few garner more polarized views than interception and decryption of trusted protocols. There are many reasons to do it and a great deal of legitimate concerns about compromising the integrity of a trusted protocol like SSL. SSL is the most common protocol to intercept, unwrap and inspect and accomplishing this has become easier and requires far less operational overhead than it did even 5 years ago. Weighing those concerns against the information that can be ascertained by cracking it open and looking at its content is often a struggle for enterprise security engineers due to the privacy implied. In previous lives I have personally struggled to reconcile this but have ultimately decided that the ethics involved in what I consider to be violation of implied security outweighed the benefit of SSL intercept. With other options being few, blocking protocols that obfuscate their content seems to be the next logical option, however, with the prolific increase of SSL enabled sites over the last 18 months, even this option seems unrealistic and frankly, clunky. Exfiltration of data, being anything from personally identifiable information to trade secrets and intellectual property is becoming a more and more common "currency" and much more desirable and lucrative to transport out of businesses and other entities. These are hard problems to solve.

Are there options out there that make better sense? Are large and medium sized enterprises doing SSL intercept? How is the data being analyzed and stored?

Is User Experience (UX) monitoring going to be the future of network monitoring? I think that the changing nature of networking is going to mean that our devices can tell us much more about what’s going on. This will change the way we think about network monitoring.


Historically we’ve focused on device & interface stats. Those tell us how our systems are performing, but don't tell us much about the end-user experience. SNMP is great for collecting device & interface counters, but it doesn't say much about the applications.


NetFlow made our lives better by giving us visibility into the traffic mix on the wire. But it couldn't say much about whether the application or the network was the pain point. We need to go deeper into analysing traffic. We've done that with network sniffers, and tools like Solarwinds Quality of Experience help make it accessible. But we could only look at a limited number of points in the network. Typical routers & switches don't look deep into the traffic flows, and can't tell us much.


This is starting to change. The new SD-WAN (Software-Defined WAN) vendors do deep inspection of application performance. They use this to decide how to steer traffic. This means they’ve got all sorts of statistics on the user experience, and they make this data available via API. So in theory we could also plug this data into our network monitoring systems to see how apps are performing across the network. The trick will be in getting those integrations to work, and making sense of it all.


There are many challenges in making this all work. Right now all the SD-WAN vendors will have their own APIs and data exchange formats. We don't yet have standardised measures of performance either. Voice has MOS, although there are arguments about how valid it is. We don't yet have an equivalent for apps like HTTP or SQL.


Standardising around SNMP took time, and it can still be painful today. But I'm hopeful that we'll figure it out. How would it change the way you look at network monitoring if we could measure the user experience from almost any network device? Will we even be able to make sense of all that data? I sure hope so.

kong.yang

A Geek's Guide to AppStack

Posted by kong.yang Mar 5, 2015

What is the Geek's Guide to AppStack? Simply put, it's the central repository for all tech content involving the AppStack. If data and applications are important to you, the AppStack Dashboard is for you. The AppStack Management Bundle enables agility and scalability in monitoring and troubleshooting applications. And this blog post will continue to be updated as new AppStack content is created. So bookmark and favorite this post as your portal to all things AppStack. Also, if there is anything that you would like to discuss around AppStack, please comment below.


 

AppStack IsAppStack How-to


 

Application Relationship Mapping for Fast Root Cause Analysis: Application Stack Dashboard

 

Application Stack Dashboard: How to Set Up, Use and Customize


Download the Application Stack Management Bundle:


AppStack Reference

Hear what the Head Geeks had to say in the AppStack Blog Series:

Ambassador BlogsApplication-Centric Monitoring:

AppStack Social Trifecta:

Helpful AppStack Resources:

SolarWinds Tech Field Day Coverage:

There’s two ways to get things donethe hard way or the easy way. The same holds true with help desk management. There are many micro to small businesses who do not have the resources to manage their help desk, and end up spending more time and effort doing the job manuallyby tracking tickets via email, updating statuses on spreadsheets, walking over to the customer’s desk to resolve tickets, etc. This is a tedious and time-consuming process, highly ineffective, and causes delays & SLA lapses.

 

Streamlining the help desk process and employing automation to simplify tasks and provide end-user support is the other waythe smart way. If you know what tools to use, and how to get the best benefits from them, you can achieve help desk automation cost-effectively.

 

Here are a few things you should automate:

  • Ticketing management: from ticket creation, technician assignment, tracking, to resolution
  • Asset Management: scheduled asset discovery, association of assets to tickets, managing inventory (PO, warranty, parts, etc.)
  • Desktop Support: ability to connect to remote systems from the help desk ticket for accelerated support

 

Take a look at this Infographic from SolarWinds to understand the benefits of centralized and organized help desk management.

1411_hde_infographic_vF.png

 

Learn how to effectively manage IT service requests »

If you’ve worked in IT for any amount of time, you are probably aware of this story: An issue arisesthe application team blames the database, the database admin blames the systems, the systems admin blames the network, and the network team blames the application. A classic tale of finger pointing!

 

But, it’s now always the admins fault. We can’t forget about the usersoften the weakest link in the network.

 

Over the years, I think I’ve heard it all. Here are some interesting stories that I’ll never forget:

 

Poor wireless range


User:     Since we moved houses, my laptop isn’t finding my wireless signal.

Me:        Did you reconfigure your router at the new location?

User:     Reconfigure…what router?

 

The user had been using their neighbors signal at their previous house. I guess they just assumed they had free Wi-Fi?  However, this was almost a decade ago when people were unaware that they could secure their Wi-Fi.

 

Why isn’t my Wireless working?


User:     So, I bought a wireless router and configured it, but my desktop isn’t picking up the signal.

Me:        Alright, can you go to ‘Network Connections’ and check if your wireless adapter is enabled?

User:     Wait, I need a wireless adapter?

 

Loop lessons


I was at work and one of my coworkers…let’s call him the hyper enthusiastic newbie. Anyway, the test lab was under construction, lab devices were being configured and the production network wasn’t connected to the lab yet. After hours of downtime, the hyper enthusiastic newbie came to me and said:

 

Newbie:               I configured the switch, and then I wanted to test it.

Me:                        And?

Newbie:               I connected port 1 from our lab switch to a port on the production switch. It worked.

Me:                        Great.

Newbie:               And then to test the 2nd port, I connected it to another port on the production switch.

 

This is a practical lesson on what switching loopbacks can do to the network

 

Not your average VoIP trouble


A marketing team member’s VoIP phone goes missing. An ARP lookup showed that the phone was on a sales reps desk. The user decided to borrow the phone for her calls because hers wasn’t working. Like I said, not your average VoIP trouble.

 

One of my personal favorites: Where's my email?


User:     As you can see I haven’t received any email today.

Admin: Can you try expanding the option which says today?

 

Well, at least it was a simple fix.


Dancing pigs over reading warning messages


So, a user saw wallpaper of a ‘cute dog’ online. They decided to download and install it despite the 101 warning signs that his system threw at him. Before they knew it…issues started to arise: Malware, data corruption, and soon every system was down. Oh my!

 

Bring your own wireless


The self-proclaimed techie user plugs in his wireless travel router that also has DHCP enabled. This DHCP also first responds to a client that asks for an IP. As you all know, this can lead to complete Mayhem and is very difficult to troubleshoot.

 

Excuse me, the network is slow


I hear it all the time and for a number of reasons:

 

Me:        What exactly is performing slowly?

User:     This download was fine. But, after I reached the office, it has stopped.

Me:        That is because torrents are blocked in our network.

 

That was an employee with very high expectations.

 

Monitor trouble!


Often, our office provides a larger sized monitor to users who are not happy with their laptop screen size. That said:

User:     My extra monitor displays nothing but the light is on.

Me:       Er, you need to connect your laptop to the docking station.

User:     But I am on wireless now!

 

Due to all these instances, user education has been a priority at work. However, these situations still continue to happen. What are your stories? We’d love to hear them.

Filter Blog

By date:
By tag: