Skip navigation
1 6 7 8 9 10 Previous Next

Geek Speak

1,900 posts

Within the government, particularly the U.S. Defense Department, video traffic—more specifically videoconference calling—often is considered mission critical.

 

The Defense Department uses video for a broad range of communications. One of the most critical uses is video teleconference (VTC) inside and outside of the United States and across multiple areas of responsibility. Daily briefings—via VTC over an Internet protocol (IP) connection—help keep everyone working in sync to accomplish the mission. So, you can see why it is so important for the network to be configured and monitored to ensure that VTCs operate effectively.

 

VTC and network administration tasks boil down to a few key points:

 

  • Ensuring the VTC system is up and operational (monitoring).
  • Setting up the connections to other endpoints (monitoring).
  • Ensuring that the VTC connection operates consistently during the call (quality of service) Troubleshooting at the VTC system level (VTC administration), and after the connection to the network, the network administrator takes over to ensure that the connection stays alive (monitoring/configuration).

 

Ensuring Quality of Service for Video over IP

 

The DOD has developed ways to ensure successful live-traffic streaming over an IP connection. These requirements focus on ensuring that video streaming has the low latency and high throughput needed among all endpoints of a VTC. Configuring the network to support effective VTCs is challenging, but it is done through implementing quality of service (QoS).

 

You can follow these four steps:

 

Step 1: Establish priorities. VTC traffic will need high priority. Email would likely have the lowest priority, while streaming video (vs. VTC) will likely have a low priority as well.

 

Step 2: Test your settings. Have you set up your QoS settings so that VTC traffic has the highest priority?

 

Step 3: Implement your settings. Consider an automated configuration management tool to speed the process and eliminate errors.

 

Step 4: Monitor your network. Once everything is in place, monitor to make sure policies are being enforced as planned and learn about network traffic.

 

Configuring and Monitoring the Network

 

Network configuration is no small task. Initial configuration and subsequent configuration management ensures routers are configured properly, traffic is prioritized as planned and video traffic is flowing smoothly.

 

Network configuration management software that automates the configuration tasks of implementing complex QoS settings can be useful, and should support the automation of:

 

  1. Pushing out QoS settings to the routers- QoS settings are fairly complex to implement. It is important that implementation of settings is not done manually, due to errors.
  2. Validating that the changes have been made correctly- After the settings are implemented on a router, it is important to back up and verify the configuration settings.
  3. Configuration change notification.

 

Network monitoring tools help validate critical network information, and should provide you with the following information:

 

  1. When and where is my network infrastructure busy?
  2. Who is using the network at those hot spots and for what purpose?
  3. When is the router dropping traffic, and what types of packets are being dropped?
  4. Identify if your side or the far side of the VTC call systems are up and operational.
  5. Identify via node and interface baselines to identify abnormal spikes during the day.

 

What are your best practices for ensuring video traffic gets through? Do you have any advice you can share?

 

Find the full article on Signal.

In the first post of this series we took a look at the problems that current generation WANs don’t have great answers for.  In the second post of the series we looked at how SD-WAN is looking to solve some of the problems and add efficiencies to your WAN.

 

If you haven’t had a chance to do so already, I would recommend starting with the linked posts above before moving on to the content below.

 

In this third and final post of the series we are going to take a look at what pitfalls an SD-WAN implementation might introduce and what are some items you should be considering if you’re looking to implement SD-WAN in your networks.

 

Proprietary Technology

 

We've grown accustom to having the ability to deploy openly developed protocols in our networks and SD-WAN takes a step backwards when it comes to openness.  Every vendor currently in the market has a significant level of lock in when it comes to their technology.  There is no interoperability between SD-WAN vendors and nothing on the horizon that looks like this fact will change.  If you commit to Company X's solution, you will need to implement the Company X product in every one of your offices if you want it to have SD-WAN level features available.  Essentially we are trading one type of lock in (service-provider run MPLS networks or private links) for another (SD-WAN overlay provider). You will need to make a decision about which lock-in is more limiting to your business and your budget.  Which lock-in is more difficult to replace, the MPLS underlay or the proprietary overlay?

 

Cost Savings

 

The cost savings argument is predicated on the idea that you will be willing to drop your expensive SLA backed circuits and replace them with generic Internet bandwidth.  What happens if you are unwilling to drop the SLA? Well the product isn't likely to come out as a cost savings at all.  There is no doubt that you will have access to features that you don't have now, but your organization will need to evaluate whether those features are worth the cost and lock-in that implementing SD-WAN incurs.

 

Vendor Survivability

 

We are approaching (might be over at this point) 20 vendors which are claiming to provide SD-WAN solutions. There is no question that it is one of the hottest networking trends at the moment and many vendors are looking to monopolize.  Where will they be in a year?  5 years? Will this fancy new solution that you implemented be bought out by a competitor, only to be discarded a year or two down the line?  How do you pick winners and losers in a highly contested market like the SD-WAN market currently is?  I can't guarantee an answer here, but there are some clear leaders in the space and a handful of companies that haven't fully committed to the vision.  If you are going to move forward with an SD-WAN deployment, you will need to factor in the organizational viability of the options you are considering.  Unfortunately, not every technical decision gets to be made on the merit of the technical solution alone.

 

Scare Factor

 

SD-WAN is a brave new world with a lot of concepts that network engineering tradition tells us to be cautious of.  Full automation and traffic re-rerouting has not been something that has been seamlessly implemented in previous iterations.  Controller based networks are a brand new concept on the wired side of the network. It's prudent for network engineers to take a hard look at the claims and verify the questionable ones before going all in.  SD-WAN vendors by and large seem willing to provide proof of concept and technical labs to convince you of their claims.  Take advantage of these programs and put the tech through its paces before committing on an SD-WAN strategy.

 

It's New

 

Ultimately, it's a new approach and nobody likes to play the role of guinea pig.  The feature set is constantly evolving and improving.  What you rely on today as a technical solution may not be available in future iterations of the product.  The tools you have to solve a problem a couple of months from now, may be wildly different than the tools you currently use.  These deployments also aren't as well tested as our traditional routing protocols.  There is a lot about SD-WAN that is new and needs to be proven.  Your tolerance for the risks of running new technology has to be taken into account when considering an SD-WAN deployment.

 

Final Thoughts

 

It’s undeniable that there are problems in our current generation of networks that traditional routing protocols haven’t effectively solved for us.  The shift from a localized perspective on decision making to a controller based network design is significant enough to be able to solve some of these long standing and nagging issues.  While the market is new, and a bit unpredictable, there is little doubt that controller based networking is the direction things are moving both in the data center and the WAN.  Also, if you look closely enough, you’ll find that these technologies don’t differ wildly from the controller based wireless networks many organizations have been running for years.  Because of this I think it makes a lot of sense to pay close attention to what is happening in the SD-WAN space and consider what positive or negative impacts an implementation could bring to your organization.

deadlock.png

 

First, a quick definition and example for those that don’t know what deadlocks are inside of a database.

 

A deadlock happens when two (or more) transactions block each other by holding locks on resources that each of the transactions also need.

 

For example:

 

Transaction 1 holds a lock on Table A.

Transaction 2 holds a lock on Table B.

Transaction 1 now requests a lock on Table B, and is blocked by Transaction 2.

Transaction 2 now requests a lock on Table A, and is blocked by Transaction 1.

 

Transaction 1 cannot complete until Transaction 2 is complete, and Transaction 2 cannot complete until Transaction 1 is complete. This is a cyclical dependency and results in what is called a deadlock. Deadlocks can involve more than two transactions, but two is the most common scenario.

 

If you scrub the intertubz for deadlock information you will find a common theme. Most people will write that deadlocks cannot be avoided in a multi-user database. They will also write about how you need to keep your transactions short, and to some that means having your stats and indexes up to date, rather than a good discussion over what a normalized database would look like.

 

(And before I go any further, let me offer you some advice. If you believe that constantly updating your stats is a way to prevent deadlocks in SQL Server, then you should find a new line of work. Actually, stay right where you are. That way people like me will continue to have jobs, cleaning up behind people such as yourself. Thanks.)

 

What causes deadlocks?

The database engine does not seize up and start deadlocking transactions because it happens to be tired that day. Certain conditions must exist in order for a deadlock to happen, and all of those conditions require someone, somewhere, to be using the database.

 

Deadlocks are the result of application code combined with a database schema that results in an access pattern that leads to a cyclical dependency.

 

That’s right. I said it. Application code causes deadlocks.

Therefore it is up to the database administrator to work together with the application developer to resolve deadlocks.

 

Another thing worth noting here is that deadlocking is not the same as blocking. I think that point is often overlooked. I have had several people explain that their database is suffering blocking all the time. When I try to explain that a certain amount of blocking is to be expected, I am usually met with, "Yeah, yeah, whatever. Can you just update the stats and rebuild my indexes so it all goes away?”

 

A better response would be, "Yeah, I know I need to look at my design, but can you rebuild the indexes for me right now, and see if that will help for the time being?" My answer would be, “Yes. Can I help you with your design?"

 

Oh, and you do not need large tables with indexes to facilitate a deadlock. Blocking and deadlocks can happen on small tables, as well. It really is a matter of application code, design, access patterns, and transaction isolation levels.

 

Look, no one likes to be told they built something horrible. And chances are when it was built it worked just fine, but as the data changes, so could the need for an updated design. So if you are a database developer, do not be offended if someone says, "We need to examine the design, the data, and the code.” It is just a simple fact that things change over time.

 

Finding deadlocks

Here is a link to Bart Duncan's blog series that helps to explain deadlocking as well as the use of trace flag T1222. If you are experiencing deadlocks and want to turn this on now, simply issue the following statement:

 

DBCC TRACEON (1222, -1)

 

The flag will be enabled and will start logging detailed deadlock information to the error log. The details from this trace flag are much easier to understand than the original Klingon returned by T1204. Unfortunately, by using the DBCC, this trace flag will be lost after the next service restart. If you want the trace flag to be enabled and always running against your instance, you need to add -T1222 as a startup parameter to your instance.

 

Another method for seeing detailed deadlock information is to query the default Extended Event system health session. You can use the following code to examine deadlock details:

 

SELECT XEvent.query('(event/data/value/deadlock)[1]') AS DeadlockGraph

FROM ( SELECT XEvent.query('.') AS XEvent

       FROM ( SELECT CAST(target_data AS XML) AS TargetData

              FROM sys.dm_xe_session_targets st

JOIN sys.dm_xe_sessions s

                   ON s.address = st.event_session_address

              WHERE s.name = 'system_health'

AND st.target_name = 'ring_buffer'

              ) AS Data

              CROSS APPLY

TargetData.nodes

('RingBufferTarget/event[@name="xml_deadlock_report"]')

              AS XEventData ( XEvent )

      ) AS src;

 

There are additional ways to discover if deadlocks are happening, such as using SQL Server Profiler (or a server trace), as well as Performance Monitor (i.e., Perfmon) counters. Each of the methods above will be reset upon a server restart, so you will need to manually capture the deadlock details for historical purposes, if desired.

 

Resolving deadlocks

Resolving a deadlock requires an understanding of why the deadlocks are happening in the first place. Even if you know a deadlock has happened, and you are looking at the deadlock details, you need to have an idea about what steps are possible.

 

I’ve collected a handful of tips and tricks over the years to use to minimize the chances that deadlocks happen. Always consult with the application team before making any of these changes.

 

  1. Using a covering index can reduce the chance of a deadlock caused by bookmark lookups.
  2. Creating indexes that match your foreign key columns can reduce your chances of having deadlocks caused by cascading referential integrity.
  3. When writing code, it is useful to keep transactions as short as possible and access objects in the same logical order when it makes sense to do so.
  4. Consider using one of the row-version based isolation levels READ COMMITTED SNAPSHOT or SNAPSHOT.
  5. The DEADLOCK_PRIORITY session variable will specify the relative importance that the current session is allowed to continue processing if it is deadlocked with another session.
  6. You can trap for the deadlock error number using TRY…CATCH logic and then retry the transaction.

 

Summary

The impact of a deadlock on end-users is a mixture of confusion and frustration. Retry logic is helpful, but having to retry a transaction simply results in longer end-user response times. This leads to the database being seen as a performance bottleneck, and pressures the DBA and application teams to track down the root cause and fix the issue.

 

As always, I hope this information helps you when you encounter deadlocks in your shop.

Solarwinds Network Performance Monitor 12 (NPM12) launched last week to great fanfare. I've written quite a bit about the problems that network professionals face in the past few weeks. NPM12 solves quite a few of these issues quite nicely. But what I wanted to talk about today is how NPM12 solves an even more common problem: arguments.

Not my fault

How many times have you called a support line for your service provider to report an issue? Odds are good it's more than once. As proficient professionals, we usually do a fair amount of troubleshooting ahead of time. We try to figure out what went wrong and where it is so it can be fixed. After troubleshooting our way to a roadblock, we often find out the solution lies somewhere outside of our control. That's when we have to interact with support personnel. Yet, when we call them to get a solution to the problem, the quick answer is far too frequently, "It's not our fault. It has to be on your end."

Service providers don't like being the problem. Their resources aren't dedicated to fixing issues. So it's much easier to start the conversation with a defensive statement that forces the professional on the other end to show their work. This often give the service provider the time to try and track down the problem and fix it. Wouldn't it be great to skip past all this mess?

That's why I think that NPM12 and NetPath have the ability to give you the information you need before you even pick up the phone. Imagine being able to start a support call with, "We're seeing issues on our end and it looks like the latency on your AWS-facing interface on R1 has high latency." That kind of statement immediately points the service provider in the right direction instead of putting you on the defensive.

Partly cloudy

This becomes even more important as our compute resources shift to the cloud. Using on-premises tools to fix problems inside your data center is an easy task. But as soon as the majority of our network traffic is outbound to the public cloud we lose the ability to solve problems in the same way.

NetPath collects information to give you the power to diagnose issues outside your direct control. You can start helping every part of your network get better and faster and keep your users happy as they find their workloads moving to Amazon, Microsoft Azure, and others. Being able to ensure that the network will work the same way no matter where the packets are headed puts everyone at ease.

Just as the cloud helps developers and systems administrators understand their environments and gives them the data they need to be more productive and efficient, so too can NPM12 and NetPath give you the tools that you need to ensure that the path between your data center and the cloud is problem free and as open as the clear blue sky.

The end result of NPM12 and NetPath is that the information it can provide to you stops the arguments before they start. You can be a hero for not only your organization but for other organizations as well. Who knows? After installing NPM12, maybe other companies will start calling you for support?

sqlrockstar

The Actuator - June 15th

Posted by sqlrockstar Employee Jun 15, 2016

The pool cover came off last week, which means my kids will want to go swimming immediately even though the water is 62F degrees. Ah, to be young and foolish again. Each time they jump in it reminds me of that time I told my boss "Sure, I'd love to be the DBA despite those other guys quitting just now."

 

Anyway, here is this week's list of things I find amusing from around the internet, enjoy!

 

Microsoft to Acquire LinkedIn for $26.2 Billion

Looks like Satya found a way to get LinkedIn to stop sending him spam emails.

 

Man dressed as Apple Genius steals $16,000 in iPhones from Apple Store

I know it isn't fair to compare retail theft to Apple security practices in general...but...yeah.

 

Twitter Hack Reminds Us Even Two-Factor Isn’t Enough

I've seen similar stories in the past but it would seem this type of attack is becoming more common. Maybe the mobile carriers can find a way to avoid social engineering attacks. If not, then this is going to get worse before it gets better.

 

This Year’s Cubs Might Be Better Than The Incredible ’27 Yankees

Or, they might not. Tough to tell right now, but since they are the Cubs I think I have an idea how this will end.

 

Identifying and quantifying architectural debt

There is a lot to digest in this one, so set aside some time and review. I promise it will help you when you sit in on design meetings, as you will be armed with information about the true cost of projects and designs.

 

Cold Storage in the Cloud: Comparing AWS, Google, Microsoft

Nice high-level summary of the three major providers of cold storage options. I think this is an area of importance in coming years, and the provider that starts making advances in functionality and usability will be set to corner a very large market.

 

The Twilight of Traceroute

As a database professional, I've used traceroute a handful of times. It's technology that is 30 years old and in need of a makeover.

 

Warm weather is here and I suddenly find myself thinking about long summer days at the beach with my friends and family.

 

IMG_1932.JPG

The federal technology landscape has moved from secure desktops and desk phones to the more sprawling environment of smartphones, tablets, personal computers, USB drives and more. The resulting “device creep” can often make it easier for employees to get work done – but it can also increase the potential for security breaches.

 

Almost half of the federal IT professionals who responded to our cyber survey last year indicated that the data that is most at risk resides on employee or contractor personal computers, followed closely by removable storage tools and government-owned mobile devices.

 

Here are three things federal IT managers can do to mitigate risks posed by these myriad devices:

 

1. Develop a suspicious device watch list.

 

As a federal IT manager, you know which devices are authorized on your network – but, more importantly, you also know which devices are not. Consider developing a list of unapproved devices and have your network monitoring software automatically send alerts when one of them attempts to access the network.

 

2. Ban USB drives.

 

The best bet is to ban USB drives completely, but if you’re not willing to go that far, invest in a USB defender tool. A USB defender tool in combination with a security information and event management (SIEM) will allow you to correlate USB events with other potential system usage and/or access violations to alert against malicious insiders.

 

They can be matched to network logs which help connect malicious activities with a specific USB drive and its user. They can also completely block USB use and user accounts if necessary. This type of tool is a very important component in protecting against USB-related issues.

 

3. Deploy a secure managed file transfer (MFT) system.

 

Secure managed file transfer systems can meet your remote storage needs with less risk.

 

File Transfer Protocol (FTP) used to get a bad rap as being unsecure, but that’s not necessarily the case. Implementing a MFT system can install a high-level of security around FTP, while still allowing employees to access files wherever they may be and from any government-approved device.

 

MFT systems also provide IT managers with full access to files and folders so they can actively monitor what data is being accessed, when and by whom. What’s more, they eliminate the need for USBs and other types of remote storage devices.

 

Underlying all of this, of course, is the need to proactively monitor and track all network activity. Security breaches are often accompanied by noticeable changes in network activity – a spike in afterhours traffic here, increased login attempts to access secure information there.

 

Network monitoring software can alert you to these red flags and allow you to address them before they become major issues. Whatever you do, do not idly sit back and hope to protect your data. Instead, remain ever vigilant and on guard against potential threats, because they can come from many places – and devices.

 

Find the full article on Government Computer News.

The other day I was having a conversation with someone new to IT (They had chosen to pursue down the programming track of IT which can be an arduous path for the uninitiated!). The topic of how teaching, education and learning to program came up and I’d like to share this analogy which not only works out with programming but also I’ve found pretty relevant to all aspects of an IT Management ecosystem.

 

The act of programming like any process driven methodology is an iterative series of steps, one where you will learn tasks one day which may ultimately remain relevant to future tasks. And then there are tasks you’ll need to perform which are not only absolutely nothing like you did the previous day, they’re unlike anything you’ve ever seen or imagined in your whole career, whether just starting out or you’ve been banging at the keys all your life. The analogy I chose for this was cooking, I know I know, the relevance hopefully should resonate with you!

 

When you get started out cooking, correct me if I’m wrong but you’re usually not expected to prepare a perfectly rising without falling soufflé, No, not at all.  That would be a poor teaching practice and one where you’re setting up someone for failure.  Instead, let’s start out somewhere simple, like Boiling water. You can mess up boiling water but once you start to understand the basic premise of it you can use it for all kinds of applications! Sterilizing water, cooking pasta or potatoes, the sky is the limit!  Chances are, once you learn how to boil water you’re not going to forget how to do it and perhaps will get even better at doing it, or find even more applications. The same is true systematically that once you start navigating into PowerShell, Bash, Python or basic batch scripts; what you did once you’ll continue to do over and over again because you understand it, but more so you got it down pat!

 

The day will come however you’re asked to do something you didn’t even think about the day prior, no more are you asked to perform a basic PowerShell script to dump the users last login you could whip up before in a single line of code (boiling water) and instead you’re asked to parse in an XLS or CSV file to go make a series of iterative changes throughout your Active Directory (Or for practical use-case sake, Query active directory for workstations which haven’t logged into AD or authenticated in the past 60 days, dump that into a CSV file, compare them against a defined whitelist you have in a separate CSV, as well as omitting specific OUs and then perform a mass ‘disable’ of those computer accounts while also moving them into a temporarily ‘Purge in 30 days’ OU and generate a report to review.) Oh we also want this script to run daily, but it can’t have any errors or impact any production machines which don’t meet these criteria.   Let’s for the sake of argument… Call this our soufflé…

 

Needless to say, that’s a pretty massive undertaking for anyone who was great at scripting, scripting that which they’ve been doing a million times before.   That is what is great about IT and cooking though.   Everything is possible, so long as you have the right ingredients and a recipe you can work off of.   In the given above scenario performing every one of those steps all at once might seem like the moon to you, but you may find that if you can break it down into a series of steps (recipe) and you’re able to perform each of those individually, you’ll find it is much more consumable to solve the bigger problem and tie it all together.

 

What is great about our IT as a community is just that, we are a community of others who have either done other things before, or perhaps have done portions of other things before and are often willing to share.    Double-down this to the fact that we’re also a sharing is caring kind of community who will often share our answers to complex problems, or work actively to help solve a particular problem.  I’m actually really proud to be a part of IT and how well we want each other to succeed while we continually fight the same problems irrespective of size of organization or where in the world.

 

I’m sure for every single one of us who has a kitchen we may have a cookbook or two on shelves with recipes and ‘answers’ to that inevitable question of how we go about making certain foods or meals.   What are some of your recipes you’ve discovered or solved over the time of your careers to help bring about success.  I personally always enjoyed taking complex scripts for managing VMware and converting them into ‘one-liners’ which were easy to understand and manipulate, both as a way for others to learn how to shift and change them, but also so I could run reports which were VERY specific to my needs at the moment while managing hundreds and thousands of datacenters.

 

I’d love it if you’d share some of your own personal stories, recipes or solutions and whether this analogy has been helpful if not explaining perhaps what we do in IT to our family who may not understand, but maybe in cracking the code on your next Systems Management or process challenge!

 

(For a link to my One-Liners post check out; PowerCLI One-Liners to make your VMware environment rock out! )

vinod.mohan

Application Apocalypse

Posted by vinod.mohan Jun 13, 2016

Capture.PNGApologies for the doomsday reference, but I think it’s important to draw attention to the fact that business-critical application failures are creating apocalyptic scenarios in many organizations today. As businesses become increasingly reliant on the IT infrastructure for hosting applications and Web services, the tolerance for downtime and degraded performance has become almost nil. Everyone wants 100% uptime with superior performance of their websites. Whether applications are hosted on-premises or in the cloud, whether they are managed by your internal IT teams or outsourced to managed service providers, maintaining high availability and application performance is a top priority for all organizations.

 

Amazon Web Services® (AWS) experienced a massive disruption of its services in Australia last week. Across the country, websites and platforms powered by AWS, including some major banks and streaming services, were affected. Even Apple® experienced a widespread outage in the United States last week, causing popular services, including iCloud®, iTunes® and iOS® App Store to go offline for several hours.

 

I’m not making a case against hosting applications in the cloud versus hosting on-premises. Regardless of where applications are running—on private, public, hybrid cloud, or co-location facilities—it is important to understand the impact and aftermath of downtime. Take a look at these statistics that give a glimpse of the dire impact due to downtime and poor application performance:

  • The average hourly cost of downtime is estimated to be $212,000.
  • 51% of customers say slow site performance is the main reason they would abandon a purchase online.
  • 79% of shoppers who are dissatisfied with website performance are less likely to buy from the same site again.
  • 78% of consumers worry about security if site performance is sluggish.
  • A one-second delay in page response can result in a 7% reduction of conversions.

*All statistics above sourced from Kissmetrics, Brand Perfect, Ponemon Institute, Aberdeen Group, Forrester Research, and IDG.

 

Understanding the Cost of Application Downtime

  • Financial losses: As seen in the stats above, customer-facing applications that perform unsatisfactorily affect online business and potential purchases, often resulting in customers taking their business to other competitors.
  • Productivity loss: Overall productivity will be impacted when applications are down and employees are not able to perform their job or provide customer service.
  • Cost of fixing problems and restoring services: IT departments spend hours and days identifying and resolving application issues, which involves labor costs, and time and effort spent on problem resolution.
  • Dent on brand reputation: When there is a significant application failure, customers will start having a negative perception about your organization and its services, and lose trust in your brand.
  • Penalty for non-compliance: MSPs with penalty clauses included in service level agreements will incur additional financial losses.

 

Identifying and Mitigating Application Problems

Applications are the backbone of most businesses. Having them run at peak performance is vital to the smooth execution of business transactions and service delivery. Every organization has to implement an IT policy and strategy to:

  1. Implement continuous monitoring to proactively identify performance problems and indicators.
  2. Identify the root cause of application problems, apply a fix, and restore services as soon as possible, while minimizing the magnitude of damage.

 

It is important to have visibility, and monitor application health (on-premises and in the cloud) and end-user experience on websites, but it is equally important to monitor the infrastructure that supports applications, such as servers, virtual machines, storage systems, etc. There are often instances where applications perform sluggishly due to server resource congestion or storage IOPS issues.

 

Share your experience of dealing with application failures and how you found and fixed them.

 

Check out this ebook "Making Performance Pay - How To Turn The Application Management Blame Game Into A New Revenue Stream" by NEXTGEN, an Australia-based IT facilitator, in collaboration with SolarWinds.

eBook_Cover-72-SolarWinds.jpg

Wendy Abbott

6/9 Atlanta SWUG Recap

Posted by Wendy Abbott Administrator Jun 13, 2016

On Thursday, we had our first Atlanta #SWUG (SolarWinds User Group).

THWACKsters from the greater Atlanta area and as far as New Jersey (shout out to sherndon1!) came together to celebrate THWACK’s 13th birthday & to unite in our love of SolarWinds.

 

20160609_131143.jpg

 

We did things a little differently this time by starting with a live announcement broadcasted to the THWACK homepage and our SolarWinds Facebook page.

In case you missed it, the live announcement & our gift to you is:

1) #THWACKcamp is coming back (September 14th & 15th)!

2) Registration is now open>> THWACKcamp 2016

3) If you register during the month of June you will be entered to win a trip to come to Austin for the live event for you + 1! (USA, CAN, UK & Germany are eligible)

 

Here’s a brief recap of the presentations & resources from the SWUG:

 

MC for this SWUG: patrick.hubbard, SolarWinds Head Geek

 

'NetPath™: How it works'

cobrien, Sr. Product Manager (Networks)

Slides: https://s3-us-west-2.amazonaws.com/swug/Atlanta/How+NetPath+Works.pptx

 

Chris O’Brien gave an overview and live demonstration of NetPath (a feature in the latest release of NPM 12). See it in action:

 

 

'Systems Product Roadmap Discussion + Q&A'

stevenwhunt, Sr. Product Manager (Systems)

 

What we’re working on for Systems products:

 

'Customer Spotlight: Custom Alerts for your Environment'

njoylif, SolarWinds MVP

Slides: https://s3-us-west-2.amazonaws.com/swug/Atlanta/RH_SWUG.pptx

 

Larry did an excellent job of talking about all the ways you can leverage custom properties in your environment.

He also connected the dots and demonstrated how alerting and custom properties go hand in hand.

 

larry rice.PNG

 

'SolarWinds Best Practices & Thinking Outside of the Box'

Dez, Technical Product Manager

KMSigma, Product Manager

 

Reach out to Dez or KMSigma for additional questions on the topics discussed:

  1. SAM as UnDP and TCP port monitoring

•OIDs, and MIBs, and SNMP – Oh, my.

  1. NCM approval process with RTN for security needs

•Manager & Multistage Approval, Real-time Notification, and Compliance

  1. Alerting best practices

•Leveraging Custom properties to reduce Alert Noise

  1. NetFlow Alerting with WPM

•Using WPM for Alerts that are currently not within for NTA

  1. Optimizing your SolarWinds Installation

•Building Orion Servers based on Role

  1. Upgrade Advisor

•Upgrade Paths Simplified

 

'Customer Spotlight: HTML is Your Friend: Leveraging HTML in Alert Notifications'

bsciencefiction.tv, SolarWinds MVP

mikegale

adamlboyd (Not in attendance, but may be able to help with any follow up questions)

Slides: https://s3-us-west-2.amazonaws.com/swug/Atlanta/SWUG_Presentation_HTML.pptx

 

Kevin & Michael did a great job of showing us how they are using HTML to customize their alerts.

Specifically, they showed us how they were able to customize their email alerts which solved for:

  • Teams paying attention to their alerts
  • Easy to digest
  • Easy to recognize issues
  • HTML 5 is responsive on Mobile so their alerts look good on different devices.

20160609_164421.jpg

 

SolarWinds User Experience "Dear New Orion User..."

meech, User Experience

 

 

Thank you to everyone who attended the event! We enjoyed meeting and talking with each of you.

We hope you’ll keep the conversation going on this post and on THWACK.

 

And last (but not least), thank you to our sponsor and professional services partner, Loop1Systems (BillFitz_Loop1) for hosting the happy hour!

Additional information on Loop1Systems: https://s3-us-west-2.amazonaws.com/swug/Atlanta/Loop1_SWUG.pptx

 

**If you left without filling out a survey, please help us out by telling us how we can make SWUGs even better>> http://thwack.SWUG-Feedback.sgizmo.com/s3/

 

If you are interested in hosting a SWUG in your area, please fill out a host application here>> http://thwack.swug-host.sgizmo.com/s3/

 

If you’re attending Cisco Live this year, RSVP to attend the SWUG we’re hosting during the event!

Virtualization admins have many responsibilities and wear many IT hats. These responsibilities can be aggregated into three primary buckets: planning, optimizing, and maintaining. Planning covers design decisions as well as discovery of virtual resources and assets. Optimizing encompasses tuning your dynamic virtual environment so that it's always running as efficiently and effectively as possible given all the variables of your applications, infrastructure, and delivery requirements. Maintaining is the creating the proper alerts and intelligent thresholds to match your constantly changing and evolving virtual data center. To do all these things well at various scale, virtualization admins need to monitor with discipline and a proper tool.

 

And sometimes you need a helping hand. Join me and jhensle on June 15th at 1PM CT as we cover plan, optimization, and maintenance with Virtualization Manager to deliver optimal performance in your virtual data center.

 

Need4Speed-VMAN.PNG

sqlrockstar

The Actuator - June 8th

Posted by sqlrockstar Employee Jun 8, 2016

Summer is here! Well, for those of us in the Northern hemisphere at least. The unofficial start of summer for many of us here is the last weekend in May. For me, the official start to summer is when I first place my feet into the ocean surf after the last weekend in May. And since that happened this past weekend, I declare summer open for business.

 

Anyway, here is this week's list of things I find amusing from around the Internet...

 

How every #GameOfThrones episode has been discussed on Twitter

Because apparently this is a show worth watching, I guess, but I haven't started yet. It's about a throne or something?

 

Mark Zuckerberg's Twitter and Pinterest password was 'dadada'

Makes sense, since Zuckerberg doesn't care much about our privacy and security, that he probably didn't care too much about his own.

 

It's Time To Rethink The Password. Yes, Again

Yes, it's worth reminding all of you about data security again, because it seems that the message isn't getting through, yet.

 

Edward Snowden: Three years on

Since I'm already on the security topic, here's the obligatory mention of Snowden.

 

NFL Players' Medical Information Stolen

And a nice reminder that this is the league that cannot figure out how to accurately measure the air pressure in footballs, so don't be surprised they don't know how to encrypt laptops.

 

Where people go to and from work

Wonderful visualization of where people commute to and from work around the USA.

 

Solar FREAKIN' Roadways!

In case you haven't seen this yet, and since Summer is here, a nice reminder of how we could be using technology to raise our quality of life.

 

Here's to the start of Summer, a sunset at Watch Hill last weekend:

YABY6130.jpg

Every toolbox has a tool that is used on problems even though it's well past retirement age. Perhaps it's an old rusty screwdriver that has a comfortable handle. Or a hammer with a broken claw that has driven hundreds of nails. We all have tools that we fall back on even when better options are available. In the world of building and repairing physical things that means the project will take more time. But in the networking world old tools can cause more problems than they solve.

Traceroute is a venerable protocol used for collecting information about network paths. Van Jacobsen created it in 1987 to help measure paths that packets take in the network and the delays that those paths cause. It uses UDP packets with a Time To Live (TTL) of 1 to force a remote system to send a special ICMP message back to the source. These ICMP messages help the originating system build a table of the hops in the path as well as the latency to that hop.

Traceroute has served the networking world well for a number of years. It has helped many of us diagnose issues in service provider networks and figure out routing loops. It's like the trusty screwdriver or hammer in the bottom of the toolbox that's always there waiting for us to use them to solve a problem. But Traceroute is starting to show its age when it comes to troubleshooting.

  • Traceroute requires ICMP messages to be enabled on the return path to work correctly. The ICMP Time Exceeded message is the way that Traceroute works magic. If a firewall in the path blocks that message from returning, everything on the other side of that device is a black hole as far as Traceroute is concerned. Even though networking professionals have been telling security pros for years to keep ICMP enabled on firewalls, there are still a surprising number of folks that turn it off to stay "safe".
  • Traceroute doesn't work well with multiple paths. Traceroute was written in the day when one path was assumed to exist between two hosts. That works well when you have a single high speed path. But today's systems can take advantage of multiple paths to different devices across carrier networks. The devices in the middle can find the most optimum path for traffic and steer it in that direction. Traceroute is oblivious to path changes.
  • Traceroute can only provide two pieces of information. As long as all you really want to know about a network is a hop-by-hop analysis of a path and the delay on that path, Traceroute is your tool. But if you need to know other information about that path, like charting the latency over time or knowing when the best time to pick a specific multi path through the network would be then Traceroute's utility becomes significantly limited.

In a modern network, we need more information that we can get from a simple tool like Traceroute. We need to collect all kinds of data that we can't get from a thirty year old program written to solve a specific kind of issue. What we need is a better solution built into something that can collect the data we need from multiple points in the network and give us instant analysis about paths and how our packets should be getting from Point A to Point B as quickly as possible.

That would be something, wouldn't it?

Full disclosure: In my work life, I spend my time at OnX Enterprise Solutions, where we’re a key partner with HPE (Hewlett Packard Enterprise). The following are some thoughts regarding some of the aggressive steps recently taken by Meg Whitman and the board. To me, these seem bold and decisive. They have been pivoting to set themselves up for the future of IT.

 

 

  1. The changing of the focus for Helion. In recent travel, and conversations with customers, I’ve heard a distinct misconception regarding the stance of Hewlett Packard Enterprise and the Helion Platform. The mistake appeared when HPE made the statement that Helion would no longer be a Public Cloud platform. The goal here was not to do away with the OpenStack approach, but rather to focus those efforts on Hybrid and Private solutions.
    1. Helion is not just a fully supported OpenStack solution, but rather an entire suite of products based on the Helion OpenStack platform. Included are products like Eucalyptus, Cloud Service Automation, Content Depot, Helion CloudSystem and quite a few more. For further information on the suite, check out this link.
    2. The key here is that while the press stated that Helion was done-for, the fact is that HPE simply narrowed the market for the product. Instead of necessarily competing with the likes of IBM, Amazon, and Microsoft, now it becomes a more secure and function extension or replacement for your datacenter’s approach.
  2. The splitting of the organization from one umbrella company to two distinct elements: That of HPE and HP Inc. The former entails all enterprise class solutions including storage, servers and enterprise applications, the latter supplies printers, laptops, and most-profitably, supplies like printer ink and toner.
    1. The split offers up the opportunity for salespeople and technical resources to align with the proper silos of internal resources, and therefore expediting the dissemination of information through their channels about offerings etc., more efficiently.
    2. There’s little doubt that the split of the organization creates a level of operational and cost efficiency with which the company had some struggles.
    3. And, just within the last couple days, the Professional Services organization has made a fairly significant announcement to join in with CSC (Computer Sciences Corporation) professional services to become if not a single organization, at minimum a fully collaborative one. As many know, the services arm of HPE was bolstered by the addition of EDS in 2009. While I’m not sure that this will have a lot of impact on the quality of the professional services delivered, it leaves very little doubt that the sheer magnitude of resources from which to draw will be quite a bit larger.
  3. The emergence of Converged and HyperConverged as well as Composable Infrastructure platforms as a strategic move forward.
    1. Converged platforms like vBlock and FlexPod have been around for years. In addition, the more pod-like solutions like Simplivity and Nutanix, as well as many others have become quite popular. HPE has been working diligently on these platforms, with a more full-scope approach. Whether your organization has one of the smaller 250’s or the larger 380 series hyperconverged devices, and/or some of the larger systems, they’ve been designed to exist within a full ecosystem of product. All part of one environment. The HPE convergence platforms have come together quite nicely as an overall “Stack” for infrastructure regardless of sizing. Augmented by the utilization of HPE’s StorVirtual (the old LeftHand product, which has been expanded quite a bit in the last couple years), OpenView (For all intents and purposes, the most robust, along with the most mature tool of its ilk in the space), these offerings make the creation, deployment and management of physical and virtual machines a much more agile process.
    2. HPE Synergy, the HPE approach to a fully composable infrastructure, seeks to extend the model of “Frictionless” IT to leverage a DevOps approach. Scripting is built-in, fully customizable, and with plans to offer up a “GitHub” like approach to the code involved. For a full overview, refer to this webpage. Synergy will involve a series of both hardware and software components dedicated to the purpose that it can take the place as a recipe for large to huge enterprises supporting the entire infrastructure from soup to nuts. Again, I’ve not seen a portfolio of products that rivals this set of solutions for a global approach to the data-center ever. I have seen a number of attempts to move in this direction, and feel that we’re finally achieving the X86 mainframe approach that has been directional in enterprise infrastructure for years.

 

All-in-all, I would have to say that I’m impressed with the way that HPE, who in past years have been seen in maybe not the best forward thinking approaches, is making strategic maneuvers toward the betterment of the company. They’re deep in the market space, with R&D in practically every area of IT. OpenStack, All Flash Array, Hybrid storage, federational management, backup and recovery, new memory tech like 3D XPoint, and many other pieces of the technology landscape are being explored, utilized, and incorporated into the product set. I feel that this venerable company is proving that the startup world is not the only place where innovation occurs in IT, but that a large organization can accomplish some amazing things as well.

Today’s federal IT infrastructure is built in layers of integrated blocks, and users have developed a heavy reliance on well-functioning application stacks. App stacks are composed of application code and all of the software and hardware components needed to effectively and reliably run the applications. These components are very tightly integrated. If one area has a problem, the whole stack if effected.

 

It’s enough to frustrate even the most hardened federal IT manager. But don’t lose your cool. Instead, take heart, because there are better ways to manage this complexity. Here are areas to focus on, along with suggested methodologies and tools:

 

1. Knock down silos and embrace a holistic viewpoint.

 

Thanks to app stacks, the siloed approach to IT is quickly becoming irrelevant. Instead, managing app stacks requires realizing that each application serves to support the entire IT foundation.

 

That being said, you’ll still need to be able to identify and address specific problems when they come up. But you don’t have to go it alone; there are tools that, together, can help you get a grasp on your app stack.

 

2. Dig through the code and use performance monitoring tools to identify problems.

 

There are many reasons an app might fail. Your job is to identify the cause of the failure. To do that you’ll need to look closely at the application layer and keep a close eye on key performance metrics using performance monitoring tools. These tools can help you identify potential problems, including memory leaks, service failures and other seemingly minor issues that can cause an app to nosedive and take the rest of the stack with it.

 

3. Stop manually digging through your virtualization layers.

 

It’s likely that you have virtualization layers buried deep in your app stack. These layers probably consist of virtual machines that are frequently migrated from one physical server to another and storage that needs to be reprovisioned, reallocated and presented to servers.

 

Handling this manually can be extremely daunting, and identifying a problem in this can seem impossible. Consider integrating an automated VM management approach with the aforementioned performance monitoring tools to gain complete visibility of these key app stack components.

 

4. Maximize and monitor storage capabilities.

 

Storage is the number one catalyst behind application failures. The best approach here is to ensure that your storage management system helps monitor performance, automate storage capacity and regularly reports so you can ensure applications continue to run smoothly.

 

You’ll be able to maintain uptime, leading to consistent and perhaps increased productivity throughout the organization. And you’ll be able to keep building your app stack – without the fear of it all tumbling down.

 

Find the full article on Government Computer News.

DODDC_logo.png

It's hard to contain my excitement or stop wishing away the hours until my flight tomorrow, as I prepare for DevOpsDays DC this week.

 

After all the discussions from my trip to Interop, I've got my convention go-bag ready . The venue is awesome. The schedule looks stellar. And we were even able to throw some free tickets your way so we could be sure SolarWinds was represented in the audience (looking at you, Peter!).

 

Whenever we venture out to a DevOps Days get together, the conversation among the team is, "What stories do we want to tell?". The truth is, each of us has our own personal stories that we gravitate to. I like to talk about the THWACK community and how the time has come for monitoring to become its own specialty within IT like storage, networking, or infosec. Connie has a great time discussing ways our monitoring tools can be leveraged for DevOps purposes. And, of course, Patrick spends his DevOps Days spreading the gospel truth about SWIS, SWQL, and the Orion SDK.

 

However, this week, the story is going to be NPM 12. It really couldn't be about anything else, when you think about it.

 

It's the story I want to tell because it fills a gap that I think will resonate with the DevOps community: As you strive to create ever more scalable systems that can be deployed quickly and efficiently - which almost by definition includes cloud and hybrid-cloud deployments - how can you be sure that everything in-between the you (and the user) and the system is operating as expected?

 

When everything was on-prem that question was, if not easy to ascertain, at least simple to define. A known quantity of systems all under the company's control had to be monitored and managed.

go-bag.jpg

But how do we manage it now, when so much relies on systems that sit just past the ISP demarc?

 

As you know, NPM 12 and NetPath address exactly that question. And hopefully the attendees at DevOps Days DC will appreciate both the question and the answer we're providing.

 

For now, I'm biding my time, packing my bags, and looking wistfully at the door wondering if time will keep dragging along until I can get on the road and start telling my story.

Filter Blog

By date:
By tag: