Skip navigation
1 15 16 17 18 19 Previous Next

Geek Speak

2,051 posts

The public sector frequently provides services and information via websites, and it’s important that these websites are up and running properly. And that’s not just for citizen-facing websites. Federal IT managers face the same challenge with internal sites such as intranets and back-end resource sites.

 

So what can federal IT pros do to keep ahead of the challenge, catch critical issues before they impact the user, and keep external and internal sites running at optimal performance?

 

The answer is three-fold:

 

  1. Monitor key performance metrics on the back-end infrastructure that supports the website.
  2. Track customer experience and front-end performance from the outside.
  3. Integrate back- and front-end information to get a complete picture.

 

Performance monitoring

 

Federal IT pros understand the advantages of standard performance monitoring, but monitoring in real time is just not enough. To truly optimize internal and external site performance, the key is to have performance information in advance.

 

This advance information is best gained by establishing a baseline, then comparing activity to that standard. With a baseline in place, a system can be configured to provide alerts based on information that strays from the baseline. And troubleshooting can start immediately and the root cause can be uncovered before it impacts customers. By anticipating an impending usage spike that will push capacity limits, the IT team can be proactive and avoid a slowdown.

 

That historical baseline will also help allocate resources more accurately and enable capacity planning. Capacity planning analysis lets IT managers configure the system to send an alert based on historical analysis.

 

Automation is also a critical piece of performance monitoring. If the site goes down over the weekend, automated tools can restart the site if it crashes and send an alert when it’s back up so the team can start troubleshooting.

 

End-user experience monitoring

 

Understanding the customer experience is a critical piece of ensuring optimal site performance. Let’s say the back-end performance looks good, but calls are coming in from end-users that the site is slow. Ideally, IT staff would be able to mimic a user’s experience, from wherever that user is located, anywhere around the world. This allows the team to isolate the issue to a specific location.

 

It is important to note that federal IT pros face a unique challenge in monitoring the end-user experience. Many monitoring tools are cloud based, and therefore will not work within a firewall. If this is the case, be sure to find something that works inside the firewall that will monitor internal and external sites equally.

 

Data integration

 

The ultimate objective is to bring all this information together to provide the visibility across the front- and back-end alike, to know where to start looking for any anomaly, no matter where it originates.

 

The goal is to improve visibility in order to optimize performance. The more data IT pros can muster, the greater their power to optimize performance and provide customers with the optimal experience.

 

Find the full article on Government Computer News.

networkautobahn

Hard to monitor

Posted by networkautobahn Jul 3, 2016

With monitoring, we try to achieve end to end visibility for our services. So everything that is running for business critical applications needs to be watched . For the usual suspects like switches, servers and firewalls we have great success with that. But in all environments you have these black spots on the map that nobody is taking care of. There are two main  categories why something is not monitored, the organisational (not my department) and the technical.

 

 

 

Not my Department Problem

In IT sometimes the different departments are only looking after the devices that they are responsible for. Nobody has established a view over the complete infrastructure. That silo mentality ends up with a lot of finger pointing and ticket ping pong. Even more problematic are devices that are under the control of a 3rd party vendor or non IT people. For example, the power supply of a building is the responsibility of the facility management. In the mindset of the facility management monitoring has a completly different meaning to the one we have in IT. We have build up fully redundant infrastructures. We have put a lot of money and effort into making sure that every device has a redundant power supply. Only to find that it ends up in a single power cord that is going to a single diesel power generator that was build in the 1950s. The monitoring by the facility management is to go to the generator two times per day and take a look at the front panel of the machine.

 

 

TECHNICAL HARD TO MONITOR

And than you have the technical problems that can be a reason why something is not monitored. Here are some examples why it is sometimes hard to implement monitoring from a technical perspective. Ancient devices: Like the mentioned Diesel Power generator there are old devices that come from an era without any connectors that can be used for monitoring. Or it is a very old Unix or Host machine. I have found all sorts of tech that was still important for a specific task. So when it couldn´t be decommissioned it is still a dependency for a needed application or task. If it is still that important than we have to find a way to monitor it. It is needed to find a way to connect like we do with SNMP or an agent. If the devices simply support none of this connections we can try to watch the service that is delivered through the device or implement an extra sensor that can be monitored. For example of the Power generator, maybe we can not watch the generator directly but we can insert some devices like an UPS that can be watched over SNMP and shows the current power output. With intelligent PDU in every rack you can achieve even more granularity on the power consumption of your components. Often all the components of a rack have been changed nearly every two years, but the Rack and the power connector strip have been used for 10+ years. The same is true for the cooling systems. There are additional sensor bars available that feed your monitoring with data for the case the cooling plant can not deliver these data. With a good monitoring you can react before something happens.

 

 

IT IS PASSIVE

Another case are passive technologies like CWDM/DWDM or antennas. These also can only be monitored indirectly with other components that are capable of proper monitoring. With GBICs that have an active measurement / DDI interface you have access to real time data that can be implemented into the monitoring. Once you have this data in your monitoring you have a baseline and know how the damping across your CWDM/DWDM fibres should look like. As a final thought, try and take a step back to figure out what is needed so that your services can run. Think in all directions and expect nothing as given. Include everything that you can think of from climate, power and include all dependancy of storage, network and applications. And with that in mind take a look at the monitoring and check if you cover everything.

In previous posts, I've talked about the importance of having a network of trusted advisors. I've also discussed the importance of honing your DART-SOAR skills. Now I'd like us to explore one of those soft and cloudy topics that every IT professional deals with, but is reluctant to address directly. And that is the business of breaking down personal silos of inefficiency, particularly as it pertains to IT knowledge and expertise.

 

As an IT professional, I tend to put all the pressure of knowing and doing everything on myself, aka Team Me. I've been rewarded for this behavior, but it has also proven to be ineffective at times. This is because the incentives could influence me to not seek help from anyone outside the sphere of me.

bitmoji867460345.png

The majority of my struggle was trust-related. The thought that discussing something I knew nothing or little about would be a sign of weakness. Oh, how naïve my green, professional self was. This modus operandi did harm to me, my team, and my organization because its inefficiencies created friction where there didn’t need to be any.

 

It wasn’t until I turned Me into We that I started truly owning IT. By believing in its core tenet and putting it into practice, it opened doors to new communities, industry friends, and opportunities. I was breaking down silos by overcoming the restrictions that I placed on myself. I was breaking my mold, learning cool new stuff, and making meaningful connections with colleagues who eventually became friends.

 

It reminds me of my WoW days. I loved playing a rogue and being able to pick off opponents in PvP Battlegrounds. But I had to pick my battles, because though I could DPS the life out of you, I didn’t have the skills to self-heal over time, or tank for very long. So engagements had to be fast and furious. It wasn't until I started running in a team with two druids (a tank and a healer), that we could really start to own our PvP competition. My PvP teammates also played rogues and shared their tips and tricks, which included Rogue communities with game play strategies. As a result, I really learned how to optimize my DPS and my other unique set of skills toward any given goal.

MeOnWoW.PNG

Do you stand on the IT front and try to win alone? Have you found winning more gratifying when you win as a team? Let me know in the comment section below.

In my previous post, I reviewed the 5 Infrastructure Characteristics that will be included as a part of a good design. The framework is layed out in the great work IT Architect: Foundations in the Art of Infrastructure Design. In this post, I’m going to continue that theme by outlining the 4 Considerations that will also be a part of that design.

 

While the Characteristics could also be called “qualities” and can be understood as a list of ways by which the design can be measured or described, Considerations could be viewed as the box that defines the boundaries of the design. Considerations set things like the limits and scope of the design, as well as explain what the architect or design team will need to be true of the environment in order to complete the design.

 

Design Considerations

I like to think of the four considerations as the four walls that create the box that the design lives in. When I accurately define the four different walls, the design to go inside of it is much easier to construct. There are less “unknowns” and I leave myself less exposed to faults or holes in the design.

 

Requirements – Although they’re all very important, I would venture to say that Requirements is the most important consideration. “Requirements”is  a list - either identified directly by the customer/business or teased out by the architect – of things that must be true about the delivered infrastructure. Some examples listed in the book are a particular Service Level Agreement metric that must be met (like uptime or performance) or governance or regulatory compliance requirements. Other examples I’ve seen could be usability/manageability requirements dictating how the system(s) will be interfaced with or a requirement that a certain level of redundancy must be maintained. For example, the configuration must allow for N+1, even during maintenance.

 

Constraints – Constraints are the considerations that determine how much liberty the architect has during the design process. Some projects have very little in the way of constraints, while others are extremely narrow in scope once all of the constraints have been accounted for. Examples of constraints from the book include budgetary constraints or the political/strategic choice to use a certain vendor regardless of other technically possible options. More examples that I’ve seen in the field include environmental considerations like “the environment is frequently dusty and the hardware must be able to tolerate poor environmentals” and human resource constraints like “it must be able to be managed by a staff of two.”

 

Risks – Risks are the architect’s tool for vetting a design ahead of time and showing the customer/business the potential technical shortcomings of the design imposed by the constraints. It also allows the architect to show the impact of certain possibilities outside the control of either the architect or the business. A technical risk could be that N+1 redundancy actually cannot be maintained during maintenance due to budgetary constraints. In this case, the risk is that a node fails during maintenance and puts the system into a degraded (and vulnerable) state. A risk that is less technical might be something like that the business is located within a few hundred yards of a river and flooding could cause a complete loss of the primary data center. When risks are purposely not mitigated in the design, listing them shows that the architect thought through the scenario, but due to cost, complexity, or some other business justification, the choice has been made to accept the risk.

 

Assumptions – For lack of a better term, an assumption is a C.Y.A. statement. Listing assumptions in a design shows the customer/business that the architect has identified a certain component of the big picture that will come into play but is not specifically addressed in the design (or is not technical in nature). A fantastic example listed in the book is an assumption that DNS infrastructure is available and functioning. I’m not sure if you’ve tried to do a VMware deployment recently, but pretty much everything beyond ESXi will fail miserably if DNS isn’t properly functioning. Although a design may not include specifications for building a functioning DNS infrastructure, it will certainly be necessary for many deployments. Calling it out here ensures that it is taken care of in advance (or in the worst case, the architect doesn’t look like a goofball when it isn’t available during the install!).

 

If you work these four Considerations (and the 5 Characteristics I detailed in my previous post) into any design documentation you’re putting together, you’re sure to have a much more impressive design. Also, if you’re interested in working toward design-focused certifications, many of these topics will come into play. Specifically, if VMware certification is of interest to you, VCIX/VCDX work will absolutely involve learning these factors well. Good luck on your future designs!

sqlrockstar

The Actuator - June 29th

Posted by sqlrockstar Employee Jun 29, 2016

Well, Britain has voted to leave the EU. I have no idea why, or what that means other than my family vacation to London next month just got a whole lot cheaper.

 

Anyway, here is this week's list of things I find amusing from around the Internet. Enjoy!

 

EU Proposal Seeks To Adjust To Robot Workforce

Maybe this is why the UK wanted to leave, because they don't want their robots to be seen as "electronic persons with specific rights and obligations."

 

Real-time dashboards considered harmful

This is what adatole and I were preaching about recently. To me, a dashboard should compel me to take action. Otherwise it is just noise.

 

Many UK voters didn’t understand Brexit, Google searches suggest

I wont' pretend to know much about what it means, either. I'm hoping there will be a "#Brexit for Dummies" book available soon.

 

UK Must Comply With EU Privacy Law, Watchdog Argues

A nice example of how the world economy, and corporate business, is more global than people realize. Just because Britain wants to leave the EU doesn't mean they won't still be bound by EU rules should they wish to remain an economic partner.

 

Hacking Uber – Experts found dozen flaws in its services and app

Not sure anyone needed more reasons to distrust Uber, but here you go.

 

History and Statistics of Ransomware

Every time I read an article about ransomware I take a backup of all my files to an external drive because as a DBA I know my top priority is the ability to recover.

 

Blade Runner Futurism

If you are a fan of the movie, or sci-fi movies in general, set aside the time to read through this post. I like how the film producers tried to predict things like the cost of a phone call in the future.

 

Here's a nice reminder of the first step in fixing any issue:

 

IMG_3115.JPG

The Pareto Principle

 

The Pareto principle, also known as the 80-20 principle, says that 20% of the issues will cause you 80% of the headaches. This principle is also known as The Law of the Vital Few. In this post, I'll describe how the Pareto principle can guide your work to provide maximum benefit. I'll also describe a way to question the information at hand using a technique known as 5 Whys.

 

The 80-20 rule states that when you address the top 20% of your issues, you'll remove 80% of the pain. That is a bold statement. You need to judge its accuracy yourself, but I've found it to be uncannily accurate.

 

The implications of this principle can take a while to sink it. On the positive side, it means you can make a significant impact if you address the right problems. On the down side, if you randomly choose what issues to work on, it's quite likely you're working on a low-value problem.

 

Not quite enough time

 

When I first heard of the 80-20 rule I was bothered by another concern: What about the remaining problems? You should hold high standards and strive for a high-quality network, but maintaining the illusion of a perfect network is damaging. If you feel that you can address 100% of the issues, there's no real incentive to prioritize. I heard a great quote a few months back:

 

     "To achieve great things, two things are needed; a plan, and not quite enough time." - Leonard Bernstein

 

We all have too much to do, so why not focus our efforts on the issues that will produce the most value? This is where having Top-N reports from your management system is really helpful. Sometime you need to see the full list of issues, but only occasionally. More often, this restricted view of the top issues is a great way to get started on your Pareto analysis.

 

3G WAN and the 80-20 rule

 

A few years back, I was asked to design a solution for rapid deployment warehouses in remote locations. After an analysis of the options I ran a trial using a 3G-based WAN. We ran some controlled tests, cutting over traffic for 15 minutes, using some restrictive QoS policies. The first tests failed with a saturated downlink.

 

When I analyzed the top-talkers report for the site I saw something odd. It seemed that 80% of the traffic to the site was print traffic. It didn't make any sense to me, but the systems team verified that the shipping label printers use an 'inefficient' print driver.

 

At this point I could have ordered WAN optimizers to compress the files, but we did a 5 Whys analysis instead. Briefly, '5 Whys' is a problem solving technique that helps you identify the true root cause of issues.

 

  • Why is the bandwidth so high? - Printer traffic taking 80% of bandwidth
  • Why is printer traffic such a high percentage? - High volume of large transactions
  • Why is the file size so large? - Don't know - oh yeah we use PostScript (or something)
  • Why can't we use an alternative print format? - We can, let's do it, yay, it worked!
  • Why do we need to ask 5 whys? - We don't, you can stop when you solve the problem

 

The best form of WAN optimization is to suppress or redirect the demand. We don't all have the luxury of a software engineer to modify their code and reduce bandwidth, but in this case it was the most elegant solution. We were able to combine a trial, reporting, top-N and deep analysis with a flexible team. The result was a valuable trial and a great result.

 

Summary

 

Here's a quick summary of what I covered in this post:

 

  • The 80/20 principle can help you get real value from your efforts.
  • Top-N reports are a great starting point to help you find that top 20%.
  • The 5 Whys principle can help you dig deeper into your data and choose the most effective actions.

 

Of course a single example doesn't prove the rule.  Does this principle ring true for you, or perhaps you think it is nonsense? Let me know in the comments.

Let’s face it!  We live in a world now where we are seeing a heavy reliance on software instead of hardware.  With Software Defined Everything popping up all over the place we are seeing traditional hardware oriented tasks being built into software – this provides an extreme amount of flexibility and portability on how we chose to deploy and configure various pieces of our environments.

 

With this software management layer taking hold of our virtualized datacenters we are going through a phase where technologies such as private and hybrid cloud are now within our grasp.  As the cloud descends upon us there is one key player that we need to focus on – the automation and orchestration that quietly executes in the background, the key component to providing the flexibility, efficiency, and simplicity that we as sysadmins are expected to provide to our end users.

 

To help drive home the importance and reliance of automation let’s take a look at a simple task – that of deploying a VM.  When we do this in the cloud, mainly public,  it’s just a matter of swiping a credit card, providing some information in regards to a name and network configuration, waiting a few minutes/seconds and away we go. Our end users can have a VM setup almost instantaneously!

 

The ease of use and efficiency of the public cloud, such as the above scenario is putting extended pressure on IT within their respective organizations – we are now expected to create, deliver and maintain these flexible like services within our businesses, and do so with the same efficiency and simplicity that cloud brings to the table.  Virtualization certainly provides a decent starting point for this, but it is automation and orchestration that will take us to the finish line.

 

So how do we do it?

 

Within our enterprise I think we can all agree that we don’t simply just create a VM and call it “done”!  There are many other steps that come after we power up that new VM.  We have server naming to contend with, networking configuration (IP, DNS, Firewall, etc).  We have monitoring solutions that need to be configured in order to properly monitor and respond to outages and issues that may pop up, as well as I’m pretty certain we will want to include our newly created VM within some sort of backup or replication job in order to protect it.  With more and more software vendors exposing public API’s we are now living in a world where its possible to tie all of these different pieces of our datacenter together.

 

Automation and orchestration doesn’t stop at just creating VMs either – there’s room for it throughout the whole VM life cycle.  The concept of the self-healing datacenter comes to mind – having scripts and actions performed automatically by monitoring software in efforts to fix issues within your environment as the occur – this is all made possible by automation.

 

So with this I think we can all conclude that automation is a key player within our environments but the questions always remains – should I automate task x?  Meaning, will the time savings and benefits of creating the automation supersede the efforts and resources it will take to create the process?  So with all this in mind I have a few questions- Do you use automation and orchestration within your environment?   If so what tasks have you automated thus far?  Do you have a rule of thumb that dictates when you will automate a certain task?  Believe it or not there are people within this world that are somewhat against automation, whether it be in fear of their jobs or simply not adapting – how do you help “push” these people down the path of automation?

Government information technology administrators long have been trained to keep an eye out for the threats that come from outside their firewalls. But what if the greatest threats actually come from within?

 

According to a federal cybersecurity survey we conducted last year, that is a question that many government IT managers struggle to answer. In fact, a majority of the 200 respondents said they believe malicious insider threats are just as damaging as malicious external threats.

 

The threat of a careless user storing sensitive data on a USB drive left on a desk can raise just as much of a red flag as an anonymous hacker. Technology, training and policies must be consistently deployed, and work together, to ensure locked-down security.

 

Technology

 

Manual network monitoring is no longer feasible, and respondents identified tools pertaining to identity and access management, intrusion prevention and detection, and security incident and event management or log management as “top tier” tools to prevent internal and external threats.

 

Each solution offers continuous and automatic network monitoring, and alerts. Problems can be traced to individual users and devices, helping identify the root cause of potential insider threats. Most importantly, administrators can address potential issues far more quickly.

 

However, tools are just that—tools. They need to be supported with proper procedures and trained professionals who understand the importance of security and maintaining constant vigilance. 

 

Training

 

According to the survey, 53 percent of respondents claim careless and untrained insiders are the largest threat at federal agencies, while 35 percent stated “lack of IT training” is a key barrier to insider threat detection. IT personnel should be trained on technology protocols and the latest government security initiatives and policies and receive frequent and in-depth information on agency-specific initiatives that could impact or change the way security is handled throughout the organization.

 

All employees should be aware of the dangers and costs of accidental misuse of agency information or rogue devices. Forty-seven percent of survey respondents stated employee or contractor computers were the most at-risk sources for data loss. Human error often can prove far more dangerous than explicit intent.

 

Policies

 

When it comes to accidental or careless insider threats, 56 percent of survey respondents were somewhat confident in their security policies, while only 31 percent were “very confident.” 

 

Agency security policies, combined with federal policies, serve as a security blueprint and are therefore extremely important. They should plainly outline the agency’s overall security approach and include specific details such as authorized users and use of acceptable devices.

 

As one of the survey respondents said: “Security is a challenge, and the enemy is increasingly sophisticated.” More and more, the enemy attacks from all fronts—externally and internally. Federal IT managers clearly need to be prepared to combat the threat using their own three-pronged attack of technology, training and policies.

 

Find the full article on Signal.

The short answer to the question in the title is NO, backup speed and restore speed are no longer related.  There are a number of reasons why this is the case.

 

Let's go back in time to understand the historical reasons behind this question.  Historically, backup was a batch process that was sent to a serial device.  Various factors led to the commonly used rule of thumb that restores took 50% to 100% longer than the full backup that created them.  This started with the fact that a restore started with first reading the entire full backup, which at a minimum would take the same amount of time as creating the full backup.  Then once that happened multiple incremental backups had to be read, each of which added time to the restore due to them time involved in loading multiple tapes.  (It wasn't that long ago that all backups were to tape.)   Also because backups were sent to tape, it was not possible to do the kind of parallel processing that today's restores are capable of.

 

The first reason why backup and restore speed are no longer related is actually negative.  Today's backups are typically sent to a device that uses deduplication.  While deduplication comes with a lot of benefits, it also can come with one challenge.  The "dedupe tax," as its referred to, is the difference between a device's I/O speed with and without deduplication.  Depending on how dedupe is done, backup can be much faster than restore and vice versa.

 

The second -- and perhaps more important -- reason why backup and restore speed are unrelated is that backups and restores don't always use the same technology any more.  Where historically both backups and restores were a batch process that simply copied everything from A to B, today's backups and restores can actually be very different from each other.  A restore may not even happen, for example.  If someone uses a CDP or near-CDP product, a "restore" may consist of pointing the production app to the backup version of that app until the production version of that app can be repaired.  Some backup software products also have the ability to do a "reverse restore" that identifies the blocks or files that have been corrupted and only transfer and overwrite those blocks or files.  That would also be significantly faster than a traditional restore.

 

One thing hasn't changed: the only way you can know the speed at which a restore will run is to test it.  Sometimes the more things change the more they stay the same.

It's it is a general rule to have one backup methodology or product if that is possible.  But it is also true that it is not always possible or even advisable in any given situation.

 

The perpetuation of virtualization is a perfect example.  For too many years, the virtualization backup capabilities of the leading backup products could be described as anything but "leading."  This led to an entire sub-category of products designed specifically for backing up virtual systems.  Unfortunately, these same products have generally eschewed support for physical systems.  Since most customers have both virtual and physical servers, this ends up requiring them to purchase and manage multiple backup products.

 

Another reason for a customer purchasing multiple backup products may be they have an application that requires the capabilities of a CDP or near-CDP product.  These products can provide very low-impact backups and extremely fast "instant" restores, so many companies have started using them for their applications demanding tight RTOs and RPOs.  But they're not necessarily ready to replace all of the backup software they've already purchased in favor of this new way of doing backup.  This again leaves them with the requirement to manage multiple products.

 

There are many challenges with using multiple backup products, the first of which is that they all behave differently and should be configured differently.  One of the most common ways to make a backup system work poorly is treat it like a different product.  TSM and NetBackup couldn't be more different than one another, but many people move from one of these to the other -- and still try to configure the new product like it is the old product.  The solution to this is simple: get training on the new product and consider hiring -- at least tempoarily -- the services of a specialist in that product to make sure you are configuring it the way it likes to be configured.

 

Another challenge is that each product reports on how it is performing in different ways.  They may use different metrics, values, and terms.  They also use different delivery mechanisms.  One may use email, where another may use XML or HTML to report backup success or failure.  The key here is to use a third party reporting system that can collect and analyze the various product and normalize them into a single reporting system.

 

Avoid having multiple backup products when you can.  When you can't, employ significant amounts of training and look into a third-party reporting tool that can handle all of the products you are using.

This is one of the common questions IT pros face in their job every day. Whether yours is a large organization with different teams for network and application management, or a small IT department with one IT guy trying to figure out where the problem is, the challenge of identifying the root cause is the same. Though the network and servers are part of the same IT infrastructure, they are two very different entities when it comes to monitoring and management.

 

Take the example of end-users complaining of slow Web connectivity and performance. Where do you think the problem is?

  • Is there a problem on the in-house network?
  • Is it a fault on the ISP’s side?
  • What about the IIS server?
  • Is it the virtual environment hosting IIS?

 

There could be many more possibilities to investigate. Making a list of questions and manually checking each part of the infrastructure is definitely what you don’t want to do. Visibility into the network and the application side of the house is key. Integrated visibility of network and application performance is the best thing an IT admin could ask for in this scenario.

 

As you probably know, Network Performance Monitor (NPM) v12 is out with NetPath™ technology. The power of NPM can be enhanced when integrated with Server & Application Monitor (SAM). In the image below, I’m making a modest attempt to explain the value of end-to-end visibility that you get across the service delivery path, from inside your network to outside your network, from on-premises LAN across the WAN and cloud networks.

 

PROBLEM/USE CASE

Users in both the main and branch offices are reporting slow performance of intranet sites. What is causing the problem and why? Is it the network, or the systems infrastructure that needs troubleshooting, or both?

NPM+SAM Image.PNG

HOW THE ROOT CAUSE WAS FOUND USING NPM and SAM

  • Using hop-by-hop critical path analysis available with NetPath in NPM to isolate problem in the ISP network.
  • Using application and server monitoring in SAM to pinpoint performance issue in the Web server environment.
  • The Quality of Experience (QoE) dashboard available in and common to both NPM and SAM analyzes network packets to provide insight into network and application response times for specific application traffic. This helps confirm the symptoms of the problem at both sides—network and the application.

 

BENEFITS USING NPM and SAM TOGETHER

  • One unified Web interface and NOC for monitoring and troubleshooting networks and applications infrastructure.
  • Eliminate finger-pointing between network and system teams. Find out exactly where the root cause is for faster troubleshooting.
  • NPM and SAM are built on the Orion® Platform, sharing common services such as centralized alerting, reporting, and management.
  • Easy to install, set up, and use. Automatically discover your environment and start monitoring.

 

If you already are a user of NPM and SAM, please comment below on how you benefitted from using them together?

Kansas City VMUG USERCONCarolina VMUG USERCON
KC VMUG USERCON.jpgCarolina VMUG USERCON.jpg

 

As I promised, I am posting the presentation from my speaking session at the two VMUG USERCONs - Carolina and Kansas City in June. I am thankful to all of the VMUG community members who stopped by to converse with chrispaap and jhensle about SolarWinds and monitoring with discipline. I am so appreciative of the packed rooms of attendees at both events, who decided to join me in my speaking session. I hope that I gave you something meaningful and valuable, especially since the community has been given me so much.

 

Attached is my slide deck. Let me know what you think in the comment section below. Thanks.

Every day, help desk pros stay busy by tracking down tickets, organizing them, assigning resources, and updating statuses. Have you ever wondered if being so busy is a good thing? Are you doing the right things at the right time?

 

Today, the increasing adoption of evolving technologies, such as BYOD and enterprise mobility, and the growing mobile workforce, require help desk pros to be super-productive in delivering IT support anywhere, anytime. To meet today’s rapidly growing end-user needs, help desk pros need to save time by cutting down trivial tasks, such as organizing tickets, and spend more time resolving issues and delivering real value to customers.

 

A typical IT service request lifecycle looks like this:

 

Picture1.png

 

In general, help desk technicians spend more time in the first half of the lifecycle, when they should be focusing more on the latter half, which drives results and delivers value to customers. Here are a few simple tips you can follow to help save time in your daily help desk operations:

 

  1. Ticket funneling: Create a centralized help desk system that can let your users submit tickets easily via email or online, auto-populate information provided by users to help technicians determine the severity of the issue, and automatically alert users about the nature of the issue and estimated completion time.
  2. Ticket prioritization: Configure your help desk system to automatically categorize tickets based on their criticality, end-user priority, technical expertise, and more. This will help you instantly identify the nature of the issue, understand the business impact, set Service Level Agreements (SLAs), and prioritize tickets that need your time today.
  3. Ticket routing: End-users often blame help desk pros when their issues aren’t quickly resolved. But the fact is, one can’t expect a help desk admin to simultaneously fix a network issue, replace a faulty projector, and help with a password reset. Based on issue type and criticality, you need to assign tickets to technicians who have expertise in handling those specific issues. This can be achieved by setting up automated workflows in your help desk system that can help route trouble tickets and assign them to the right technician at the right time.
  4. Reduce time-to-resolution: Clearly, end-users want their issues resolved as soon as possible. To do this, the IT pro may need to access the end-user’s PC remotely, get more information from users, restart servers, etc. Ideally, your help desk should seamlessly integrate with remote support and communication and troubleshooting tools to help you get all the information you need quickly to resolve issues faster.
  5. Asset mapping: Gathering asset details, licensing information, data about the hardware and software installed on end-user computers, etc. is the most time-consuming task in help desk support. It is much easier to use a help desk system to automatically scan and discover installed IT assets, procure asset details, manage IT inventory, map assets with associated tickets, etc.
  6. Encourage self-service: The most effective way to resolve trivial issues is to help end-users learn how to resolve such things on their own. Minor issues, such as password resets, software updates, etc. can be fixed by end-users if proper guidance is provided. Shape your help desk as a self-service platform where users can find easy-to-fix solutions for common issues and resolve them without submitting help desk tickets.

 

By following these simple tips, you can save time and deliver more value to your end-users. If you want more information, check out this white paper that reviews major tasks performed by help desk analysts and IT support staff, and discusses how to simplify and automate those tasks.

 

How have you simplified your IT support tasks?

 

Share your help desk and remote support best practices so we can all benefit.

In the world of information technology, those of us who are tasked with providing and maintaining the networks, applications, storage, and services to the rest of the organization, are increasingly under pressure to provide more accurate, or at least more granular, service level guarantees. The standard quality of service (QoS) mechanisms we have used in the past are becoming more and more inadequate to properly handle the disparate types of traffic we are seeing on the wire today. In order to continue to successfully provide services in a guaranteed, deliberate, measurable, and ultimately very accurate manner, is going to require different tools and additional focus on increasingly more all encompassing ecosystems. Simply put: our insular fiefdoms are not going to cut it in the future. So, what are we going to do about the problem? What can we do to increase our end to end visibility, tracking, and service level guarantees?

 

One of the first things we ought to do is make certain that we have, at the very least, implemented some baseline quality of service policies. Things like separating video, voice, regular data, high priority data, control plane data, etc., seem like the kind of thing that should be a given, but every day I am surprised by another network that has very poorly deployed what QoS they do have. Often I see video and voice in the same class, and I see no class for control plane traffic; my guess is no policing either, but that is another topic for another day. If we cannot succeed at the basics, we most certainly should not be attempting anything more grandiose until we can fix the problems of today.

 

I have written repeatedly on the need to break down silos in IT, to get away from the artificial construct that says one group of people only control one area of the network, and have only limited interaction with other teams. Many times, as a matter of fact, I see such deep and ingrained silos that the different departments do not actually converge, from a leadership perspective, until the CIO. This unnecessarily obfuscated the full network picture from pretty much everyone. Server teams know what they have and control, storage teams are the same, and on down the line it goes with nobody really having an overall picture of things until you get far enough into the upper management layer that the fixes become political, and die by the proverbial committee.

 

In order to truly succeed at providing visibility to the network, we need to merge the traditional tools, services, and methodologies we have always used, with the knowledge and tools from other teams. Things like application visibility, hooks into virtualized servers, storage monitoring, wireless and security and everything in between need to be viewed as one cohesive structure on which service guarantees may be given. We need to stop looking at individual pieces, applying policy in a vacuum, and calling it good. When we do this it is most certainly not good or good enough.

 

We really don’t need QoS, we need full application visibility from start to finish. Do we care about the plumbing systems we use day to day? Not really, we assume they work effectively and we do not spend a lot of time contemplating the mechanisms and methodologies of said plumbing. In the same way, nobody for whom the network is merely a transport service cares about how things happen in the inner workings of that system, they just want it to work. The core function of the network is to provide a service to a user. That service needs to work all of the time, and it needs to work as quickly as it is designed to work. It does not matter to a user who is to blame when their particular application quits working, slows down, or otherwise exhibits unpleasant and undesired tendencies, they just know that somewhere in IT, someone has fallen down on the job and abdicated one of their core responsibilities: making things work.

 

I would suggest that one of the things we should certainly be implementing, looking at, etc., is a monitoring solution that can not only tell us what the heck the network routers, switches, firewalls, etc., are doing at any given time, but one in which applications, their use of storage, their underlying hardware—virtual, bare metal, containers—and their performance are measured as well. Yes, I want to know what the core movers and shakers of the underlying transport infrastructure are doing, but I also want visibility into how my applications are moving over that structure, and how that data becomes quantifiable as relates to the end user experience.

 

If we can get to a place where this is the normal state of affairs rather than the exception, using an application framework bringing everything together, we’ll be one step closer to knowing what the heck else to fix in order to support our user base. You can’t fix what you don’t know is a problem, and if all groups are in silos, monitoring nothing but their fiefdoms, there really is not an effective way to design a holistic, network-wide solution to the quality of service challenges we face day to day. We will simply do what we have always done and deploy small solutions, in a small way, to larger problems, then spend most of our time tossing crap over the fence to another group with a “it’s not the network” thrown in as well. It’s not my fault, it must be yours. And at the end of the day, the users are just wanting to know why the plumbing isn’t working and the toilets are all backed up.

Leon Adato

DevOpsDays DC Denoument

Posted by Leon Adato Expert Jun 22, 2016

DODDC_logo.pngThe SolarWinds booth at DevOpsDays DC represents SolarWinds' third appearance at the event (after Columbus and Austin (https://thwack.solarwinds.com/community/solarwinds-community/geek-speak_tht/blog/2016/05/23/devopsdays-daze) ). I could play up the cliche and say that the third time was the charm, but the reality that we who have attended - myself, Connie (https://thwack.solarwinds.com/people/ding), Patrick (https://thwack.solarwinds.com/people/patrick.hubbard), and Andy Wong - were charmed from the moment we set foot in the respective venues.

 

While Kong (https://thwack.solarwinds.com/people/kong.yang) and Tom (https://thwack.solarwinds.com/people/sqlrockstar) - my Head-Geeks-In-Arms - are used to more intimate gatherings, like VMUGs and SQL Saturdays, I'm used to the big shows: CiscoLive, InterOp, Ignite, VMWorld, and the like. DevOps Days is a completely different animal, and here's what I learned:

 

 

Focus

The people coming to DevOpsDays are focused. As much as I love to wax philosophical about all things IT, and especially about all things monitoring, the people who I spoke with wanted to stay on topic. That meant cloud, continuous delivery, containers, and the like. While it might have been a challenge for an attention-deficit Chatty Kathy like me, it was also refreshing.

 

 

There was also focus of purpose. DevOpsDays is a place where attendees come to learn, not to be marketed to (or worse, AT). So there are no scanners, no QR codes on the badge, nothing. People who come to DevOpsDays can't be guilted or enticed into giving vendors their info unless they REALLY mean it, and then it's only the info THEY want to give. Again, challenging, but also refreshing.

 

 

Conversations

That focus reaps rewards in the form of real conversations. We had very few drive-by visitors. People who approached the table were genuinely interested in hearing what SolarWinds was all about. They might not personally be using our software (although many were), but they were part of teams and organizations that had use for monitoring. More than once, someone backed away from the booth, saying, "Hang on. I gotta see if my coworkers know about this."

 

 

The conversations were very much a dialogue, as opposed to a monologue. Gone was the typical trade show 10-second elevator pitch. We got to ask questions and hear real details about people's environments, situations, and challenges. That gave us the opportunity to make suggestions, recommendations, or just commiserate.

 

Which meant I had a chance to really think about...

 

 

The SolarWinds (DevOps) Story

"So how exactly does SolarWinds fit into DevOps?" This was a common question, not to mention a perfectly valid one given the context. My first reaction was to talk about the Orion SDK  and how SolarWinds can be leveraged to do all the things developers don't really want to recreate when it comes to monitoring-type activities. Things like:

 

  • A job scheduler to perform actions based on date or periodicity.
  • Built-in account db that hands username/password combinations without exposing them to the user.
  • The ability to push code to remote systems, execute it, and pull back the result or return code.
  • Respond with an automatic action when that result or return code is not what was expected.

 

But as we spoke to people and understood their needs, some other stories emerged:

 

  • Using the Orion SDK to automatically add a system which was provisioned by chef, jenkins, or similar tools into monitoring.
  • Perform a hardware scan of that system to collect relevant asset and hardware inventory information.
  • Feed that information into a CMDB for ongoing tracking.
  • Scan that system for known software.
  • Automatically apply monitoring templates based on the software scan.

 

 

This is part of a continuous delivery model that I hadn't considered until digging into the DevOpsDays scene, and I'm really glad I did.

 

 

Attending the conferences and hearing the talks, I also believe strongly that traditional monitoring - fault, capacity, and performance - along with alerting and automation, are still parts of the culture that DevOps advocates and practitioners don't hear about often enough. And I'm submitting CFP after CFP until I have a chance to tell that story.

 

 

Is SolarWinds a hardcore DevOps tool? Of course not. If anything, it's a hardcore supporter in the "ops" side of the DevOps arena. Even so, SolarWinds tools have a valid, rightful place in the equation, and we're committed to being there for our customers. "There" in terms of our features, and "there" in terms of our presence at these conferences.

 

 

So come find us. Tell us your stories. We can't wait to see you there!

Filter Blog

By date:
By tag: