1 2 3 Previous Next

Geek Speak

2,461 posts

     When it comes to application performance management, the main focus is the application. That is the main concern for end-users. They do not care about network performance, server performance, or any other metrics that can be measured. They only care about the application they are trying to use. Development teams and server admins are usually the main parties involved in the monitoring and management process for applications, but the reality is that all of this runs over a network. To take performance management to the next level, it only makes sense to bring all parties to the table.

 

First Steps: The Internal Network

 

     When examining the role of the network in an application’s performance, the first step is the internal network. If your application is internal, this may be the only step to focus on for your application. Whether it is east\west in your server and user environments or north\south to your firewalls, network monitoring needs to be involved. Let’s see if this sounds familiar...

 

Application is having issues -> Dev team receives a trouble ticket -> They consult with the server team to find the root cause -> Once all of their options are exhausted, the network team is consulted as the next step.

 

     After this long process, when the network team is finally consulted, they realize a small issue on the uplinking switch. After a quick fix, everything is back to normal. That is where the issue lies for a lot of environments. A tiered approach adds unneeded steps to the troubleshooting process of application performance management. I like to think of it as almost a hub-and-spoke environment with the application being the hub, and each spoke is a supporting team.

 

Hub and Spoke Application Monitoring Structure

 

     By doing this, all parties are included in the process of application performance management. Each of them could be the first alerted party if there is an issue, and address the problem directly. This sets a good base for ensuring uptime and optimal performance for an application.

 

Monitoring External Network Performance

 

     Creating a strong structure to application performance management is the first step. Once this is mastered on the internal network, the network team has an additional step of monitoring the external network performance. This is why it is crucial that the network team is included in monitoring applications. For example, there could be a routing issue between ISPs causing latency for external users accessing your application. Server analytics would show performance within acceptable tolerances and the development team would not see any errors either, yet users could be having a poor experience using the application. If proper steps were taken to monitor the external network, this issue could easily be detected, resolved, and communicated to all affected users. One example of managing the external network is toolsets that allow for multiple endpoints to be used for testing all over the world. Communicated stats could be anything from ping latency to overall bandwidth from all of these different locations.

 

The fact is that utilizing the network team in application performance management is a no-brainer. Reducing troubleshooting and problem resolution times are a couple of things that any technical team could get behind. Next time you are planning a management and monitoring structure, be sure to focus on the network as well as the application itself.

I’m in Antwerp this week for Techorama. It’s a wonderful event in a great location, the Kinepolis, a 24-screen theater that can hold about 9,000 people in total. If you are in or around Antwerp this week, stop by and say hello.

 

As always, here are some links from the Intertubz that I hope will hold your interest. Enjoy!

 

One-of-a-Kind Private Train Takes On Florida’s Traffic Nightmare

As much as I love autonomous vehicles, I know that having a modern rail system would be even better for our country. Here’s hoping Florida can get it done and lead the way for others to follow.

 

Are Low-skilled Jobs More Vulnerable to Automation?

For anyone that has ever been involved in a theoretical discussion regarding what jobs automation will replace next. Sometimes the jobs we think are easiest or best for machines are not. And some jobs (like automated index tuning for databases) are a lot easier for a robot than a human.

 

AI and Compute

Interesting analysis into the volume of AI projects in use now compared to six years ago. Consumption appears to double every 3.5 months. This rise in consumption is something a legacy data center would never be able to keep pace with, and is an example of where the cloud shines as an infrastructure provider.

 

Reaching Peak Meeting Efficiency

Yes, meetings are a necessary and important part of corporate life. And no one likes them. It’s time we all underwent some ongoing training on how to make meetings an efficient use of our time.

 

'Sexiest Job' Ignites Talent Wars as Demand for Data Geeks Soars

At first these salaries seem silly. But there is a dearth of people that can analyze data properly in the world. And the value a company gains from such insights makes up for such high salaries. Face it folks, the future isn’t in databases, it’s in the data.

 

The Internet of Trash: IoT Has a Looming E-Waste Problem

Here’s the real garbage collection problem with technology today: billions of IoT devices with short lifespans. Might be wise to invest in companies that specialize in cleanup and recycling of these devices.

 

Uh, Did Google Fake Its Big A.I. Demo?

I think the word “faked” is meant for a clickbait headline here. Chances are Google did some editing to make it presentable. The bigger issue, of course, is just how human the interaction seemed. And that has people more upset than if it was faked entirely.

 

The Techorama Welcome Kit left in my room was a nice touch, and they almost spelled my name correctly:

By Paul Parker, SolarWinds Federal & National Government Chief Technologist

 

Despite ploughing more than £50m into the digital transformation of the NHS, evidence suggests only a handful of trusts have adopted the Government’s “cloud first” policy.

 

NHS Digital spent more than £32m on digital transformation consultancy services, and £23m with cloud, software, and hardware providers between April and December 2017.

 

But, its bosses must be questioning why, when less than a third of NHS trusts surveyed in January have adopted any level of public cloud. Mistrust soars, according to recent findings from IT management software provider, SolarWinds.

 

The research questioned more than 200 NHS trusts and revealed that, while respondents were aware of the government’s policy, less than a third have begun the transition.

 

Of those who have yet to adopt any level of public cloud, 64% cited security concerns, 57% blamed legacy tech, and 52% said budgets were the biggest barriers.

 

However, for respondents who had adopted some public cloud, budget registered as far more of a barrier (66%), followed by security and legacy technology (59% each).

 

Eight percent of NHS trusts not using public cloud admitted they were using 10 or more monitoring tools to try and control their environment, compared to just 5% of NHS trusts with public cloud.

 

In addition, monitoring and managing the public cloud remains an issue, even after adoption, with 49% of trusts with some public cloud struggling to determine suitable workloads for the environment.

 

Other issues included visibility of cloud performance (47%) and protecting and securing cloud data (45%).

 

Six percent of NHS trusts still expect to see no return on investment at all from public cloud adoption.

 

Speaking exclusively to BBH about the findings, SolarWinds’ chief technologist of federal and national government Paul Parker said, “Cloud is this wonderful, ephemeral term that few people know how to put into a solid thought process.

 

“From the survey results, what we have seen is that the whole “cloud first” initiative has no real momentum and no one particularly driving it along.

 

“While there are a lot of organizations trying to get off of legacy technology, and push to modernize architecture, they are missing out.”

 

Improvements to training are vital, he added.

 

“People tend to believe there’s this tremendous return on investment to moving into the cloud when, in reality, you are shifting the cost from capital to operating expenses. It is not necessarily a cost saving in terms of architecture and people need to recognize that. “Rather than owning the infrastructure, with cloud you are leasing it, so it doesn’t change the level of investment much.”

 

There are savings to be had, though, and they come from converging job roles and improving access to medical records, for example.

 

Parker said, “With cloud, you do not need a monitoring team for every piece of architecture, so while there are savings to be had, they are more operational. With a cloud infrastructure, everything is ready at the click of a button.”

 

Moving forward, he says,

 

“It’s that old adage of ‘evolution, not revolution.’ There needs to be technology training, and the NHS needs to have an overarching goal, rather than simply moving to the cloud.

 

“The first thing trusts need to do is look at their current environment and determine what’s there, what’s critical and what’ non-critical. That will enable them to focus on moving the non-critical things into a cloud environment without jeopardizing anyone’s health or life or affecting security. That will help everyone to better understand the benefits of the cloud and to build trust.

 

“ I would also like to see a top-down approach in terms of policy and direction. The ultimate goal of healthcare services is to make sure people have a better quality of life; and the goal of IT is to make sure they can deliver that. As IT experts, it is our job to try and help them to do that and make IT easier and more affordable.”

 

Find the full article on Building Better Healthcare.

One of the most interesting changes that I have observed in my career is Microsoft shifting from just being a development organization to truly becoming a DevOps team, in the case of the SQL Server team. The product code developers are the operations support for the cloud service hosting Azure SQL Database. Many other large development organizations have this model, but my relationships and experience with SQL Server have allowed me to observe this product much more closely. My major observation is that the major reason dev cycles turn so much faster is to fix problems that in traditional on-premises software would have possibly taken months between patches, or even years between full releases, that now get fixed in the course of a few weeks. This has really changed the way Microsoft releases SQL Server on-premises--after a major release, there is a cumulative update released every month for the first year. This means more problems get fixed faster, and if you are using the cloud service, the release cadence is even faster.

 

I know most of us do not work for hyper-scale public cloud providers with massive development resources, but I used this example because it is a very real-world visible impact into the way a DevOps model can transform an organization. So how does this apply to running our infrastructure organizations? I think even in the most off-the-shelf traditional shop, you should be thinking about how to automate the following work:

 

  • Manual
  • Repetitive
  • Work with no enduring value
  • Work that doesn’t scale with your application

 

As a system admin, your main job task is to keep the lights (and more importantly, the servers) on in your organization. By adopting a mindset of trying to automate as many of these tasks as possible, you will enjoy your job more, and have more time to focus on tasks that are more strategic to your organization. You can take advantage of frameworks like PowerShell and Python scripting in your environments (yes, you will have to write some code) to bring this all together, but this mindset will change the way you view system administration.

 

Where do you start with automation? Identify your most common tasks that consume the most time--for example, if you are using VM templates to deploy your operating system environments (that’s a really fancy way of saying VMs), you have already started down this path. A logical next stop might be to automate the process for keeping your templates up to date with patches, or to use scripting to ensure that any post-deployment configuration tasks happen without human intervention. To do this really well, you should adopt a developer mindset--all of your scripts should use source control, you should work to develop a unit testing process (the one negative to automation is that if you break something badly, you can break a lot of things badly, and quickly), and some aspects of a development methodology to keep the progress moving forward.

 

Moving into this mindset is a big shift for many organizations. However, in modern IT, influences like cloud and the aforementioned rapid development cycles mean that everything just moves faster. The other benefit of this automation, which should be a major influence on what tasks you automate, is the reduction in alert pages. If you can automate away your most common pager responses, your entire team and organization will benefit. The final, largest benefit of automation is that you know a script will run the same every time it's executed, as opposed to your junior admin, who may or may not get a complex task sequence correct each time.

“In the beginning was the word” … oops, “the cloud."

 

Long ago "cloud" was a buzzword, especially when it was a cool word that one knew exactly what it meant. Many companies made investments with an eye for the future focused on high-profile technologies, betting on the success of the cloud. Other companies drove their investments conversely to more reliable traditional solutions. Today the attention is focused on moving these old-fashioned but profitable (and fragile) applications to a cloud environment. The challenge is how to provide these applications the right amount of resources. This appears to be a difficult task because it isn’t simply a matter of quantity of resources. Legacy applications are built to run in a traditional architecture, usually monolithic and rigid. The kind of traditional “create, install, run, and forget it,” solution is on for a field for highly vertical specialists that seem to be going away.

 

Migrate or replace?

Who should make the decision on this migration? A more important decision to make might be: is it worth it to re-code these applications, or is it more convenient in terms of time, money, and features to code them from scratch? Or just move to a cloud-native application instead? I think that the second option would be a best choice. It is true that it will ask for more resources, but once in production it will present several benefits: weekly (or even daily) upgrades to fix any bugs or provide new features, performance, running natively on cloud components, flexibility, and accessibility to several different device types (tablets, phones, different OS/browsers).

 

All these benefits could also be gained by re-coding the old legacy app, but would it be worth it in the long run? The core of the app is still the original one, built with a traditional architecture in mind. And are we sure that the cost of re-coding will be less than the one required to move to a new native app? Let’s consider the training that users will need to use the new app as a cost that will be offset by hyper-specialists no longer being required to maintain a legacy app.

 

Usually the companies most resistant to a cloud native app replacing a legacy one are banking institutions, insurance companies, and healthcare. These organizations are all bound to solid and long time-tested applications, no matter how inefficient, but tested. By tested I mean including bugs (known ones and accepted ones) because solving them would mean a huge effort in terms of cost for development and time loss for services being down.

Educate management crew

If you make these operators understand the benefits of a cloud native application, especially involving the management team because of their leadership of the technical teams below them, you did the best job you could. After this, it's a matter of time because word moves fast in their vertical environments. Outside of those verticals things tend to slow down.

 

For all the cases where companies still reject this kind of app, there’s still the so-called “lift and shift:” keeping legacy apps and moving them to a virtualized infrastructure. It’s a smooth step to push them to the “new world.” From that starting point it could be simpler to expose to management the benefits of virtualization and going forward, presenting a scenario they could move into the future, as in a serverless environment. But this last step should be taken only if the people you’re talking to are technologically advanced, otherwise they could just be scared from the “unknown” and simply decide to put all the stuff back to the original site.

Be sober

In any case, our task is like walking on eggshells – presenting new solutions, keeping an eye on the emotional part of the people you’re talking to, and not being too enthusiastic, but looking professional, responsible, and balanced, speaking not only about benefits, but also (and in some cases, especially) about risks keeping beside the solution for any of these risks.

A storage system on its own is not useful. Sure, it can store data, but how are you going to put any data on it? Or read back the data that you just stored? You need to connect clients to your storage system. For this post, let’s assume that we are using block protocols like iSCSI or traditional block storage systems. This article also applies to file protocols (like NFS and SMB) and to some extent even to hyper-converged infrastructure, but we will get back to that later.

 

Direct attaching clients to the storage system is an option. There is no contention between clients on the ports, and it is cheap. In fact, I still see direct attached solutions in cases where low cost wins over client scalability. However, direct attaching your clients to a storage system does not really scale well in number of clients. Front-end ports on a storage array are expensive and limited.

 

Add some network

Therefore, we add some sort of network. For block protocols, that is a SAN. The two most common used protocols are the FC protocol (FCP) and iSCSI. Both protocols use SCSI commands, but the network equipment is vastly different: FC switches vs. Ethernet switches. Both have their advantages and disadvantages, and IT professionals will usually have a strong preference for either of the two.

 

Once you have settled for a protocol, the switch line speed is usually the first thing that comes up. FC commonly uses 16Gbit and 32Gbit switches that have been entering the market lately. Ethernet, however, is making bigger jumps, with 10Gbit being standard within a rack or wiring closet and 25/40/100Gbit commonly used for uplinks to the data center cores.

 

The current higher speeds of Ethernet networks are often one of the arguments why “Ethernet is winning over FC.” 100Gbit Ethernet has already been on the market for quite some time, and the next obvious iteration of FC is “only” going to achieve 64Gbit.

 

Oversubscription

Once you start attaching more clients to a storage system than it has storage ports, you start oversubscribing. 100 servers attached to 10 storage ports means you have on average 10 servers on each storage port. Even worse, if those servers are hypervisors running 30 virtual machines each, you will now have 300 VMs competing for resources on a single port.

 

Even the most basic switch will have some sort of bandwidth/port monitoring functionality. If it does not have a management GUI that can show you graphs, third-party software can pull that data out of the switch using SNMP. As long as traffic in/out does not exceed 70% you should be OK, right?

 

The challenge is that this is not the whole truth. Other, more obscure limitations might ruin your day. For example, you might be sending a lot of very small I/O to a storage port. Storage vendors often brag with 4KB I/O performance specs. 25,000 4KB IOps only accounts for roughly 100MB/s or 800Mbit (excluding overhead). So, while your SAN port shows a meager 50% utilization, your storage port or HBA could still be overloaded.

 

It becomes more complex once you start connecting SAN switches and distributing clients and storage systems across this network of switches. It is hard to keep track of how much storage and client ports traverse the ISLs (Inter Switch Links). In this case, it is a smart move to keep your SAN topology simple and to be careful with oversubscription ratios. Do the oversubscription math, and look beyond the standard bandwidth graphs. Check error counters, and in an FC SAN that has long distance links, check whether the Buffer-to-Buffer credits deplete on a port.

 

Ethernet instead of FC

The same principles apply to Ethernet. One argument why a company chooses an Ethernet-based SAN is because it already has LAN switches in place. In these cases, be extra vigilant. I am not opposed to sharing a switch chassis between SAN and normal client traffic. However, ports, ISLs, and switch modules/ASICS are prime contention points. You do not want your SAN performance to drop because a backup, restore, or large data transfer starts between two servers, and both types of traffic start fighting for the available bandwidth.

 

Identically, hyper converged infrastructure solutions like VxRail and other VMware VSAN place high demands on the Ethernet uplinks. Ideally, you would want to ensure that VMware VSAN uses dedicated, high-speed uplinks.

Which camp are you in? FC or Ethernet, or neither? And how do you ensure that the SAN doesn’t become a bottleneck? Comment below!

Welcome to another edition of the Actuator. I hope everyone is enjoying some warm spring weather. It's nice to be able to sit outside for an hour at the end of the day.

 

As always, here are some links from the Intertubz that I hope will hold your interest. Enjoy!

 

Alexa and Siri Can Hear This Hidden Command. You Can’t.

Fun fact for you: There is no law against sending subliminal messages to humans, or machines. The practice is discouraged and *may* be considered an invasion of privacy (for humans, not for machines). Another example of where the laws lag far behind the technology.

 

Digital Photocopiers Loaded With Secrets

Not mentioned in the article: the embarrassing photos from office Christmas parties.

 

Stanford Study Shows the Astonishing Productivity Boost of Working From Home

Glad to see we are putting some data into the productivity levels for people working from home. I’ve been doing it for eight years now. I know it’s made me more productive, happier, and healthier. I can’t go back to having a real job, ever.

 

Don't Skype Me: How Microsoft Turned Consumers Against a Beloved Brand

“[Using Skype] is like Tim Tebow trying to be a baseball player.” Ouch.

 

Amazon’s Fake Review Problem

I’ve been frustrated for years with the reviews on Amazon. I find many of them to be fake. These days I focus on the three-star ratings and do my best to discern the truth. To be fair, Amazon is not the only company with an online review problem.

 

Are My Friends Really My Friends?

Interesting analysis showing that despite being surrounded by constant interactions, we are more alone now than ever before.

 

Security researchers discover critical flaw in PGP encryption that reveals plaintext

Everything is terrible.

 

Seems legit:

Public cloud providers have greatly simplified the process of creating a backup, but the challenge has always been managing that at scale with things like policies for retention or simple granular file level restores or regulatory focused dashboards. This is the value added by many of the backup management solutions discussed in the following post and becomes critical once an environment scales past a few instances and databases.

 

The benefits of managed backup management are:

  1. Simplified Management - The backup management solutions offered by public cloud providers are generally account or subscription focused and doesn't offer a holistic view of the entire environment.
  2. Scalability - Fully managed backup and SaaS solutions have been built to scale to the largest environments without any performance impact or major concern for running out of storage space. This eliminates the need to re-architect the backup management deployment to scale with the needs of the organization for things such as keeping data for twice as long because of a new mandate.
  3. Multi-Cloud Support - Many of the legacy backup products that are available on the market only support backing up data to public cloud providers or only support backing up workloads in a single cloud provider. More and more companies are implementing multi-cloud strategies and a solution that supports multiple clouds is essential to simplifying operations.

 

Unmanaged Deployment

The following solutions are unmanaged deployments. This means that the software is available to be installed by the customer or has been packaged in the native cloud format such as an AWS AMI but is not available in one of the cloud provider's marketplace.

 

Rubrik Cloud Data Management

Rubrik Cloud Data Management is a software appliance that can be deployed to AWS, Azure and GCP. The Cloud Data Management platform supports policy based snapshot management along with advanced analytics to generate operational insights.

 

Managed Deployment

The following solutions are managed deployments. This means the backup management software company has added a deployment solution to the respective cloud provider's marketplace to allow the infrastructure to be provisioned with the click of a button.

 

Veritas CloudPoint

Veritas CloudPoint is a backup management solution that supports automated deployment into Azure but supports backing up workloads on AWS, Azure and GCP. In addition to IaaS workloads across the three major clouds CloudPoint also supports application level backups such as Microsoft SQL, MongoDB and AWS Aurora.

 

SaaS Deployment

The following solutions are Software as a Service (SaaS) deployments. This means the backup management software company hosts the software for its customers.

 

Druva Apollo

Druva Apollo is a SaaS solution that provides data protection of AWS EC2, RDS, S3, EBS, and Glacier. Druva Apollo also includes SLA-based snapshot retention policies in addition to tiering to reduce costs as older snapshots are moved to cheaper storage and eventually deleted.

 

Rubrik Polaris

Rubrik Polaris is a SaaS solution that integrates with Rubrik's Cloud Data Management hardware and software appliances to provide a unified management platform for both on-premises and cloud-based workloads.

 

CloudRanger Backup and Recovery

CloudRanger Backup and Recovery is a SaaS solution that provides backup management of AWS EC2, RDS, and Redshift instances using native AWS snapshots. Instance and file level backups are supported along with multi-region and multi-account backup restore points.

 

Fully Managed

The following solutions are fully managed backup management solutions such that the cloud provider manages backups on your behalf.

 

Built-in Snapshots

Public cloud providers allow administrators to create snapshots of virtual machine instances, databases, etc. This doesn't provide a robust feature set in terms of management but does allow administrators to backup and restore to a given point in time.

 

Backup management is an unsexy topic to most but of course has tremendous value when there is a disaster but many of the new backup management solutions are becoming much more than just creating a snapshot via the public cloud providers native snapshot APIs.

By Paul Parker, SolarWinds Federal & National Government Chief Technologist

 

Here is an interesting article from my colleague Joe Kim, in which he discusses what we can expect to see in the future as federal IT professionals.

 

Over the past couple of years, administrators have confronted a rapidly changing landscape that seems to shift overnight. Public vs. private cloud is an old argument; now, administrators are grappling with the reality of implementing and managing hybrid IT infrastructures.

 

As such, their skill sets are being tested like never before. Sixty-two percent of respondents to a recent SolarWinds IT trends survey indicated that hybrid IT has required that they acquire new skills, while 11 percent said it has altered their career path. Meanwhile, 57 percent of public sector organizations have already hired or reassigned IT personnel, or plan to do so, for the specific purpose of managing cloud technologies.

 

The skills that IT administrators are learning today will have a large impact on what IT management will look like ten years from now.

 

From service managers to service consumers

 

Our survey found that 96 percent of respondents have moved at least some applications and aspects of critical infrastructure to the cloud. This migration has caused federal administrators to sharpen their “as-a-service” skills, since many of the tools they are using have become software-defined and exist both on-premises and in hosted environments.

 

Federal IT professionals have gone from being service managers to service consumers who work with cloud providers to manage their infrastructures. In this service-oriented world, administrators are finding themselves interacting more with software than they are with hardware switches and routers. These interactions are precursors to IT practitioners’ inevitable evolution from traditional network managers into areas that may be more familiar to developers, and toward becoming service brokers, rather than service managers.

 

From network manager to network developer

 

Administrators previously needed to be savvy about command lines and hands-on management of network components, but the move toward hybrid IT and Software-as-a-Service (SaaS) applications has greatly reduced the need for these types of skills. Administrators must now begin to be able to manage the different pieces of code that comprise applications and allow those programs to work with each other.

 

Tomorrow’s network administrators will be familiar with application program interfaces (APIs) — essentially app building blocks — and how they can be used to solve common problems, from network management to security challenges. They will create highly customized and dynamic networks that fit the unique needs of their agencies. Furthermore, they’ll have a greater amount of control over these networks, as they will be able to tap into APIs to dictate policy, rules, user access, and more.

 

From servicing people to self-service

 

They’ll also move from being service managers to service brokers. Instead of provisioning more storage or spending their time clicking around user interfaces, they’ll be assigning applications and access rights to individual users so that those users can easily set up services on their own. The standard practice of a user submitting an online request for access to a new application will be a rare occurrence; everything that a person needs or is authorized to use will be at their fingertips in this self-service environment.

 

Network administrators will also have more opportunity to add strategic value to their agencies. Today, administrators spend a lot of time servicing users. Moving toward self-service will allow users to check their own boxes, download their own applications, and authorize their own access, all without having to go through their system administrators. In turn, future administrators will have more time to work on higher value services, such as developing plans for stronger security measures or using predictive analytics to anticipate and remediate network issues.

 

From the future to the past

 

Despite all of these changes, administrators will still need to focus on the “bread and butter” aspects of network management, including performance, availability, and compliance. To ensure success in each of these areas, administrators will need to use many of the same tools and processes that are commonplace right now.

 

Indeed, some of these tools will be even more important than they are today. For instance, network performance monitoring will be critical, particularly as IT becomes increasingly hybrid- and application-based. Agencies will need solutions that provide automated and unfettered insight into the performance of these applications, whether they are on-premises or hosted, just as they do today.

 

Learning these and other solutions will take some work, but there are resources available. Online forums such as SolarWinds’ THWACK provide forums where administrators can exchange ideas and ask questions, and most vendors will be more than willing to offer information on best practices.

 

These resources provide administrators with a chance to hone their skills today while preparing for their future. That future undoubtedly will be challenging, but it also will present many opportunities for federal IT professionals who want to expand their horizons and add more value to their agencies.

 

Find the full article on GovLoop.

The Simple Network Management Protocol (SNMP) has been a key part of managing network devices in the data centre for some time. It really is a pretty simple protocol to work with (hence the name), and I think it’s underrated as a key tool for monitoring unusual events. Unfortunately, SNMP has had some issues over time. One of these has been sending out a lot of information over the network in an insecure fashion. SNMP v3 was developed to address this. Another issue has been that Joe Sysadmin doesn’t always take the time to configure custom strings to use in the environment with the devices he’s trying to manage. Instead, the default “public” community string is left configured on the devices with more access than is required. This kind of behaviour drives information security folks nuts, and has operations staff questioning whether SNMP is worth the hassle.

 

SNMP is an extremely flexible solution that provides a robust framework with which you can leverage things like vendor-specific management information base (MIB) files. You can use these to provide both read-only and write access to networked devices. The advantage to this approach is that you can feed information into your management system that provides useful insights, rather than simply showing whether the device is up or down.

 

Following on from this, alerting that aligns with your devices gives you a better chance of identifying unusual issues in your environment. You could, for example, set your devices to send a trap when local user credentials are used to log in to a device rather than directory credentials. This type of activity may indicate that someone’s up to no good in your environment.

 

Security in your environment isn’t just about people cracking device credentials, though. It’s also about having devices available to provide the appropriate services to applications and their users. Configuring devices to send meaningful information via SNMP as issues occur can be a great way to get to minor problems before they become major issues. If one of your two firewall devices has suffered a failure, your infrastructure is compromised and you need to address the problem. I’ve seen plenty of situations where internal systems failures go unnoticed for far too long, leading to reduced performance in the environment and angst for both the operations staff and end-users. But people don’t just become unhappy with the infrastructure. They start to use workarounds to get their work done, which can involve unsafe practices such as storing unsecured corporate data in personal mailboxes or on publicly accessible file sharing sites.

 

A lot of people would agree that data centre operations can be a difficult thing to do well, particularly at a large scale. There always seems to be some device or another that’s run out of capacity, has a failed component, or has simply stopped doing what it’s meant to do. That’s why tools such as SNMP and syslog can help tremendously with keeping things under control in the DC. There’s a wide range of management systems available in the marketplace that can be used to do some pretty cool stuff with SNMP. Most device that can be deployed in a 19” rack can speak SNMP and syslog, so why not get as much information about what’s happening in your environment as you can? The investment in effort upfront can save you a lot of time and headaches down the trick when things invariably go awry.

In a previous Geek Speak blog post, I talked about the viability of practice leader being a career path for IT professionals. A large part of practice leadership is being fluent in vendor technologies, i.e. products. This opens up an interesting set of paths for IT professionals: product management and product marketing.

 

Are you passionate about discovering IT pain points and finding solutions for them? Product management focuses on customers and their problems. A product manager (PM) is the voice of the entire customer spectrum; the PM delivers market metrics that guide product decisions.

 

Do you like to tell stories? If you like sharing your problem-solving knowledge by using a product in a one-to-many fashion, you'd probably enjoy working as a product marketing manager (PMM). As Peter Drucker said, product marketing focuses on the product selling itself; creating a product that people want to buy; and creating an environment that encourages people to buy.

 

Ideally, PMs work to understand real customer problems and using that knowledge to turn solutions into products, while PMMs work to smooth the friction between products and consumers in the marketplace.

 

 

Would you consider transitioning into a new role as PM or PMM? Let me know in the section below.

 

P.S. We are hiring for both PM and PMM positions.

Over the years, I’ve read more than a few articles about the qualities that are found in the best administrators. These articles focus on soft skills, but sometimes will list hard skills as well. In all those years, and in all those articles, I rarely see advice on how those skills are to be applied. It’s as if the author expects the reader to just know what to do with the skills, once acquired.

 

No matter what type of administrator you are (network, database, systems, etc.), the best way to apply your skills is by being responsive and responsible. I wrote about this in my book more than eight years ago, and the advice is still true today.

 

Being responsive means you take action on an item. It matters not if the item or task is your responsibility. For example, if a disk fails and you aren’t a member of the server team, it’s not something for you to fix. If someone is reaching out to you for help, you must be responsive. The person reaching out to you has no idea what tasks you are responsible for, they just need help. You want that customer to have the perception that you are responsive to their needs.

 

The hardest part of being responsive are the hours. It can be difficult to be responsive at all times of the day. And the better you get at your role, the more your services will be in demand.

 

Being responsible is taking ownership for something. It should be common sense that you would take responsibility for tasks that are central to your job role as an administrator. But you should also take responsibility for your mistakes. When something goes wrong it can be easy to deflect blame to other teams or specific people. You must resist that urge.

 

Here’s an example from my past life as a database administrator. It was 3 a.m. and I needed to rebuild a server. It failed because a LUN was erased by mistake (and that mistake replicated, quickly, which is why HA is not the same as DR, but I digress). I needed to restore the master database. We were using a third-party backup software product, and the master restore required some different syntax that didn’t want to work.

 

It took me longer than it should to restore the master. I wasn’t happy. And I could have blamed the backup vendor, or the engineer that wiped out the LUN, or the manager that confused HA and DR. But I knew that wouldn’t help anyone. So, the next day I informed my managers that I could have done better. I outlined a training plan so that any member of my team would be able to perform the same tasks in the correct amount of time. I took responsibility.

 

I didn’t have to inform my managers about the delay. They have no idea how long it takes to restore a master database. But I wanted to show them that I was being responsible. I could have easily blamed others, and nobody would have thought twice.

 

Be responsive and responsible. I believe it pays off in the long run.

The image of a modern office has been rapidly changing in recent years. From the medical field, to sales, to technical support, users are on the road and working from just about everywhere possible. Firewalls, malware protection, and other security practices are great when a user is on-premises at your location, but what happens when they are working on an assignment at the coffee shop or showing off a report at a lunch meeting? How can you ensure these protections are installed and working correctly? The fact is that a network and systems perimeter logically cannot be defined simply by the walls of your location. Users are on the go, and this requires special accommodations from us in the support field.

 

Endpoint Protection

When a user is on location, it is obviously easier to deploy security policies against their device to allow it to use your network in a secure manner in accordance with your security policies. In the past, endpoint meant just providing users a firewall and some antivirus software. With malware and other intrusions getting more advanced, this can no longer be the extent of endpoint security. Modern endpoint protection has progressed to a point where endpoint agents now can contact an organization’s security management systems to get its policies, definitions, etc. This contrasts with the old way where a lot of things were only able to be controlled when a user was on-site. This version of protection will allow an organization to provide maximum security to users when on-site or on the road alike. This can be especially useful for users who are constantly on the go as they will still receive security updates and configuration as soon as they are published while you -- the administrator -- can stay in the loop too.

 

Making Remote Network Access Easy

Security is often the number one priority (as it should be, in my opinion), but to an end-user, ease of use is often the most important. Their focus is often just getting their work done in the easiest and most hassle-free way possible. There are multiple ways to keep a remote user (and the company network) safe and secure while allowing them to work. A couple of the most popular methods are remote or virtual desktops and VPN connections with profiling. Virtual desktops provide remote access to users as if they were sitting at your main location. This could include everything from fileshare access to applications. This makes things easy and uniform company-wide as the virtual desktop image can be maintained, and the system that a user is accessing from is irrelevant. The virtual desktop would remain secure and under your control. The other option is VPN access with profiling. Best of both worlds, right? A user could access network resources with their own device and use their own applications. The profiling aspect would allow you as the administrator to ensure malware protection is installed, the firewall is on, policies are up to date, and the device is current with operating system updates. Both solutions have their merits and a place in certain situations depending on what you might be looking for.

 

The Bottom Line

As I mentioned earlier, I believe security should be the number one focus (as a lot of you do too, I’m sure). I know in my day job, the security of the network is my main focus. In a lot of cases we can’t simply place the network and servers on lockdown to the outside world. That sure would be easier! Working to provide remote access where it's explicitly needed is something we as network and server admins will be tasked with even more going forward. It’s how we all approach that challenge that will determine the security of our organizations going forward.

Following my previous post about logging, I'd like to talk about another tool to manage logs that is more advanced than syslog.

 

Logwatch is essentially a system log analyzer and reporter. It elaborates logs that are simply collected by syslog. This kind of evolution is simplifying the daily job of modern system and network administrators. Logs are everywhere and almost everything produces logs -- not only specific IT systems, but also the elements forming the so-called "Internet of Things," or IoT. The innovation comes from this last application: "things" producing logs could be managed in a smarter way if those logs are analysed and reported to elaborate behavior to perform consequently.

This tool is a very simple one, but, at the same time, is very powerful. In its first form it is only CLI and structured in directories.

 

Shaping Behaviors

The real power (but not the only one) of Logwatch is the possibility to shape its behavior according to the administrator’s needs. Shaping the tool could be simple or more articulated, but it should be possible to allow the administrator to carry out their daily job as easily as possible so they could redirect their attention to solve the issues that these tools could reveal.

 

Customizing also means filtering, as described in my last post. Filtering would be easier if the tool would emphasize the keywords to be filtered. So, the first customization should be catching these keywords using variables in the main configuration. As an example, we could be interested in raising an alert log when a website will return a 500 error, but only after “n” times, and not immediately because we already know that this webserver is supported by an application server, which could be overloaded in particular known situations. No need to produce tons of logs when we already know that the issue occurs in the app server (THIS is the issue to solve, not the error coming from webserver).

 

Preventive modeling

This process implies a previous analysis by the administrator, as you can understand. As usual, this is a tool, and it works according to your decisions. It can’t make its own decisions. Other more advanced tools could give inputs and advice, but not these basics tools.

 

Different services may have different logging configurations. At the same time, some services could be ignored, others overridden. Furthermore, security level – there are services not so critical so that logging accuracy can be lower, other that are high-security related, which need a very high production of logs. These different behaviors are written in Perl in the case of Logwatch.

 

Conversely, there are cases when we need not different behavior but instead a homogeneous output: the case of different IoT components, produced by different vendors and logging in various ways. For an easier and more effective comprehension, customization of log writing and processing can “normalize” the logs written by any of them, and helping a comparison between behaviors.

Syslog evolution

We can compare an advanced tool like Logwatch with more basic ones like syslog based mainly on the scripting and the ability to choose how to behave in different situations. Syslog only allows the administrator to filter the most representative lines for troubleshooting. Logwatch can build a particular structure of the line itself, and then filter it based on this shape, being a friendlier helper for the advanced administrator. In this way, troubleshooting and monitoring will improve, and a better analysis and developing new models to apply to log writing will increase.

May has arrived and so has some warmer weather. It feels good to be able to sit outside. I'm enjoying it while I can; we have only a few weeks of warm weather before the mosquitos arrive.

 

As always, here are some links from the Intertubz that I hope will hold your interest. Enjoy!

 

Twitter urges all users to change passwords after glitch

Just in case you didn’t hear about this, yet. Twitter came forward to say that a “bug” had accidentally caused user passwords to be stored in clear text. I’d like more technical details on how this bug could happen. In the meantime, change your passwords. Or don’t and use this as an excuse to tweet weird things and then say you were hacked.

 

Home DNA Kits: What Do They Tell You?

Not much, even for the dog. Only one company identified the DNA as not human. Something to think about for those folks that are thinking of spending money on such services.

 

The Gambler Who Cracked the Horse-Racing Code

On the heels of the Kentucky Derby comes this riveting story of a man who built and has used a predictive model for horse racing to win $1 billion over the past 30 years. The next time someone says data analytics isn’t a real thing, show them this story.

 

Police Tested Facial Recognition at a Major Sporting Event. The Results Were Disastrous

Then again, maybe data analytics still has a way to go. Or maybe we should put some prize money behind facial recognition efforts. If someone knew they could make a billion dollars with the right predictive model, I’m certain we would have a solution by now.

 

Unroll.me to close to EU users saying it can’t comply with GDPR

If you really want people to know you that don’t care about their privacy, just tell the world you won’t comply with the GDPR. This single act tells me everything I need to know about whether or not I can trust Unroll.me with my data. (The answer is "no.")

 

Yes, it’s Bad. Robocalls, and Their Scams, are Surging.

Robocalls are increasing. People are mad. And nothing is being done to stop the volume of spam phone calls from increasing.

 

This Lego Breakfast Machine Can Make You Bacon And Eggs

Father’s Day is coming up. #JustSayin

 

Spring is finally here, and we all know what that means:

 

Filter Blog

By date: By tag:

SolarWinds uses cookies on its websites to make your online experience easier and better. By using our website, you consent to our use of cookies. For more information on cookies, see our cookie policy.