1 2 3 4 5 Previous Next

Geek Speak

2,496 posts

After you’ve installed your new storage systems and migrated your data onto them, life slows down a bit. Freshly installed systems shouldn’t throw any hardware errors in the first stages of their lifecycle, apart from a drive that doesn’t fully realize it’s DOA. Software should be up to date. Maybe you’ll spend a bit more time to fully integrate the systems into your documentation and peripheral systems. Or deal with some of the migration aftermath, where new volumes were sized too small. But otherwise, it should be “business as usual.”

 

That doesn’t mean you can lie back and fall asleep. Storage vendors release new software versions periodically. The interval used to be a couple of releases a year, apart from the new platforms that might have a few extra patches to iron out the early difficulties. But with the AGILE mindset of developers, and the constant drive to squash bugs and add new features, software is now often released monthly. So, should you upgrade or not?

 

If It Ain’t Broke…

One camp will go to great lengths to avoid upgrading storage system software. While the theory of “if it ain’t broke, don’t fix it!” has its merits up to some point, it usually comes from fear. Fear that a software upgrade will go wrong and break something. Let’s be honest though: over time, the gap between your (old) software version and the newer software only becomes bigger. If you don’t feel comfortable with an upgrade path from 4.2.0 to 4.2.3, how does an upgrade path from 4.2.0 to 5.0.1 make you feel? Especially if your system shows you an uptime of 800+ days?

 

On the other hand, there’s no need to rush either. Vendors perform some degree of QA testing on their software, but it's usually a safe move to wait 30-90 days before applying new software to your critical production systems. Try it on a less critical system first, or let the new installs in the field flush out some additional bugs that slipped through the net. Code releases have been revoked more than once, and you don’t want to be hitting any new bugs while patching old bugs.

 

Target and latest revisions

Any respectable storage vendor should at the very least have a release matrix that shows release dates, software versions, adoption rates, and the suggested target release. This information can help you balance “latest features and bugfixes” versus “a few more new bugs that hurt more than the previous fixes.”

 

Again, don’t be lazy and hide behind the target release matrix. Once a new release comes out, check the release notes to see if anything in it applies to your environment. Sometimes it does really make sense to upgrade immediately, like with critical security or stability patches. Often, the system will check for the latest software release and show some sort of alert. In the last couple of months, I’ve seen patches for premature SSD media wear, overheating power supplies that can set fire to your DC, and a boatload of critical security patches. If you keep up-to-date with code and release notes, it doesn’t even take that much time to scroll through the latest fixes and feature additions.

 

One step up, there’s also vendors that look beyond a simple release matrix. They will look at your specific system and configuration, and select the ideal release and hotfixes for your setup. All this will be based on a bunch of data they collect from their systems at customers around the globe. And if you fall behind in upgrades and need intermediate updates, they will even select the ideal intermediate upgrades, blacklisting the ones that don’t fit your environment.

 

How often do you upgrade your storage systems? And what’s your biggest challenge with these upgrades? Let me know in the comments below!

Had a great time in Antwerp last week for Techorama, but it feels good to be home again. Summer weather is here and I'm looking forward to taking the Jeep out for a ride.

 

As always, here are some links from the Intertubz that I hope will hold your interest. Enjoy!

 

The U.S. military is funding an effort to catch deepfakes and other AI trickery

As technology advances, it becomes easier for anyone to create deepfakes, and the U.S. military isn’t sure what they can do to solve the problem.

 

U.S. Launches Criminal Probe into Bitcoin Price Manipulation

Fake money is a scam, pure and simple. And as is often the case, our laws to protect people lag far behind the advances in technology.

 

Economic Inequality is the Norm, Not the Exception

Interesting analysis on how wealth gets distributed over time.

 

Here's how the Alexa spying scandal could become Amazon's worst nightmare

I’ve talked before about how Amazon and Microsoft deal in trust more than anything else. But this scenario isn’t as much as about trust as it is the equivalent of a butt-dial. Alexa performed a task as designed. We didn’t stop using smartphones because of butt-dials, and we won’t stop using Alexa either.

 

Your Professional Growth Questionnaire

Nice summary of questions to ask yourself as part of a self-review process. I especially like the mention of a 360. I wish more companies looked to conduct similar methods of collecting data regarding employee performance.

 

The Places in the U.S. Where Disaster Strikes Again and Again

"About 90 percent of the total losses across the United States occurred in ZIP codes that contain less than 20 percent of the population." Wonderful data analysis and visualizations. It’s article like this that remind me how data is beautiful. Oh, and where not to live, too.

 

I was able to spend a few hours in Ghent last week, a beautiful city to pass the time:

Data management is one of the aspects of information technology that is regularly overlooked, but with the explosion of data in recent years and the growth of data only accelerating, organizations need to get a handle on these large amounts of data. With the digital transformation that organizations are going through, being able to filter out worthless data is critical given that many organizations are now unlocking competitive advantages based on the data they are gathering and generating. In addition to the growth of data, adoption of public cloud services makes keeping track of data even more difficult with the sprawl of data across on-prem and public cloud. This poses not only an operational concern, but also a security concern from the standpoint of needing to keep track of what data is where. Data management solutions that provide data analysis and classification make this much simpler by providing data about your data to be able to make informed decisions that span capacity planning for data backups to regulatory compliance in regards to data locality.

 

The benefits of managed data management are:

  1. Simplified Deployment -  The data management solutions offered by cloud providers provide a quick and easy way to start getting insight from your data by simplifying the deployment process to leverage either a SaaS solution or fully managed solution which removes much of the heavy lifting of deployment that is common with data management solutions.
  2. Simplified Management - One of the challenges with data insight or big data solutions is the administrative overhead that comes along with managing things like patching or upgrading the software as well as the number of servers associated with the deployment.

 

SaaS Deployment

The following solutions are Software as a Service (SaaS) deployments. This means the data management software company hosts the software for its customers.

 

Veritas Information Map

Veritas Information Map is a SaaS-based multi-cloud data management solution. Information maps provide insights into a company's data both on-prem as well as public clouds such as AWS and Azure.

 

Komprise Intelligent Data Management

Komprise Intelligent Data Management is a SaaS-based data management solution. Komprise leverages existing industry standard protocols for accessing data from a NAS or S3 bucket to provide insights to things such as data access time or who was the last person to access the data. Komprise supports gathering data insights from NFS, CIFS, SMB, Azure, AWS, GCP, and more.

 

Fully Managed

The following solutions are fully managed solutions such that the cloud provider manages your data management platform on your behalf and enables the IT organization to continue to derive valuable insight out of the data without the management hassle.

 

AWS S3 Inventory

The AWS S3 Inventory solution is a simplistic inventory solution of the objects in an AWS S3 bucket along with associated metadata such as the storage class or encryption status.

 

AWS S3 Analytics

The AWS S3 Analytics solution is a storage class tiering recommendation solution that provides data and insight into moving data between storage classes to reduce cost by moving infrequently accessed data to a cheaper storage tier.

 

Azure Storage Analytics

The Azure Storage Analytics solution provides data about various Azure storage solutions such as blob storage, queues, and tables. Storage analytics allows for the creation of charts and graphs to visualize things like data access patterns, including who accessed the data or even where the data was accessed from.

 

Data management can mean a lot of different things to different people given their specific focus for the data, but despite the different use cases, getting actionable insight from data is incredibly valuable and generally difficult. The goal of managed solutions is to help simplify and expedite the return on investment of data analysis.

By Paul Parker, SolarWinds Federal & National Government Chief Technologist

 

Here is an interesting article from my colleague, Joe Kim, in which he points out some of the challenges of managing wireless sensor networks.

 

For several years, government network administrators have tried to turn knowledge into action to keep their networks and data centers running optimally and efficiently. For instance, they have adopted automated network monitoring to better manage increasingly complex data centers.

 

Now, a new factor has entered this equation: wireless sensor networks. These networks are composed of spatially distributed, autonomous sensors that monitor physical or environmental conditions within data centers to detect conditions such as sound, temperature, or humidity levels.

 

However, wireless sensor networks can be extraordinarily complex, as they are capable of providing a very large amount of data. This can make it difficult for managers to get an accurate read on the type of information their connected devices are capturing, which in turn can throw into question the effectiveness of an agency’s network monitoring processes. 

 

Fortunately, there are several steps federal IT managers can take to help ease the burden of managing, maintaining, and improving the efficacy of their wireless sensor networks. By following these guidelines, administrators can take the knowledge they receive from their sensor arrays and make it work for their agencies.

 

Establish a baseline for more effective measurement and security

 

Before implementing wireless sensors, managers should first monitor their wireless networks to create a baseline of activity. Only with this data will teams be able to accurately determine whether or not their wireless sensor networks are delivering the desired results.

 

Establishing a baseline allows managers to more easily identify any changes in network activity after their sensors are deployed, which, in turn, provides a true picture of network functionality. Also, a baseline provides a reference point for potential security issues.

 

Set trackable metrics to monitor performance and deliver ROI

 

Following the baseline assessment, administrators should configure trackable metrics to help them get the most out of their wireless sensor networks. For example, bandwidth monitoring that lets managers track usage over time can help them more effectively and efficiently allocate network resources. Watching monthly usage trends can also help teams better plan for future deployments and adjust budgets accordingly.

 

Metrics (along with the initial baseline) also can help agencies achieve measurable results. The goal is to know specifically what is needed from devices so that teams can get the most out of their wireless sensors. With metrics in hand, managers can understand whether or not their deployments are delivering the best return on investment.

 

Apply appropriate network monitoring tools to keep watch over sensor arrays

 

Network monitoring principles should be applied to wireless sensor networks to help ensure that they continue to operate effectively and securely. For instance, network performance and bandwidth monitoring software can be effective at identifying potential network anomalies and problematic usage patterns. These and other tools can also be used to forecast device scalability and threshold alerts, allowing managers to act on the information that sensors are sending out.

 

These tools, along with the other strategies mentioned above, are designed to do one thing: provide knowledge that can be turned into effective action. Managers can use these practices to bridge the gap between the raw data that their sensors are providing and the steps needed to keep their networks and applications running. And there is nothing scary about that.

 

Find the full article on Government Computer News.

Disasters come in many forms. I’ve walked in on my daughters when they were younger and doing craft things in their bedroom and said “this is a disaster!” When it comes to serious events though, most people think of natural disasters, like floods or earthquakes. But a disaster can also be defined as an event that has a serious impact on your infrastructure or business operations. It could be any of the following events:

  • Security-related (you may have suffered a major intrusion or breach)
  • Operator error (I’ve seen a DC go dark during generator testing because someone forgot to check the fuel levels)
  • Software faults (there are many horror stories of firmware updates taking out core platforms)

 

So how can SNMP help? SNMP traps, when captured in the right way, can be like a distress signal for your systems. If you’ve spent a bit of time setting up your infrastructure, you’ll hopefully be able to quickly recognise that something has gone wrong in your data centre and begin to assess whether you are indeed in the midst of a disaster. That’s right, you need to take a moment, look at the evidence in front of you, and then decide whether invoking your disaster recovery plan is the right thing to do.

 

Your infrastructure might be sending out a bunch of SNMP traps for a variety of reasons. This could be happening because someone in your operations team has deployed some new kit, or a configuration change is happening on some piece of key infrastructure. It’s important to be able to correlate the information in those SNMP traps with what’s been identified as planned maintenance.

 

Chances are,  if you’re seeing a lot of errors from devices (or perhaps lots of red lights, depending on your monitoring tools), your DC is having some dramas. Those last traps received by your monitoring system are also going to prove useful in identifying what systems were having issues and where you should start looking to troubleshoot. There are a number of different scenarios that play out when disaster strikes, but it’s fair to say that if everything in one DC is complaining that it can’t talk to anything in your other DC, then you have some kind of disaster on your hands.

 

What about syslog? I like syslog because it’s a great way to capture messages from a variety of networked devices and store them in a central location for further analysis. The great thing about this facility is that, when disaster strikes, you’ll (hopefully) have a record of what was happening in your DC when the event occurred. The problem, of course, is that if you only have one DC, and only have your syslog messages going to that DC, it might be tricky to get to that information if your DC becomes a hole in the ground. Like every other system you put into your DC, it’s worth evaluating how important it is and what it will cost you if the system is unavailable.

 

SNMP traps and syslog messages can be of tremendous use in determining whether a serious event has occurred in your DC, and understanding what events (if any) lead up to that event occurring. If you’re on the fence about whether to invest time and resources in deploying SNMP infrastructure and configuring a syslog repository, I heartily recommend you look to leverage these tools in your DC. They’ll likely come in extremely handy, and not just when disaster strikes.

     When it comes to application performance management, the main focus is the application. That is the main concern for end-users. They do not care about network performance, server performance, or any other metrics that can be measured. They only care about the application they are trying to use. Development teams and server admins are usually the main parties involved in the monitoring and management process for applications, but the reality is that all of this runs over a network. To take performance management to the next level, it only makes sense to bring all parties to the table.

 

First Steps: The Internal Network

 

     When examining the role of the network in an application’s performance, the first step is the internal network. If your application is internal, this may be the only step to focus on for your application. Whether it is east\west in your server and user environments or north\south to your firewalls, network monitoring needs to be involved. Let’s see if this sounds familiar...

 

Application is having issues -> Dev team receives a trouble ticket -> They consult with the server team to find the root cause -> Once all of their options are exhausted, the network team is consulted as the next step.

 

     After this long process, when the network team is finally consulted, they realize a small issue on the uplinking switch. After a quick fix, everything is back to normal. That is where the issue lies for a lot of environments. A tiered approach adds unneeded steps to the troubleshooting process of application performance management. I like to think of it as almost a hub-and-spoke environment with the application being the hub, and each spoke is a supporting team.

 

Hub and Spoke Application Monitoring Structure

 

     By doing this, all parties are included in the process of application performance management. Each of them could be the first alerted party if there is an issue, and address the problem directly. This sets a good base for ensuring uptime and optimal performance for an application.

 

Monitoring External Network Performance

 

     Creating a strong structure to application performance management is the first step. Once this is mastered on the internal network, the network team has an additional step of monitoring the external network performance. This is why it is crucial that the network team is included in monitoring applications. For example, there could be a routing issue between ISPs causing latency for external users accessing your application. Server analytics would show performance within acceptable tolerances and the development team would not see any errors either, yet users could be having a poor experience using the application. If proper steps were taken to monitor the external network, this issue could easily be detected, resolved, and communicated to all affected users. One example of managing the external network is toolsets that allow for multiple endpoints to be used for testing all over the world. Communicated stats could be anything from ping latency to overall bandwidth from all of these different locations.

 

The fact is that utilizing the network team in application performance management is a no-brainer. Reducing troubleshooting and problem resolution times are a couple of things that any technical team could get behind. Next time you are planning a management and monitoring structure, be sure to focus on the network as well as the application itself.

I’m in Antwerp this week for Techorama. It’s a wonderful event in a great location, the Kinepolis, a 24-screen theater that can hold about 9,000 people in total. If you are in or around Antwerp this week, stop by and say hello.

 

As always, here are some links from the Intertubz that I hope will hold your interest. Enjoy!

 

One-of-a-Kind Private Train Takes On Florida’s Traffic Nightmare

As much as I love autonomous vehicles, I know that having a modern rail system would be even better for our country. Here’s hoping Florida can get it done and lead the way for others to follow.

 

Are Low-skilled Jobs More Vulnerable to Automation?

For anyone that has ever been involved in a theoretical discussion regarding what jobs automation will replace next. Sometimes the jobs we think are easiest or best for machines are not. And some jobs (like automated index tuning for databases) are a lot easier for a robot than a human.

 

AI and Compute

Interesting analysis into the volume of AI projects in use now compared to six years ago. Consumption appears to double every 3.5 months. This rise in consumption is something a legacy data center would never be able to keep pace with, and is an example of where the cloud shines as an infrastructure provider.

 

Reaching Peak Meeting Efficiency

Yes, meetings are a necessary and important part of corporate life. And no one likes them. It’s time we all underwent some ongoing training on how to make meetings an efficient use of our time.

 

'Sexiest Job' Ignites Talent Wars as Demand for Data Geeks Soars

At first these salaries seem silly. But there is a dearth of people that can analyze data properly in the world. And the value a company gains from such insights makes up for such high salaries. Face it folks, the future isn’t in databases, it’s in the data.

 

The Internet of Trash: IoT Has a Looming E-Waste Problem

Here’s the real garbage collection problem with technology today: billions of IoT devices with short lifespans. Might be wise to invest in companies that specialize in cleanup and recycling of these devices.

 

Uh, Did Google Fake Its Big A.I. Demo?

I think the word “faked” is meant for a clickbait headline here. Chances are Google did some editing to make it presentable. The bigger issue, of course, is just how human the interaction seemed. And that has people more upset than if it was faked entirely.

 

The Techorama Welcome Kit left in my room was a nice touch, and they almost spelled my name correctly:

By Paul Parker, SolarWinds Federal & National Government Chief Technologist

 

Despite ploughing more than £50m into the digital transformation of the NHS, evidence suggests only a handful of trusts have adopted the Government’s “cloud first” policy.

 

NHS Digital spent more than £32m on digital transformation consultancy services, and £23m with cloud, software, and hardware providers between April and December 2017.

 

But, its bosses must be questioning why, when less than a third of NHS trusts surveyed in January have adopted any level of public cloud. Mistrust soars, according to recent findings from IT management software provider, SolarWinds.

 

The research questioned more than 200 NHS trusts and revealed that, while respondents were aware of the government’s policy, less than a third have begun the transition.

 

Of those who have yet to adopt any level of public cloud, 64% cited security concerns, 57% blamed legacy tech, and 52% said budgets were the biggest barriers.

 

However, for respondents who had adopted some public cloud, budget registered as far more of a barrier (66%), followed by security and legacy technology (59% each).

 

Eight percent of NHS trusts not using public cloud admitted they were using 10 or more monitoring tools to try and control their environment, compared to just 5% of NHS trusts with public cloud.

 

In addition, monitoring and managing the public cloud remains an issue, even after adoption, with 49% of trusts with some public cloud struggling to determine suitable workloads for the environment.

 

Other issues included visibility of cloud performance (47%) and protecting and securing cloud data (45%).

 

Six percent of NHS trusts still expect to see no return on investment at all from public cloud adoption.

 

Speaking exclusively to BBH about the findings, SolarWinds’ chief technologist of federal and national government Paul Parker said, “Cloud is this wonderful, ephemeral term that few people know how to put into a solid thought process.

 

“From the survey results, what we have seen is that the whole “cloud first” initiative has no real momentum and no one particularly driving it along.

 

“While there are a lot of organizations trying to get off of legacy technology, and push to modernize architecture, they are missing out.”

 

Improvements to training are vital, he added.

 

“People tend to believe there’s this tremendous return on investment to moving into the cloud when, in reality, you are shifting the cost from capital to operating expenses. It is not necessarily a cost saving in terms of architecture and people need to recognize that. “Rather than owning the infrastructure, with cloud you are leasing it, so it doesn’t change the level of investment much.”

 

There are savings to be had, though, and they come from converging job roles and improving access to medical records, for example.

 

Parker said, “With cloud, you do not need a monitoring team for every piece of architecture, so while there are savings to be had, they are more operational. With a cloud infrastructure, everything is ready at the click of a button.”

 

Moving forward, he says,

 

“It’s that old adage of ‘evolution, not revolution.’ There needs to be technology training, and the NHS needs to have an overarching goal, rather than simply moving to the cloud.

 

“The first thing trusts need to do is look at their current environment and determine what’s there, what’s critical and what’ non-critical. That will enable them to focus on moving the non-critical things into a cloud environment without jeopardizing anyone’s health or life or affecting security. That will help everyone to better understand the benefits of the cloud and to build trust.

 

“ I would also like to see a top-down approach in terms of policy and direction. The ultimate goal of healthcare services is to make sure people have a better quality of life; and the goal of IT is to make sure they can deliver that. As IT experts, it is our job to try and help them to do that and make IT easier and more affordable.”

 

Find the full article on Building Better Healthcare.

One of the most interesting changes that I have observed in my career is Microsoft shifting from just being a development organization to truly becoming a DevOps team, in the case of the SQL Server team. The product code developers are the operations support for the cloud service hosting Azure SQL Database. Many other large development organizations have this model, but my relationships and experience with SQL Server have allowed me to observe this product much more closely. My major observation is that the major reason dev cycles turn so much faster is to fix problems that in traditional on-premises software would have possibly taken months between patches, or even years between full releases, that now get fixed in the course of a few weeks. This has really changed the way Microsoft releases SQL Server on-premises--after a major release, there is a cumulative update released every month for the first year. This means more problems get fixed faster, and if you are using the cloud service, the release cadence is even faster.

 

I know most of us do not work for hyper-scale public cloud providers with massive development resources, but I used this example because it is a very real-world visible impact into the way a DevOps model can transform an organization. So how does this apply to running our infrastructure organizations? I think even in the most off-the-shelf traditional shop, you should be thinking about how to automate the following work:

 

  • Manual
  • Repetitive
  • Work with no enduring value
  • Work that doesn’t scale with your application

 

As a system admin, your main job task is to keep the lights (and more importantly, the servers) on in your organization. By adopting a mindset of trying to automate as many of these tasks as possible, you will enjoy your job more, and have more time to focus on tasks that are more strategic to your organization. You can take advantage of frameworks like PowerShell and Python scripting in your environments (yes, you will have to write some code) to bring this all together, but this mindset will change the way you view system administration.

 

Where do you start with automation? Identify your most common tasks that consume the most time--for example, if you are using VM templates to deploy your operating system environments (that’s a really fancy way of saying VMs), you have already started down this path. A logical next stop might be to automate the process for keeping your templates up to date with patches, or to use scripting to ensure that any post-deployment configuration tasks happen without human intervention. To do this really well, you should adopt a developer mindset--all of your scripts should use source control, you should work to develop a unit testing process (the one negative to automation is that if you break something badly, you can break a lot of things badly, and quickly), and some aspects of a development methodology to keep the progress moving forward.

 

Moving into this mindset is a big shift for many organizations. However, in modern IT, influences like cloud and the aforementioned rapid development cycles mean that everything just moves faster. The other benefit of this automation, which should be a major influence on what tasks you automate, is the reduction in alert pages. If you can automate away your most common pager responses, your entire team and organization will benefit. The final, largest benefit of automation is that you know a script will run the same every time it's executed, as opposed to your junior admin, who may or may not get a complex task sequence correct each time.

“In the beginning was the word” … oops, “the cloud."

 

Long ago "cloud" was a buzzword, especially when it was a cool word that one knew exactly what it meant. Many companies made investments with an eye for the future focused on high-profile technologies, betting on the success of the cloud. Other companies drove their investments conversely to more reliable traditional solutions. Today the attention is focused on moving these old-fashioned but profitable (and fragile) applications to a cloud environment. The challenge is how to provide these applications the right amount of resources. This appears to be a difficult task because it isn’t simply a matter of quantity of resources. Legacy applications are built to run in a traditional architecture, usually monolithic and rigid. The kind of traditional “create, install, run, and forget it,” solution is on for a field for highly vertical specialists that seem to be going away.

 

Migrate or replace?

Who should make the decision on this migration? A more important decision to make might be: is it worth it to re-code these applications, or is it more convenient in terms of time, money, and features to code them from scratch? Or just move to a cloud-native application instead? I think that the second option would be a best choice. It is true that it will ask for more resources, but once in production it will present several benefits: weekly (or even daily) upgrades to fix any bugs or provide new features, performance, running natively on cloud components, flexibility, and accessibility to several different device types (tablets, phones, different OS/browsers).

 

All these benefits could also be gained by re-coding the old legacy app, but would it be worth it in the long run? The core of the app is still the original one, built with a traditional architecture in mind. And are we sure that the cost of re-coding will be less than the one required to move to a new native app? Let’s consider the training that users will need to use the new app as a cost that will be offset by hyper-specialists no longer being required to maintain a legacy app.

 

Usually the companies most resistant to a cloud native app replacing a legacy one are banking institutions, insurance companies, and healthcare. These organizations are all bound to solid and long time-tested applications, no matter how inefficient, but tested. By tested I mean including bugs (known ones and accepted ones) because solving them would mean a huge effort in terms of cost for development and time loss for services being down.

Educate management crew

If you make these operators understand the benefits of a cloud native application, especially involving the management team because of their leadership of the technical teams below them, you did the best job you could. After this, it's a matter of time because word moves fast in their vertical environments. Outside of those verticals things tend to slow down.

 

For all the cases where companies still reject this kind of app, there’s still the so-called “lift and shift:” keeping legacy apps and moving them to a virtualized infrastructure. It’s a smooth step to push them to the “new world.” From that starting point it could be simpler to expose to management the benefits of virtualization and going forward, presenting a scenario they could move into the future, as in a serverless environment. But this last step should be taken only if the people you’re talking to are technologically advanced, otherwise they could just be scared from the “unknown” and simply decide to put all the stuff back to the original site.

Be sober

In any case, our task is like walking on eggshells – presenting new solutions, keeping an eye on the emotional part of the people you’re talking to, and not being too enthusiastic, but looking professional, responsible, and balanced, speaking not only about benefits, but also (and in some cases, especially) about risks keeping beside the solution for any of these risks.

A storage system on its own is not useful. Sure, it can store data, but how are you going to put any data on it? Or read back the data that you just stored? You need to connect clients to your storage system. For this post, let’s assume that we are using block protocols like iSCSI or traditional block storage systems. This article also applies to file protocols (like NFS and SMB) and to some extent even to hyper-converged infrastructure, but we will get back to that later.

 

Direct attaching clients to the storage system is an option. There is no contention between clients on the ports, and it is cheap. In fact, I still see direct attached solutions in cases where low cost wins over client scalability. However, direct attaching your clients to a storage system does not really scale well in number of clients. Front-end ports on a storage array are expensive and limited.

 

Add some network

Therefore, we add some sort of network. For block protocols, that is a SAN. The two most common used protocols are the FC protocol (FCP) and iSCSI. Both protocols use SCSI commands, but the network equipment is vastly different: FC switches vs. Ethernet switches. Both have their advantages and disadvantages, and IT professionals will usually have a strong preference for either of the two.

 

Once you have settled for a protocol, the switch line speed is usually the first thing that comes up. FC commonly uses 16Gbit and 32Gbit switches that have been entering the market lately. Ethernet, however, is making bigger jumps, with 10Gbit being standard within a rack or wiring closet and 25/40/100Gbit commonly used for uplinks to the data center cores.

 

The current higher speeds of Ethernet networks are often one of the arguments why “Ethernet is winning over FC.” 100Gbit Ethernet has already been on the market for quite some time, and the next obvious iteration of FC is “only” going to achieve 64Gbit.

 

Oversubscription

Once you start attaching more clients to a storage system than it has storage ports, you start oversubscribing. 100 servers attached to 10 storage ports means you have on average 10 servers on each storage port. Even worse, if those servers are hypervisors running 30 virtual machines each, you will now have 300 VMs competing for resources on a single port.

 

Even the most basic switch will have some sort of bandwidth/port monitoring functionality. If it does not have a management GUI that can show you graphs, third-party software can pull that data out of the switch using SNMP. As long as traffic in/out does not exceed 70% you should be OK, right?

 

The challenge is that this is not the whole truth. Other, more obscure limitations might ruin your day. For example, you might be sending a lot of very small I/O to a storage port. Storage vendors often brag with 4KB I/O performance specs. 25,000 4KB IOps only accounts for roughly 100MB/s or 800Mbit (excluding overhead). So, while your SAN port shows a meager 50% utilization, your storage port or HBA could still be overloaded.

 

It becomes more complex once you start connecting SAN switches and distributing clients and storage systems across this network of switches. It is hard to keep track of how much storage and client ports traverse the ISLs (Inter Switch Links). In this case, it is a smart move to keep your SAN topology simple and to be careful with oversubscription ratios. Do the oversubscription math, and look beyond the standard bandwidth graphs. Check error counters, and in an FC SAN that has long distance links, check whether the Buffer-to-Buffer credits deplete on a port.

 

Ethernet instead of FC

The same principles apply to Ethernet. One argument why a company chooses an Ethernet-based SAN is because it already has LAN switches in place. In these cases, be extra vigilant. I am not opposed to sharing a switch chassis between SAN and normal client traffic. However, ports, ISLs, and switch modules/ASICS are prime contention points. You do not want your SAN performance to drop because a backup, restore, or large data transfer starts between two servers, and both types of traffic start fighting for the available bandwidth.

 

Identically, hyper converged infrastructure solutions like VxRail and other VMware VSAN place high demands on the Ethernet uplinks. Ideally, you would want to ensure that VMware VSAN uses dedicated, high-speed uplinks.

Which camp are you in? FC or Ethernet, or neither? And how do you ensure that the SAN doesn’t become a bottleneck? Comment below!

Welcome to another edition of the Actuator. I hope everyone is enjoying some warm spring weather. It's nice to be able to sit outside for an hour at the end of the day.

 

As always, here are some links from the Intertubz that I hope will hold your interest. Enjoy!

 

Alexa and Siri Can Hear This Hidden Command. You Can’t.

Fun fact for you: There is no law against sending subliminal messages to humans, or machines. The practice is discouraged and *may* be considered an invasion of privacy (for humans, not for machines). Another example of where the laws lag far behind the technology.

 

Digital Photocopiers Loaded With Secrets

Not mentioned in the article: the embarrassing photos from office Christmas parties.

 

Stanford Study Shows the Astonishing Productivity Boost of Working From Home

Glad to see we are putting some data into the productivity levels for people working from home. I’ve been doing it for eight years now. I know it’s made me more productive, happier, and healthier. I can’t go back to having a real job, ever.

 

Don't Skype Me: How Microsoft Turned Consumers Against a Beloved Brand

“[Using Skype] is like Tim Tebow trying to be a baseball player.” Ouch.

 

Amazon’s Fake Review Problem

I’ve been frustrated for years with the reviews on Amazon. I find many of them to be fake. These days I focus on the three-star ratings and do my best to discern the truth. To be fair, Amazon is not the only company with an online review problem.

 

Are My Friends Really My Friends?

Interesting analysis showing that despite being surrounded by constant interactions, we are more alone now than ever before.

 

Security researchers discover critical flaw in PGP encryption that reveals plaintext

Everything is terrible.

 

Seems legit:

Public cloud providers have greatly simplified the process of creating a backup, but the challenge has always been managing that at scale with things like policies for retention or simple granular file level restores or regulatory focused dashboards. This is the value added by many of the backup management solutions discussed in the following post and becomes critical once an environment scales past a few instances and databases.

 

The benefits of managed backup management are:

  1. Simplified Management - The backup management solutions offered by public cloud providers are generally account or subscription focused and doesn't offer a holistic view of the entire environment.
  2. Scalability - Fully managed backup and SaaS solutions have been built to scale to the largest environments without any performance impact or major concern for running out of storage space. This eliminates the need to re-architect the backup management deployment to scale with the needs of the organization for things such as keeping data for twice as long because of a new mandate.
  3. Multi-Cloud Support - Many of the legacy backup products that are available on the market only support backing up data to public cloud providers or only support backing up workloads in a single cloud provider. More and more companies are implementing multi-cloud strategies and a solution that supports multiple clouds is essential to simplifying operations.

 

Unmanaged Deployment

The following solutions are unmanaged deployments. This means that the software is available to be installed by the customer or has been packaged in the native cloud format such as an AWS AMI but is not available in one of the cloud provider's marketplace.

 

Rubrik Cloud Data Management

Rubrik Cloud Data Management is a software appliance that can be deployed to AWS, Azure and GCP. The Cloud Data Management platform supports policy based snapshot management along with advanced analytics to generate operational insights.

 

Managed Deployment

The following solutions are managed deployments. This means the backup management software company has added a deployment solution to the respective cloud provider's marketplace to allow the infrastructure to be provisioned with the click of a button.

 

Veritas CloudPoint

Veritas CloudPoint is a backup management solution that supports automated deployment into Azure but supports backing up workloads on AWS, Azure and GCP. In addition to IaaS workloads across the three major clouds CloudPoint also supports application level backups such as Microsoft SQL, MongoDB and AWS Aurora.

 

SaaS Deployment

The following solutions are Software as a Service (SaaS) deployments. This means the backup management software company hosts the software for its customers.

 

Druva Apollo

Druva Apollo is a SaaS solution that provides data protection of AWS EC2, RDS, S3, EBS, and Glacier. Druva Apollo also includes SLA-based snapshot retention policies in addition to tiering to reduce costs as older snapshots are moved to cheaper storage and eventually deleted.

 

Rubrik Polaris

Rubrik Polaris is a SaaS solution that integrates with Rubrik's Cloud Data Management hardware and software appliances to provide a unified management platform for both on-premises and cloud-based workloads.

 

CloudRanger Backup and Recovery

CloudRanger Backup and Recovery is a SaaS solution that provides backup management of AWS EC2, RDS, and Redshift instances using native AWS snapshots. Instance and file level backups are supported along with multi-region and multi-account backup restore points.

 

Fully Managed

The following solutions are fully managed backup management solutions such that the cloud provider manages backups on your behalf.

 

Built-in Snapshots

Public cloud providers allow administrators to create snapshots of virtual machine instances, databases, etc. This doesn't provide a robust feature set in terms of management but does allow administrators to backup and restore to a given point in time.

 

Backup management is an unsexy topic to most but of course has tremendous value when there is a disaster but many of the new backup management solutions are becoming much more than just creating a snapshot via the public cloud providers native snapshot APIs.

By Paul Parker, SolarWinds Federal & National Government Chief Technologist

 

Here is an interesting article from my colleague Joe Kim, in which he discusses what we can expect to see in the future as federal IT professionals.

 

Over the past couple of years, administrators have confronted a rapidly changing landscape that seems to shift overnight. Public vs. private cloud is an old argument; now, administrators are grappling with the reality of implementing and managing hybrid IT infrastructures.

 

As such, their skill sets are being tested like never before. Sixty-two percent of respondents to a recent SolarWinds IT trends survey indicated that hybrid IT has required that they acquire new skills, while 11 percent said it has altered their career path. Meanwhile, 57 percent of public sector organizations have already hired or reassigned IT personnel, or plan to do so, for the specific purpose of managing cloud technologies.

 

The skills that IT administrators are learning today will have a large impact on what IT management will look like ten years from now.

 

From service managers to service consumers

 

Our survey found that 96 percent of respondents have moved at least some applications and aspects of critical infrastructure to the cloud. This migration has caused federal administrators to sharpen their “as-a-service” skills, since many of the tools they are using have become software-defined and exist both on-premises and in hosted environments.

 

Federal IT professionals have gone from being service managers to service consumers who work with cloud providers to manage their infrastructures. In this service-oriented world, administrators are finding themselves interacting more with software than they are with hardware switches and routers. These interactions are precursors to IT practitioners’ inevitable evolution from traditional network managers into areas that may be more familiar to developers, and toward becoming service brokers, rather than service managers.

 

From network manager to network developer

 

Administrators previously needed to be savvy about command lines and hands-on management of network components, but the move toward hybrid IT and Software-as-a-Service (SaaS) applications has greatly reduced the need for these types of skills. Administrators must now begin to be able to manage the different pieces of code that comprise applications and allow those programs to work with each other.

 

Tomorrow’s network administrators will be familiar with application program interfaces (APIs) — essentially app building blocks — and how they can be used to solve common problems, from network management to security challenges. They will create highly customized and dynamic networks that fit the unique needs of their agencies. Furthermore, they’ll have a greater amount of control over these networks, as they will be able to tap into APIs to dictate policy, rules, user access, and more.

 

From servicing people to self-service

 

They’ll also move from being service managers to service brokers. Instead of provisioning more storage or spending their time clicking around user interfaces, they’ll be assigning applications and access rights to individual users so that those users can easily set up services on their own. The standard practice of a user submitting an online request for access to a new application will be a rare occurrence; everything that a person needs or is authorized to use will be at their fingertips in this self-service environment.

 

Network administrators will also have more opportunity to add strategic value to their agencies. Today, administrators spend a lot of time servicing users. Moving toward self-service will allow users to check their own boxes, download their own applications, and authorize their own access, all without having to go through their system administrators. In turn, future administrators will have more time to work on higher value services, such as developing plans for stronger security measures or using predictive analytics to anticipate and remediate network issues.

 

From the future to the past

 

Despite all of these changes, administrators will still need to focus on the “bread and butter” aspects of network management, including performance, availability, and compliance. To ensure success in each of these areas, administrators will need to use many of the same tools and processes that are commonplace right now.

 

Indeed, some of these tools will be even more important than they are today. For instance, network performance monitoring will be critical, particularly as IT becomes increasingly hybrid- and application-based. Agencies will need solutions that provide automated and unfettered insight into the performance of these applications, whether they are on-premises or hosted, just as they do today.

 

Learning these and other solutions will take some work, but there are resources available. Online forums such as SolarWinds’ THWACK provide forums where administrators can exchange ideas and ask questions, and most vendors will be more than willing to offer information on best practices.

 

These resources provide administrators with a chance to hone their skills today while preparing for their future. That future undoubtedly will be challenging, but it also will present many opportunities for federal IT professionals who want to expand their horizons and add more value to their agencies.

 

Find the full article on GovLoop.

The Simple Network Management Protocol (SNMP) has been a key part of managing network devices in the data centre for some time. It really is a pretty simple protocol to work with (hence the name), and I think it’s underrated as a key tool for monitoring unusual events. Unfortunately, SNMP has had some issues over time. One of these has been sending out a lot of information over the network in an insecure fashion. SNMP v3 was developed to address this. Another issue has been that Joe Sysadmin doesn’t always take the time to configure custom strings to use in the environment with the devices he’s trying to manage. Instead, the default “public” community string is left configured on the devices with more access than is required. This kind of behaviour drives information security folks nuts, and has operations staff questioning whether SNMP is worth the hassle.

 

SNMP is an extremely flexible solution that provides a robust framework with which you can leverage things like vendor-specific management information base (MIB) files. You can use these to provide both read-only and write access to networked devices. The advantage to this approach is that you can feed information into your management system that provides useful insights, rather than simply showing whether the device is up or down.

 

Following on from this, alerting that aligns with your devices gives you a better chance of identifying unusual issues in your environment. You could, for example, set your devices to send a trap when local user credentials are used to log in to a device rather than directory credentials. This type of activity may indicate that someone’s up to no good in your environment.

 

Security in your environment isn’t just about people cracking device credentials, though. It’s also about having devices available to provide the appropriate services to applications and their users. Configuring devices to send meaningful information via SNMP as issues occur can be a great way to get to minor problems before they become major issues. If one of your two firewall devices has suffered a failure, your infrastructure is compromised and you need to address the problem. I’ve seen plenty of situations where internal systems failures go unnoticed for far too long, leading to reduced performance in the environment and angst for both the operations staff and end-users. But people don’t just become unhappy with the infrastructure. They start to use workarounds to get their work done, which can involve unsafe practices such as storing unsecured corporate data in personal mailboxes or on publicly accessible file sharing sites.

 

A lot of people would agree that data centre operations can be a difficult thing to do well, particularly at a large scale. There always seems to be some device or another that’s run out of capacity, has a failed component, or has simply stopped doing what it’s meant to do. That’s why tools such as SNMP and syslog can help tremendously with keeping things under control in the DC. There’s a wide range of management systems available in the marketplace that can be used to do some pretty cool stuff with SNMP. Most device that can be deployed in a 19” rack can speak SNMP and syslog, so why not get as much information about what’s happening in your environment as you can? The investment in effort upfront can save you a lot of time and headaches down the trick when things invariably go awry.

Filter Blog

By date: By tag:

SolarWinds uses cookies on its websites to make your online experience easier and better. By using our website, you consent to our use of cookies. For more information on cookies, see our cookie policy.