In the past year I have earned a couple of certificates from the Microsoft Professional Program. One certificate was in Data Science, the other in Big Data. I’m currently working on a third certificate, this one in Artificial Intelligence.

 

You might be wondering why a database guy would be spending so much time on data science, analytics, and AI. Well, I’ll tell you.

 

The future isn’t in databases, but in the data.

 

Let me explain why.

 

Databases Are Cheap and Plentiful

Take a look at the latest DB-Engines rankings. You will find there are 342 distinct databases listed, 138 of those are relational databases. And I’m not sure that’s a complete list, either. But it should help make my point: you have no idea which one of 342 databases is the right one. It could be none of them. It could be all of them.

 

Sure, you can narrow the list of options by looking at categories. You may know you want a relational, or a key-value pair, or even a graph database. Each category will have multiple options, and it will be up to you to decide which one is the right one.

 

So, a decision is made to go with whatever is easiest. And “easiest” doesn’t always mean “best.” It just means you’ve made a decision that allows the project to move forward.

 

Here’s the fact I want you to understand: Data doesn’t care where or how it is stored. Neither do the people curating the data. Nobody ever stops and says “wait, I can’t use that, it’s stored in JSON.” If they want (or need) the data, they will take it, no matter what format it is stored in to start.

 

And the people curating the data don’t care about endless debates on MAXDOP and NUMA and page splits. They just want their processing to work.

 

And then there is this #hardtruth - It’s often easier to throw hardware at a problem than to talk to the DBA.

 

Technology Trends Over the Past Ten Years

Let’s break down a handful of technology trends over the past ten years. These trends are the technology drivers for the rise of data analytics during that time.

 

Business Intelligence software – The ability to analyze and report on data has become easier with each passing year. The Undisputed King of all business analytics, Excel, is still going strong. Tableau shows no signs of slowing down. PowerBI has burst onto the scene in just the past few years. Data analytics is embedded into just about everything. You can even run R and Python through SQL Server.

 

Real-time analytics – Software such as Hadoop, Spark, and Kafka allow for real-time analytic processing. This has allowed companies to gather quality insights into data at a faster rate than ever before. What used to take weeks or months can now be done in minutes.

 

Data-driven decisions – Companies can use real-time analytics and enhanced BI reporting to build a culture that is truly data-driven. We can move away from “hey, I think I’m right, and I found data to prove me right” to a world of “hey, the data says we should make a change, so let’s make the change and not worry about who was right or wrong.” In other words, we can remove the human factor from decision making, and let the data help guide our decisions instead.

 

Cloud computing – It’s easy to leverage cloud providers such as Microsoft Azure and Amazon Web Services to allocate hardware resources for our data analytic needs. Data warehousing can be achieved on a global scale, with low latency and massive computing power. What once cost millions of dollars to implement can be done for a few hundred dollars and some PowerShell scripts.

 

Technology Trends Over the Next Ten Years

Now, let’s break down a handful of current trends. These are the trends that will affect the data industry for the next ten years.

 

Predictive analytics – Artificial intelligence, machine learning, and deep learning are just starting to become mainstream. AWS is releasing DeepLens this year. Azure Machine Learning makes it easy to deploy predictive web services. Azure Workbench lets you build your own facial recognition program in just a few clicks. It’s never been easier to develop and deploy predictive analytic solutions.

 

DBA as a service – Every company that makes database software (Microsoft, AWS, Google, Oracle, etc.) is actively building automation for common DBA tasks. Performance tuning and monitoring, disaster recovery, high availability, low latency, auto-scaling based upon historical workloads, the list goes on. The current DBA role, where lonely people work in a basement rebuilding indexes, is ending, one page at a time.

 

Serverless functions – Serverless functions are also hip these days. Services such as IFTTT make it easy for a user to configure an automated response to whatever trigger they define. Azure Functions and AWS Lambda are where the hipster programmers hang out, building automated processes to help administrators do more with less.

 

More chatbots – We are starting to see a rise in the number of chatbots available. It won’t be long before you are having a conversation with a chatbot playing the role of a DBA. The only way you’ll know it is a chatbot and not a DBA is because it will be a pleasant conversation for a change. Chatbots are going to put a conversation on top of the automation of the systems underneath. As new people enter the workforce, interaction with chatbots will be seen as the norm.

 

Summary

There is a dearth of people that can analyze data today.

 

That’s the biggest growth opportunity I see for the next ten years. The industry needs people that can collect, curate, and analyze data.

 

We also need people that can build data visualizations. Something more than an unreadable pie chart. But that’s a rant for a different post.

 

We are always going to need an administrator to help keep the lights on. But as time goes, on we will need fewer and fewer of them. This is why I’m advocating a shift for data professionals to start learning more about data analytics.

 

Well, I’m not just advocating it, I’m doing it.

After you’ve installed your new storage systems and migrated your data onto them, life slows down a bit. Freshly installed systems shouldn’t throw any hardware errors in the first stages of their lifecycle, apart from a drive that doesn’t fully realize it’s DOA. Software should be up to date. Maybe you’ll spend a bit more time to fully integrate the systems into your documentation and peripheral systems. Or deal with some of the migration aftermath, where new volumes were sized too small. But otherwise, it should be “business as usual.”

 

That doesn’t mean you can lie back and fall asleep. Storage vendors release new software versions periodically. The interval used to be a couple of releases a year, apart from the new platforms that might have a few extra patches to iron out the early difficulties. But with the AGILE mindset of developers, and the constant drive to squash bugs and add new features, software is now often released monthly. So, should you upgrade or not?

 

If It Ain’t Broke…

One camp will go to great lengths to avoid upgrading storage system software. While the theory of “if it ain’t broke, don’t fix it!” has its merits up to some point, it usually comes from fear. Fear that a software upgrade will go wrong and break something. Let’s be honest though: over time, the gap between your (old) software version and the newer software only becomes bigger. If you don’t feel comfortable with an upgrade path from 4.2.0 to 4.2.3, how does an upgrade path from 4.2.0 to 5.0.1 make you feel? Especially if your system shows you an uptime of 800+ days?

 

On the other hand, there’s no need to rush either. Vendors perform some degree of QA testing on their software, but it's usually a safe move to wait 30-90 days before applying new software to your critical production systems. Try it on a less critical system first, or let the new installs in the field flush out some additional bugs that slipped through the net. Code releases have been revoked more than once, and you don’t want to be hitting any new bugs while patching old bugs.

 

Target and latest revisions

Any respectable storage vendor should at the very least have a release matrix that shows release dates, software versions, adoption rates, and the suggested target release. This information can help you balance “latest features and bugfixes” versus “a few more new bugs that hurt more than the previous fixes.”

 

Again, don’t be lazy and hide behind the target release matrix. Once a new release comes out, check the release notes to see if anything in it applies to your environment. Sometimes it does really make sense to upgrade immediately, like with critical security or stability patches. Often, the system will check for the latest software release and show some sort of alert. In the last couple of months, I’ve seen patches for premature SSD media wear, overheating power supplies that can set fire to your DC, and a boatload of critical security patches. If you keep up-to-date with code and release notes, it doesn’t even take that much time to scroll through the latest fixes and feature additions.

 

One step up, there’s also vendors that look beyond a simple release matrix. They will look at your specific system and configuration, and select the ideal release and hotfixes for your setup. All this will be based on a bunch of data they collect from their systems at customers around the globe. And if you fall behind in upgrades and need intermediate updates, they will even select the ideal intermediate upgrades, blacklisting the ones that don’t fit your environment.

 

How often do you upgrade your storage systems? And what’s your biggest challenge with these upgrades? Let me know in the comments below!

Had a great time in Antwerp last week for Techorama, but it feels good to be home again. Summer weather is here and I'm looking forward to taking the Jeep out for a ride.

 

As always, here are some links from the Intertubz that I hope will hold your interest. Enjoy!

 

The U.S. military is funding an effort to catch deepfakes and other AI trickery

As technology advances, it becomes easier for anyone to create deepfakes, and the U.S. military isn’t sure what they can do to solve the problem.

 

U.S. Launches Criminal Probe into Bitcoin Price Manipulation

Fake money is a scam, pure and simple. And as is often the case, our laws to protect people lag far behind the advances in technology.

 

Economic Inequality is the Norm, Not the Exception

Interesting analysis on how wealth gets distributed over time.

 

Here's how the Alexa spying scandal could become Amazon's worst nightmare

I’ve talked before about how Amazon and Microsoft deal in trust more than anything else. But this scenario isn’t as much as about trust as it is the equivalent of a butt-dial. Alexa performed a task as designed. We didn’t stop using smartphones because of butt-dials, and we won’t stop using Alexa either.

 

Your Professional Growth Questionnaire

Nice summary of questions to ask yourself as part of a self-review process. I especially like the mention of a 360. I wish more companies looked to conduct similar methods of collecting data regarding employee performance.

 

The Places in the U.S. Where Disaster Strikes Again and Again

"About 90 percent of the total losses across the United States occurred in ZIP codes that contain less than 20 percent of the population." Wonderful data analysis and visualizations. It’s article like this that remind me how data is beautiful. Oh, and where not to live, too.

 

I was able to spend a few hours in Ghent last week, a beautiful city to pass the time:

Data management is one of the aspects of information technology that is regularly overlooked, but with the explosion of data in recent years and the growth of data only accelerating, organizations need to get a handle on these large amounts of data. With the digital transformation that organizations are going through, being able to filter out worthless data is critical given that many organizations are now unlocking competitive advantages based on the data they are gathering and generating. In addition to the growth of data, adoption of public cloud services makes keeping track of data even more difficult with the sprawl of data across on-prem and public cloud. This poses not only an operational concern, but also a security concern from the standpoint of needing to keep track of what data is where. Data management solutions that provide data analysis and classification make this much simpler by providing data about your data to be able to make informed decisions that span capacity planning for data backups to regulatory compliance in regards to data locality.

 

The benefits of managed data management are:

  1. Simplified Deployment -  The data management solutions offered by cloud providers provide a quick and easy way to start getting insight from your data by simplifying the deployment process to leverage either a SaaS solution or fully managed solution which removes much of the heavy lifting of deployment that is common with data management solutions.
  2. Simplified Management - One of the challenges with data insight or big data solutions is the administrative overhead that comes along with managing things like patching or upgrading the software as well as the number of servers associated with the deployment.

 

SaaS Deployment

The following solutions are Software as a Service (SaaS) deployments. This means the data management software company hosts the software for its customers.

 

Veritas Information Map

Veritas Information Map is a SaaS-based multi-cloud data management solution. Information maps provide insights into a company's data both on-prem as well as public clouds such as AWS and Azure.

 

Komprise Intelligent Data Management

Komprise Intelligent Data Management is a SaaS-based data management solution. Komprise leverages existing industry standard protocols for accessing data from a NAS or S3 bucket to provide insights to things such as data access time or who was the last person to access the data. Komprise supports gathering data insights from NFS, CIFS, SMB, Azure, AWS, GCP, and more.

 

Fully Managed

The following solutions are fully managed solutions such that the cloud provider manages your data management platform on your behalf and enables the IT organization to continue to derive valuable insight out of the data without the management hassle.

 

AWS S3 Inventory

The AWS S3 Inventory solution is a simplistic inventory solution of the objects in an AWS S3 bucket along with associated metadata such as the storage class or encryption status.

 

AWS S3 Analytics

The AWS S3 Analytics solution is a storage class tiering recommendation solution that provides data and insight into moving data between storage classes to reduce cost by moving infrequently accessed data to a cheaper storage tier.

 

Azure Storage Analytics

The Azure Storage Analytics solution provides data about various Azure storage solutions such as blob storage, queues, and tables. Storage analytics allows for the creation of charts and graphs to visualize things like data access patterns, including who accessed the data or even where the data was accessed from.

 

Data management can mean a lot of different things to different people given their specific focus for the data, but despite the different use cases, getting actionable insight from data is incredibly valuable and generally difficult. The goal of managed solutions is to help simplify and expedite the return on investment of data analysis.

By Paul Parker, SolarWinds Federal & National Government Chief Technologist

 

Here is an interesting article from my colleague, Joe Kim, in which he points out some of the challenges of managing wireless sensor networks.

 

For several years, government network administrators have tried to turn knowledge into action to keep their networks and data centers running optimally and efficiently. For instance, they have adopted automated network monitoring to better manage increasingly complex data centers.

 

Now, a new factor has entered this equation: wireless sensor networks. These networks are composed of spatially distributed, autonomous sensors that monitor physical or environmental conditions within data centers to detect conditions such as sound, temperature, or humidity levels.

 

However, wireless sensor networks can be extraordinarily complex, as they are capable of providing a very large amount of data. This can make it difficult for managers to get an accurate read on the type of information their connected devices are capturing, which in turn can throw into question the effectiveness of an agency’s network monitoring processes. 

 

Fortunately, there are several steps federal IT managers can take to help ease the burden of managing, maintaining, and improving the efficacy of their wireless sensor networks. By following these guidelines, administrators can take the knowledge they receive from their sensor arrays and make it work for their agencies.

 

Establish a baseline for more effective measurement and security

 

Before implementing wireless sensors, managers should first monitor their wireless networks to create a baseline of activity. Only with this data will teams be able to accurately determine whether or not their wireless sensor networks are delivering the desired results.

 

Establishing a baseline allows managers to more easily identify any changes in network activity after their sensors are deployed, which, in turn, provides a true picture of network functionality. Also, a baseline provides a reference point for potential security issues.

 

Set trackable metrics to monitor performance and deliver ROI

 

Following the baseline assessment, administrators should configure trackable metrics to help them get the most out of their wireless sensor networks. For example, bandwidth monitoring that lets managers track usage over time can help them more effectively and efficiently allocate network resources. Watching monthly usage trends can also help teams better plan for future deployments and adjust budgets accordingly.

 

Metrics (along with the initial baseline) also can help agencies achieve measurable results. The goal is to know specifically what is needed from devices so that teams can get the most out of their wireless sensors. With metrics in hand, managers can understand whether or not their deployments are delivering the best return on investment.

 

Apply appropriate network monitoring tools to keep watch over sensor arrays

 

Network monitoring principles should be applied to wireless sensor networks to help ensure that they continue to operate effectively and securely. For instance, network performance and bandwidth monitoring software can be effective at identifying potential network anomalies and problematic usage patterns. These and other tools can also be used to forecast device scalability and threshold alerts, allowing managers to act on the information that sensors are sending out.

 

These tools, along with the other strategies mentioned above, are designed to do one thing: provide knowledge that can be turned into effective action. Managers can use these practices to bridge the gap between the raw data that their sensors are providing and the steps needed to keep their networks and applications running. And there is nothing scary about that.

 

Find the full article on Government Computer News.

Disasters come in many forms. I’ve walked in on my daughters when they were younger and doing craft things in their bedroom and said “this is a disaster!” When it comes to serious events though, most people think of natural disasters, like floods or earthquakes. But a disaster can also be defined as an event that has a serious impact on your infrastructure or business operations. It could be any of the following events:

  • Security-related (you may have suffered a major intrusion or breach)
  • Operator error (I’ve seen a DC go dark during generator testing because someone forgot to check the fuel levels)
  • Software faults (there are many horror stories of firmware updates taking out core platforms)

 

So how can SNMP help? SNMP traps, when captured in the right way, can be like a distress signal for your systems. If you’ve spent a bit of time setting up your infrastructure, you’ll hopefully be able to quickly recognise that something has gone wrong in your data centre and begin to assess whether you are indeed in the midst of a disaster. That’s right, you need to take a moment, look at the evidence in front of you, and then decide whether invoking your disaster recovery plan is the right thing to do.

 

Your infrastructure might be sending out a bunch of SNMP traps for a variety of reasons. This could be happening because someone in your operations team has deployed some new kit, or a configuration change is happening on some piece of key infrastructure. It’s important to be able to correlate the information in those SNMP traps with what’s been identified as planned maintenance.

 

Chances are,  if you’re seeing a lot of errors from devices (or perhaps lots of red lights, depending on your monitoring tools), your DC is having some dramas. Those last traps received by your monitoring system are also going to prove useful in identifying what systems were having issues and where you should start looking to troubleshoot. There are a number of different scenarios that play out when disaster strikes, but it’s fair to say that if everything in one DC is complaining that it can’t talk to anything in your other DC, then you have some kind of disaster on your hands.

 

What about syslog? I like syslog because it’s a great way to capture messages from a variety of networked devices and store them in a central location for further analysis. The great thing about this facility is that, when disaster strikes, you’ll (hopefully) have a record of what was happening in your DC when the event occurred. The problem, of course, is that if you only have one DC, and only have your syslog messages going to that DC, it might be tricky to get to that information if your DC becomes a hole in the ground. Like every other system you put into your DC, it’s worth evaluating how important it is and what it will cost you if the system is unavailable.

 

SNMP traps and syslog messages can be of tremendous use in determining whether a serious event has occurred in your DC, and understanding what events (if any) lead up to that event occurring. If you’re on the fence about whether to invest time and resources in deploying SNMP infrastructure and configuring a syslog repository, I heartily recommend you look to leverage these tools in your DC. They’ll likely come in extremely handy, and not just when disaster strikes.

     When it comes to application performance management, the main focus is the application. That is the main concern for end-users. They do not care about network performance, server performance, or any other metrics that can be measured. They only care about the application they are trying to use. Development teams and server admins are usually the main parties involved in the monitoring and management process for applications, but the reality is that all of this runs over a network. To take performance management to the next level, it only makes sense to bring all parties to the table.

 

First Steps: The Internal Network

 

     When examining the role of the network in an application’s performance, the first step is the internal network. If your application is internal, this may be the only step to focus on for your application. Whether it is east\west in your server and user environments or north\south to your firewalls, network monitoring needs to be involved. Let’s see if this sounds familiar...

 

Application is having issues -> Dev team receives a trouble ticket -> They consult with the server team to find the root cause -> Once all of their options are exhausted, the network team is consulted as the next step.

 

     After this long process, when the network team is finally consulted, they realize a small issue on the uplinking switch. After a quick fix, everything is back to normal. That is where the issue lies for a lot of environments. A tiered approach adds unneeded steps to the troubleshooting process of application performance management. I like to think of it as almost a hub-and-spoke environment with the application being the hub, and each spoke is a supporting team.

 

Hub and Spoke Application Monitoring Structure

 

     By doing this, all parties are included in the process of application performance management. Each of them could be the first alerted party if there is an issue, and address the problem directly. This sets a good base for ensuring uptime and optimal performance for an application.

 

Monitoring External Network Performance

 

     Creating a strong structure to application performance management is the first step. Once this is mastered on the internal network, the network team has an additional step of monitoring the external network performance. This is why it is crucial that the network team is included in monitoring applications. For example, there could be a routing issue between ISPs causing latency for external users accessing your application. Server analytics would show performance within acceptable tolerances and the development team would not see any errors either, yet users could be having a poor experience using the application. If proper steps were taken to monitor the external network, this issue could easily be detected, resolved, and communicated to all affected users. One example of managing the external network is toolsets that allow for multiple endpoints to be used for testing all over the world. Communicated stats could be anything from ping latency to overall bandwidth from all of these different locations.

 

The fact is that utilizing the network team in application performance management is a no-brainer. Reducing troubleshooting and problem resolution times are a couple of things that any technical team could get behind. Next time you are planning a management and monitoring structure, be sure to focus on the network as well as the application itself.

I’m in Antwerp this week for Techorama. It’s a wonderful event in a great location, the Kinepolis, a 24-screen theater that can hold about 9,000 people in total. If you are in or around Antwerp this week, stop by and say hello.

 

As always, here are some links from the Intertubz that I hope will hold your interest. Enjoy!

 

One-of-a-Kind Private Train Takes On Florida’s Traffic Nightmare

As much as I love autonomous vehicles, I know that having a modern rail system would be even better for our country. Here’s hoping Florida can get it done and lead the way for others to follow.

 

Are Low-skilled Jobs More Vulnerable to Automation?

For anyone that has ever been involved in a theoretical discussion regarding what jobs automation will replace next. Sometimes the jobs we think are easiest or best for machines are not. And some jobs (like automated index tuning for databases) are a lot easier for a robot than a human.

 

AI and Compute

Interesting analysis into the volume of AI projects in use now compared to six years ago. Consumption appears to double every 3.5 months. This rise in consumption is something a legacy data center would never be able to keep pace with, and is an example of where the cloud shines as an infrastructure provider.

 

Reaching Peak Meeting Efficiency

Yes, meetings are a necessary and important part of corporate life. And no one likes them. It’s time we all underwent some ongoing training on how to make meetings an efficient use of our time.

 

'Sexiest Job' Ignites Talent Wars as Demand for Data Geeks Soars

At first these salaries seem silly. But there is a dearth of people that can analyze data properly in the world. And the value a company gains from such insights makes up for such high salaries. Face it folks, the future isn’t in databases, it’s in the data.

 

The Internet of Trash: IoT Has a Looming E-Waste Problem

Here’s the real garbage collection problem with technology today: billions of IoT devices with short lifespans. Might be wise to invest in companies that specialize in cleanup and recycling of these devices.

 

Uh, Did Google Fake Its Big A.I. Demo?

I think the word “faked” is meant for a clickbait headline here. Chances are Google did some editing to make it presentable. The bigger issue, of course, is just how human the interaction seemed. And that has people more upset than if it was faked entirely.

 

The Techorama Welcome Kit left in my room was a nice touch, and they almost spelled my name correctly:

By Paul Parker, SolarWinds Federal & National Government Chief Technologist

 

Despite ploughing more than £50m into the digital transformation of the NHS, evidence suggests only a handful of trusts have adopted the Government’s “cloud first” policy.

 

NHS Digital spent more than £32m on digital transformation consultancy services, and £23m with cloud, software, and hardware providers between April and December 2017.

 

But, its bosses must be questioning why, when less than a third of NHS trusts surveyed in January have adopted any level of public cloud. Mistrust soars, according to recent findings from IT management software provider, SolarWinds.

 

The research questioned more than 200 NHS trusts and revealed that, while respondents were aware of the government’s policy, less than a third have begun the transition.

 

Of those who have yet to adopt any level of public cloud, 64% cited security concerns, 57% blamed legacy tech, and 52% said budgets were the biggest barriers.

 

However, for respondents who had adopted some public cloud, budget registered as far more of a barrier (66%), followed by security and legacy technology (59% each).

 

Eight percent of NHS trusts not using public cloud admitted they were using 10 or more monitoring tools to try and control their environment, compared to just 5% of NHS trusts with public cloud.

 

In addition, monitoring and managing the public cloud remains an issue, even after adoption, with 49% of trusts with some public cloud struggling to determine suitable workloads for the environment.

 

Other issues included visibility of cloud performance (47%) and protecting and securing cloud data (45%).

 

Six percent of NHS trusts still expect to see no return on investment at all from public cloud adoption.

 

Speaking exclusively to BBH about the findings, SolarWinds’ chief technologist of federal and national government Paul Parker said, “Cloud is this wonderful, ephemeral term that few people know how to put into a solid thought process.

 

“From the survey results, what we have seen is that the whole “cloud first” initiative has no real momentum and no one particularly driving it along.

 

“While there are a lot of organizations trying to get off of legacy technology, and push to modernize architecture, they are missing out.”

 

Improvements to training are vital, he added.

 

“People tend to believe there’s this tremendous return on investment to moving into the cloud when, in reality, you are shifting the cost from capital to operating expenses. It is not necessarily a cost saving in terms of architecture and people need to recognize that. “Rather than owning the infrastructure, with cloud you are leasing it, so it doesn’t change the level of investment much.”

 

There are savings to be had, though, and they come from converging job roles and improving access to medical records, for example.

 

Parker said, “With cloud, you do not need a monitoring team for every piece of architecture, so while there are savings to be had, they are more operational. With a cloud infrastructure, everything is ready at the click of a button.”

 

Moving forward, he says,

 

“It’s that old adage of ‘evolution, not revolution.’ There needs to be technology training, and the NHS needs to have an overarching goal, rather than simply moving to the cloud.

 

“The first thing trusts need to do is look at their current environment and determine what’s there, what’s critical and what’ non-critical. That will enable them to focus on moving the non-critical things into a cloud environment without jeopardizing anyone’s health or life or affecting security. That will help everyone to better understand the benefits of the cloud and to build trust.

 

“ I would also like to see a top-down approach in terms of policy and direction. The ultimate goal of healthcare services is to make sure people have a better quality of life; and the goal of IT is to make sure they can deliver that. As IT experts, it is our job to try and help them to do that and make IT easier and more affordable.”

 

Find the full article on Building Better Healthcare.

One of the most interesting changes that I have observed in my career is Microsoft shifting from just being a development organization to truly becoming a DevOps team, in the case of the SQL Server team. The product code developers are the operations support for the cloud service hosting Azure SQL Database. Many other large development organizations have this model, but my relationships and experience with SQL Server have allowed me to observe this product much more closely. My major observation is that the major reason dev cycles turn so much faster is to fix problems that in traditional on-premises software would have possibly taken months between patches, or even years between full releases, that now get fixed in the course of a few weeks. This has really changed the way Microsoft releases SQL Server on-premises--after a major release, there is a cumulative update released every month for the first year. This means more problems get fixed faster, and if you are using the cloud service, the release cadence is even faster.

 

I know most of us do not work for hyper-scale public cloud providers with massive development resources, but I used this example because it is a very real-world visible impact into the way a DevOps model can transform an organization. So how does this apply to running our infrastructure organizations? I think even in the most off-the-shelf traditional shop, you should be thinking about how to automate the following work:

 

  • Manual
  • Repetitive
  • Work with no enduring value
  • Work that doesn’t scale with your application

 

As a system admin, your main job task is to keep the lights (and more importantly, the servers) on in your organization. By adopting a mindset of trying to automate as many of these tasks as possible, you will enjoy your job more, and have more time to focus on tasks that are more strategic to your organization. You can take advantage of frameworks like PowerShell and Python scripting in your environments (yes, you will have to write some code) to bring this all together, but this mindset will change the way you view system administration.

 

Where do you start with automation? Identify your most common tasks that consume the most time--for example, if you are using VM templates to deploy your operating system environments (that’s a really fancy way of saying VMs), you have already started down this path. A logical next stop might be to automate the process for keeping your templates up to date with patches, or to use scripting to ensure that any post-deployment configuration tasks happen without human intervention. To do this really well, you should adopt a developer mindset--all of your scripts should use source control, you should work to develop a unit testing process (the one negative to automation is that if you break something badly, you can break a lot of things badly, and quickly), and some aspects of a development methodology to keep the progress moving forward.

 

Moving into this mindset is a big shift for many organizations. However, in modern IT, influences like cloud and the aforementioned rapid development cycles mean that everything just moves faster. The other benefit of this automation, which should be a major influence on what tasks you automate, is the reduction in alert pages. If you can automate away your most common pager responses, your entire team and organization will benefit. The final, largest benefit of automation is that you know a script will run the same every time it's executed, as opposed to your junior admin, who may or may not get a complex task sequence correct each time.

“In the beginning was the word” … oops, “the cloud."

 

Long ago "cloud" was a buzzword, especially when it was a cool word that one knew exactly what it meant. Many companies made investments with an eye for the future focused on high-profile technologies, betting on the success of the cloud. Other companies drove their investments conversely to more reliable traditional solutions. Today the attention is focused on moving these old-fashioned but profitable (and fragile) applications to a cloud environment. The challenge is how to provide these applications the right amount of resources. This appears to be a difficult task because it isn’t simply a matter of quantity of resources. Legacy applications are built to run in a traditional architecture, usually monolithic and rigid. The kind of traditional “create, install, run, and forget it,” solution is on for a field for highly vertical specialists that seem to be going away.

 

Migrate or replace?

Who should make the decision on this migration? A more important decision to make might be: is it worth it to re-code these applications, or is it more convenient in terms of time, money, and features to code them from scratch? Or just move to a cloud-native application instead? I think that the second option would be a best choice. It is true that it will ask for more resources, but once in production it will present several benefits: weekly (or even daily) upgrades to fix any bugs or provide new features, performance, running natively on cloud components, flexibility, and accessibility to several different device types (tablets, phones, different OS/browsers).

 

All these benefits could also be gained by re-coding the old legacy app, but would it be worth it in the long run? The core of the app is still the original one, built with a traditional architecture in mind. And are we sure that the cost of re-coding will be less than the one required to move to a new native app? Let’s consider the training that users will need to use the new app as a cost that will be offset by hyper-specialists no longer being required to maintain a legacy app.

 

Usually the companies most resistant to a cloud native app replacing a legacy one are banking institutions, insurance companies, and healthcare. These organizations are all bound to solid and long time-tested applications, no matter how inefficient, but tested. By tested I mean including bugs (known ones and accepted ones) because solving them would mean a huge effort in terms of cost for development and time loss for services being down.

Educate management crew

If you make these operators understand the benefits of a cloud native application, especially involving the management team because of their leadership of the technical teams below them, you did the best job you could. After this, it's a matter of time because word moves fast in their vertical environments. Outside of those verticals things tend to slow down.

 

For all the cases where companies still reject this kind of app, there’s still the so-called “lift and shift:” keeping legacy apps and moving them to a virtualized infrastructure. It’s a smooth step to push them to the “new world.” From that starting point it could be simpler to expose to management the benefits of virtualization and going forward, presenting a scenario they could move into the future, as in a serverless environment. But this last step should be taken only if the people you’re talking to are technologically advanced, otherwise they could just be scared from the “unknown” and simply decide to put all the stuff back to the original site.

Be sober

In any case, our task is like walking on eggshells – presenting new solutions, keeping an eye on the emotional part of the people you’re talking to, and not being too enthusiastic, but looking professional, responsible, and balanced, speaking not only about benefits, but also (and in some cases, especially) about risks keeping beside the solution for any of these risks.

A storage system on its own is not useful. Sure, it can store data, but how are you going to put any data on it? Or read back the data that you just stored? You need to connect clients to your storage system. For this post, let’s assume that we are using block protocols like iSCSI or traditional block storage systems. This article also applies to file protocols (like NFS and SMB) and to some extent even to hyper-converged infrastructure, but we will get back to that later.

 

Direct attaching clients to the storage system is an option. There is no contention between clients on the ports, and it is cheap. In fact, I still see direct attached solutions in cases where low cost wins over client scalability. However, direct attaching your clients to a storage system does not really scale well in number of clients. Front-end ports on a storage array are expensive and limited.

 

Add some network

Therefore, we add some sort of network. For block protocols, that is a SAN. The two most common used protocols are the FC protocol (FCP) and iSCSI. Both protocols use SCSI commands, but the network equipment is vastly different: FC switches vs. Ethernet switches. Both have their advantages and disadvantages, and IT professionals will usually have a strong preference for either of the two.

 

Once you have settled for a protocol, the switch line speed is usually the first thing that comes up. FC commonly uses 16Gbit and 32Gbit switches that have been entering the market lately. Ethernet, however, is making bigger jumps, with 10Gbit being standard within a rack or wiring closet and 25/40/100Gbit commonly used for uplinks to the data center cores.

 

The current higher speeds of Ethernet networks are often one of the arguments why “Ethernet is winning over FC.” 100Gbit Ethernet has already been on the market for quite some time, and the next obvious iteration of FC is “only” going to achieve 64Gbit.

 

Oversubscription

Once you start attaching more clients to a storage system than it has storage ports, you start oversubscribing. 100 servers attached to 10 storage ports means you have on average 10 servers on each storage port. Even worse, if those servers are hypervisors running 30 virtual machines each, you will now have 300 VMs competing for resources on a single port.

 

Even the most basic switch will have some sort of bandwidth/port monitoring functionality. If it does not have a management GUI that can show you graphs, third-party software can pull that data out of the switch using SNMP. As long as traffic in/out does not exceed 70% you should be OK, right?

 

The challenge is that this is not the whole truth. Other, more obscure limitations might ruin your day. For example, you might be sending a lot of very small I/O to a storage port. Storage vendors often brag with 4KB I/O performance specs. 25,000 4KB IOps only accounts for roughly 100MB/s or 800Mbit (excluding overhead). So, while your SAN port shows a meager 50% utilization, your storage port or HBA could still be overloaded.

 

It becomes more complex once you start connecting SAN switches and distributing clients and storage systems across this network of switches. It is hard to keep track of how much storage and client ports traverse the ISLs (Inter Switch Links). In this case, it is a smart move to keep your SAN topology simple and to be careful with oversubscription ratios. Do the oversubscription math, and look beyond the standard bandwidth graphs. Check error counters, and in an FC SAN that has long distance links, check whether the Buffer-to-Buffer credits deplete on a port.

 

Ethernet instead of FC

The same principles apply to Ethernet. One argument why a company chooses an Ethernet-based SAN is because it already has LAN switches in place. In these cases, be extra vigilant. I am not opposed to sharing a switch chassis between SAN and normal client traffic. However, ports, ISLs, and switch modules/ASICS are prime contention points. You do not want your SAN performance to drop because a backup, restore, or large data transfer starts between two servers, and both types of traffic start fighting for the available bandwidth.

 

Identically, hyper converged infrastructure solutions like VxRail and other VMware VSAN place high demands on the Ethernet uplinks. Ideally, you would want to ensure that VMware VSAN uses dedicated, high-speed uplinks.

Which camp are you in? FC or Ethernet, or neither? And how do you ensure that the SAN doesn’t become a bottleneck? Comment below!

Welcome to another edition of the Actuator. I hope everyone is enjoying some warm spring weather. It's nice to be able to sit outside for an hour at the end of the day.

 

As always, here are some links from the Intertubz that I hope will hold your interest. Enjoy!

 

Alexa and Siri Can Hear This Hidden Command. You Can’t.

Fun fact for you: There is no law against sending subliminal messages to humans, or machines. The practice is discouraged and *may* be considered an invasion of privacy (for humans, not for machines). Another example of where the laws lag far behind the technology.

 

Digital Photocopiers Loaded With Secrets

Not mentioned in the article: the embarrassing photos from office Christmas parties.

 

Stanford Study Shows the Astonishing Productivity Boost of Working From Home

Glad to see we are putting some data into the productivity levels for people working from home. I’ve been doing it for eight years now. I know it’s made me more productive, happier, and healthier. I can’t go back to having a real job, ever.

 

Don't Skype Me: How Microsoft Turned Consumers Against a Beloved Brand

“[Using Skype] is like Tim Tebow trying to be a baseball player.” Ouch.

 

Amazon’s Fake Review Problem

I’ve been frustrated for years with the reviews on Amazon. I find many of them to be fake. These days I focus on the three-star ratings and do my best to discern the truth. To be fair, Amazon is not the only company with an online review problem.

 

Are My Friends Really My Friends?

Interesting analysis showing that despite being surrounded by constant interactions, we are more alone now than ever before.

 

Security researchers discover critical flaw in PGP encryption that reveals plaintext

Everything is terrible.

 

Seems legit:

Public cloud providers have greatly simplified the process of creating a backup, but the challenge has always been managing that at scale with things like policies for retention or simple granular file level restores or regulatory focused dashboards. This is the value added by many of the backup management solutions discussed in the following post and becomes critical once an environment scales past a few instances and databases.

 

The benefits of managed backup management are:

  1. Simplified Management - The backup management solutions offered by public cloud providers are generally account or subscription focused and doesn't offer a holistic view of the entire environment.
  2. Scalability - Fully managed backup and SaaS solutions have been built to scale to the largest environments without any performance impact or major concern for running out of storage space. This eliminates the need to re-architect the backup management deployment to scale with the needs of the organization for things such as keeping data for twice as long because of a new mandate.
  3. Multi-Cloud Support - Many of the legacy backup products that are available on the market only support backing up data to public cloud providers or only support backing up workloads in a single cloud provider. More and more companies are implementing multi-cloud strategies and a solution that supports multiple clouds is essential to simplifying operations.

 

Unmanaged Deployment

The following solutions are unmanaged deployments. This means that the software is available to be installed by the customer or has been packaged in the native cloud format such as an AWS AMI but is not available in one of the cloud provider's marketplace.

 

Rubrik Cloud Data Management

Rubrik Cloud Data Management is a software appliance that can be deployed to AWS, Azure and GCP. The Cloud Data Management platform supports policy based snapshot management along with advanced analytics to generate operational insights.

 

Managed Deployment

The following solutions are managed deployments. This means the backup management software company has added a deployment solution to the respective cloud provider's marketplace to allow the infrastructure to be provisioned with the click of a button.

 

Veritas CloudPoint

Veritas CloudPoint is a backup management solution that supports automated deployment into Azure but supports backing up workloads on AWS, Azure and GCP. In addition to IaaS workloads across the three major clouds CloudPoint also supports application level backups such as Microsoft SQL, MongoDB and AWS Aurora.

 

SaaS Deployment

The following solutions are Software as a Service (SaaS) deployments. This means the backup management software company hosts the software for its customers.

 

Druva Apollo

Druva Apollo is a SaaS solution that provides data protection of AWS EC2, RDS, S3, EBS, and Glacier. Druva Apollo also includes SLA-based snapshot retention policies in addition to tiering to reduce costs as older snapshots are moved to cheaper storage and eventually deleted.

 

Rubrik Polaris

Rubrik Polaris is a SaaS solution that integrates with Rubrik's Cloud Data Management hardware and software appliances to provide a unified management platform for both on-premises and cloud-based workloads.

 

CloudRanger Backup and Recovery

CloudRanger Backup and Recovery is a SaaS solution that provides backup management of AWS EC2, RDS, and Redshift instances using native AWS snapshots. Instance and file level backups are supported along with multi-region and multi-account backup restore points.

 

Fully Managed

The following solutions are fully managed backup management solutions such that the cloud provider manages backups on your behalf.

 

Built-in Snapshots

Public cloud providers allow administrators to create snapshots of virtual machine instances, databases, etc. This doesn't provide a robust feature set in terms of management but does allow administrators to backup and restore to a given point in time.

 

Backup management is an unsexy topic to most but of course has tremendous value when there is a disaster but many of the new backup management solutions are becoming much more than just creating a snapshot via the public cloud providers native snapshot APIs.

By Paul Parker, SolarWinds Federal & National Government Chief Technologist

 

Here is an interesting article from my colleague Joe Kim, in which he discusses what we can expect to see in the future as federal IT professionals.

 

Over the past couple of years, administrators have confronted a rapidly changing landscape that seems to shift overnight. Public vs. private cloud is an old argument; now, administrators are grappling with the reality of implementing and managing hybrid IT infrastructures.

 

As such, their skill sets are being tested like never before. Sixty-two percent of respondents to a recent SolarWinds IT trends survey indicated that hybrid IT has required that they acquire new skills, while 11 percent said it has altered their career path. Meanwhile, 57 percent of public sector organizations have already hired or reassigned IT personnel, or plan to do so, for the specific purpose of managing cloud technologies.

 

The skills that IT administrators are learning today will have a large impact on what IT management will look like ten years from now.

 

From service managers to service consumers

 

Our survey found that 96 percent of respondents have moved at least some applications and aspects of critical infrastructure to the cloud. This migration has caused federal administrators to sharpen their “as-a-service” skills, since many of the tools they are using have become software-defined and exist both on-premises and in hosted environments.

 

Federal IT professionals have gone from being service managers to service consumers who work with cloud providers to manage their infrastructures. In this service-oriented world, administrators are finding themselves interacting more with software than they are with hardware switches and routers. These interactions are precursors to IT practitioners’ inevitable evolution from traditional network managers into areas that may be more familiar to developers, and toward becoming service brokers, rather than service managers.

 

From network manager to network developer

 

Administrators previously needed to be savvy about command lines and hands-on management of network components, but the move toward hybrid IT and Software-as-a-Service (SaaS) applications has greatly reduced the need for these types of skills. Administrators must now begin to be able to manage the different pieces of code that comprise applications and allow those programs to work with each other.

 

Tomorrow’s network administrators will be familiar with application program interfaces (APIs) — essentially app building blocks — and how they can be used to solve common problems, from network management to security challenges. They will create highly customized and dynamic networks that fit the unique needs of their agencies. Furthermore, they’ll have a greater amount of control over these networks, as they will be able to tap into APIs to dictate policy, rules, user access, and more.

 

From servicing people to self-service

 

They’ll also move from being service managers to service brokers. Instead of provisioning more storage or spending their time clicking around user interfaces, they’ll be assigning applications and access rights to individual users so that those users can easily set up services on their own. The standard practice of a user submitting an online request for access to a new application will be a rare occurrence; everything that a person needs or is authorized to use will be at their fingertips in this self-service environment.

 

Network administrators will also have more opportunity to add strategic value to their agencies. Today, administrators spend a lot of time servicing users. Moving toward self-service will allow users to check their own boxes, download their own applications, and authorize their own access, all without having to go through their system administrators. In turn, future administrators will have more time to work on higher value services, such as developing plans for stronger security measures or using predictive analytics to anticipate and remediate network issues.

 

From the future to the past

 

Despite all of these changes, administrators will still need to focus on the “bread and butter” aspects of network management, including performance, availability, and compliance. To ensure success in each of these areas, administrators will need to use many of the same tools and processes that are commonplace right now.

 

Indeed, some of these tools will be even more important than they are today. For instance, network performance monitoring will be critical, particularly as IT becomes increasingly hybrid- and application-based. Agencies will need solutions that provide automated and unfettered insight into the performance of these applications, whether they are on-premises or hosted, just as they do today.

 

Learning these and other solutions will take some work, but there are resources available. Online forums such as SolarWinds’ THWACK provide forums where administrators can exchange ideas and ask questions, and most vendors will be more than willing to offer information on best practices.

 

These resources provide administrators with a chance to hone their skills today while preparing for their future. That future undoubtedly will be challenging, but it also will present many opportunities for federal IT professionals who want to expand their horizons and add more value to their agencies.

 

Find the full article on GovLoop.

The Simple Network Management Protocol (SNMP) has been a key part of managing network devices in the data centre for some time. It really is a pretty simple protocol to work with (hence the name), and I think it’s underrated as a key tool for monitoring unusual events. Unfortunately, SNMP has had some issues over time. One of these has been sending out a lot of information over the network in an insecure fashion. SNMP v3 was developed to address this. Another issue has been that Joe Sysadmin doesn’t always take the time to configure custom strings to use in the environment with the devices he’s trying to manage. Instead, the default “public” community string is left configured on the devices with more access than is required. This kind of behaviour drives information security folks nuts, and has operations staff questioning whether SNMP is worth the hassle.

 

SNMP is an extremely flexible solution that provides a robust framework with which you can leverage things like vendor-specific management information base (MIB) files. You can use these to provide both read-only and write access to networked devices. The advantage to this approach is that you can feed information into your management system that provides useful insights, rather than simply showing whether the device is up or down.

 

Following on from this, alerting that aligns with your devices gives you a better chance of identifying unusual issues in your environment. You could, for example, set your devices to send a trap when local user credentials are used to log in to a device rather than directory credentials. This type of activity may indicate that someone’s up to no good in your environment.

 

Security in your environment isn’t just about people cracking device credentials, though. It’s also about having devices available to provide the appropriate services to applications and their users. Configuring devices to send meaningful information via SNMP as issues occur can be a great way to get to minor problems before they become major issues. If one of your two firewall devices has suffered a failure, your infrastructure is compromised and you need to address the problem. I’ve seen plenty of situations where internal systems failures go unnoticed for far too long, leading to reduced performance in the environment and angst for both the operations staff and end-users. But people don’t just become unhappy with the infrastructure. They start to use workarounds to get their work done, which can involve unsafe practices such as storing unsecured corporate data in personal mailboxes or on publicly accessible file sharing sites.

 

A lot of people would agree that data centre operations can be a difficult thing to do well, particularly at a large scale. There always seems to be some device or another that’s run out of capacity, has a failed component, or has simply stopped doing what it’s meant to do. That’s why tools such as SNMP and syslog can help tremendously with keeping things under control in the DC. There’s a wide range of management systems available in the marketplace that can be used to do some pretty cool stuff with SNMP. Most device that can be deployed in a 19” rack can speak SNMP and syslog, so why not get as much information about what’s happening in your environment as you can? The investment in effort upfront can save you a lot of time and headaches down the trick when things invariably go awry.

In a previous Geek Speak blog post, I talked about the viability of practice leader being a career path for IT professionals. A large part of practice leadership is being fluent in vendor technologies, i.e. products. This opens up an interesting set of paths for IT professionals: product management and product marketing.

 

Are you passionate about discovering IT pain points and finding solutions for them? Product management focuses on customers and their problems. A product manager (PM) is the voice of the entire customer spectrum; the PM delivers market metrics that guide product decisions.

 

Do you like to tell stories? If you like sharing your problem-solving knowledge by using a product in a one-to-many fashion, you'd probably enjoy working as a product marketing manager (PMM). As Peter Drucker said, product marketing focuses on the product selling itself; creating a product that people want to buy; and creating an environment that encourages people to buy.

 

Ideally, PMs work to understand real customer problems and using that knowledge to turn solutions into products, while PMMs work to smooth the friction between products and consumers in the marketplace.

 

 

Would you consider transitioning into a new role as PM or PMM? Let me know in the section below.

 

P.S. We are hiring for both PM and PMM positions.

Over the years, I’ve read more than a few articles about the qualities that are found in the best administrators. These articles focus on soft skills, but sometimes will list hard skills as well. In all those years, and in all those articles, I rarely see advice on how those skills are to be applied. It’s as if the author expects the reader to just know what to do with the skills, once acquired.

 

No matter what type of administrator you are (network, database, systems, etc.), the best way to apply your skills is by being responsive and responsible. I wrote about this in my book more than eight years ago, and the advice is still true today.

 

Being responsive means you take action on an item. It matters not if the item or task is your responsibility. For example, if a disk fails and you aren’t a member of the server team, it’s not something for you to fix. If someone is reaching out to you for help, you must be responsive. The person reaching out to you has no idea what tasks you are responsible for, they just need help. You want that customer to have the perception that you are responsive to their needs.

 

The hardest part of being responsive are the hours. It can be difficult to be responsive at all times of the day. And the better you get at your role, the more your services will be in demand.

 

Being responsible is taking ownership for something. It should be common sense that you would take responsibility for tasks that are central to your job role as an administrator. But you should also take responsibility for your mistakes. When something goes wrong it can be easy to deflect blame to other teams or specific people. You must resist that urge.

 

Here’s an example from my past life as a database administrator. It was 3 a.m. and I needed to rebuild a server. It failed because a LUN was erased by mistake (and that mistake replicated, quickly, which is why HA is not the same as DR, but I digress). I needed to restore the master database. We were using a third-party backup software product, and the master restore required some different syntax that didn’t want to work.

 

It took me longer than it should to restore the master. I wasn’t happy. And I could have blamed the backup vendor, or the engineer that wiped out the LUN, or the manager that confused HA and DR. But I knew that wouldn’t help anyone. So, the next day I informed my managers that I could have done better. I outlined a training plan so that any member of my team would be able to perform the same tasks in the correct amount of time. I took responsibility.

 

I didn’t have to inform my managers about the delay. They have no idea how long it takes to restore a master database. But I wanted to show them that I was being responsible. I could have easily blamed others, and nobody would have thought twice.

 

Be responsive and responsible. I believe it pays off in the long run.

The image of a modern office has been rapidly changing in recent years. From the medical field, to sales, to technical support, users are on the road and working from just about everywhere possible. Firewalls, malware protection, and other security practices are great when a user is on-premises at your location, but what happens when they are working on an assignment at the coffee shop or showing off a report at a lunch meeting? How can you ensure these protections are installed and working correctly? The fact is that a network and systems perimeter logically cannot be defined simply by the walls of your location. Users are on the go, and this requires special accommodations from us in the support field.

 

Endpoint Protection

When a user is on location, it is obviously easier to deploy security policies against their device to allow it to use your network in a secure manner in accordance with your security policies. In the past, endpoint meant just providing users a firewall and some antivirus software. With malware and other intrusions getting more advanced, this can no longer be the extent of endpoint security. Modern endpoint protection has progressed to a point where endpoint agents now can contact an organization’s security management systems to get its policies, definitions, etc. This contrasts with the old way where a lot of things were only able to be controlled when a user was on-site. This version of protection will allow an organization to provide maximum security to users when on-site or on the road alike. This can be especially useful for users who are constantly on the go as they will still receive security updates and configuration as soon as they are published while you -- the administrator -- can stay in the loop too.

 

Making Remote Network Access Easy

Security is often the number one priority (as it should be, in my opinion), but to an end-user, ease of use is often the most important. Their focus is often just getting their work done in the easiest and most hassle-free way possible. There are multiple ways to keep a remote user (and the company network) safe and secure while allowing them to work. A couple of the most popular methods are remote or virtual desktops and VPN connections with profiling. Virtual desktops provide remote access to users as if they were sitting at your main location. This could include everything from fileshare access to applications. This makes things easy and uniform company-wide as the virtual desktop image can be maintained, and the system that a user is accessing from is irrelevant. The virtual desktop would remain secure and under your control. The other option is VPN access with profiling. Best of both worlds, right? A user could access network resources with their own device and use their own applications. The profiling aspect would allow you as the administrator to ensure malware protection is installed, the firewall is on, policies are up to date, and the device is current with operating system updates. Both solutions have their merits and a place in certain situations depending on what you might be looking for.

 

The Bottom Line

As I mentioned earlier, I believe security should be the number one focus (as a lot of you do too, I’m sure). I know in my day job, the security of the network is my main focus. In a lot of cases we can’t simply place the network and servers on lockdown to the outside world. That sure would be easier! Working to provide remote access where it's explicitly needed is something we as network and server admins will be tasked with even more going forward. It’s how we all approach that challenge that will determine the security of our organizations going forward.

Following my previous post about logging, I'd like to talk about another tool to manage logs that is more advanced than syslog.

 

Logwatch is essentially a system log analyzer and reporter. It elaborates logs that are simply collected by syslog. This kind of evolution is simplifying the daily job of modern system and network administrators. Logs are everywhere and almost everything produces logs -- not only specific IT systems, but also the elements forming the so-called "Internet of Things," or IoT. The innovation comes from this last application: "things" producing logs could be managed in a smarter way if those logs are analysed and reported to elaborate behavior to perform consequently.

This tool is a very simple one, but, at the same time, is very powerful. In its first form it is only CLI and structured in directories.

 

Shaping Behaviors

The real power (but not the only one) of Logwatch is the possibility to shape its behavior according to the administrator’s needs. Shaping the tool could be simple or more articulated, but it should be possible to allow the administrator to carry out their daily job as easily as possible so they could redirect their attention to solve the issues that these tools could reveal.

 

Customizing also means filtering, as described in my last post. Filtering would be easier if the tool would emphasize the keywords to be filtered. So, the first customization should be catching these keywords using variables in the main configuration. As an example, we could be interested in raising an alert log when a website will return a 500 error, but only after “n” times, and not immediately because we already know that this webserver is supported by an application server, which could be overloaded in particular known situations. No need to produce tons of logs when we already know that the issue occurs in the app server (THIS is the issue to solve, not the error coming from webserver).

 

Preventive modeling

This process implies a previous analysis by the administrator, as you can understand. As usual, this is a tool, and it works according to your decisions. It can’t make its own decisions. Other more advanced tools could give inputs and advice, but not these basics tools.

 

Different services may have different logging configurations. At the same time, some services could be ignored, others overridden. Furthermore, security level – there are services not so critical so that logging accuracy can be lower, other that are high-security related, which need a very high production of logs. These different behaviors are written in Perl in the case of Logwatch.

 

Conversely, there are cases when we need not different behavior but instead a homogeneous output: the case of different IoT components, produced by different vendors and logging in various ways. For an easier and more effective comprehension, customization of log writing and processing can “normalize” the logs written by any of them, and helping a comparison between behaviors.

Syslog evolution

We can compare an advanced tool like Logwatch with more basic ones like syslog based mainly on the scripting and the ability to choose how to behave in different situations. Syslog only allows the administrator to filter the most representative lines for troubleshooting. Logwatch can build a particular structure of the line itself, and then filter it based on this shape, being a friendlier helper for the advanced administrator. In this way, troubleshooting and monitoring will improve, and a better analysis and developing new models to apply to log writing will increase.

May has arrived and so has some warmer weather. It feels good to be able to sit outside. I'm enjoying it while I can; we have only a few weeks of warm weather before the mosquitos arrive.

 

As always, here are some links from the Intertubz that I hope will hold your interest. Enjoy!

 

Twitter urges all users to change passwords after glitch

Just in case you didn’t hear about this, yet. Twitter came forward to say that a “bug” had accidentally caused user passwords to be stored in clear text. I’d like more technical details on how this bug could happen. In the meantime, change your passwords. Or don’t and use this as an excuse to tweet weird things and then say you were hacked.

 

Home DNA Kits: What Do They Tell You?

Not much, even for the dog. Only one company identified the DNA as not human. Something to think about for those folks that are thinking of spending money on such services.

 

The Gambler Who Cracked the Horse-Racing Code

On the heels of the Kentucky Derby comes this riveting story of a man who built and has used a predictive model for horse racing to win $1 billion over the past 30 years. The next time someone says data analytics isn’t a real thing, show them this story.

 

Police Tested Facial Recognition at a Major Sporting Event. The Results Were Disastrous

Then again, maybe data analytics still has a way to go. Or maybe we should put some prize money behind facial recognition efforts. If someone knew they could make a billion dollars with the right predictive model, I’m certain we would have a solution by now.

 

Unroll.me to close to EU users saying it can’t comply with GDPR

If you really want people to know you that don’t care about their privacy, just tell the world you won’t comply with the GDPR. This single act tells me everything I need to know about whether or not I can trust Unroll.me with my data. (The answer is "no.")

 

Yes, it’s Bad. Robocalls, and Their Scams, are Surging.

Robocalls are increasing. People are mad. And nothing is being done to stop the volume of spam phone calls from increasing.

 

This Lego Breakfast Machine Can Make You Bacon And Eggs

Father’s Day is coming up. #JustSayin

 

Spring is finally here, and we all know what that means:

 

By Paul Parker, SolarWinds Federal & National Government Chief Technologist

 

It’s no secret that public cloud is becoming an increasingly popular option for enterprises to adopt as a means of storing data in a way that is easily accessible. To encourage the public sector in the UK to take the steps required to adopt public cloud, their government introduced the “Cloud First” policy in 2013; the policy states that when making technology decisions, all public sector organizations should consider using the public cloud before other options.

 

A recent Freedom of Information request that SolarWinds conducted found that, despite this policy, less than two thirds (61%) of central government departments have adopted public cloud in their organization. Of the departments with 25% or less public cloud usage, 65% attributed their lack of adoption to their legacy technology, half blamed security concerns, and 35% claimed that a lack of skills prevented them from using public cloud more.

 

The research also revealed that over a third (35%) of central government departments with low public cloud adoption are trying to monitor their on-premises technology and their cloud services with different monitoring tools, making it nearly impossible to manage the whole landscape accurately.

 

Lost in legacy tech

 

In recent years, the UK government has paid for on-premises solutions and there is little incentive to move away from this technology. Although fit for purpose now, departments will need to embrace the public cloud to ensure that they are able to maintain their services.

 

Full cloud adoption is unrealistic, but part of the problem with only adopting a hybrid cloud environment is that there can be added complexity. One method of easing the transition would be to strategically plan a smooth integration between cloud and legacy technology. Another would be to use specially designed monitoring tools that can manage across both environments and reduce the complexity.

 

No penalties means no need

 

One of the reasons that 39% of departments haven’t yet implemented public cloud over the last five years is that there are no consequences for not complying with the policy.

 

What needs to change?

 

One step that should be taken sooner rather than later is to implement incentives for adhering to the policy. Organizations across the public sector of the UK who proactively adopt public cloud usage could receive benefits, such as additional funding or special training, to reward their efforts. As an alternative, those who haven’t shown a demonstrated effort to consider cloud adoption could suffer from a loss of budget or resources.

 

In the United States, they support their cloud initiatives with a US Federal Government Certification called FedRAMP. This certification provides a common set of controls under which, public cloud providers have been judged to be secure by the government and a Third-Party Assessment Organization (3PAO). This allows agencies and bureaus to leverage the cloud environment with an assurance of availability and security. With a similar initiative in the UK, the public sector may feel more confident about adopting public cloud, and therefore would potentially be able to realize many of the benefits that come from proper cloud adoption.

 

Find the full article on Open Access Government.

Building a monitoring and alerting system should always be driven by your business needs. This is always a debate between the IT organization which tends to focus on granular measures, whereas the business users would like to see more of an end to end picture of the organization. An example of this would be uptime--as a DBA, if my database is available and servicing requests, I feel as though I’ve met my uptime goals, whatever they may be. However, if a load balancer goes down taking away access to the application tier, the application is unavailable to users, and that is all that matters. Building a monitoring solution that looks at systems holistically is challenging, and sometimes requires working backwards from desired monitoring objectives (is the system up) to the choosing indicators (is the database service available and writeable), and then building a target.

 

Defining Service Level Objectives

 

You want to focus on what your users care about, and not necessarily what is easy to measure. There are two main areas you will want to use to define these objects--performance and uptime. One notion that comes from Google’s Site Reliability Engineering is the notion of an error budget--a rate at which these service level objects can be missed. Additionally, having an error budget can allow you to be more aggressive with upgrades and resolving technical debt. While evaluating projects and change control efforts you can know that if you are well ahead of your SLO budget you can be more aggressive with rollout. If you are behind the curve, you may curtail some migration efforts.

 

Target Values for SLOs

 

Target values will be a negotiation between IT and the business. From an IT perspective it is important to not overpromise--for example if you only have one physical server in your stack, you probably aren’t going to reach 99.99% uptime. This is important for a few reasons, but in my opinion the biggest is helping the business users understand the correlation between resource cost and availability. In the above one server example, if the business wants that application to deliver 99.99% uptime, it is going to have to invest in redundancy at several levels. There are a few other tenants to think about:

 

  • Past performance isn’t a predictor of future performance--While building a performance target off of your historic baseline is a good start, it does not address the problem of a system that performs well at its current level, but that will fall off a cliff without a major reengineering effort.
  • Don’t Overthink Your Targets--While it may be tempting to bring in someone from the data science team to create your new targets using a complex machine learning K-means clustering algorithm, you are better of creating simple targets like percentage uptime and throughput. If you can’t explain your target in a sentence it is likely too complex.
  • Absolutes are bad--The notion of a system that is always available and can scale infinitely is completely unrealistic. Even hyperscale cloud providers have difficulties delivering 99.999% uptime. It’s better to promise what you can deliver and make the business understand what the cost of delivering more is.

 

This process allows you to set clear expectations with your business and reduces some of the finger pointing during outages. It does require a strong relationship between IT management and senior leadership of your organization, but in the end delivers IT that can be kept up to date while meeting the business needs of the organization.


 

At some point, your first storage system will be “full." I’m writing it as “full” because the system might not actually be 100% occupied with data at that exact point in time. The system could be full for another technical reason. For example, shared components in a system (e.g. CPUs) are overloaded before you ever install the maximum amount of drives, and upgrading those would be too expensive. Or it could be an administrative decision that has made you decide to not hand out new capacity from an existing system. For example, you’re expecting a rapid organic growth of several thin provisioned volumes, which would soon fully utilize the capacity headroom of the current system.

 

The fact that a single system has reached the maximum capacity, either for technical or administrative purposes, does not mean you need to turn away customers. IT should be a facilitator to the business. If the business needs to store additional data, there’s often a good reason for it. In health care it could be storing medical images. For a service/cloud provider, hosting more (paying) customers. So instead of communicating “Sorry, we’re full, go somewhere else!”, we should say something in the direction of “Yes, we can store your data, but it’s going to land on a different system”. In fact, just store the data and leave out the system part!

 

More of the same?

When your first system is full and you’re buying another one, you could buy a similar system and install it next to the original one. It might be a bit faster, or a bit more tuned for capacity. Or completely identical, if you were happy with the previous one.

 

On the other hand, this might also be a good moment to differentiate between the types of data in your company. For example, if you’ve started out with a block storage, maybe this is the time to buy a NAS and offload some of the file data to it.

 

Regardless of type, introducing a second system will create a couple of challenges for the IT department. First, you’ll now have to decide which system you want to land new data on. With identical type systems, it might be a fill and spill principle where you fill up the first system and then move over to the second box.

 

Once you introduce different types and speeds of systems though, you need to differentiate between types of storage and the capabilities of systems. Some data might be better suited to land on a NAS, other data on a spinning disk SAN, and another flavor of data on an all-flash SAN array. And you need to keep track which clients/devices are attached to which systems, so documentation and a clear naming convention is paramount.

 

Keeping it running

Then there’s the challenge of keeping all the storage systems running. You can probably monitor a handful of systems with the in-box GUI, but that doesn’t scale well. At some point, you need to add at least central monitoring software, to group all the alerts and activities in a single user interface. Even better would be central management, so you don’t have to go back to the individual boxes to allocate LUNs and shares.

 

With an increasing number of storage systems comes an increasing number of attached servers and clients. Ensuring that all clients, interconnects and systems are on the right patch levels is a vertical task across all these layers. You should look at the full stack to ensure you don’t break anything by patching it to a newer level.

 

If you glue too many systems together, you’ll end up with a spaghetti of shared systems that make patch management difficult, if not impossible. Some clients will be running old software that prevents other layers (like the SAN or storage array) from being patched to the newest levels. Other attached clients might rely on these newer codes because they run a newer hypervisor. You’ll quickly end up with a very long string of upgrades that need to be performed, before you’re fully up to date and compliant. So, it’s probably best to create building blocks of some sort.

 

How do you approach the “problem” of growing data? Do you throw more systems at it, or upgrade capacity/performance of existing systems? And how do you ensure that the infrastructure can be managed and patched? Let me know!

Patch management at any sort of scale has always been a mundane and time-consuming task that most administrators would like to avoid at all costs. With the proliferation of DevOps methodologies and the public cloud, the practice of immutable infrastructure has eliminated the need for patch management in the eyes of some, given the fact that there would be no long-living servers. In contrast to that notion, most environments have long-living servers that are still around and will be for the foreseeable future due to various reasons. The public cloud and DevOps are the new flavors of the month in IT for many valid reasons, but patch management is still a critical aspect of securing IT environments that can be made easier through the use of managed solutions.

 

The benefits of managed patch management are:

  1. Simplified Management -  The patch management solutions offered by cloud providers provide a single management interface to simplify operations. In addition to the proverbial single pane of glass most cloud providers provide a simplified manner in which to deploy the patch management agents to instances to help speed up deployment.
  2. Scalability - Fully managed solutions have been built to scale to the largest of environments without any performance impact. This eliminates the need to rearchitect the patch management deployment to scale with the needs of the organization.
  3. Managed Upgrades - One of the advantages of utilizing a fully managed patch management solution is the fact that the system for managing patches is automatically patched itself. This is a major win for many organizations that are already short on IT staff.

 

Managed Deployment

The following solutions are managed deployments. This means the patch management software company has added a deployment solution to the respective cloud provider's marketplace to allow the infrastructure to be provisioned with the click of a button.

 

ManageEngine Patch Manager Plus

ManageEngine Patch Manager Plus is a patch management solution that supports Windows, Linux and Mac OS endpoints. This solution is only available on AWS as a marketplace deployment option.

 

SaaS Deployment

The following solutions are Software as a Service (SaaS) deployments. This means the patch management software company hosts the software for its customers.

 

Kaseya VSA

Kaseya VSA is an RMM management platform created by Kaseya that includes patch management functionality. The patch management solution includes support for Windows, Mac OS X and 3rd party software.

 

Automox

Automox is a next generation patch management platform hosted in AWS that aims to provide a unified platform for managing patches across all environments. The patch management solution includes support for Windows, Mac OS X, Linux and 3rd party software.

 

Fully Managed

The following solutions are fully managed patch management solutions such that the cloud provider manages your patch management platform on your behalf and allows engineers to focus on ensuring that instances are up-to-date with their patches.

 

AWS Systems Manager (Patch Manager)

Patch Manager is AWS' managed patch management solution that rolls up underneath AWS Systems Manager. Patch Manager supports both Linux and Windows operating systems as well as on-premises workloads.

 

Azure Automation (Update Management)

Update Management is Azure's managed patch management solution that rolls up underneath Azure Automation. Azure Automation Update Management supports both Linux and Windows operating systems.

 

Patch management for many is simply a necessary evil that often goes overlooked but has a critical impact to the security posture of all environments. Leveraging a managed solution for patch management helps to make life that much easier for administrators given that patch management doesn't provide any business value for most organizations, but it has to be done lest the organization become another headline about a security breach due to unpatched systems.

Striking a balance between good visibility into infrastructure events and too much noise is difficult. I’ve worked in plenty of enterprise environments where multiple tools are deployed (often covering the same infrastructure elements) to monitor for critical infrastructure events. They are frequently deployed with minimal customization and seen as a pain to deal with by the teams who aren’t directly responsible for them. They’re infrequently updated, and it’s a lot of hard work to get all of the elements of your infrastructure covered. Some of this is due to the tools being difficult to deploy, and some of it is due to a lack of available resources, but mostly the problem is people.

 

Information security is a similar beast to deal with. There are many tools available to help monitor your environment for security breaches and critical issues, and lots of data centers with multiple tools installed. Yet, enterprises continue to suffer from having poor visibility into their environments. 

 

Syslog is an underrated tool. I call it the blue-collar worker of the DC. It will happily sit in your environment and collect your log data for you, ready to dish up information about problems and respond to issues at a moment’s notice. All you have to do is tell your devices to send information in its direction and it takes care of the rest. It catches the messages, stores them, analyzes them, and sends you information when there’s something you should probably look at.

 

But it’s not just a useful way to view events in your DC. It can also be a very useful tool when it comes to managing security incidents. 

 

First, set up sensible alerting in your syslog environment. An important consideration is understanding what you want to alert on. Turn on too much information and the alerts will get sent to someone’s trash file rather than read. Turn on too little and you won’t catch the minor issues before they become major ones. You should also think about the right mechanism to use for alerting. Some people work well with email, while others prefer messages or dashboard access.

 

The second thing is to understand what you want to look for in your events. Events such as logins using local credentials when your devices use directory services are an example of something that should trigger an alert. The key is to understand what’s normal in your environment and alert on things that aren’t. It can take some time to understand what is normal, but it’s worth the effort.

 

Third, you must understand how to respond when an alert comes your way. Syslog is a great tool to have running in the DC because it centralizes your infrastructure logging and provides a single place to look when things go awry. But you still need to evaluate the severity of the problem, the available resolutions, and the impact on the business.

 

The key to managing security events with syslog is to have the right information at hand. Syslog gives you the information you need in terms of what, when, and who. It can’t always tell you how it happened, but having that information in one place makes working that out easier.

 

Infrastructure operations can be challenging, particularly when there’s been some kind of security incident. Having the right tools in place gives you a better chance of getting through those events without too many problems and your sanity intact.

This week's Actuator comes to you from the month of May. As you know, May is located right next to June. That means 2018 is almost half over. Now is a good time to check in on your goals for the year and mark your progress.

 

As always, here are some links from the Intertubz that I hope will hold your interest. Enjoy!

 

Amazon cloud business just keeps rolling along

Hard to believe a company that just made $5 billion in revenue for Q1 is the same company that felt the need to jack their Prime subscription by 20%, from $99 to $119.

 

GDPR Kills the American Internet: Long Live the Internet!

May is here, and on the 25th GDPR compliance becomes a thing. Here, Cringely talks about how ICANN is not currently GDPR compliant, and the interesting discussion about the data privacy rights of individuals.

 

Transcription Service Leaked Medical Records

Business suffers a ransomware attack and then follows up by exposing patient data online. Saying “sorry” isn’t enough anymore, folks. We need to start imposing penalties for this lack of basic security. We need to do something to force people and companies to treat data as if lives depended on it.

 

$35 Million Penalty for Not Telling Investors of Yahoo Hack

Yes, exactly like this, except that is too small a fine for a company that size.

 

MacOS monitoring the open source way

I enjoy when tech companies openly disclose how they tackle the same problems we all face. Here Dropbox shares their approach, allowing everyone else to compare and contrast their methods. In the end, it is through this open disclosure and knowledge sharing that we all get a little better in how we administer our own enterprise.

 

You know how HTTP GET requests are meant to be idempotent?

This sounds like something Patrick would discover.

 

Bacoin, the First-Ever Cryptocurrency Backed by the Gold Standard of Bacon

Finally, a cryptocurrency I can sink my teeth into.

 

I need one of these for my front yard:

 

Here is an interesting article from my colleague Joe Kim, in which he explores database performance management.

 

Databases are complex, multifaceted, and vital to the health of every agency. They are the heart of every data center and arguably one of the most important components of an agency’s technology infrastructure, whether on-premises, in the cloud, or within a hybrid IT environment.

 

For these reasons, optimizing database performance is critical to enabling an optimized data center. There are five things a federal IT manager can do to meet this goal:

 

  1. Ensure that databases are healthy
  2. Gain visibility into data and metrics
  3. Put data and metrics in context
  4. Track optimization plans
  5. Create and maintain a performance baseline

 

Let’s look at each of these steps individually to understand how they all fit together:

 

Ensure database health

 

When it comes to databases, health and performance are two different things; databases must be healthy before they can be optimized. Signifiers of database health include things like CPU utilization, I/O statistics, and memory pressure. Collectively, these metrics can indicate whether a database can perform well.

 

Gain visibility

 

The next step is to begin the performance optimization process, ensuring that queries can execute quickly and throughput can be maximized. This starts with gaining full visibility into the data and metrics needed to assess database performance. For example, the ability to drill down into granular metrics like resource contention and a database’s workload is key in helping to identify and mitigate the root cause of a performance problem.

 

Put data in context

 

Agencies should be sure that data delivered by your monitoring tool is structured and presented in a way that gives you and your team the real insights necessary to fix problems and optimize performance.  Specifically, the data should help quickly identify and resolve the root cause of performance issues, and not lead you down a “rat hole” of unnecessary, second-level research and analysis.

 

Track optimization plans

 

There are things your team will do to test and optimize performance, including running optimization queries, for example. Make sure all queries and tests are tracked, and that results are carefully correlated with the tests being performed.

 

Create and maintain a performance baseline

 

It’s nearly impossible to tell when a database is underperforming if you don’t have a daily baseline “normal” to measure against. The best approach is to implement a comprehensive series of management and monitoring tools.

 

Make sure you also have the ability to view down, up, and across; it should let you drill down into the database, across database technologies, and across deployment methods (including cloud).  And, finally, make sure the tools you choose allow you to establish a historical record of performance metrics.

 

All this information, coupled with the ability to create a baseline, will help ensure that your IT teams have the tools they need to optimize the health and performance of their database.

 

To learn more about a tool that can help ensure that your database is performing at optimal levels, watch the Federal Webinar: Getting to know SolarWinds® Database Performance Analyzer.

 

Find the full article on Federal Technology Insider.

Filter Blog

By date: By tag:

SolarWinds uses cookies on its websites to make your online experience easier and better. By using our website, you consent to our use of cookies. For more information on cookies, see our cookie policy.