1 2 3 Previous Next

Geek Speak

2,459 posts

By Paul Parker, SolarWinds Federal & National Government Chief Technologist


Despite ploughing more than £50m into the digital transformation of the NHS, evidence suggests only a handful of trusts have adopted the Government’s “cloud first” policy.


NHS Digital spent more than £32m on digital transformation consultancy services, and £23m with cloud, software, and hardware providers between April and December 2017.


But, its bosses must be questioning why, when less than a third of NHS trusts surveyed in January have adopted any level of public cloud. Mistrust soars, according to recent findings from IT management software provider, SolarWinds.


The research questioned more than 200 NHS trusts and revealed that, while respondents were aware of the government’s policy, less than a third have begun the transition.


Of those who have yet to adopt any level of public cloud, 64% cited security concerns, 57% blamed legacy tech, and 52% said budgets were the biggest barriers.


However, for respondents who had adopted some public cloud, budget registered as far more of a barrier (66%), followed by security and legacy technology (59% each).


Eight percent of NHS trusts not using public cloud admitted they were using 10 or more monitoring tools to try and control their environment, compared to just 5% of NHS trusts with public cloud.


In addition, monitoring and managing the public cloud remains an issue, even after adoption, with 49% of trusts with some public cloud struggling to determine suitable workloads for the environment.


Other issues included visibility of cloud performance (47%) and protecting and securing cloud data (45%).


Six percent of NHS trusts still expect to see no return on investment at all from public cloud adoption.


Speaking exclusively to BBH about the findings, SolarWinds’ chief technologist of federal and national government Paul Parker said, “Cloud is this wonderful, ephemeral term that few people know how to put into a solid thought process.


“From the survey results, what we have seen is that the whole “cloud first” initiative has no real momentum and no one particularly driving it along.


“While there are a lot of organizations trying to get off of legacy technology, and push to modernize architecture, they are missing out.”


Improvements to training are vital, he added.


“People tend to believe there’s this tremendous return on investment to moving into the cloud when, in reality, you are shifting the cost from capital to operating expenses. It is not necessarily a cost saving in terms of architecture and people need to recognize that. “Rather than owning the infrastructure, with cloud you are leasing it, so it doesn’t change the level of investment much.”


There are savings to be had, though, and they come from converging job roles and improving access to medical records, for example.


Parker said, “With cloud, you do not need a monitoring team for every piece of architecture, so while there are savings to be had, they are more operational. With a cloud infrastructure, everything is ready at the click of a button.”


Moving forward, he says,


“It’s that old adage of ‘evolution, not revolution.’ There needs to be technology training, and the NHS needs to have an overarching goal, rather than simply moving to the cloud.


“The first thing trusts need to do is look at their current environment and determine what’s there, what’s critical and what’ non-critical. That will enable them to focus on moving the non-critical things into a cloud environment without jeopardizing anyone’s health or life or affecting security. That will help everyone to better understand the benefits of the cloud and to build trust.


“ I would also like to see a top-down approach in terms of policy and direction. The ultimate goal of healthcare services is to make sure people have a better quality of life; and the goal of IT is to make sure they can deliver that. As IT experts, it is our job to try and help them to do that and make IT easier and more affordable.”


Find the full article on Building Better Healthcare.

One of the most interesting changes that I have observed in my career is Microsoft shifting from just being a development organization to truly becoming a DevOps team, in the case of the SQL Server team. The product code developers are the operations support for the cloud service hosting Azure SQL Database. Many other large development organizations have this model, but my relationships and experience with SQL Server have allowed me to observe this product much more closely. My major observation is that the major reason dev cycles turn so much faster is to fix problems that in traditional on-premises software would have possibly taken months between patches, or even years between full releases, that now get fixed in the course of a few weeks. This has really changed the way Microsoft releases SQL Server on-premises--after a major release, there is a cumulative update released every month for the first year. This means more problems get fixed faster, and if you are using the cloud service, the release cadence is even faster.


I know most of us do not work for hyper-scale public cloud providers with massive development resources, but I used this example because it is a very real-world visible impact into the way a DevOps model can transform an organization. So how does this apply to running our infrastructure organizations? I think even in the most off-the-shelf traditional shop, you should be thinking about how to automate the following work:


  • Manual
  • Repetitive
  • Work with no enduring value
  • Work that doesn’t scale with your application


As a system admin, your main job task is to keep the lights (and more importantly, the servers) on in your organization. By adopting a mindset of trying to automate as many of these tasks as possible, you will enjoy your job more, and have more time to focus on tasks that are more strategic to your organization. You can take advantage of frameworks like PowerShell and Python scripting in your environments (yes, you will have to write some code) to bring this all together, but this mindset will change the way you view system administration.


Where do you start with automation? Identify your most common tasks that consume the most time--for example, if you are using VM templates to deploy your operating system environments (that’s a really fancy way of saying VMs), you have already started down this path. A logical next stop might be to automate the process for keeping your templates up to date with patches, or to use scripting to ensure that any post-deployment configuration tasks happen without human intervention. To do this really well, you should adopt a developer mindset--all of your scripts should use source control, you should work to develop a unit testing process (the one negative to automation is that if you break something badly, you can break a lot of things badly, and quickly), and some aspects of a development methodology to keep the progress moving forward.


Moving into this mindset is a big shift for many organizations. However, in modern IT, influences like cloud and the aforementioned rapid development cycles mean that everything just moves faster. The other benefit of this automation, which should be a major influence on what tasks you automate, is the reduction in alert pages. If you can automate away your most common pager responses, your entire team and organization will benefit. The final, largest benefit of automation is that you know a script will run the same every time it's executed, as opposed to your junior admin, who may or may not get a complex task sequence correct each time.

“In the beginning was the word” … oops, “the cloud."


Long ago "cloud" was a buzzword, especially when it was a cool word that one knew exactly what it meant. Many companies made investments with an eye for the future focused on high-profile technologies, betting on the success of the cloud. Other companies drove their investments conversely to more reliable traditional solutions. Today the attention is focused on moving these old-fashioned but profitable (and fragile) applications to a cloud environment. The challenge is how to provide these applications the right amount of resources. This appears to be a difficult task because it isn’t simply a matter of quantity of resources. Legacy applications are built to run in a traditional architecture, usually monolithic and rigid. The kind of traditional “create, install, run, and forget it,” solution is on for a field for highly vertical specialists that seem to be going away.


Migrate or replace?

Who should make the decision on this migration? A more important decision to make might be: is it worth it to re-code these applications, or is it more convenient in terms of time, money, and features to code them from scratch? Or just move to a cloud-native application instead? I think that the second option would be a best choice. It is true that it will ask for more resources, but once in production it will present several benefits: weekly (or even daily) upgrades to fix any bugs or provide new features, performance, running natively on cloud components, flexibility, and accessibility to several different device types (tablets, phones, different OS/browsers).


All these benefits could also be gained by re-coding the old legacy app, but would it be worth it in the long run? The core of the app is still the original one, built with a traditional architecture in mind. And are we sure that the cost of re-coding will be less than the one required to move to a new native app? Let’s consider the training that users will need to use the new app as a cost that will be offset by hyper-specialists no longer being required to maintain a legacy app.


Usually the companies most resistant to a cloud native app replacing a legacy one are banking institutions, insurance companies, and healthcare. These organizations are all bound to solid and long time-tested applications, no matter how inefficient, but tested. By tested I mean including bugs (known ones and accepted ones) because solving them would mean a huge effort in terms of cost for development and time loss for services being down.

Educate management crew

If you make these operators understand the benefits of a cloud native application, especially involving the management team because of their leadership of the technical teams below them, you did the best job you could. After this, it's a matter of time because word moves fast in their vertical environments. Outside of those verticals things tend to slow down.


For all the cases where companies still reject this kind of app, there’s still the so-called “lift and shift:” keeping legacy apps and moving them to a virtualized infrastructure. It’s a smooth step to push them to the “new world.” From that starting point it could be simpler to expose to management the benefits of virtualization and going forward, presenting a scenario they could move into the future, as in a serverless environment. But this last step should be taken only if the people you’re talking to are technologically advanced, otherwise they could just be scared from the “unknown” and simply decide to put all the stuff back to the original site.

Be sober

In any case, our task is like walking on eggshells – presenting new solutions, keeping an eye on the emotional part of the people you’re talking to, and not being too enthusiastic, but looking professional, responsible, and balanced, speaking not only about benefits, but also (and in some cases, especially) about risks keeping beside the solution for any of these risks.

A storage system on its own is not useful. Sure, it can store data, but how are you going to put any data on it? Or read back the data that you just stored? You need to connect clients to your storage system. For this post, let’s assume that we are using block protocols like iSCSI or traditional block storage systems. This article also applies to file protocols (like NFS and SMB) and to some extent even to hyper-converged infrastructure, but we will get back to that later.


Direct attaching clients to the storage system is an option. There is no contention between clients on the ports, and it is cheap. In fact, I still see direct attached solutions in cases where low cost wins over client scalability. However, direct attaching your clients to a storage system does not really scale well in number of clients. Front-end ports on a storage array are expensive and limited.


Add some network

Therefore, we add some sort of network. For block protocols, that is a SAN. The two most common used protocols are the FC protocol (FCP) and iSCSI. Both protocols use SCSI commands, but the network equipment is vastly different: FC switches vs. Ethernet switches. Both have their advantages and disadvantages, and IT professionals will usually have a strong preference for either of the two.


Once you have settled for a protocol, the switch line speed is usually the first thing that comes up. FC commonly uses 16Gbit and 32Gbit switches that have been entering the market lately. Ethernet, however, is making bigger jumps, with 10Gbit being standard within a rack or wiring closet and 25/40/100Gbit commonly used for uplinks to the data center cores.


The current higher speeds of Ethernet networks are often one of the arguments why “Ethernet is winning over FC.” 100Gbit Ethernet has already been on the market for quite some time, and the next obvious iteration of FC is “only” going to achieve 64Gbit.



Once you start attaching more clients to a storage system than it has storage ports, you start oversubscribing. 100 servers attached to 10 storage ports means you have on average 10 servers on each storage port. Even worse, if those servers are hypervisors running 30 virtual machines each, you will now have 300 VMs competing for resources on a single port.


Even the most basic switch will have some sort of bandwidth/port monitoring functionality. If it does not have a management GUI that can show you graphs, third-party software can pull that data out of the switch using SNMP. As long as traffic in/out does not exceed 70% you should be OK, right?


The challenge is that this is not the whole truth. Other, more obscure limitations might ruin your day. For example, you might be sending a lot of very small I/O to a storage port. Storage vendors often brag with 4KB I/O performance specs. 25,000 4KB IOps only accounts for roughly 100MB/s or 800Mbit (excluding overhead). So, while your SAN port shows a meager 50% utilization, your storage port or HBA could still be overloaded.


It becomes more complex once you start connecting SAN switches and distributing clients and storage systems across this network of switches. It is hard to keep track of how much storage and client ports traverse the ISLs (Inter Switch Links). In this case, it is a smart move to keep your SAN topology simple and to be careful with oversubscription ratios. Do the oversubscription math, and look beyond the standard bandwidth graphs. Check error counters, and in an FC SAN that has long distance links, check whether the Buffer-to-Buffer credits deplete on a port.


Ethernet instead of FC

The same principles apply to Ethernet. One argument why a company chooses an Ethernet-based SAN is because it already has LAN switches in place. In these cases, be extra vigilant. I am not opposed to sharing a switch chassis between SAN and normal client traffic. However, ports, ISLs, and switch modules/ASICS are prime contention points. You do not want your SAN performance to drop because a backup, restore, or large data transfer starts between two servers, and both types of traffic start fighting for the available bandwidth.


Identically, hyper converged infrastructure solutions like VxRail and other VMware VSAN place high demands on the Ethernet uplinks. Ideally, you would want to ensure that VMware VSAN uses dedicated, high-speed uplinks.

Which camp are you in? FC or Ethernet, or neither? And how do you ensure that the SAN doesn’t become a bottleneck? Comment below!

Welcome to another edition of the Actuator. I hope everyone is enjoying some warm spring weather. It's nice to be able to sit outside for an hour at the end of the day.


As always, here are some links from the Intertubz that I hope will hold your interest. Enjoy!


Alexa and Siri Can Hear This Hidden Command. You Can’t.

Fun fact for you: There is no law against sending subliminal messages to humans, or machines. The practice is discouraged and *may* be considered an invasion of privacy (for humans, not for machines). Another example of where the laws lag far behind the technology.


Digital Photocopiers Loaded With Secrets

Not mentioned in the article: the embarrassing photos from office Christmas parties.


Stanford Study Shows the Astonishing Productivity Boost of Working From Home

Glad to see we are putting some data into the productivity levels for people working from home. I’ve been doing it for eight years now. I know it’s made me more productive, happier, and healthier. I can’t go back to having a real job, ever.


Don't Skype Me: How Microsoft Turned Consumers Against a Beloved Brand

“[Using Skype] is like Tim Tebow trying to be a baseball player.” Ouch.


Amazon’s Fake Review Problem

I’ve been frustrated for years with the reviews on Amazon. I find many of them to be fake. These days I focus on the three-star ratings and do my best to discern the truth. To be fair, Amazon is not the only company with an online review problem.


Are My Friends Really My Friends?

Interesting analysis showing that despite being surrounded by constant interactions, we are more alone now than ever before.


Security researchers discover critical flaw in PGP encryption that reveals plaintext

Everything is terrible.


Seems legit:

Public cloud providers have greatly simplified the process of creating a backup, but the challenge has always been managing that at scale with things like policies for retention or simple granular file level restores or regulatory focused dashboards. This is the value added by many of the backup management solutions discussed in the following post and becomes critical once an environment scales past a few instances and databases.


The benefits of managed backup management are:

  1. Simplified Management - The backup management solutions offered by public cloud providers are generally account or subscription focused and doesn't offer a holistic view of the entire environment.
  2. Scalability - Fully managed backup and SaaS solutions have been built to scale to the largest environments without any performance impact or major concern for running out of storage space. This eliminates the need to re-architect the backup management deployment to scale with the needs of the organization for things such as keeping data for twice as long because of a new mandate.
  3. Multi-Cloud Support - Many of the legacy backup products that are available on the market only support backing up data to public cloud providers or only support backing up workloads in a single cloud provider. More and more companies are implementing multi-cloud strategies and a solution that supports multiple clouds is essential to simplifying operations.


Unmanaged Deployment

The following solutions are unmanaged deployments. This means that the software is available to be installed by the customer or has been packaged in the native cloud format such as an AWS AMI but is not available in one of the cloud provider's marketplace.


Rubrik Cloud Data Management

Rubrik Cloud Data Management is a software appliance that can be deployed to AWS, Azure and GCP. The Cloud Data Management platform supports policy based snapshot management along with advanced analytics to generate operational insights.


Managed Deployment

The following solutions are managed deployments. This means the backup management software company has added a deployment solution to the respective cloud provider's marketplace to allow the infrastructure to be provisioned with the click of a button.


Veritas CloudPoint

Veritas CloudPoint is a backup management solution that supports automated deployment into Azure but supports backing up workloads on AWS, Azure and GCP. In addition to IaaS workloads across the three major clouds CloudPoint also supports application level backups such as Microsoft SQL, MongoDB and AWS Aurora.


SaaS Deployment

The following solutions are Software as a Service (SaaS) deployments. This means the backup management software company hosts the software for its customers.


Druva Apollo

Druva Apollo is a SaaS solution that provides data protection of AWS EC2, RDS, S3, EBS, and Glacier. Druva Apollo also includes SLA-based snapshot retention policies in addition to tiering to reduce costs as older snapshots are moved to cheaper storage and eventually deleted.


Rubrik Polaris

Rubrik Polaris is a SaaS solution that integrates with Rubrik's Cloud Data Management hardware and software appliances to provide a unified management platform for both on-premises and cloud-based workloads.


CloudRanger Backup and Recovery

CloudRanger Backup and Recovery is a SaaS solution that provides backup management of AWS EC2, RDS, and Redshift instances using native AWS snapshots. Instance and file level backups are supported along with multi-region and multi-account backup restore points.


Fully Managed

The following solutions are fully managed backup management solutions such that the cloud provider manages backups on your behalf.


Built-in Snapshots

Public cloud providers allow administrators to create snapshots of virtual machine instances, databases, etc. This doesn't provide a robust feature set in terms of management but does allow administrators to backup and restore to a given point in time.


Backup management is an unsexy topic to most but of course has tremendous value when there is a disaster but many of the new backup management solutions are becoming much more than just creating a snapshot via the public cloud providers native snapshot APIs.

By Paul Parker, SolarWinds Federal & National Government Chief Technologist


Here is an interesting article from my colleague Joe Kim, in which he discusses what we can expect to see in the future as federal IT professionals.


Over the past couple of years, administrators have confronted a rapidly changing landscape that seems to shift overnight. Public vs. private cloud is an old argument; now, administrators are grappling with the reality of implementing and managing hybrid IT infrastructures.


As such, their skill sets are being tested like never before. Sixty-two percent of respondents to a recent SolarWinds IT trends survey indicated that hybrid IT has required that they acquire new skills, while 11 percent said it has altered their career path. Meanwhile, 57 percent of public sector organizations have already hired or reassigned IT personnel, or plan to do so, for the specific purpose of managing cloud technologies.


The skills that IT administrators are learning today will have a large impact on what IT management will look like ten years from now.


From service managers to service consumers


Our survey found that 96 percent of respondents have moved at least some applications and aspects of critical infrastructure to the cloud. This migration has caused federal administrators to sharpen their “as-a-service” skills, since many of the tools they are using have become software-defined and exist both on-premises and in hosted environments.


Federal IT professionals have gone from being service managers to service consumers who work with cloud providers to manage their infrastructures. In this service-oriented world, administrators are finding themselves interacting more with software than they are with hardware switches and routers. These interactions are precursors to IT practitioners’ inevitable evolution from traditional network managers into areas that may be more familiar to developers, and toward becoming service brokers, rather than service managers.


From network manager to network developer


Administrators previously needed to be savvy about command lines and hands-on management of network components, but the move toward hybrid IT and Software-as-a-Service (SaaS) applications has greatly reduced the need for these types of skills. Administrators must now begin to be able to manage the different pieces of code that comprise applications and allow those programs to work with each other.


Tomorrow’s network administrators will be familiar with application program interfaces (APIs) — essentially app building blocks — and how they can be used to solve common problems, from network management to security challenges. They will create highly customized and dynamic networks that fit the unique needs of their agencies. Furthermore, they’ll have a greater amount of control over these networks, as they will be able to tap into APIs to dictate policy, rules, user access, and more.


From servicing people to self-service


They’ll also move from being service managers to service brokers. Instead of provisioning more storage or spending their time clicking around user interfaces, they’ll be assigning applications and access rights to individual users so that those users can easily set up services on their own. The standard practice of a user submitting an online request for access to a new application will be a rare occurrence; everything that a person needs or is authorized to use will be at their fingertips in this self-service environment.


Network administrators will also have more opportunity to add strategic value to their agencies. Today, administrators spend a lot of time servicing users. Moving toward self-service will allow users to check their own boxes, download their own applications, and authorize their own access, all without having to go through their system administrators. In turn, future administrators will have more time to work on higher value services, such as developing plans for stronger security measures or using predictive analytics to anticipate and remediate network issues.


From the future to the past


Despite all of these changes, administrators will still need to focus on the “bread and butter” aspects of network management, including performance, availability, and compliance. To ensure success in each of these areas, administrators will need to use many of the same tools and processes that are commonplace right now.


Indeed, some of these tools will be even more important than they are today. For instance, network performance monitoring will be critical, particularly as IT becomes increasingly hybrid- and application-based. Agencies will need solutions that provide automated and unfettered insight into the performance of these applications, whether they are on-premises or hosted, just as they do today.


Learning these and other solutions will take some work, but there are resources available. Online forums such as SolarWinds’ THWACK provide forums where administrators can exchange ideas and ask questions, and most vendors will be more than willing to offer information on best practices.


These resources provide administrators with a chance to hone their skills today while preparing for their future. That future undoubtedly will be challenging, but it also will present many opportunities for federal IT professionals who want to expand their horizons and add more value to their agencies.


Find the full article on GovLoop.

The Simple Network Management Protocol (SNMP) has been a key part of managing network devices in the data centre for some time. It really is a pretty simple protocol to work with (hence the name), and I think it’s underrated as a key tool for monitoring unusual events. Unfortunately, SNMP has had some issues over time. One of these has been sending out a lot of information over the network in an insecure fashion. SNMP v3 was developed to address this. Another issue has been that Joe Sysadmin doesn’t always take the time to configure custom strings to use in the environment with the devices he’s trying to manage. Instead, the default “public” community string is left configured on the devices with more access than is required. This kind of behaviour drives information security folks nuts, and has operations staff questioning whether SNMP is worth the hassle.


SNMP is an extremely flexible solution that provides a robust framework with which you can leverage things like vendor-specific management information base (MIB) files. You can use these to provide both read-only and write access to networked devices. The advantage to this approach is that you can feed information into your management system that provides useful insights, rather than simply showing whether the device is up or down.


Following on from this, alerting that aligns with your devices gives you a better chance of identifying unusual issues in your environment. You could, for example, set your devices to send a trap when local user credentials are used to log in to a device rather than directory credentials. This type of activity may indicate that someone’s up to no good in your environment.


Security in your environment isn’t just about people cracking device credentials, though. It’s also about having devices available to provide the appropriate services to applications and their users. Configuring devices to send meaningful information via SNMP as issues occur can be a great way to get to minor problems before they become major issues. If one of your two firewall devices has suffered a failure, your infrastructure is compromised and you need to address the problem. I’ve seen plenty of situations where internal systems failures go unnoticed for far too long, leading to reduced performance in the environment and angst for both the operations staff and end-users. But people don’t just become unhappy with the infrastructure. They start to use workarounds to get their work done, which can involve unsafe practices such as storing unsecured corporate data in personal mailboxes or on publicly accessible file sharing sites.


A lot of people would agree that data centre operations can be a difficult thing to do well, particularly at a large scale. There always seems to be some device or another that’s run out of capacity, has a failed component, or has simply stopped doing what it’s meant to do. That’s why tools such as SNMP and syslog can help tremendously with keeping things under control in the DC. There’s a wide range of management systems available in the marketplace that can be used to do some pretty cool stuff with SNMP. Most device that can be deployed in a 19” rack can speak SNMP and syslog, so why not get as much information about what’s happening in your environment as you can? The investment in effort upfront can save you a lot of time and headaches down the trick when things invariably go awry.

In a previous Geek Speak blog post, I talked about the viability of practice leader being a career path for IT professionals. A large part of practice leadership is being fluent in vendor technologies, i.e. products. This opens up an interesting set of paths for IT professionals: product management and product marketing.


Are you passionate about discovering IT pain points and finding solutions for them? Product management focuses on customers and their problems. A product manager (PM) is the voice of the entire customer spectrum; the PM delivers market metrics that guide product decisions.


Do you like to tell stories? If you like sharing your problem-solving knowledge by using a product in a one-to-many fashion, you'd probably enjoy working as a product marketing manager (PMM). As Peter Drucker said, product marketing focuses on the product selling itself; creating a product that people want to buy; and creating an environment that encourages people to buy.


Ideally, PMs work to understand real customer problems and using that knowledge to turn solutions into products, while PMMs work to smooth the friction between products and consumers in the marketplace.



Would you consider transitioning into a new role as PM or PMM? Let me know in the section below.


P.S. We are hiring for both PM and PMM positions.

Over the years, I’ve read more than a few articles about the qualities that are found in the best administrators. These articles focus on soft skills, but sometimes will list hard skills as well. In all those years, and in all those articles, I rarely see advice on how those skills are to be applied. It’s as if the author expects the reader to just know what to do with the skills, once acquired.


No matter what type of administrator you are (network, database, systems, etc.), the best way to apply your skills is by being responsive and responsible. I wrote about this in my book more than eight years ago, and the advice is still true today.


Being responsive means you take action on an item. It matters not if the item or task is your responsibility. For example, if a disk fails and you aren’t a member of the server team, it’s not something for you to fix. If someone is reaching out to you for help, you must be responsive. The person reaching out to you has no idea what tasks you are responsible for, they just need help. You want that customer to have the perception that you are responsive to their needs.


The hardest part of being responsive are the hours. It can be difficult to be responsive at all times of the day. And the better you get at your role, the more your services will be in demand.


Being responsible is taking ownership for something. It should be common sense that you would take responsibility for tasks that are central to your job role as an administrator. But you should also take responsibility for your mistakes. When something goes wrong it can be easy to deflect blame to other teams or specific people. You must resist that urge.


Here’s an example from my past life as a database administrator. It was 3 a.m. and I needed to rebuild a server. It failed because a LUN was erased by mistake (and that mistake replicated, quickly, which is why HA is not the same as DR, but I digress). I needed to restore the master database. We were using a third-party backup software product, and the master restore required some different syntax that didn’t want to work.


It took me longer than it should to restore the master. I wasn’t happy. And I could have blamed the backup vendor, or the engineer that wiped out the LUN, or the manager that confused HA and DR. But I knew that wouldn’t help anyone. So, the next day I informed my managers that I could have done better. I outlined a training plan so that any member of my team would be able to perform the same tasks in the correct amount of time. I took responsibility.


I didn’t have to inform my managers about the delay. They have no idea how long it takes to restore a master database. But I wanted to show them that I was being responsible. I could have easily blamed others, and nobody would have thought twice.


Be responsive and responsible. I believe it pays off in the long run.

The image of a modern office has been rapidly changing in recent years. From the medical field, to sales, to technical support, users are on the road and working from just about everywhere possible. Firewalls, malware protection, and other security practices are great when a user is on-premises at your location, but what happens when they are working on an assignment at the coffee shop or showing off a report at a lunch meeting? How can you ensure these protections are installed and working correctly? The fact is that a network and systems perimeter logically cannot be defined simply by the walls of your location. Users are on the go, and this requires special accommodations from us in the support field.


Endpoint Protection

When a user is on location, it is obviously easier to deploy security policies against their device to allow it to use your network in a secure manner in accordance with your security policies. In the past, endpoint meant just providing users a firewall and some antivirus software. With malware and other intrusions getting more advanced, this can no longer be the extent of endpoint security. Modern endpoint protection has progressed to a point where endpoint agents now can contact an organization’s security management systems to get its policies, definitions, etc. This contrasts with the old way where a lot of things were only able to be controlled when a user was on-site. This version of protection will allow an organization to provide maximum security to users when on-site or on the road alike. This can be especially useful for users who are constantly on the go as they will still receive security updates and configuration as soon as they are published while you -- the administrator -- can stay in the loop too.


Making Remote Network Access Easy

Security is often the number one priority (as it should be, in my opinion), but to an end-user, ease of use is often the most important. Their focus is often just getting their work done in the easiest and most hassle-free way possible. There are multiple ways to keep a remote user (and the company network) safe and secure while allowing them to work. A couple of the most popular methods are remote or virtual desktops and VPN connections with profiling. Virtual desktops provide remote access to users as if they were sitting at your main location. This could include everything from fileshare access to applications. This makes things easy and uniform company-wide as the virtual desktop image can be maintained, and the system that a user is accessing from is irrelevant. The virtual desktop would remain secure and under your control. The other option is VPN access with profiling. Best of both worlds, right? A user could access network resources with their own device and use their own applications. The profiling aspect would allow you as the administrator to ensure malware protection is installed, the firewall is on, policies are up to date, and the device is current with operating system updates. Both solutions have their merits and a place in certain situations depending on what you might be looking for.


The Bottom Line

As I mentioned earlier, I believe security should be the number one focus (as a lot of you do too, I’m sure). I know in my day job, the security of the network is my main focus. In a lot of cases we can’t simply place the network and servers on lockdown to the outside world. That sure would be easier! Working to provide remote access where it's explicitly needed is something we as network and server admins will be tasked with even more going forward. It’s how we all approach that challenge that will determine the security of our organizations going forward.

Following my previous post about logging, I'd like to talk about another tool to manage logs that is more advanced than syslog.


Logwatch is essentially a system log analyzer and reporter. It elaborates logs that are simply collected by syslog. This kind of evolution is simplifying the daily job of modern system and network administrators. Logs are everywhere and almost everything produces logs -- not only specific IT systems, but also the elements forming the so-called "Internet of Things," or IoT. The innovation comes from this last application: "things" producing logs could be managed in a smarter way if those logs are analysed and reported to elaborate behavior to perform consequently.

This tool is a very simple one, but, at the same time, is very powerful. In its first form it is only CLI and structured in directories.


Shaping Behaviors

The real power (but not the only one) of Logwatch is the possibility to shape its behavior according to the administrator’s needs. Shaping the tool could be simple or more articulated, but it should be possible to allow the administrator to carry out their daily job as easily as possible so they could redirect their attention to solve the issues that these tools could reveal.


Customizing also means filtering, as described in my last post. Filtering would be easier if the tool would emphasize the keywords to be filtered. So, the first customization should be catching these keywords using variables in the main configuration. As an example, we could be interested in raising an alert log when a website will return a 500 error, but only after “n” times, and not immediately because we already know that this webserver is supported by an application server, which could be overloaded in particular known situations. No need to produce tons of logs when we already know that the issue occurs in the app server (THIS is the issue to solve, not the error coming from webserver).


Preventive modeling

This process implies a previous analysis by the administrator, as you can understand. As usual, this is a tool, and it works according to your decisions. It can’t make its own decisions. Other more advanced tools could give inputs and advice, but not these basics tools.


Different services may have different logging configurations. At the same time, some services could be ignored, others overridden. Furthermore, security level – there are services not so critical so that logging accuracy can be lower, other that are high-security related, which need a very high production of logs. These different behaviors are written in Perl in the case of Logwatch.


Conversely, there are cases when we need not different behavior but instead a homogeneous output: the case of different IoT components, produced by different vendors and logging in various ways. For an easier and more effective comprehension, customization of log writing and processing can “normalize” the logs written by any of them, and helping a comparison between behaviors.

Syslog evolution

We can compare an advanced tool like Logwatch with more basic ones like syslog based mainly on the scripting and the ability to choose how to behave in different situations. Syslog only allows the administrator to filter the most representative lines for troubleshooting. Logwatch can build a particular structure of the line itself, and then filter it based on this shape, being a friendlier helper for the advanced administrator. In this way, troubleshooting and monitoring will improve, and a better analysis and developing new models to apply to log writing will increase.

May has arrived and so has some warmer weather. It feels good to be able to sit outside. I'm enjoying it while I can; we have only a few weeks of warm weather before the mosquitos arrive.


As always, here are some links from the Intertubz that I hope will hold your interest. Enjoy!


Twitter urges all users to change passwords after glitch

Just in case you didn’t hear about this, yet. Twitter came forward to say that a “bug” had accidentally caused user passwords to be stored in clear text. I’d like more technical details on how this bug could happen. In the meantime, change your passwords. Or don’t and use this as an excuse to tweet weird things and then say you were hacked.


Home DNA Kits: What Do They Tell You?

Not much, even for the dog. Only one company identified the DNA as not human. Something to think about for those folks that are thinking of spending money on such services.


The Gambler Who Cracked the Horse-Racing Code

On the heels of the Kentucky Derby comes this riveting story of a man who built and has used a predictive model for horse racing to win $1 billion over the past 30 years. The next time someone says data analytics isn’t a real thing, show them this story.


Police Tested Facial Recognition at a Major Sporting Event. The Results Were Disastrous

Then again, maybe data analytics still has a way to go. Or maybe we should put some prize money behind facial recognition efforts. If someone knew they could make a billion dollars with the right predictive model, I’m certain we would have a solution by now.


Unroll.me to close to EU users saying it can’t comply with GDPR

If you really want people to know you that don’t care about their privacy, just tell the world you won’t comply with the GDPR. This single act tells me everything I need to know about whether or not I can trust Unroll.me with my data. (The answer is "no.")


Yes, it’s Bad. Robocalls, and Their Scams, are Surging.

Robocalls are increasing. People are mad. And nothing is being done to stop the volume of spam phone calls from increasing.


This Lego Breakfast Machine Can Make You Bacon And Eggs

Father’s Day is coming up. #JustSayin


Spring is finally here, and we all know what that means:


By Paul Parker, SolarWinds Federal & National Government Chief Technologist


It’s no secret that public cloud is becoming an increasingly popular option for enterprises to adopt as a means of storing data in a way that is easily accessible. To encourage the public sector in the UK to take the steps required to adopt public cloud, their government introduced the “Cloud First” policy in 2013; the policy states that when making technology decisions, all public sector organizations should consider using the public cloud before other options.


A recent Freedom of Information request that SolarWinds conducted found that, despite this policy, less than two thirds (61%) of central government departments have adopted public cloud in their organization. Of the departments with 25% or less public cloud usage, 65% attributed their lack of adoption to their legacy technology, half blamed security concerns, and 35% claimed that a lack of skills prevented them from using public cloud more.


The research also revealed that over a third (35%) of central government departments with low public cloud adoption are trying to monitor their on-premises technology and their cloud services with different monitoring tools, making it nearly impossible to manage the whole landscape accurately.


Lost in legacy tech


In recent years, the UK government has paid for on-premises solutions and there is little incentive to move away from this technology. Although fit for purpose now, departments will need to embrace the public cloud to ensure that they are able to maintain their services.


Full cloud adoption is unrealistic, but part of the problem with only adopting a hybrid cloud environment is that there can be added complexity. One method of easing the transition would be to strategically plan a smooth integration between cloud and legacy technology. Another would be to use specially designed monitoring tools that can manage across both environments and reduce the complexity.


No penalties means no need


One of the reasons that 39% of departments haven’t yet implemented public cloud over the last five years is that there are no consequences for not complying with the policy.


What needs to change?


One step that should be taken sooner rather than later is to implement incentives for adhering to the policy. Organizations across the public sector of the UK who proactively adopt public cloud usage could receive benefits, such as additional funding or special training, to reward their efforts. As an alternative, those who haven’t shown a demonstrated effort to consider cloud adoption could suffer from a loss of budget or resources.


In the United States, they support their cloud initiatives with a US Federal Government Certification called FedRAMP. This certification provides a common set of controls under which, public cloud providers have been judged to be secure by the government and a Third-Party Assessment Organization (3PAO). This allows agencies and bureaus to leverage the cloud environment with an assurance of availability and security. With a similar initiative in the UK, the public sector may feel more confident about adopting public cloud, and therefore would potentially be able to realize many of the benefits that come from proper cloud adoption.


Find the full article on Open Access Government.

Building a monitoring and alerting system should always be driven by your business needs. This is always a debate between the IT organization which tends to focus on granular measures, whereas the business users would like to see more of an end to end picture of the organization. An example of this would be uptime--as a DBA, if my database is available and servicing requests, I feel as though I’ve met my uptime goals, whatever they may be. However, if a load balancer goes down taking away access to the application tier, the application is unavailable to users, and that is all that matters. [MR1] Building a monitoring solution that looks at systems holistically is challenging, and sometimes requires working backwards from desired monitoring objectives (is the system up) to the choosing indicators (is the database service available and writeable), and then building a target.


Defining Service Level Objectives


You want to focus on what your users care about, and not necessarily what is easy to measure. There are two main areas you will want to use to define these objects--performance and uptime. One notion that comes from Google’s Site Reliability Engineering is the notion of an error budget--a rate at which these service level objects can be missed. Additionally, having an error budget can allow you to be more aggressive with upgrades and resolving technical debt. While evaluating projects and change control efforts you can know that if you are well ahead of your SLO budget you can be more aggressive with rollout. If you are behind the curve, you may curtail some migration efforts.


Target Values for SLOs


Target values will be a negotiation between IT and the business. From an IT perspective it is important to not overpromise--for example if you only have one physical server in your stack, you probably aren’t going to reach 99.99% uptime. This is important for a few reasons, but in my opinion the biggest is helping the business users understand the correlation between resource cost and availability. In the above one server example, if the business wants that application to deliver 99.99% uptime, it is going to have to invest in redundancy at several levels. There are a few other tenants to think about:


  • Past performance isn’t a predictor of future performance--While building a performance target off of your historic baseline is a good start, it does not address the problem of a system that performs well at its current level, but that will fall off a cliff without a major reengineering effort.
  • Don’t Overthink Your Targets--While it may be tempting to bring in someone from the data science team to create your new targets using a complex machine learning K-means clustering algorithm, you are better of creating simple targets like percentage uptime and throughput. If you can’t explain your target in a sentence it is likely too complex.
  • Absolutes are bad--The notion of a system that is always available and can scale infinitely is completely unrealistic. Even hyperscale cloud providers have difficulties delivering 99.999% uptime. It’s better to promise what you can deliver and make the business understand what the cost of delivering more is.


This process allows you to set clear expectations with your business and reduces some of the finger pointing during outages. It does require a strong relationship between IT management and senior leadership of your organization, but in the end delivers IT that can be kept up to date while meeting the business needs of the organization.


[MR1]Matters to a non dba? To a dba? Ties in?


Filter Blog

By date: By tag:

SolarWinds uses cookies on its websites to make your online experience easier and better. By using our website, you consent to our use of cookies. For more information on cookies, see our cookie policy.