Geek Speak

June 2018 Previous month Next month

In my last post, I talked about microservices and how their deep connection can offer a quality application.

 

Now, I want to move up a layer. Not only between microservices, but interaction between applications. I would call it architecture applied to applications, where their behavior is strictly connected and influences the other ones. Like it is for microservices, the goal is a better user experience.

 

Architecting a Good Product

The architecture is built to scale up and manage all the single pieces overall. Every application should play like an instrument in an orchestra. The applications need to follow rules, strategies, and logical and technical design. If one of them doesn’t play as expected, the whole concert becomes a low-quality performance. If direction is improvised, the applications won’t run in harmony. The entire user experience relies completely on the whole architecture.

 

The importance of a good design describes all the interactions between every application in detail as the minimum requirement for an acceptable user experience. This so-called UX (User Experience) design is the goal of the final product.

 

Different Users, Different Designs

Happiness is extremely subjective. Consequently, the UX is also very subjective. We should consider in this design how different users react based on age, skills, expectations, and so on. There may be several different designs with different interactions and even different applications involved. For each of the users, the product should be easily usable in a reasonable time. The analysis of user behavior will help in this. The best way to accomplish this is offering a limited number of users the final tool as a beta and receive a feedback to build a model of interactions between the components in the UX.

 

Who Wants to be a Pilot?

Complexity of a GUI doesn’t necessarily mean a cryptic interface. To drive a sports car, you need a good understanding of the theory behind driving and practical experience. But sport cars aren’t built just for car enthusiasts, so they need to have a sophisticated electronic layer to manage all the critical behavior and correct all the possible mistakes that a “normal” driver could make. This complex environment should be simple and intuitive for the user to help them focus on driving. So, with a glance, the driver should be able to use the navigation system, enable the cruise control, activate fog lights, and so on. All of these components are the applications in a UX architecture.

 

Is it Nice? Not Enough

Aesthetics can’t replace usability. You can spend all of your development time making the best icons and all the best possible nice animations. However, if UX is bad, the whole system won’t have success. Instead, the opposite is true. If applications are well built and their interaction makes the end-user happy, even a less-polished look will be acceptable if the product ends up being successful.

 

Standardize It

The Interaction Design Association (IxDA, https://ixda.org/) defines the interaction design between users and services. It also defines the interactions inside services and between applications. IxDA offers some guidelines to follow in order for a product to be standardized. This means for users that a product defined by IxDA standards is inherently usable because all similar products run in the same way.

 

And again – feedback, feedback, feedback! Not only in beta testing, but also when the UX is generally available. The more info you get from the users, the better you can tune applications’ interactions, and the better the overall user experience.

I hope everyone had an enjoyable Father’s Day this past weekend. For me, Father’s Day is the official start to summer. It’s also an excuse to grill as much meat as legally possible. By the time you read this, I will be on my way to Germany, to eat my weight in German grilled meat at SQL Grillen.

 

As always, here are some links from the Intertubz that I hope will hold your interest. Enjoy!

 

Apple Update Will Hamper Police Device Crackers

This update is to close a loophole exploited by companies such as Grayshift, that specialize in helping police departments crack open a locked phone. But by announcing the upcoming patch, Apple has given Grayshift plenty of notice to come up with alternatives. Look for this dance to continue for some time.

 

GPAs don’t really show what students learned. Here’s why.

This post was written by someone that took one class in statistics and had a 2.8 GPA. All kidding aside, I do like the idea of modifying, or eliminating, the use of GPA as a measuring stick.

 

Inside Amazon’s $3.5 Million Competition to Make Alexa Chat Like a Human

A 20-minute conversation with a robot sounds amazing and awful at the same time.

 

Blockchain explained

A visual representation to help you understand that Blockchain is a linked list with horrible latency and performance.

 

Unbreakable smart lock devastated to discover screwdrivers exist

I have no words.

 

The Death of Supply Chain Management

You can swap out the subject of this article for almost any other and the theme is the same: humans of today need to be prepared for the machines of tomorrow that will be taking their jobs.

 

Ancient Earth globe shows where you were located 750 million years ago

Because I’m a geek who loves things like this and I think you should, too.

 

TFW you go to get a cup of coffee and find your Jeep twin:

 

No matter how much automation, redundancy, and protection you build into your systems, thing are always going to break. It might be a change breaking an API to another system. It might be a change in a metric. Perhaps you just experienced massive hardware failure. Many IT organizations have traditionally had a postmortem, or root cause analysis, process to try to improve the overall quality of their processes. The major problem with most postmortem processes is that they devolve into circular pointing matches. The database team blames the storage team, who in turn blames the network team, and everyone walks out of the meeting angry.

 

As I’m writing this article, I’m working a system where someone restarted a database server in the middle of a large operation, causing database corruption. This is a classic example of an event that might trigger a postmortem. In this scenario, we moved to new hardware and no one tested the restore times of the largest databases. This is currently problematic, as the database restore is still happening a few hours after I started this article. Other scenarios would be any situations where you have unexpected data loss, on-call pages, or a monitoring failure that didn’t capture a major system fault.

 

How can we do a better postmortem? The first thing to do is execute blameless postmortems. This process assumes that everyone involved in an accident had good intentions and executed with the right intentions based on available information. This technique originates in medicine and aviation, where human lives are at stake. Instead of assigning blame to any one person or team, the situation is analyzed with an eye toward figuring out what happened. Writing a blameless postmortem can be hard, but the outcome is more openness in your organization. You don’t want engineers trying to hide outages to avoid an ugly, blame-filled process.

 

Some common talking points for your postmortems include:

 

  • Was enough data collected to gather the root cause of the incident?
  • Would more monitoring data help with the process analysis?
  • Is the impact of the incident clearly defined?
  • Was outcome shared with stakeholders?

 

In the past, many organizations did not share a postmortem outside of the core engineering team. This is a process that has changed in recent years. Many organizations like Microsoft and Amazon, because of the nature of their hosting businesses, have made postmortems public. By sharing with the widest possible audience, especially in your IT organization, you can garner more comments and deeper insights into a given problem.

 

One scenario referenced in Site Reliability Engineering by Google is the notion of integrating postmortems into disaster recovery activities. By incorporating these real-world failures, you make your disaster recovery testing as real as possible.

 

If your organization isn’t currently conducting postmortems, or only conducts them for major outages, you might start to think about trying to introduce them more frequently for smaller problems. As mentioned above, starting with paged incidents is a good start. It gets you to start thinking about how to automate responses to common problems and helps ensure that the process can be followed correctly so that when a major issue occurs, you're not focused on how to conduct the postmortem, but instead on how to find the real root cause of the problem.

By Paul Parker, SolarWinds Federal & National Government Chief Technologist

 

Cloud and hybrid IT will be one of the top five most important technologies for U.K. public sector IT professionals in the next three to five years. This is the view held by 88% of the industry surveyed in the SolarWinds IT Trends Report 2018.

 

Further insights from the report uncover that, even with challenges implementing cloud services, the sector remains positive about the opportunities the cloud presents, both today and in the near future.

 

Despite cloud and hybrid IT representing one of the top challenges in rollout and performance (with 62% of respondents ranking it in the top three challenges), the public sector continues to see the benefits of the cloud, primarily for creating efficiencies (72%*), ROI (81%*), and productivity (79%*).

 

Barriers to success

 

At the same time, over half (58%) of public sector respondents say their IT systems are not performing at optimum levels, and a further quarter (23%) are not sure. Three-quarters (73%) of public sector IT professionals spend more than 25% of their time reactively working to optimize performance, while just 47% spend a similar amount of time proactively optimizing. When asked what was causing barriers to performance, nearly half (44%) cite inadequate infrastructure, while a staggering 56% cite a lack of organizational strategy.

 

“With these results, we see continued public sector commitment to the cloud and hybrid IT,” commented Paul Parker, chief technologist of federal and national government, SolarWinds. “However, what is most striking for me is the amount of time IT professionals spend retroactively fixing their systems. I would liken it to trying to fix a car as you drive it down the motorway. Trying to maintain systems while they are still in use is near-impossible, and ineffective in the long run. Without having the time, strategy, and budget to get core systems working first, it is no surprise that public sector entities are taking longer to adopt and benefit from the Government’s Cloud First policy than perhaps was originally hoped.”

 

The promise of emerging technologies

 

Alongside their focus on the cloud, public sector respondents also voice interest in emerging technologies. While automation and AI prove more popular among for-profit organizations than the public sector, there is still strong interest in embracing these advances. Over half (56%) rate automation as a top three technology with the greatest potential of improving productivity and efficiency, and a third (33%) see the same potential in AI.

 

When it comes to Software-defined Everything (SDx) and big data analytics, the public sector is more optimistic than its for-profit counterparts; over half (56%) of public sector respondents think SDx is one of the top three technologies in terms of ROI potential, while a similar percentage (51%) rank big data analytics among the top three technologies with the greatest potential to provide productivity and efficiency benefits. This is compared to equivalent responses from for-profit organizations, where just 17% rate SDx and 34% rate big data analytics similarly.

 

Further findings from the research are as follows:

 

  • 77% of public sector respondents think the cloud is the most important technology in their IT strategy today, compared to 62% of U.K. for-profit organizations
  • 72% of public sector respondents consider the cloud to be one of the top three technologies for creating and increasing efficiencies, and 79% consider it among the top three in terms of productivity benefits
  • In the next 3-5 years, the public sector expects AI to be one of the top five most important technologies in their strategy, according to 51% of respondents
  • 81% rate cloud and hybrid IT as one of the top three technologies in terms of ROI potential, compared to just 60% of for-profit organizations

 

*percentage of U.K. public sector respondents who rated the cloud as one of the top three technologies with the greatest potential in these areas

 

Find the full article on Open Access Government.

Two weeks ago, I talked about upgrading storage systems. More specifically, how to determine the right upgrade interval for your storage systems. What the previous post did not cover is the fact that your storage system is part of a larger ecosystem. It attaches to a SAN and LAN, and is accessed by clients. A number of technologies such as backup, snapshots, and replication protect data. Monitoring systems ensure you are aware of what is going on with all the individual components. This list can go on and on…

 

Each component in that ecosystem will receive periodic updates. Some components are relatively static: a syslog server may receive a few patches, but it is unlikely that these patches fundamentally change how the syslog server works. Moreover, in case of a bad update, it is relatively easy to roll back to a previous version. As a result, it is easy to keep a syslog server up to date.

 

Other components change more frequently or have a larger feature change between patches. For example, hyper-converged infrastructure, which is still a growing market, receives many new features to make it more attractive to a wider audience. It is more of a gamble to upgrade these systems: new features might break old functions that your peripheral systems rely on.

 

Finally, do not forget the systems that are a spider in the web, like hypervisors. They run on hardware that needs to be on a compatibility list. Backup software talks to it, using snapshots or things like Changed Block Tracking to create backups and restore data. Automation tools will talk to them to create new VMs. Plus, the guest OS in a VM receives VM hardware and tool upgrades. These systems are again more difficult to upgrade, just because there’s so many aspects of the software exposed to other components of the ecosystem.

 

So how can you keep this ecosystem healthy, without too many “Uh-oh, I broke something!” moments with the whole IT stack collapsing like a game of Jenga?

 

Reading, testing, and building blocks

Again, read! Release notes, compatibility lists, advisories, etc. Do not just look for changes in the product itself, but also for changes to APIs or peripheral components. A logical drawing of your infrastructure helps: visualize which systems talk with other systems.

 

Next is testing. A vendor tests upgrade paths and compatibility, but no environment is like your own. Test diligently. If you cannot afford a test environment, then at the very least test your upgrades on a production system of lesser importance. After the upgrade, test again: does your backup still run? No errors? We had a “no upgrades on Friday afternoon” policy at one customer: it avoids having to pull a weekender to fix issues, or missing three backups because nobody noticed something was broken.

 

As soon as you find the ideal combination of software and hardware versions, create a building block out of it. TOGAF can help: it is a lightweight and adaptable framework for IT architecture. You can tailor it to your own specific needs and capabilities. Moreover, you do not need to do “all of it” before you can reap the benefits: you can pick the components you like.

 

Let us assume you want to run an IaaS platform. It consists of many systems: storage, SAN, servers, hypervisor, LAN, etc. You have read the HCLs and done the required testing, so you are certain that a combination of products works for you. Whatever keeps your VMs running! This is your solution building block.

 

Some components in the solution building block could need careful specification. For example Cisco UCS firmware 3.2(2d) with VMware ESXi 6.5U1 needs FNIC driver Y. Others are more loosely specified: syslogd, any version.

 

Next, track the life cycle of these building blocks, starting with the building block that you’re currently running in production: the active standard. Think ESX 6.5U1 with UCS blades on firmware 3.2(2d) on a VMAX with SRDF/Metro for replication and Veeam for backup and recovery. Again, specify versions or version ranges where required.

 

You might also be testing a new building block: the proposed or provisional standard. That could be with newer versions of software (like vSphere 6.7) or different components. It could even be completely different and use HCI infrastructure such as VXRail.

 

Finally, there are the old building blocks, either phasing-out or retired. The difference between these versions is the amount of effort you will put in upgrading or removing them from your landscape. A building block with ESXi 5.5 could be “phasing-out” in late 2017, which means you will not deploy new instances of it, but you also do not actively retire it. Now though, with the EOL of ESXi 5.5 around the corner, that building block should transition to retired. You need to remove it from your environment because it is an impending support risk.

 

By doing the necessary legwork before you upgrade, and by documenting the software and hardware versions you use, upgrades should become less of game of Jenga where one small upgrade brings down the entire stack.

I attended the 2018 American SAP User Group (aka ASUG) SAPPHIRE NOW conference in Orlando this past week. This was the largest conference ever in its 25+ year history with over 21,000 SAP customers attending. Throw in SAP employees, vendors, exhibitors, and so forth, the overall attendance exceeded 30,000. What an amazing time. I wrote about my experiences during last year’s conference, “When Hasso Plattner Speaks, Everyone Listens” and I kept the momentum going this year. The theme of this year’s conference, and this applies to all business innovation and not just SAP customers, is the “Customer Revolution.”

 

The evolution of ecommerce and online retail that we all have been witnessing over the past 20 years has driven this customer revolution. Traditional customer buying trends have been uprooted and cast aside, and brand loyalty hardly exists anymore. Customers have at their fingertips the power to alter entire industries and start and stop trends in days if not hours. Much of this change is evident during the comparisons of the top 20 businesses 10 years ago versus today. Google, Amazon, and many others are near the top when 10 years ago they were barely top 50.  So how does this apply to us IT professionals? I will forgo the typical, “IT Is Always Changing Like No Other Time Before, And So Should You!” article (I’ve been reading those articles for 25 years). Instead, what dawned on me during this conference while listening to so many amazing keynote speakers (SAP CEO Bill McDermott, Olympian Lindsey Vonn, Jon Bon Jovi, Condeleeza Rice, President Barack Obama, and too many more to list) is IT’s role in maximizing on this revolution.

 

For one of my presentations during this conference, I spoke about how a specific SAP report designed to identify innovation and business transformation opportunities within your SAP ERP landscape can be used as a conversation starter with your business stakeholders on their participation in upcoming key IT initiatives and projects. I’ve given this presentation several times already this year and I would say that my audience has been predominantly IT.  After my last presentation, I was approached by someone in Finance from a popular global shoe brand. Pym attended my presentation looking for inspiration on how to engage his company’s IT to get their participation on Finance’s initiatives and projects. It appears that Pym’s IT department is reluctant to listen. Honestly, I was caught off guard by this juxtaposition and I found myself with little in the ways of advice to offer Pym. So this had me thinking… what is IT’s role?

 

There are many tangible factors to this evolved IT landscape: virtualization, cloud, mobility, agile, etc. But these are all really only logical conclusions to trying to meet the demands of an insatiable customer base, whether they are internal or external. No matter how much faster, smarter, predictive IT products and services become, the customer will always expect more from them. This journey to IT nirvana never ends. Your accomplishment of reaching the summit of one complex IT project is only followed by the prospect of another, and another after that. I’ve been in IT in various capacities for 25 years. There are two absolutes that I have come to realize: IT will always have more work than they can handle, and the customer’s love affair with technology is tumultuous at best.

 

So back to Pym. I am sure his company’s IT is really busy and is resource constrained. Honestly, what department isn’t? So I asked Pym, “Does your IT view themselves as a customer-focused group?” Pym laughed. “No. But not for lack of trying to work hard. They are busy doing their things and working within their processes.” Pym replied. I followed up with questions on the company’s culture, IT’s mission as defined by the CIO, and his opinion of how well IT communicates. Pym’s responses were typical of a customer that is on the outside and disconnected. Pym’s perception was that his IT was failing to meet the demands of their customers. How does this change? “Change comes from the top!” is the standard response. But that type of culture shift takes valuable time if the staff is looking to the CIO to drive it. Each savvy, career-minded IT professional needs to recognize the Customer Revolution and the importance of being ahead of it.

The IT realm serves as the conduit for business transformation. Technology fuels automation, the analysis of huge volumes of data to identify opportunities which leads to unimaginable improvements in productivity (take a quick pause and reminisce how you were working, and living, 22 years ago), and positive outcomes that continue to move our planet forward. Your role as that savvy, career-minded professional is to take your company’s mission statement to heart. Know your customers and be empathetic to their situation. Be creative in your solutions to solving their problems and removing their roadblocks to success. If you find yourself working in an IT department that is “…too busy doing their things and working within their processes,” then I hate to tell you this, that department is not customer focused and not aligned with the company’s goals. Be a positive voice for change and customer advocate in your department. Improve your critical thinking skills and your emotional IQ. As my former high school principal often said during his morning announcements, “If it is to be it is up to me!” Take ownership of issues, turn them into opportunities, and work for your customers.

 

I realize that this is a lot to ask of you without appearing to ask anything of your customers. Take comfort in the fact that you will be rewarded in knowing your customers while being an active participant in this customer revolution. I will end it this with some inspiring quotes from SAP CEO Bill McDermott, “Innovate your next move! Change has never moved this fast and it will never move this slowly again.” And perhaps my favorite, “We cannot let anxiety detract from opportunity. The new wave of growth in this economy will be at the intersection at the speed of the machines and judgement of the human.” Now get out there and know your customers. Allow their success to be your success. Viva la Revolucion!

Welcome to another edition of The Actuator! This week we are talking about drones, data centers, and why no one cares about your emails.

 

As always, here are some links from the Intertubz that I hope will hold your interest. Enjoy!

 

Why Microsoft Wants to Put Data Centers at the Bottom of the Ocean

No word yet on how to do maintenance for these units, but I suspect they would simply raise it out of the water. It seems crazy now, but in 30 years this could be the norm.

 

Yahoo Messenger Will Be Discontinued on July 17, 2018, After 20 Years of Service

I didn’t even realize this was still a thing. Once upon a time, Yahoo Messenger was a corporate standard. Too bad Yahoo didn’t invest in the platform a bit more; they could have invented Slack a decade early.

 

The damage from Atlanta’s huge cyberattack is even worse than the city first thought

NARRATOR’S VOICE: “It was even worse than they second thought, too.”

 

Drones are now being trained to spot violent people in crowds

As long as the drones aren’t collecting data about the individuals, I have no problem with using machines to detect possible violence in crowds. It’s no different than police using CCTV to monitor a crowd. If it helps keep people safe, sign me up.

 

The Race to Build Autonomous Delivery Robots Rolls On

Now let’s imagine that same drone that spots violent acts in crowds can deliver pizza and beer to your seat. Welcome to the future of multi-tasking.

 

Genealogy database used to identify suspect in 1987 homicide

The suspect didn’t give his DNA to the genealogy website, but relatives did. I am both impressed, and horrified, by this detective work. DNA is not 100% accurate, it is just another piece of circumstantial evidence.

 

The Most Extreme Out-of-Office Message

As a person who dislikes email, and abhors OOO messages, this article hammers home all the reasons that electronic messaging is destroying lives each and every day.

 

The future for DBAs is closer than you think:

Malware prevention is a very hot topic due to the recent ransomware attacks that have completely crippled several companies and organizations. For most smaller companies, being able to hire a full-time security engineer is a pipe dream at best, and even larger companies just don't see the need to spend money on a dedicated security resource. The need for malware prevention is undeniable given the number of threats that are out there on the internet and the incentive that hackers have to continue to create more and more malware. Malware prevention and endpoint protection software at an enterprise level have traditionally been cumbersome and unwieldy to manage and maintain.

 

The benefits of managed malware prevention are:

  1. Simplified Deployment - Malware prevention management solutions are notorious for being a pain to install and get started. This becomes less desirable based on the fact that malware prevention is a true business value add in the eyes of most on the business side of a company.
  2. Simplified Scalability - One of the challenges with managing malware prevention or endpoint protection servers is the storage for packages and maintaining the database. Leveraging the SaaS offering can offload that work and allow management of 10,000 hosts to be similar to that of a back-end environment of 10 hosts. Not having to worry about re-architecting the deployment when reaching a certain number of nodes is a major win for operational efficiency.
  3. Faster Access to Technological Advancements - Malware prevention is a constant game of playing catch-up to prevent the latest form of malware that is smarter and more advanced than the last one. SaaS offerings enable administrators faster access to advances that security companies create. An example is a more advanced threat detection engine that utilizes machine learning but would require an upgrade for non-SaaS implementations.

 

Managed Deployment

The following solutions are managed deployments. This means the malware prevention management software company has added a deployment solution to the respective cloud provider's marketplace to allow the infrastructure to be provisioned with the click of a button.

 

     Palo Alto Networks VM-Series Next-Generation Firewall Bundle

     Palo Alto Networks VM-Series Next-Generation is a virtual instance that is deployed as a traditional perimeter firewall just like the Palo Alto firewall and includes the ability to detect and prevent malware at the network level. The firewall is available in both the AWS and Azure marketplaces.

 

     Fortinet FortiGate Next-Generation Firewall

     Fortinet FortiGate Next-Generation Firewall Bundle is a virtual instance that is typically deployed as a traditional perimeter firewall, but the bundle also includes the ability to detect and prevent malware at the network level. The firewall is also available in both the AWS and Azure marketplaces.

 

SaaS Deployment

The following solutions are Software as a Service (SaaS) deployments. This means the malware prevention management software company hosts the software for its customers.

 

     Trend Micro Deep Security as a Service

     Trend Micro Deep Security is an agent-based Software as a Service (SaaS) solution that supports dynamic inventory validation for AWS EC2 instance workloads and can be used to secure other native AWS services like WAF and Inspector.

 

     Symantec Cloud Workload Protection

     Symantec Cloud Workload Protection is an agent-based Software as a Service (SaaS) solution that supports instances as well as containers. Cloud Workload Protection supports dynamic inventory/discovery for the three major public clouds (AWS, GCP, Azure).

 

     Symantec Cloud Workload Protection for Storage

     Symantec Cloud Workload Protection for Storage is a Software as a Service (SaaS) based deployment that is used to scan AWS S3 buckets for malicious objects. This solution integrates with Symantec Protection Engine to provide a single management interface for both malware protection of AWS EC2 instances as well as AWS S3 buckets.

 

     Arc4dia SNOW

     Arc4dia SNOW is a lightweight endpoint detection/response sensor that feeds data into the SNOW Cloud where advanced anomaly detection and deep analysis are performed on the data gathered from the sensors.

 

Malware prevention, much like many of the other topics covered in this Cloud Native Operational Solutions series, is an area of IT that very often goes overlooked and uncared for until there is a major crisis that brings the organization to their knees due to a system outage.

Disaster recovery can be a complicated beast to tame. If you’re attempting to recover your IT infrastructure from some kind of disaster, you’re already faced with a number of challenges. First, there’s been an event that’s serious enough to warrant disaster recovery. Whether it’s flood, a plague of locusts, or someone cutting the wrong fibre cables, things aren’t going swimmingly. As a result, people are going to be a little stressed out. Invariably, there will be a lot to contend with, not just from an infrastructure perspective, but also from a people and process point of view. If there’s been a disaster, people will be worried about their families, possibly their homes. Some staff may not be able to make it to the recovery site (if you have one) to help with the recovery.

 

You’ll also likely be faced with the reality that your data centre is a complicated environment with a lot of moving parts. Some of that technical debt that you hoped wouldn’t be a problem any time soon has come back to bite you. And when was the last time you tested your recovery process? Did you failover a few non-production workloads during the day and run a few ping tests? That’s probably not enough to see how things really behave in a disaster.

 

I’m a big fan of trying to keep things simple. But data centre operations aren’t always a simple thing. And the recovery of a data centre is usually less so. You can help yourself by leveraging event monitoring tools to understand your progress at any given time, and how far you've still got to go. It seems odd that something as simple as syslog could be useful in the event of a disaster. But keep in mind you have a whole lot of moving pieces, slightly stressed out staff members, and various upset business units to deal with. A tool like syslog can provide insights into what has happened previously as well as what point you’re up to in the recovery process. It’s one thing to follow a checklist or run sheet during a recovery. It’s another thing to be able to validate your progress during the recovery by looking at actual log files generated by hosts and applications coming back online. In my opinion, this is why leveraging tools such as syslog and SNMP is so critical to achieving a level of sanity when managing and operating data centre infrastructure.

 

Beyond validation, tools like these give you a way to prove to concerned business units, other infrastructure staff, and (potentially) vendors what happened with the infrastructure. This is particularly useful when a recovery activity has gone awry and people are left scratching their heads as to why that’s the case. Grabbing the current bundle of logs after the machine has come up is one thing, but if you can go back through the events of the last few hours, there may be some additional insights that can be had.

 

Disaster recovery is no fun at the best of times, and people rightfully try to avoid it if they can. But you can make things at least a little easier on yourself and your business by investing time and effort in tools that can give you the right information when you need it most. Sure, it’s a bad thing that your primary data centre is under 3 feet of water now, but at least you’ll have some clarity about what happened when your applications came up for air in your secondary data centre.

By Paul Parker, SolarWinds Federal & National Government Chief Technologist

 

Every federal IT pro understands the importance of network monitoring, systems management, database performance monitoring, and other essential functions. The IT infrastructure must be working optimally to ensure overall performance.

 

What about application performance?

 

The reality is, even with a lightning-fast infrastructure, if application performance is poor, then end-users will have a poor experience. Proper application performance management (APM) is vital for identifying application performance issues and ensuring that applications maintain an expected level of service.

 

Let’s discuss the five most important elements of application performance management.

 

The Five Elements

 

End-User Experience Monitoring

 

This should be the primary focus of a federal IT pro’s APM efforts. End-user experience monitoring tools gather information on the user’s interaction with the application and help identify any problems that are having a negative impact on the end-user’s experience.

 

As government has embraced the cloud, it’s important to find a tool that can monitor both on-premises and hosted applications. It’s also helpful to consider a tool that allows for instant changes to network links or external servers if either are compromising the end-user experience.

 

Runtime Application Architecture Discovery

 

This piece of APM looks at the hardware and software components involved in application execution to help pinpoint problems and establish their scope.

 

With the complexity of today’s networks, discovering and displaying all the components that contribute to application performance is a hefty task. As such, it is important to choose a monitoring tool that provides real-time insight into the application delivery infrastructure. The best tools will also visualize this application architecture on the same console that provides insight to the end-user experience.

 

User-Defined Transaction Profiling

 

Understanding user-defined transactions as they traverse the architecture will help ensure two things. First, it will allow federal IT pros to trace events as they occur across the various components. Second, it will provide an understanding of where and when events are occurring, and whether they are occurring as efficiently as possible.

 

Component Deep-Dive Monitoring

 

This step provides an in-depth understanding of the components and pathways discovered in previous steps. In a nutshell, the federal IT pro conducts in-depth monitoring of the resources used by, and events occurring within, the application performance infrastructure.

 

Analytics

 

APM analytics tools allow federal IT pros to:

 

• Set a performance baseline that provides an understanding of current and historical performance, and set expectations for a “normal” application workload

• Quickly identify, pinpoint, and eliminate application performance issues based on historical baseline data

• Anticipate and mitigate potential future issues through actionable patterns

• Identify areas for improvement by mapping infrastructure changes to performance changes

 

Conclusion

 

When choosing APM tools, consider the agency’s current technical environment. A lag in performance necessitates a prompt response. Be sure your APM tools provide continuous monitoring as well as real-time, transaction-oriented information.

 

It is equally important to choose a set of APM tools that integrate with one another. Having visibility across all pieces of the application environment is critical to having a complete understanding of application performance and helping ensure application optimization.

 

Find the full article on our partner DLT’s blog Technically Speaking.

Leon Adato

Footloose at CLUS

Posted by Leon Adato Expert Jun 8, 2018

CiscoLive! US ("CLUS") is literally right around the corner, set to open in sunny Orlando in just a couple of days. So it's time for me to run down the things I'm hoping to see and do while I'm hanging with 24,000+ of my closest friends and associates!

 

First, based on the recent enhancements to Network Insight in NPM and NCM, I've got a solid reason to dive deep into Nexus technology and see what treasures are there for me to find. As a monitoring engineer, I find that I often approach new technology "backward" that way--I'm interested in learning more about it once I have the capability to see inside. So now that the world of VDCs, vPCs, PACLs (port-based ACLs), VACLs (VLAN-based ACLs), etc. are open to me, I want to know more about it.

 

And that takes me to the second point. I'm really interested to see the reaction of attendees when we talk about some of the new aspects of our flagship products. The scalability improvements will definitely satisfy folks who have come to our booth year after year talking about their super-sized environments. If folks aren't impressed with the Orion Mapping feature, I think I'll check for a pulse. Orion Service Manager is one of those hidden gems that answers the question "who's monitoring my monitoring?" And by the end of the show, Kevin and I will either have the "Log" song fully harmonized, or our co-workers will have us locked in a closet with duct-tape over our mouths. This, of course, in honor of the new Log Monitor tool (Log Manager ).

 

Something that has become more and more evident, especially with the rise of Cisco DevNet, is the "intersectionality" of monitoring professionals. Once upon a time, we'd go to CiscoLive and talk to folks who cared about monitoring and cared about networks (but didn't care so much about servers, applications, databases, storage, etc.). We'd go to other conventions, such as Microsoft Ignite, and talk about folks who cared about monitoring and cared about applications/servers (but didn't care as much about networks, etc.).  Now, however, the overlap has grown. We talk about virtualization at SQL Saturdays. We discuss networking at Microsoft Ignite. And we talk about application tracing at CiscoLive. Or at least, we've started to. So one of the things I'm curious about is how this trend will continue.

 

Another theory I want to test is the pervasiveness of SDN. I'm seeing more of it "in the wild" and while I believe I understand what's contributing, I'm going to hold that card close to my chest just now until after CiscoLive 2018 is over. We'll see if my theory tests out as true.

 

Believe it or not, I'm excited to talk to as many of the 24,000 attendees as I can. As I wrote recently, meeting people and collecting stories is one of the real privileges of being a Head Geek, and I'm looking forward to finding so many people and stories in one place.

 

On the other side of the convention aisle, I'm also looking forward to hanging out with all my SolarWinds colleagues in an environment where we're not all running from meeting to meeting and trying to catch up during lunch or coffee breaks. Sure, we'll all be talking to folks (if past years are any indication, more or less non-stop). But in those quiet moments before the expo floor opens or when everyone has run off to attend classes, we'll all have a chance to re-sync the way that can only be done at conventions like this.

 

Speaking of catching up, there's going to be a SWUG again, and that means I'll get to meet up with SolarWinds users who are local to the area as well as those who traveled in for the convention. SWUGs have become a fertile ground for deep conversations about monitoring, both the challenges and the triumphs. I'm looking forward to hearing about both.

 

And then there's the plain goofy fun stuff. Things like Kilted Monday; folks risking tetanus as they dig through our buckets of buttons for ones they don't have yet (there are three new ones this year, to boot!); roving bands of #SocksOfCLUS enthusiasts; and more.

 

I'm just relieved that my kids are going to lay off the shenanigans this year. They caused quite a stir last year, and I could do without the distraction of mattress-surfing, blowtorch-wielding, chainsaw-swinging teenagers at home.

   

When it comes to networking specifically, software-defined networking is a model in which the use of programmability allows IT professionals to increase the performance and their ability to accurately monitor the network. This can be also seen in server environments as well. By harnessing the ability to program specific custom modules and applications, users are able to take the standard functions of their systems and drastically increase the range of what they are able to do. These abilities can generally be summarized into three major areas: monitoring, management, and configuration.

 

Monitoring

 

Monitoring networking and other systems performance and uptime is something that admins and engineers alike are no stranger to. In the past, monitoring tools were limited to using ICMP to detect if a system or device was still on the network and accessible. Software-defined IT expands the possibilities of your monitoring. This can be done with a standard, modern monitoring toolset or something that you custom code yourself. Here’s an example. Say you have a web application. It remains accessible via the network with no interruption. A database error pops up, causing the application to crash. But the device itself is still online and responding to ICMP traffic. By using a software-defined mentality, you can program tools to check for an HTTP response code or if a certain block of code loaded on a web application. This information would be far more valuable than simply if a device responded to ICMP traffic. Admins could even program their scripts to restart an application service if the application failed, potentially taking the need for human intervention out of the loop.

 

Management

 

This is an example that I think is already here in a lot of networks today. Take the virtual machine environments that a lot of enterprises utilize. The software systems themselves can manage virtual servers without the need for human intervention. If a physical server becomes overloaded, virtual servers can be moved to another physical host seamlessly based on preset variables such as CPU and memory usage. Using software-defined techniques allows for the management of systems and devices to take place without the need for an admin to A., recognize the issue, and B., respond accordingly. Now, in the time that it would take for an admin to recognize an issue, the system can respond to an issue automatically with preconfigured response actions.

 

Configuration

 

The last category that software-defined techniques can help admins and engineers with is configuration. Here’s another example. Your company is now using a new NTP server for all of your network devices. Normally you would be responsible for logging in to every device and pointing them to the new server. With the modern software-defined networking tools that are available, admins can send the needed commands to each network device with very little interaction needed. Tasks like this that would potentially take hours, depending on the number of devices, can now be done in minutes.

 

The fact is, admins and engineers are still responsible for the ultimate control of everything. The tools that are available cannot run on their own, without some help. During initial configuration and programming of any scripts that are needed, the admins and engineers are responsible for knowing the ins and outs of their systems. Because of this, the common argument that software-defined IT will cut the need for jobs can be easily negated by this fact. Anybody that has configured a software-defined toolset can attest to this. These tools are simply there to streamline and assist with everyday operations.

In a distributed tracing architecture, we need to define the microservices that work inside it. We also need to distinguish the “component” behavior from the “user” behavior and experience – similar words, but totally different concepts.

We should think of the multitude of microservices that constitute a whole infrastructure. Each of these microservices keeps a trace of what it’s doing (behavior) to provide it to the next microservice, and so on, so that the whole design doesn’t get confused in the middle.

Let's also look at a user behavior: dividing an application in microservices, each of them could modify its behavior according to the habits of the user and improve their experience using the application.

 

Component and user behaviour

Imagine this as a self-learning platform: microservices learn behavior from the user habits and will change behavior according to those user habits.

Let’s imagine at an e-commerce website with its own engine. It will be composed of microservices like product displays, product suggestions, payment management, delivery options, and so on. Each of these microservices will learn from the users browsing the site. So, the microservice proposing suggestions will understand from user input that he’s not interested in an eBook, but instead prefers traditional books. The microservice passes this info to the next microservice that will compose the showcase of traditional books. The user will choose to pay by PayPal, and not credit card, so next time, the microservice will set as default payment option to PayPal and not display credit card options. Last, after the decision of where to deliver the item and how (mail, courier), the related microservices will be activated, with the default address and last used delivery method.

 

Tracing methodology

To achieve this user experience, every microservice must get info from the previous one and send its info to the next one: microservice behavior. This architecture has another benefit too: every action is more agile, not monolithic, because the system will automatically use only the service required. The system does not need to parse and query all the options that it could offer. This is mostly the case of a SQL query: in the previous example, the DB will be split in many tables instead of just few, so the query will be performed only on the smaller table assigned to that particular microservice.

The distributed trace is performed using traces, of course, but also spans. The first one is based on the request received by the microservice from the “previous” module, in a sequential mode, and actively sending this to the next module. A span is a part of that trace and keeps information about every single activity performed in the previous module in a detailed way.

 

User’s experience

Let's look at the user’s experience. Segmenting an application in a multitude of services makes the troubleshooting work simpler. Developers can get a better understanding of which of the services is responsible for a slow response to the user. This allows them to decide if they should be upgrading and working on that service, or maybe recode it. Anomalies and poor performances are quickly spotted and solved. Tracing is much more important in cases of wide architectures, such as microservices working on different sessions or, worse, in different hosts.

We can finally consider the distributed tracing as a “logging” method to offer a better user experience in a shorter time, based either on its component or the user choices behavior.

The big news this week is Microsoft agreeing to purchase GitHub for $7.5 billion USD. Microsoft continues to push into an area where it can be viable for the next 20 years or more. Amazon and Microsoft are slowly cornering the marketing in infrastructure hosting and development, leaving Google and Apple behind.

 

As always, here are some links from the Intertubz that I hope will hold your interest. Enjoy!

 

What You Should Know About GitHub—And Why Microsoft Is Buying It

A quick summary to help make sense of how and why this deal happened. Seriously, it’s the right move for all parties involved. Microsoft now has the opportunity to position themselves as the number one maker of software development tools for decades to come.

 

Microsoft is now more valuable than Alphabet

In case you did not hear the news, Microsoft is now the third-largest company in terms of market capitalization, trailing only Apple and Amazon. Not bad for a company that can’t make a smartphone. Clearly they must be doing something right.

 

Coca-Cola Suffers Breach at the Hands of Former Employee

Yet another example of a former employee dealing with personal employee data. Corporations need to have strict policies and guidelines on removing access from employees the moment notice is given.

 

Mercedes Gone In 20 Seconds As Thieves Use Keyless Signal Cloning Tech

I’m surprised that this is possible with a late model car. Keyless entry has been around for a long time, and I could understand this vulnerability existing for an older model. But a late model Mercedes should be better than this.

 

An Alphabet spinoff company can cut a home’s energy bills by digging a deep hole in the backyard

I wish our country would spend more time and effort on making homes efficient. I’ve been looking into solar panels for my home, and now I’m looking into geothermal heating and cooling.

 

Woman found guilty of distracted driving despite claiming she was checking the time on her Apple Watch

And yet, she had to check multiple times. I would guess she was using the watch to do more than just check the time. Maybe next time she will check for a decent excuse.

 

Price’s Law: Why Only A Few People Generate Half Of The Results

Similar to Pareto’s Principle, Price’s Law talks about how only a few people are responsible for the majority of the output for the entire group. I am fascinated by this and will continue to look for examples (and counter-examples) in everything.

 

A cheat sheet for you to decipher GitHub comments:

“The price of reliability is the pursuit of the utmost simplicity.” C.A.R. Hoare, Turing Award lecture.

 

Software and computers in general are inherently dynamic and not of a state of stasis. The only way IT, servers, software, or any other thing that has 1s and 0s on it can be perfectly stable is if they exist in a vacuum. If we think about older systems that were offline, we frequently had higher levels of stability--the exchange for that is fewer updates, new features, and longer development and feedback cycles, which meant you could wait years for simple fixes to a relatively basic problem. One of the goals of IT management should always be to keep these forces in check--agility and stability.

 

Agile’s Effect on Enterprise Software

 

Widespread of adoption of Agile frameworks across development organizations has meant that even enterprise focused organizations like Microsoft have shortened release cycles on major products to (in some cases) less than a year, and if you are using cloud services, as short as a month. If you work in an organization that does a lot of custom development, you may be used to daily or even hourly builds of application software. This causes a couple of challenges for traditional IT organizations in supporting new releases of enterprise software like Windows or SQL Server, and also supporting developers in their organization who are employing continuous integration/continuous deployment (CI/CD) methodologies in their organizations.

 

How This Changes Operations

 

First, let’s talk about supporting new releases of enterprise software like operating systems and relational database management systems (RDBMS). I was recently speaking at a conference where I was asked, “How are large enterprises who have general patch management teams supposed to keep up with a monthly patch cycle for all products?” This was a hard answer to deliver, but since the rest of the world has changed, your processes need to be changed along with them. Just like you shifted from physical machines to virtual machines, you need be able to adjust your operations processes to deal with more frequent patching cycles. It’s not just about the new functionality that you are missing out on. The array and depth of security threats means software is patched more frequently than ever, and if you aren’t patching your systems, your systems are vulnerable to threats from both internal and external vectors.

 

How Operations Can Help Dev

 

While as an admin I still get nervous about pushing out patches on the first day, the important thing is to develop a process to apply updates in near real-time to dev/test environments, with automated error checking, to then relatively quickly move the same patches into QA and production environments. If you lack development environments, you can patch your lower priority systems first, before moving on to higher priority systems.

 

Supporting internal applications is a little bit of a different story. As your development teams move to more frequent build processes, you need to maintain infrastructure support for them. One angle for this can be to move to a container-based deployment model--the advantage there is that the developers become responsible for shipping the libraries and other OS requirements their new features require, as they are shipped with the application code. Whatever approach you take, you want to focus on automating your responses to errors that are generated by frequent deployments, and work with your development teams to do smaller releases that allow for easier isolation of errors.

 

Summary

 

The IT world (and the broader world in general) has shifted to a cycle of faster software releases and faster adoption of features. This all means IT operations has to move faster to support both vendor and internally developed applications, which can be a big shift for many legacy IT organizations. Automation, smart administration, and more frequent testing will be how you make this happen in your organization.

By Paul Parker, SolarWinds Federal & National Government Chief Technologist

 

Hybrid IT presents SecOps challenges

 

The Department of Defense (DoD) has long been at the tip of the spear when it comes to successfully melding IT security and operations (SecOps). Over the past few decades, the DoD has shown consistent leadership through a commitment to bringing security awareness into just about every facet of its operations. The growing popularity of hybrid IT poses a challenge to the DoD’s well-honed approach to SecOps.

 

An increasing number of public sector agencies are moving at least some of their services and applications to the cloud while continuing to maintain critical portions of their infrastructures on-site. This migration is hampered by increased security concerns as agency teams grapple with items like the disconcerting concept of relinquishing control of their data to a third party, or documenting a system access list without knowing everyone behind the cloud provider’s infrastructure.

 

Here are five strategies teams can employ to help ensure balance and maintain the DoD’s reputation as a model for SecOps success.

 

Foster an agency-wide commitment to high security standards

 

The secure-by-design concept does not just apply to the creation of software; it must be a value shared by workers throughout the agency. Everyone, from the CIO down, should be trained on the agency’s specific security protocols and committed to upholding the agency’s high security standards.

 

Establish clear visibility into hybrid IT environments

 

Gaining clear visibility into applications and data as they move on- and off-premises is essential. Therefore, agencies should employ next-generation monitoring capabilities that allow SecOps teams to monitor applications wherever they may be. Tools can also be used to help ensure that they have established the appropriate network perimeters and to keep tabs on overall application performance for better quality of service. System and application monitors should be able to provide a complete environmental view to help identify recent and historic trends.

 

Rely on data to identify potential security holes

 

Identifying vulnerabilities requires complete data visualization across all networking components, whether they exist on-site or off. Teams should be able to select different sets of metrics of their choice, and easily view activity spikes or anomalies that correspond to those metrics. A graphical representation of the overlaid data can help pinpoint potential issues that deserve immediate attention.

 

Stay patched and create a software inventory whitelist

 

Software should be routinely updated to fortify it against the latest viruses and vulnerabilities. Ensure that you track the release of your patches, and make certain you have a documented and tested plan and rollout strategy. The ease of an automated patch management system can quickly become your biggest nightmare if you haven’t done proper validation.

 

SecOps teams should also collaborate on the creation of a software inventory whitelist. Teams should carefully research the software that is available to them and create a list of solutions that fit their criteria and agency security parameters. The NIST Guide to Application Whitelisting is a good starting point.

 

Hybrid IT is challenging the DoD to up its admirable SecOps game. The organization will need to make some strategic adjustments to overcome the challenges that hybrid IT poses, but doing so will undoubtedly yield beneficial results. Agencies will be able to reap the many benefits of hybrid IT while also improving their security postures. That is a win/win for both security and operations teams.

 

Find the full article on SIGNAL.

Filter Blog

By date: By tag:

SolarWinds uses cookies on its websites to make your online experience easier and better. By using our website, you consent to our use of cookies. For more information on cookies, see our cookie policy.