Skip navigation
1 15 16 17 18 19 Previous Next

Geek Speak

2,164 posts

I'm in Seattle this week at the PASS Summit. If you don't know PASS, check out sqlpass.org for more information. This will be my 13th consecutive Summit and I'm as excited as if it was my first. I am looking forward to seeing my #sqlfamily again.

 

Anyway, here's a bunch of stuff I found on the Intertubz in the past week that you might find interesting, enjoy!

 

DDoS attack against DNS provider knocks major sites offline

Yet another reminder that the internet is fragile, and you can't rely on any service provider to be up 100% of the time.

 

10 Modern Software Over-Engineering Mistakes

A bit long but worth the read. I am certain you will identify (painfully at times) with at least 8 or more of the items on this list.

 

What will AI make possible that's impossible today?

Also a bit long, but for someone that enjoys history lessons I found this worth sharing.

 

Go serverless for the enterprise with Microsoft Azure Functions

One of the sessions from Microsoft Ignite I felt worth sharing here. I am intrigued by the 'serverless' future that appears to be on the horizon. I am also intrigued as to how this future will further enable DDoS attacks because basic security seems to be hard for most.

 

Bill Belichick is done using the NFL’s Microsoft Surface tablet he hates so much

On the bright side, the announcers keep calling it an iPad, so there's that.

 

Here's how I handle online abuse

Wonderful post from Troy Hunt on how he handles online abuse. For those of us that put ourselves out there we often get negative feedback from the cesspool of misery known as the internet and Troy shares his techniques.

 

Michael Dell Outlines Framework for IT Dominance

"Work is not a location. You don't go to work, you do work." As good as that line is, they forgot the part about work also being a never ending day.

 

Microsoft allows Brazil to inspect its source code for ‘back doors’

I've never heard of this program before, but my first thought is how would anyone know if Microsoft was showing them everything?

 

I managed to visit some landmarks while in Barcelona last week, here's Legodaughter in front of Sagrada Familia:

lego - 1.jpg

Emails is the center of life for almost every business in this world. When email is down businesses cannot communicate. There is loss of productivity which could lead to dollars lost, which in the end is not good.

Daily monitoring of Exchange involves many aspects of the environment and the surrounding infrastructure.  Simply turning on monitoring will not get you very far. First question you should ask yourself is “What do I need to monitor?” Not knowing what to look out for could inundate you with alerts which is not going to be helpful for you.

One of the first places to look at when troubleshooting mail slowness or other email issues is the network. Therefore, it is a good idea to monitor some basic network counters on the Exchange Servers. These counters will help guide you to determine where the root cause of the issue is.

 

Network Counters

The following tables displays acceptable thresholds and information about common network counters.

 

Counter

Description

Threshold

Network Interface(*)\Packets Outbound Errors

Indicates the number of outbound packets that couldn't be transmitted because of errors.

Should be 0 at all times.

TCPv6\Connection Failures

Shows the number of TCP connections for which the current state is either ESTABLISHED or CLOSE-WAIT. The number of TCP connections that can be established is constrained by the size of the nonpaged pool. When the nonpaged pool is depleted, no new connections can be established.

Not applicable

TCPv4\Connections Reset

Shows the number of times TCP connections have made a direct transition to the CLOSED state from either the ESTABLISHED state or the CLOSE-WAIT state.

An increasing number of resets or a consistently increasing rate of resets can indicate a bandwidth shortage.

TCPv6\Connections Reset

Shows the number of times TCP connections have made a direct transition to the CLOSED state from either the ESTABLISHED state or the CLOSE-WAIT state.

An increasing number of resets or a consistently increasing rate of resets can indicate a bandwidth shortage.

 

 


Monitoring Beyond the Exchange Environment


When your monitoring exchange not only are you monitoring for performance but you also want to monitor outside factors such as network, active directory, and any external connections such as mobile device management. All these external factors will affect the health of your Exchange environment.

In order to run Exchange, you need a network, yes routers and switches can impact exchange. As the exchange admin you don’t need to be aware of every single network event but a simple alert of a network outage or blip can be helpful. Sometimes all it takes is a slight blip in the network and it could have could affect your Exchange DAG by causing databases to fail over.

If you are not responsible for the network, then I would suggest you coordinate with your network team on what notifications you should be made aware in terms of network outages. Some key items to be informed or notified of are:

  • Planned outages between datacenters
  • Planned outages for network equipment
  • Network equipment upgrades and or changes that would affect the subnet your exchange servers reside on
  • Unplanned outages of network equipment and between datacenters
  • If your servers are virtualized, you should be informed of any host changes and/or virtual switch changes
  • Planned or unplanned DNS server changes because DNS issues can be a real nightmare

Preventing Bigger Headaches

Exchange Monitoring is a headache and can be time consuming but if you know what you are looking for and have the right tools in hand it is not so bad.  If the Exchange DAG is properly designed a network blip or outage should not take down your email for you company, this is the whole point of having an Exchange DAG( high availability design). What you may get is a help desk calls when users see that their outlook has disconnected briefly. Being informed of potential network outages can help you prepare in advance if you need to manually switch active copies of databases or when you need to do mailbox migrations. A network that is congested or having outages can cause mailbox migrations to fail, cause outlook clients to disconnect and even impact the speed of email delivery. Knowing ahead of time allows you to be prepared and have less headaches.

 


Forget about writing a letter to your congressman – today citizens use the web, email and social media to make their voices heard on the state, local and federal levels.

 

Much of this participation is due to the ubiquity of mobile devices. People can do just about everything with a smartphone or tablet and they expect their interactions with the government to be just as easy.

 

Unfortunately, according to a January 2015 report by the American Customer Satisfaction Index, citizen satisfaction with federal government services continued to decline in 2014. This, despite Cross-Agency Priority Goals that require federal agencies to “utilize technology to improve the customer experience.”

 

IT pros need to design services that allow users to easily access information and interact with their governments using any type of device. Then, they must monitor these services to ensure they continue to deliver optimal experiences.

 

Those who wish to avoid the ire of the citizenry would do well to add automated end-user monitoring to their IT toolkit. End-user monitoring allows agency IT managers to continuously observe the user experience without having to manually check to see if a website or portal is functioning properly. It can help ensure that applications and sites remain problem-free-- and enhance a government’s relationship with its citizens.

 

There are three types of end-user monitoring solutions IT professionals can use to ensure their services are running at peak performance.

 

First, there is web performance monitoring, which can proactively identify slow or non-performing websites that could hamper the user experience. Automated web performance monitoring tools can also report load-times of page elements so that administrators can adjust and fix pages accordingly.

 

Synthetic end-user monitoring (SEUM) allows IT administrators to run simulated tests on different scenarios to anticipate the outcome of certain events. For example, in the days leading up to an election or critical vote on the Hill, agency IT professionals may wish to test certain applications to ensure they can handle spikes in traffic. Depending on the results, managers can make adjustments to handle the influx.

 

Likewise, SEUM allows for testing of beta applications or sites, so managers can gauge the positive or negative aspects of the user experience before the services go live.

 

Finally, real-time end-user monitoring effectively complements its synthetic partner. It is a passive monitoring process that gathers actual performance data as end users are visiting and interacting with the web application in real time, and it will alert administrators to any sort of anomaly.

 

Using these monitoring techniques, IT teams can address user experience issues from certain locations – helping to ascertain why a response rate from a user in Washington, D.C., might be dramatically different from one in Austin, Texas.

 

Today, governments are trying to become more agile and responsive and are committed to innovation. They’re also looking for ways to better service their customers. The potent combination of synthetic, real-time and web performance monitoring can help them achieve all of these goals by greatly enhancing end-user satisfaction and overall citizen engagement.

 

Find the full article on Government Computer News.

Without a doubt, the biggest trend in IT storage over the past year, and moving forward is the concept of Software Defined Storage (SDS). It’s more than just a buzzword in the industry, but, as I’ve posted before, a true definition has yet to be achieved. I’ve written previously about just the same thought. Here’s what I wrote.

 

Ultimately, I talked about different brands with different approaches and definitions. So, at this point, I’m not going to rehash the details. But at a high level, the value as I see it, has to do with the divesting of the hardware layer from the management plane. In the view I have, the capacities of leveraging the horsepower of commodity hardware, in reference builds, plus a management layer optimized toward that hardware build grants the users/IT organization the costs come down, and potentially, the abilities to customize the hardware choices for the use-case. Typically your choices revolve around Read/Write IOPS, Disc Redundancy, Tiering, Compression, Deduplication, Number of paths to disc, failover, and of course, with the use of X86 architecture, the amount of RAM, and speed of processors in the servers. To compare these to traditional monolithic storage platforms makes for a compelling case.

 

However, there are other approaches. I’ve had conversations with customers who only want to buy a “Premixed/Premeasured” solution. And, while that doesn’t lock out SDS models such as that one above, it does change the game a bit. Toward that end, many storage companies will align with a specific server, and disc model. They’ll build architectures very tightly bound around a hardware stack, even though they’re relying on commodity hardware, and allow the customers to purchase the storage much in the same as more traditional models. They often take it a step further and put a custom bezel on the front of the device. So it may be Dell behind, but it’s “Someone’s Storage” company in the front. After all, the magic all takes place at the software layer, whatever makes the solution unique… so why not?

 

Another category that I feel is truly trending in terms of storage, is really a recent category in backup, dubbed CDM, or Copy Data Management. Typically, these are smaller bricks of storage that act as gateway type devices, holding onto some backup data, but also pointing to the real target, as defined by the lifecycle policy on the data. There are a number of players here. I am thinking specifically of Rubrik, Cohesity, Actifio, and others. As these devices are built on storage bricks, but made functional simply due to superior software, I would also consider them to be valid considerations as Software Defined Storage.

 

Backup and Disaster Recovery are key pain points in the management of an infrastructure. Traditional methods consisted of some level of scripted software moving your data for backup into a tape mechanism (maybe robotic), which would then require quite often manual filing, and then the rotation of tapes to places like Iron Mountain. Restores have been tricky, time spent awaiting restore, and the potential corruption of files upon those tapes has been reasonably consistent. With tools like these, as well as other styles of backup including cloud service providers and even public cloud environments have made things far more viable. These CDM solutions take so much of the leg work out of the process, and as well, enable quite possibly zero Recovery Point and Recovery Time objectives, regardless of where the current file is located, and by that I mean, the CDM device, local storage, or even on some cloud vendor’s storage. It shouldn’t matter, as long as the metadata points to that file.

 

I have been very interested in changes in this space for quite some time, as these are key process changes pushing upward into the data center. I’d be curious to hear your responses and ideas as well. 

joshberman

PCI DSS 3.2 is coming!

Posted by joshberman Employee Oct 24, 2016

PCI DSS 3.1 expires October 31st this year. But don’t panic. If you don’t have a migration plan for 3.2, yet, you have until Feb 1, 2018 before the requirements become mandatory. For most merchants, the changes are not onerous. If you are a service provider, however, there are more substantial changes, some of which are already in effect. In this post we focus on the requirements for merchants, and present a quick overview of the required changes so that you can identify any gaps you may still need to remediate. 

 

The main changes in PCI DSS 3.2 are around authentication, encryption, and other minor clarifications. We will primarily discuss authentication and encryption in this article.

 

Authentication

The PCI council review of data breaches over the years has confirmed that current authentication methods are not sufficient to mitigate the risk of cardholder data breaches. Even though cardholder data is maintained on specific network segments, it is nearly impossible to prevent a data breach if the authentication mechanisms are vulnerable as well. Specifically, the reliance on passwords, challenge questions, etc. only, has been proven to be a weakness in the overall security of cardholder networks. 

 

The new 3.2 requirements now state that multi-factor authentication (MFA) is required for all individual, administrative, non-console, or remote access to the cardholder data environment.[1] The new standard is clear. Multi-factor[2] means at least two of the following:

 

  • Something you know, such as a password or passphrase
  • Something you have, such as a token device or smart card
  • Something you are, such as a biometric

 

Multi-factor does not mean a password and a set of challenge questions, or two passwords.

 

Implementing such multifactor authentication takes time and planning as there are almost 200 different types of multifactor solutions on the market.

Multifactor authentication is also complex because:

  1. It likely needs to integrate with your single sign-on solution
  2. Not all IT systems can support the same types of MFA, especially cloud solutions
  3. MFA resets are more complex (especially if a dongle is lost)
  4. Most MFA solutions require rollout and management consoles on top of your built-in authentication

 

Encryption

In recent years, SSL, the workhorse for securing e-commerce and credit card transactions, has been through the ringer. From shellshock, heartbleed, and poodle to the Man-in-the-Middle AES CBC padding attack reported this May, SSL, openSSL, and all the derivative implementations that have branched from openSSL have been experiencing one high severity vulnerability after another. [SIDEBAR: A list of all vulnerabilities in openSSL can be found here: List of OpenSSL Vulnerabilities, other SSL vulnerabilities can be found on the vendors websites or at the National Vulnerability Database.] Of particular concern are SSL vulnerabilities in Point of Sale solutions and other embedded devices as those systems may be harder to upgrade and patch than general-purpose computers. In response, the PCI council has issued guidance on the use of SSL.[3]

 

The simplest approach to achieve PCI DSS 3.2 compliance is not to use SSL, or early versions of TLS, period. [SIDEBAR TLS  - Transport Layer Security is the name of the IETF[4] RFC that was the follow on to SSL]. In fact, any version of TLS prior to 1.2 and all versions of SSL 1.0-3.0 are considered insufficiently secure to protect cardholder data. As a merchant, you may still use these older versions if you have a plan in place to migrate by June 2018, and a risk mitigation plan to compensate for the risks of using such older versions. Risk mitigation plans might include, for example, additional breach detection controls, or documented patches for all known vulnerabilities, or carefully selecting cipher suites to ensure no weak ciphers are permitted, or all of the above. If you have a Point of Sale or Point of Interaction system that does not have any known vulnerabilities, you may run these systems using older versions of SSL/TLS. The PCI council reserves the right to change guidance if new vulnerabilities associated with a particular POS or POI become known which jeopardize the cardholder environment.

 

If you are concerned about the risk to your e-commerce or mobile environment with upgrading your SSL to TLS 1.2 or higher, you can ask your online marketing department what the oldest versions of iOS, Android, Windows® and Mac® are that connect to your systems. Android has supported TLS 1.2 since version 4.1, although it is not enabled by default. As of version 4.4, Kitkat, released Oct. 2013, TLS 1.2 has been enabled by default. iOS 5, has supported TLS 1.2 since Oct. 2011. Windows 8 is the first Microsoft OS to support TLS 1.2 by default. Prior to that, you needed to manually configure TLS 1.2 support. A complete TLS table for Windows is available here: TLS version support. For Mac OS® you need to reach Mavericks, (Oct. 2013) to find a Mac computer with TLS 1.2 enabled by default.  

 

If all this versioning seems daunting, not to worry. Most modern browsers, which auto- upgrade, have been supporting TLS 1.2 for a long time.[5] The net/net is most organizations have nothing to fear with upgrading their secure communications layer to TLS 1.2.

 

Minor Clarifications

There are some additional nuances regarding the correct cipher suites to use to meet PCI DSS 3.2 requirements, which we will cover in a future post on using managed file transfer in cardholder environments.

 

If you are a PCI pro, you now have a good overview of the major changes coming in PCI DSS 3.2. Note that we didn’t address the additional changes a service provider needs to comply with, nor did we walk though all 12 requirements of PCI DSS and how they apply to IT organizations. For some of that, you should check out the Simplifying PCI Compliance for IT Professionals White Paper.

 

Questions about PCI DSS?

If you have questions regarding the latest version of PCI DSS, or what’s generally required from an IT perspective to maintain compliance, we urge you to join in the conversation on THWACK!

 


[1] Requirement 8.3

[2] Requirement 8.2

[3] PCI DSS 3.2 Appendix 2

[4] Internet Engineering Task Force RFC 2246, et. seq https://www.ietf.org/rfc/rfc2246.txt

[5] https://en.wikipedia.org/wiki/Template:TLS/SSL_support_history_of_web_browsers

Master of Your Virtual IT Universe: Trust but Verify at Any Scale

A Never Ending IT Journey around Optimizing, Automating and Reporting on Your Virtual Data Center

1609_VM_Ebook_photo_3.jpg

Automation

Automation is a skill that requires detailed knowledge, including comprehensive experience around a specific task. This is because you need that task to be fully encapsulated in a workflow script, template, or blueprint. Automation, much like optimization, focuses on understanding the interactions of the IT ecosystem, the behavior of the application stack, and the interdependencies of systems to deliver the benefits of economies of scale and efficiency to the overall business objectives. And it embraces the do-more-with-less edict that IT professionals have to abide by.

 

Automation is the culmination of a series of brain dumps covering the steps that an IT professional takes to complete a single task. These are steps that the IT pro is expected to complete multiple times with regularity and consistency. The singularity of regularity is a common thread in deciding to automate an IT process.

 

Excerpted from Skillz To Master Your Virtual Universe SOAR Framework

 

Automation in the virtual data center spans workflows. These workflows can encompass management actions such as provisioning or reclaiming virtual resources, setting up profiles and configurations in a one to many manner, and reflecting best practices in policies across the virtual data center in a consistent and scalable way.

 

Embodiment of automation

Scripts, templates, and blueprints embody IT automation. They are created from an IT professional’s best practice methodology - tried and true IT methods and processes. Unfortunately, automation itself cannot differentiate between good and bad. Therefore, automating bad IT practice will lead to unbelievable pain at scale across your virtual data centers.

 

To combat that from happening, keep automation stupid simple. First, automate at a controlled scale following the mantra, “Do no harm to your production data center environment.” Next, monitor the automation process from start to finish in order to ensure that every step executes as expected. Finally, analyze the results and use your findings to make necessary adjustments to optimize the automation process.

 

Automate with purpose

Start with an end goal in mind. What problems are you solving for with your automation work? If you can’t answer this question, then you’re not ready to automate any solution.

 

This post is a shortened version of the eventual eBook chapter. Stay tuned for elongated version in the eBook. Next week, I will cover reporting in the virtual data center.

I'm in Barcelona this week. It's been ten years since I've been here so it's hard for me to tell what has changed, but everything seems new. It still has a wonderful feel, and wonderful food. Yesterday I presented my session with David Klee to a packed room. I loved the energy of the session, and VMworld as a whole. I hope I get a chance to come back again.

 

Anyway, here's a bunch of stuff I found on the Intertubz in the past week that you might find interesting, enjoy!

 

You've been hacked. What are you liable for?

A nice reminder for those that aren't concerned about being hacked. You may have some liability you were not expecting.

 

SQL Server 2016 Express Edition in Windows containers

This may seem like a minor blip on the radar for some, but for those of us in the Microsoft Fanbois circles this is a huge step forward to what we believe will be total assimilation of all things data.

 

Amazon Partners With VMware to Extend Its Computing Cloud

Of course, SQL Server Express in a container pales in comparison to the big news of the week, and that is VMware partnering with Amazon to go head on with Microsoft for the enterprise hybrid cloud market.

 

How an IT Pro Kicked Hackers Off Surveillance Cameras

Score one for the good guys, I guess. Maybe we can get Rich a job working for the companies that continue to deploy unsecured IoT devices.

 

S.A.R.A. Seeks To Give Artificial Intelligence People Skills

Something tells me that maybe letting college students decide what is appropriate behavior for AI isn't the best idea.

 

Work Counts

Ever wonder how many people are just like you? Well, wonder no more!

 

Because I'll be away on Election day, I already filled out my absentee ballot and wanted to show you who I voted for:

 

bacon - 1.jpg

Whether seeking solutions to pressing problems, networking with like-minded professionals or researching products, professionals across nearly all industries benefit from participating in communities every day, and it’s no different for federal IT professionals.

 

Actively engaging in online IT communities can make all the difference by enabling connections with other federal IT pros who are encountering similar issues, providing educational content, offering insights from experts and providing a channel to share valuable feedback.

 

Peer-to-Peer Collaboration

 

Online IT communities offer diverse feedback on creative ideas, and productive discussions for problem-solving. You may find your “unique” problem is actually more common than originally thought.

 

For example, GovLoop features educational blogs, forums about everything from career to citizen engagement, training both online and in person, and group spaces to engage with content around topics such as technology, levels of government and occupations.

 

Another popular community is Reddit which often generates reams of answers from users who have experienced similar problems or are able to recommend resources to fix the issue. Reddit also features up-voting and down-voting on replies, making it easy to identify trusted answers from top-ranked users.

 

This sort of organic, peer-to-peer collaboration can help you solve problems quickly and with the confidence of having your answers come from trusted sources—professionals just like you from a wide variety of backgrounds and with vast ranges of expertise.

 

Direct Line to Vendors

 

Online communities can also connect you with the vendors whose products you rely on to get your job done. As you become more involved, you might transition from solely an information-seeker to an information-provider.

 

For example, the highly engaged end users that participate in SolarWinds’ more than 130,000 member strong IT community, THWACK, have influenced product directions and even go-to-market strategy based on their direct feedback and general discussion of industry trends and the company’s products. THWACK also features a Federal and Government space, which caters specifically to challenges that are unique to federal IT pros.

 

Career Development

 

Online IT communities also enable education that lends itself to career development. For example, many consider the SpiceWorks How-to forum a reliable place to develop IT skills and learn about the industry. Forums such as this and those found on thwack provide a venue for community members to learn best practices, get access to advice from experts in various fields and research developing trends that could impact your career in an environment where you have the option to ask questions and engage in conversations.

 

The Power of the Masses at Your Finger Tips

 

In summary, the access to a wide audience of peers that online IT communities provide can be invaluable to you as a federal IT pro. Membership and active participation within such communities can provide quick answers to problems, foster collaboration to ensure vendors are creating products that meet your needs and create new opportunities for your career development.

 

Find the full article on our partner DLT’s blog, TechnicallySpeaking.

1606_LEM_Integration_Guide_w160h600.png

The process of expanding operations or adopting new technology within an IT organization is sometimes met with caution. Rightfully so, given what’s at stake. Even the smallest configuration error, which can happen when you introduce new software or systems into a network, can spell disaster. Whether it leads to downtime, loss of data, the advent of security vulnerabilities, or compliance violations, the costs can be great for businesses of all sizes.

 

It’s not surprising, then, that some IT pros are hesitant to try out new software or test the latest SolarWinds offerings. But what it really boils down to is the fact that some IT pros don’t have the resources available to test a solution effectively without fearing these negative consequences.

 

What if I told you it was possible to test a solution like SolarWinds® Log & Event Manager (LEM) in a manner that was both safe and free for your business? Would you consider adding a powerful SIEM solution to your arsenal that tackles IT security and compliance? Well, you can! Introducing the LEM + GNS3® Integration Guide.

 

What is GNS3?

 

GNS3 is a multi-vendor tool that allows you to build, design, and test network configurations and software in a risk-free virtual environment. This technology eliminates the need for expensive physical testing by offering a network-attached or stand-alone virtual test bed, free of charge. With real-time network emulation, users can conduct proof-of-concept testing and troubleshooting on dynamic network configurations.

 

Download the GNS3 and SolarWinds LEM Integration Guide

 

Whether you’re a seasoned GNS3 pro that’s new to LEM, or a LEM user that’s interested in building a lab to experience the full functionality of the product within a safe and secure virtualized instance for testing or troubleshooting, this guide has something for you. In addition to instructing you on how to get started with VMWare®, Hyper-v®, GNS3, and LEM, this guide will help you understand some of the LEM basics to help ensure that you hit the ground running with this advanced security solution.

 

Click here for access to the guide and free 30-day trial of LEM!

 

To learn more about the partnership we’ve formed with GNS3, check out the GNS3 Group on THWACK: https://thwack.solarwinds.com/groups/gns3

Master of Your Virtual IT Universe: Trust but Verify at Any Scale

A Never Ending IT Journey around Optimizing, Automating and Reporting on Your Virtual Data Center

Optimization

Optimization is a skill that requires a clear end-goal in mind. Optimization focuses on understanding the interactions of the IT ecosystem, the behavior of the application stack, and the interdependencies of systems inside and outside their sphere of influence in order to deliver success in business objectives.

 

If one were to look at optimization from a theoretical perspective, each instantiation of optimization would be a mathematical equation with multi-variables. Think multivariate calculus as an IT pro tries to find the maxima as other variables change with respect to one another.

 

Excerpted from Skillz To Master Your Virtual Universe SOAR Framework

 

Optimization in the virtual data center spans the virtual data center health across resource utilization and saturation while encompassing resource capacity planning and resource elasticity. Utilization, saturation, and errors play key roles in the optimization skill. The key question is: what needs to be optimize in the virtual data center?

 

Resources scalability

Similar to other IT disciplines, optimization in the virtual environment boils down to optimizing resources i.e. do more with less. This oftentimes produces an over-commitment of resources and the eventual contention issues that follow the saturated state. If the contention persists over an extended period of time or comes too fast and too furious, errors usually crop up. And that’s when the “no-fun” time begins.

 

Resource optimization starts with tuning compute (vCPUs), memory (vRAM), network and storage. It extends to the application and its tunable properties through the hypervisor to the host and cluster.

 

Sub-optimal scale

 

vCPU and vRAM penalties manifests in saturation and errors, which lead to slow application performance and tickets being opened. There are definite costs to oversizing and undersizing virtual machines (VMs). Optimization seeks to find the fine line with respect to the entire virtual data center environment.

 

To optimize compute cycles, look for vCPU utilization and their counters as well processor queue length. For instance, in VMware, the CPU counters to examine are: %USED, %RDY and %CSTP. %USED shows how much time the VM spent executing CPU cycles on the physical CPU. %RDY defines the percentage of time a VM wanted to execute but had to wait to be scheduled by the VMKernel. %CSTP is the percentage of time that a SMP VM was ready to run but incurred delay because of co-vCPU scheduling contention. The performance counters in Microsoft are System\Processor Queue Length, Process\% Processor Time, Processor\%Processor Time and Thread\% Processor Time.

 

To optimize memory, look for memory swapping, guest level paging and overall memory utilization. For VMware, the counters are SWP/s and SWW/s while for Microsoft, the counter is pages/s. For Linux VMs, leverage vmstat and the swap counters si and so, swap in and swap out respectively.

 

Of course, a virtualization maestro needs to factor in hypervisor kernel optimization/reclamation techniques as well as the application stack and the layout of their virtual data center infrastructure into their optimization process. 

 

This post is a shortened version of the eventual eBook chapter. For a longer treatment, stay tuned for the eBook. Next week, I will cover automation in the virtual data center.

If you didn’t have a chance to join some 350+ of your fellow IT and Security Pros at our Shields Up Panel: Network Security Fundamentals, Fight! THWACKcamp session – you’re in luck, we took some notes.

 

Our panel was comprised of Eric Hodeen, Byron Anderson, our moderator Patrick Hubbard, and me, c1ph3r_qu33n.

 

Compliance v Security was the theme this year, and we tackled 4 big questions:

 

  • Have security practitioners and business owners figured out how to work with compliance schemes instead of fighting them? 
  • Are you more or less secure when you put compliance first?
  • What benefits (or harms) do compliance schemes and checklists offer?
  • If you are new to compliance, where do you start first? 

 

Our panelists felt that security and compliance teams are generally getting along better. However, there are still times when a business owner looks only at the penalties or risks of non-compliance and doesn’t consider the impact to the business of following a standard blindly. This can be especially true of highly proscriptive standards like DISA STIGS (Defense Information Systems Agency - Security Technical Implementation Guidelines)[1], or NERC CIP (North America Electric Reliability Corporation – Critical Information Protection)[2]. The challenge for IT and security pros, is to effectively communicate the potential business impacts and to give the business owner the ammunition to argue for a waiver or request a compensating control.  This way your organization can reach an optimum balance of compliance risk vs business needs.

 

One of the misconceptions that business owners may have is that a compliance scheme comprehends all the organizations security risk, so nothing further needs to be considered. As practitioners we know that compliance schemes are negotiated or promulgated standards that take time to change. Adjusting for changes to the threat landscape and addressing new technology innovations in a rapid fashion are challenges for compliance schemes. Furthermore no compliance standard considers every nuance of every IT environment.

 

So that is one of the risks of taking a compliance only approach.  But no one on the panel felt compliance schemes don’t have value.  Like other good guidelines and checklists, such as the OWASP top ten[3], or the SANS Critical Security Controls[4], compliance checklists can add value to an organization, especially as assurance.  The panel was divided however, on whether you start with a checklist, or you end with a checklist.  The answer may depend on your organizations maturity.  If you’ve been doing security for a while, using a checklist to validate your approach may add an extra layer of assurance. If you are new to security, however, a good checklist can be a great asset as you get started in this new IT discipline. 

 

Speaking of getting started, we all had different ideas about what is your most important first step. One of us said default passwords, which insidiously have a way of creeping back into the organization – whether it’s from a new install, or a reset of an existing device – default passwords still haunt us.  Another panelist thought end users were the biggest challenge, and maintaining good security required strong user participation. Anyone who has dealt with ransomware or phishing knows how important it is to keep users informed of likely risks and good security hygiene.

 

VIDEO: Shields Up Panel: Network Security Fundamentals, Fight!

 

We all agreed that THWACKcamp was great fun and we hope to see you all next year. If you’ve got an issue you’d like to see the experts take a stab at, post your questions and we’ll put them in the idea basket for next year.

 


 


[1] http://iase.disa.mil/stigs/Pages/index.aspx

[2] http://www.nerc.com/pa/Stand/Pages/CIPStandards.aspx

[3] https://www.owasp.org/index.php/Category:OWASP_Top_Ten_Project

[4] https://www.sans.org/critical-security-controls

I'm heading to VMworld in Barcelona next week, so if you are there let me know as I would love to talk data and databases with you. I'm co-presenting with David Klee and we are going to be talking about virtualizing your database server. I have not been to Barcelona in 10 years, I'm looking forward to seeing the city again, even briefly.

 

Here's a bunch of stuff I found on the Intertubz in the past week that you might find interesting, enjoy!

 

Cloud by the Megawatt: Inside IBM’s Cloud Data Center Strategy

If you are like me you will read this article and think to yourself "Wait, IBM has a Cloud?"

 

VMware, AWS Join Forces in Battle for Enterprise Cloud Market

This partnership marks an important shift in the market and should cause some concern for Microsoft. That being said, I also see this partnership as a last-ditch effort to keep VMware relevant before being absorbed completely by Dell.

 

Here are the 61 passwords that powered the Mirai IoT botnet

Proving once again that the cobbler's children have no shoes, we have an army of devices built by people that should know better, but don't put into practice the basics of security.

 

Twitter, Microsoft, Google and others say they haven’t scanned messages like Yahoo

I feel I have heard this story before, and I think I know how it ends.

 

Are microservices for you? You might be asking the wrong question

"Change for the sake of change is rarely a sensible use of time." If only they taught this at all the charm schools known as the MBA.

 

Latency numbers every programmer should know

Not a bad start at a complete list, and I would argue that more than just programmers should know these numbers. I've had to explain the speed of light to managers before.

 

7 Times Technology Almost Destroyed The World

Here's hoping the robots can act a bit more like humans when it counts the most.

 

Autumn has arrived here in New England, and that means apple picking is in full swing:

apple - 1.jpg

One of the questions that comes up during the great debate on the security of Internet of Things (IoT) is the responsibility of device manufacturers to support those devices. When we buy a refrigerator or a toaster, we expect that device to last through the warranty date and well beyond. Assuming it is a well-made unit it may last for a long time. But what about devices that only live as long as someone else wants them to?

Time's Up!

Remember Revolv? The smart hub for your home that was purchased by Nest? They stopped selling the device in October 2014 after being purchased, but a year and a half later they killed the service entirely. The Internet was noticeably upset and cried out that Google and Nest had done a huge disservice to their customers by killing the product. The outcry was so fierce that Nest ended up offering refunds for devices.

The drive to create new devices for IoT consumption is huge. Since most of them require some kind of app or service to operate correctly, it also stands to reason that these devices are reliant on the app to work properly. In the case of Revolv, once the app was shut down the device was no longer able to coordinate services and essentially became a huge paperweight. A few companies have chosen to create a software load that allows devices to function in isolated mode, but those are few and far between.

The biggest concern for security here is what happens when those devices that are abandoned by still function are left to their own ends. A fair number of the devices used in the recent IoT botnet attacks were abandonware cameras that were running their last software update. Those devices aren't going to have security holes patched or get new features. The fact that they work at all owes more to them being IP-based devices than anything else.

Killing In The Name Of IoT

However, if those manufacturers had installed a kill switch instead of allow the cameras to still work it would have prevented some of the chaos from the attack. Yes, the buyers of those cameras would have been irritated that the functionality was lost. But it could have made a massive security issue easier to deal with.

Should manufacturers be responsible for installing a software cut-out that allows a device to be remotely disabled when the support period expires? That's a thorny legal question. It opens the manufacturer up to lawsuits and class action filings about creating products with known defects. But it also raises the question of whether or now these same manufacturers should have a greater duty to the safety of the Internet.

And this isn't taking into account the huge issues with industrial IoT devices. Could you imagine what might happen if an insulin pump or an electrical smart meter was compromised and used as an attack vector? The damage could be catastrophic. Worse yet, even with a kill switch or cut-out to prevent transmission, neutering those devices renders them non-functional and potentially hazardous. Medical devices that stop working cause harm and possibly death. Electrical meters that go offline create hazards for people living in homes.

Decisions, Decisions

There's no easy answer to all these problems. Someone is going to be mad no matter what we decide. Either these devices live on in their last known configuration and can be exploited or they get neutered when they shutdown. The third option, having manufacturers support devices forever, isn't feasible either. So we have to make some choices here. We have to stand up for what we think is right and make it happen.

Make sure your IoT policy spells out what happens to out-of-support devices. Make sure your users know that you are going to implement a traffic kill switch if your policy spells it out. Knowledge is key here. Users will understand your reasons if communicated ahead of time. And using tools from Solarwinds to track those devices and keep tabs on them helps you figure out when it's time to implement those policies. Better to have it sorted out now than have to deal with a problem when it happens.

6622814393_7fbe9569da_o.jpg

Image courtesy of Spirit-Fire on Flickr

 

I'd think I'd like to mirror a session title from the recent ThwackCamp and subtitle this particular post "Don't Hate Your Monitoring." We all face an enormous challenge in monitoring our systems and infrastructure, and in part that's caused by an underlying conflict:

 

monitor_all.jpg Color_Overload.jpg

Image Courtesy D Sharon Pruitt

 

This is a serious problem for everybody. We want to monitor everything we possibly can. We NEED to monitor everything we can, because heaven help us if we miss something important because we don't have the data available. At the same time, we cannot possibly cope with the volume of information coming into our monitoring systems; it's overwhelming, and trying to manually sift through to find the alerts or data that actually matter to your business. And then we wonder why people are stressed, and why we have a love/hate relationship with our monitoring systems!

 

How can the chaos be minimized? Well, some manual labor is required up front, and after that it will be an iterative process that's never truly complete.

 

Decide what actually needs to be monitored

It's tempting to monitor every port on every device, but do you really need to monitor every access switch port? Even if you want to maintain logs for those ports for other reasons, you'll want to filter alerts for those ports so that they don't show up in your day to day monitoring. If somebody complains about bad performance, then digging in to the monitoring and alerting is a good next step (maybe the port is fully utilized, or spewing errors constantly), but that's not business critical, perhaps unless that's your CEO's switchport.

 

Focus on which alerts you generate in the first place

  • Use Custom Properties to allow identification of related systems so that alerts can be generated in an intelligent way using custom labels to identify related systems.
  • Before diving into the Alert Suppression tab to keep things quiet, look carefully at Trigger Conditions and try to add intelligent queries in order to minimize the generation of alerts in the first place. The trigger conditions allow for some quite complex nested logic which can really help make sure that only the most critical alerts hit the top of your list.
  • Use trigger conditions to suppress downstream alerts (e.g if a site router is down, don't trigger alerts from devices behind that router that are now inaccessible)

 

Suppress Alerts!

I know I just said not to dive into Alert Suppression, but it's still useful as the cherry on top of the cream that is carefully managed triggers.

  • It's better in general to create appropriate rules governing when an alert is triggered than to suppress it afterwards. Alert suppression is in some ways rather a blunt tool; if the condition is true, all alerts are suppressed.
  • One way to achieve downstream alert suppression is to add a suppression condition to devices on a given site that queries for the status of that site's edge router; if the router status is not "Up", the condition becomes true, and it should suppress the triggered alerts from that end device. This could also be achieved using Trigger Conditions, but it's cleaner in my opinion to do it in the Alert suppression tab. Note that I said "not Up" for the node status rather than "Down"; that means that the condition will evaluate to true for any status except Up, rather than explicitly requiring it to be only "Down". The more you know, etc...

 

Other features that may be helpful

  • Use dependencies! Orion is smart enough to know the implicit dependencies of, say, CPU and Memory on the Host in which they are found, but site or application-level dependencies are just a little bit trickier for Orion to guess. The Dependencies feature allows you to create relationships between groups of devices so that if the 'parent' group is down, alerts from the 'child' group can be automatically suppressed. This is another way to achieve downstream alert suppression at a fairly granular level.
  • Time-based monitoring may help for sites where the cleaner unplugs the server every night (or the system has a scheduled reboot), for example.
  • Where approptiate, consider using the "Condition must exist for more than <x> minutes" option within Trigger Conditions to avoid getting an alert for every little blip in a system. This theoretically slows down your notification time, but can help clear out transient problems before they disturb you.
  • Think carefully about where each alert type should be sent. Which ones are pager-worthy, for example, versus ones that should just be sent to a file for historical record keeping?

 

Performance and Capacity Monitoring

  • Baselining. As I discussed in a previous post, if you don't know what the infrastructure is doing when things are working correctly, it makes it even harder to figure out what's wrong when then there's a problem. This might apply to element utilization, network routing issues, and more. This information doesn't have to be in your face all the time, but having it to hand is very valuable.

 

BUT!

 

Everything so far talks about how to handle alerting when events occur. This is "reactive" monitoring, and it's what most of us end up doing. However, to achieve true inner peace we need to look beyond the triggers and prevent the event from happening in the first place. Obviously there's not much that can be done about power outages or hardware failures, but in other ways we can help ourselves by proactively.

 

Proactive monitoring basically means preempting avoidable alerts. Solarwinds software offers a number of features to forecast and plan for capacity issues before they become alerts. For example, Virtualization Manager can warn of impending doom for VMs and their hosts; Storage Resource Monitor tracks capacity trends for storage devices; Network Performance Manager can forecast exhaustion dates on the network; User Device Tracker can monitor switch port utilization. Basically, we need to use the forecasting/trending tools provided to look for any measurement that looks like it's going to hit a threshold, check with the business to determine any additional growth expected, then make plans to mitigate the issue before it becomes one.

 

Hating Our Monitoring

 

We don't have to hate our monitoring. Sadly, the tools tend to do exactly what we tell them to, and we sometimes expect a little too much from them in terms of having the intelligence to know which alerts are important, and which are not. However, we have the technology at our fingertips, and we can make our infrastructure monitoring dance, if not to our tune (because sometimes we need something that just isn't possible at the moment), then at least to the same musical genre. With careful tuning, alerting can largely be mastered and minimized. With proactive monitoring and forecasting, we can avoid some of those alerts in the first place. After all -- and without wishing to sound too cheesy -- the best alert is the one that never triggers.

For the unprepared, managing your agency’s modern IT infrastructure with all its complexity can be a little scary. Evolving mandates, the constant threat of a cyber-attack and a connected workforce that demands access to information when they want it, where they want it, places more pressure on the government’s IT professionals than ever. And at the heart of it all is still the network.

 

At SolarWinds we know today’s government IT pro is a Bear Grylls-style survival expert. And in true Man vs. Wild fashion, the modern IT pro needs a network survival guide to be prepared for everything.

 

Assess the Network

 

Every explorer needs a map. IT Pros are no different, and the map you need is of your network. Understanding your networks capabilities, needs and resources is the first step of network survival.

 

Ask yourself the following questions:

 

  • How many sites need to communicate?
  • Are they located on the intranet, or externally and accessed via a datacenter?
  • Is the bulk of my traffic internal, or is it all bound for the Internet? How about any key partners and contractors?
  • Which are the key interfaces to monitor?
  • Where should deep packet inspection (DPI) agents go?
  • What is the scope and scale of what needs to be monitored?

 

Acknowledge that Wireless is the Way

 

What’s needed are tools like wireless heat maps to manage over-subscribed access points and user device tracking tools that allow agencies to track rogue devices and enforce their BYOD policies. The problem is that many of these tools have traditionally been cost-prohibitive, but newer options open doors to implementing these technologies you might not be aware of.

 

Prepare for the Internet of Things

 

The government can sometimes be slower to adopt new technology, but agencies are increasingly experimenting with the Internet of Things. How do you overcome these challenges? True application firewalls can untangle the most sneaky device conversation, get IP address management under control and get gear ready for IPv6. They can also classify and segment your device traffic; implement effective quality of service to ensure that critical business traffic has headroom; and of course, monitor flow.

 

Understand that Scalability is Inevitable

 

It is important to leverage capacity forecasting tools, configuration management, and web-based reporting to be able to predict and document scalability and growth needs so you can justify your budget requests and stay ahead of infrastructure demands.

 

Just admit it already—it’s All About the Application

 

Everything we do is because of and for the end-users. The whole point of having a network is to achieve your mission and support your stakeholders. Seek a holistic view of the entire infrastructure, including the impact of the network on application issues and don’t silo network management anymore.

 

A Man is Only as Good as His Tools

 

Having sophisticated network monitoring and management tools is an important part of arming IT professionals for survival, but let’s not overlook the need for certifications and training, so the tools can be used to effectively manage the network.

 

Revisit, Review, Revise

 

What’s needed to keep your network running at its peak will change, so your plans need to adapt to keep up. Constantly reexamine your network to be sure that you’re addressing changes as they arise. Successful network management is a cyclical process, not a one-way journey.

 

Find the full article on Federal Technology Insider.

Filter Blog

By date:
By tag: