Skip navigation
1 14 15 16 17 18 Previous Next

Geek Speak

1,743 posts
omri

What Does APM Mean to You?

Posted by omri Employee Apr 29, 2015

Many of us have heard the term APM, and just as many are confused as to what it truly means. Gartner put together their own definition – you’ll find it here. When writing this post I was thinking about 2 of the 5 “functional dimensions” Gartner outlines as making a proper APM solution. These are:

 

  • Runtime application architecture discovery modeling and display: As I like to think about it, this means discovering and providing useful metrics on the full range of paths that an application can take between software and hardware in your environment as part of proper execution.
  • User defined transaction profiling: Similar to the above, but more focused on how real users are using the application, and thus the paths their actual application requests take through the same hardware and software topology. Think about it like choosing those paths which are most critical to your actual users. This allows the solution to provide real-time metrics surrounding user experience and satisfaction.


We all have a number of web based apps running in our IT world, many of whose topology we may not fully understand. Capabilities such as the above can be helpful in identifying, troubleshooting and isolating the cause of user issues with those apps. After all, the issue could be anywhere in the various layers of our environment, and no one wants to start a guess and check game of servers, databases, etc. you stand to lose while wasting valuable time. At the same time, we’re all curious how exactly users are interacting with our environment, but don’t possess the sixth sense to tell us just which DB has the most users hitting it at any given time. With those two capabilities combined, you can begin to imagine being given visibility into which issues are impacting the most users at any given time (thus where the most helpdesk tickets will come from) as well as the components along their application request path that could be the root cause – all in real time.

 

In thinking about all the above, I’m curious about the following:

  1. Do you or your company currently use any software today that helps provide this sort of information?
  2. If you do, what sort of problems has it helped you to solve? With what type of applications? What do you feel is missing?
  3. Would knowing your various applications’ topology be interesting to you or your company? How about the real user transaction paths within those same applications?
  4. What sort of problems do you think could be solved if you knew all of that?
  5. In general, do you wish you knew more about how your real user’s actions affect the IT environment you work hard to monitor and maintain?

 

Don’t be shy and comment! I know I’m not the only one that struggles with this and, after all, misery loves company.

In 1999 and 2000 I worked as Tier 3 UNIX Support for a colocation company. One of our largest customers had recently moved in so much new equipment, there wasn’t enough power in one rack. In what was to be a temporary solution, a power cord was left running across the aisle about two feet off the floor.

 

One day as I was going into the data center with one of our Tier 2 folks, the inevitable happened – his foot caught the cord as he stepped over it, pulling it free and crashing the customer’s server. He scrambled to plug it back in as quickly as possible, but I stopped him and instead had us find a longer cord which we then ran underneath the raised floor.

 

The outage triggered an automatic trouble ticket, which was assigned to me. My manager advised me to list the root cause as “momentary loss of power”. This in turn triggered a series of daily 8:30AM “post mortem” meetings as the customer – quite reasonably – wanted to know why it had taken so long to get the server back on line. I was instructed to not say anything that could make it appear that the outage was our fault in any way.

 

After six weeks, I couldn't take it anymore and, with my manager waving me off, I said, “I tripped over your power cord.” I then proceeded to tell the customer what had happened, making it sound like I had been alone in the data center.

 

The customer’s first response was, “Oh. You should have just said so in the first place.”

 

With the explanation that we had set things up so no one could trip over the cord again, the customer was satisfied with the resolution.

 

This incident led me to determine the three things any customer wants to know when there’s an incident:

  1. What happened.
  2. What you did to fix it.
  3. What you’re doing to make sure it doesn't happen again.

 

If you can provide (and follow through on) these three things, you’ll have satisfied customers nearly every time.

 

What are the key things you do to ensure your customers are happy with the resolution of their problems?

Based on some of the responses from this last post, it is good to see that there are some that take backing up their network configurations serious. I would like to build on that last post and discuss some ideas around network configuration management, specifically solutions and automation to handle some of the tasks required. I see that several stated that they use Solarwinds NCM, which I personally do use in my environment. Solarwinds NCM is extremely easy to setup and configure to “save your bacon”, as one comment stated. NCM is a perfect example of a solution, which has the ability to track changes, roll back changes, reporting and auditing; as well as many other use cases of the product. There are also many open source products that I have used previously which include routerconfigs plugin for Cacti, simple TFTP jobs setup on each router/switch to backup nightly and numerous other solutions including rConfig which is another open source product that is a very solid solution. However, you may have the solution in place but how do you handle making sure that each and every network switch/router is in a consistent state? Or is configured to have nightly scheduled backups of their configs? Do you still do this manually? Or do you use automation tools such as Ansible, Chef or Puppet to stamp out configurations? I have personally began the journey of using Ansible to build playbooks to start stamping out configurations in a consistent manner as well as for creating dynamic configurations using templates. This is also another great way to start building a solution around the fact that when a network device fails, you have a solid way to start rebuilding a failed device from a somewhat consistent state. I would definitely still leverage a nightly backup to restore configurations which may have changed over time since deployment of an automation tool but hopefully as changes are made, your automation deployment configurations are also modified to reflect these changes.

If you are heading to Chicago next week for Microsoft Ignite, come by booth #543 and say hello to myself, kong.yang, and patrick.hubbard. We'd love to talk with you about, well, anything! I'm looking forward to lots of conversations about databases, performance tuning, and bacon (not necessarily in that order).

 

Click here to let us know you’re going to be at MS Ignite and see a full schedule of our activities.

 

See you there!

Brien Posey

Come Meet me at Ignite!

Posted by Brien Posey Apr 28, 2015

Microsoft Ignite is quickly approaching and although I am not going to be presenting a session this year, I am going to be doing an informal presentation on May 4th in the SolarWinds booth at the exhibit hall.

 

My presentation will focus on the challenges of server virtualization. As we all know, server virtualization provides numerous benefits, but it also introduces a whole new set of challenges. As such, I plan to spend a bit of time talking about some of the virtualization related challenges that keep administrators up at night.

 

I don’t want to give away too much information about my presentation before Ignite even starts, but I will tell you that some of the challenges that I plan to talk about are related to IP address management. Microsoft has tried to make things better for its customers by introducing a feature in Windows Server 2012 and Windows Server 2012 R2 called IP Address Management (IPAM).

 

The Windows Server IPAM feature has its good points and it has its bad points. In some situations, Microsoft IPAM works flawlessly. In other situations, it simply is not practical to use Microsoft IPAM. Unfortunately, some administrators probably do not find out about the IPAM feature’s limitations until they have taken the time to install and configure IPAM. That being the case, I plan to spend some time talking about what you can realistically expect from Microsoft IPAM.

 

So if you are interested in learning more about IP address management, or if you just want to meet me in person then please feel free to stop by the SolarWinds booth (# 543) for my presentation on May 4th at 2pm.  I am expecting a big crowd, so you may want to arrive early.

 

 

Click here to let us know you’re going to be at MS Ignite and see a full schedule of our activities.

 

 

 

SWIatMSIgnite.png

Are you attending Microsoft Ignite in Chicago from May 4th-8th? Come by Booth #543 for the swag -- IT's most-sought-after buttons, stickers, and t-shirts. Stay for the in booth demos and conversations highlighting the breadth and depth of SolarWinds IT management product portfolio. Chat with SolarWinds subject matter experts -- Product Managers, System Engineers, and Head Geeks - patrick.hubbard, Microsoft MVP sqlrockstar, and me.

                                        • Bring your IT management problems and challenges because we're bringing our IT management know-how.
                                        • Learn best practice tips for application stack management from the app to the virtualization and web layers extended through the physical servers and storage arrays connected via the network.
                                        • Delve into the IT learning experience via the SolarWinds in-booth theater presentations.
                                        • [UPDATED]  Play SysMan Hero for fabulous prizes.
                                        • [UPDATED]  Click here to let us know you're going to be at MS Ignite and see a full schedule of our activities.


Ignite will showcase Microsoft's vision in mobility and cloud. Expect exciting sessions and discussions on Windows 10, Windows Server vNext, and Microsoft Azure that include hot tech trends such as DevOps, security, hybrid cloud, containers, and big data analytics. They highlight continuous application delivery and continuous service integration. And SolarWinds can certainly help IT pros manage the continuous app cycles with connected context.   


 

Bonus round:

We invite you to meet one-on-one with Michael Thompson, SolarWinds Director of Systems Management Strategy. Mike can share SolarWinds' vision for helping IT Pros drive and deliver the performance that their business or organizations need as well as discuss your individual IT management needs. If you are attending Microsoft Ignite and would like to speak with Mike for 30 minutes, please RSVP today.


We look forward to meeting and talking to you there!

The need for disruptive innovation is driving businesses to seek better, faster, and cheaper options to internal IT. This coercive effort to find the best-fit technology is putting the squeeze on IT departments, IT budgets, and IT pros. Furthermore, new technologies are disrupting older, monolithic technologies and IT processes at a higher frequency and a grander scale. Alternatives are coming in with more velocity, more critical mass, or both. This abundance of choice is putting IT pros in a scramble-to-come-up-to-speed-on-the-changing-tech-landscape-or-find-yourself-out-of-the-IT-profession mode.

 

IT generalists may find themselves on the endangered list. A generalist is an IT pro with limited knowledge across many domains. Think broad, but not deep in any IT area. Generalists are being treated as replaceable commodities with reduction in force either through automation and orchestration or through future tech constructs like AI or machine learning.

 

But all is not lost, there are two paths that IT pros can seek to differentiate themselves from fellow generalist clones: IT versatilist or IT specialist.

 

IT versatilists are embracing the challenges by leveraging their inner IT force—born of experience and expertise to meet the speed of business. IT specialists are emerging out of the data center shadows to show off the mastery of their one skill in the era of big data and security ops.

 

A versatilist is fluent in multiple IT domains. This level of expertise across many technology disciplines bridges the utility between IT ops and business ops. They are sought after to define the processes and policies that enable automation and orchestration at speed and scale.

 

A specialist is a master of one specific IT discipline. It can be an application like databases or a domain like security, storage, or networking. Businesses are looking to them to transform big data into profit-turning, tunable insights and actions. Businesses are also looking to them to protect their data from all the connected entities in the Internet of things.

 

So which IT pro are you? The revolution has already started—hybrid cloud, software-defined data center, big data, containers, microservices, and more. Have you made preparations to evolve into a specialist or a versatilist? Awaken your inner IT force, there’s plenty of opportunities to expand your IT career. Paid well, you will be.

Security is an aspect that every organization should give the utmost priority. Ideally, every employee, from the end-user to top-level management are educated when it comes to the impact of network security failure. That said, organizations spend significant capex on securing the network. Despite all the investment on intrusion detection devices, firewalls and access control rules, hackers and their threats continue to succeed—data is stolen, critical services are brought down, and malware manages to sneak into secured networks.

 

Akamai released their fourth quarter “State of the Internet” report last month which provides valuable insights into, well…obviously, the state of Internet! The security section of the report discusses the top originating country for attack traffic (no points for guessing), the most targeted port, and information about DDoS attacks.

 

As per the report, the most targeted port for attacks is the good old Telnet port. In fact, Port 23 remains the most targeted port for the 3rd consecutive quarter and attacks against port 23 have increased to 32% from 12% in Q3 2014! This despite the fact that most enterprises I know have shifted to SSH from Telnet to enhance security. The cause of attacks can mostly be attributed to bots trying their luck on finding devices with port 23 open and then using the default username and password. That or a brute-force attack to gain access into the target network.


most attacks.png

Source: Akamai State of the Internet report


While the data in the report reminds the network admin not to leave unused ports open, it also shows that HTTP and HTTPS, both of which are open in most enterprise networks, too are targeted for attack. And then, port 23 or none of the top 10 ports listed might be the ones used to target your network. It can be a different random port which you might have left open inadvertently or had to leave open to facilitate a business service. Of course, it is not possible to block all ingress traffic originating from the WAN to your network.

 

Firewalls and Intrusion Detection/Prevention Systems (IDS/IPS) enhance your network’s security and are a necessity. But they may not successfully protect your network every timeto name a few, everyone remembers what happened to Sony, Home Depot, and Target! These organizations definitely had security measures in place to protect against malware and other threatsbut despite their efforts, the breaches still occurred. This shows that malware and other network threats are getting smarter every day and the traditional methods of security using firewalls and IDS/IPS alone are not sufficient. The work around?

 

A New Security Layer:

 

In addition to firewalls and intrusion detection systems, add a 3rd layer of security that can detect threats and attacks that have breached your defense. A layer that looks at the behavior of network traffic to detect anomalies, such as malware, hacking, data theft, and DDoS attacks.

 

With Network Behavior Anomaly Detection or NBAD, it is possible to detect anomalies that get past the firewall and IDS/IPS systems. NBAD tracks traffic behavior and alerts you if there is unusual or out of the ordinary activity. For example, traffic originating from invalid IP addresses, traffic on one port from one system to many, TCP or UDP packets whose size is less than the least expected value, etc., are all network behavior anomalies. NBAD is further enhanced, when individual systems in the network are monitored for behavior anomalies.

 

Enterprises can get started with NBAD on their own using traffic flow data, network performance data, and log analysis.

 

Flow technologies, such as NetFlow, sFlow, J-Flow, or IPFIX carries information about the IP conversations with details like source and destination IP addresses, ports, protocol, volume, number of packets, etc. The data can then be used to track behavior anomalies, such as burst of packets, traffic from invalid IP addresses, malformed packets, etc.

 

Network performance data can also help discover network anomalies. If there were sudden voice call drops, it could be due to fully utilized links which in turn could possibly be a DDoS attack.

 

While flow based analysis of traffic is the most widely used method for NBAD, log analysis from various elements in the network including user systems can add value to network behavior analysis. With a log analysis tool that analyzes logs and extrapolates information based on correlation, the admin can pin-point the source of threats within the network and take preventive measures before major damage occurs.

 

While you are still waiting to find a dedicated NBAD tool that really does what you need, leverage existing technologies and tools for your own network behavior analysis engine. So, what are you starting with? NetFlow or log analysis?

Any Help Desk is going to have to handle repeat requests (hopefully not always for the same customer – although repeat requests from customers is definitely a metric worth tracking…). There are a few things you can do to help avoid having “repetitive” become “mind-numbing boredom”, while improving the level of service provided to your customers.


The first, and most obvious, thing to do is automate. Many common requests received by the Help Desk can be automated through scripts. Requests like password resets, creating new accounts, permission changes, and provisioning new resources can all be automated.


Who will write these scripts? I’ve always found that this is a great job for the administrators who wish they were spending less time doing break-fix. Not only will they welcome the chance to keep their skills sharp by writing some code, but they’re also most likely the folks with the most knowledge of what needs to be accomplished to meet the particular request.


The second thing to do is document. Not all repeat tasks can be easily automated. Creating “How To” documents or “run books” for your Help Desk staff can make their jobs easier and keep your customers happier. The keys to successful documentation are:

  • Keep it centralized
  • Keep it up to date
  • Make it easily and quickly searchable


A lot of organizations find that setting up an internal Wiki serves this purpose well.


Handling repetitive Help Desk tasks well, whether through automation, documentation, or training, serves both IT and its customers. Help Desk feels more valued if they can help a customer quickly without having to transfer them to someone else. Customers feel better taken care of if the first person they speak to can assist them immediately. Lastly, the fewer repeat tasks that get pushed to other staff, the more time they have to focus on improving the overall environment.


How does your organization handle repetitive requests? What tasks do you see as good candidates for automation?

Recently, the Security Team here at SolarWinds conducted a survey to gather information about security risks you felt would be the most detrimental to your network. While it was clear the reality is the external threat will always be a risk, there was a lot more confidence in your perimeter defense systems, policies, and procedures. On the flipside, there was also a significant increase in the belief that the INTERNAL threat is a much higher risk.


The following infographic provides several simple tips that can help reduce the risk of insider abuse. Below you will also find some additional best practices that you can use to create a more secure user environment.

 

 

 

1. CREATE STRONG PASSWORDS/PRACTICE PASSWORD HYGIENE

  • Configure and enforce the use of strong passwords - while your user/customers may become grumpy, your leadership and compliance auditors breathe a sigh of relief.
  • Educate your users on the importance of passwords to create buy-in. One of the most effective ways to drive a point home is to show them how easy it is to crack simple passwords: get permission from management and run a live attack on sample passwords. The “shock and awe” factor can be a pretty effective method.
  • Use SIEM or Log Management tools to monitor and alert on odd password sets/resets, such as strange times of day or too many accounts being changed at once. This can be an early indicator of both brute force and low and slow attacks.

 

2. KEEP YOUR INBOX SAFE

  • User education is also extremely important when it comes to email.  Providing real-life examples of phishing emails would be a good way to help your user base gain a simple understanding of how emails can be used to gather information.  Most importantly, encourage them to ask questions! The old adage “If it’s too good to be true...it probably is” is a good mantra to remember when preaching email security.
  • Email content scanners are essential for scanning attachments and emails for embedded code, while SIEM and Log Management tools can also be used to monitor logs for suspicious authentications events. Look for someone logging on to another user’s inbox, “send as” events against critical inboxes, port 25 traffic that does NOT source from your email server(s), or an abnormal amount of traffic that is in fact coming from your internal email server(s).

 

3. KEEP SECURITY TOP OF MIND

  • The Department of Defense provides a decent model for creating a security culture with education tools like emailed “Security Tips”, required online or classroom based self-paced security courses, and enforcing a “Clean desk” policy. This type of consistency in education keeps users aware even if they only pay attention to half of the material, and builds accountability - to use an old military quote, users will begin to “police their own” and hold their peers responsible for a secure environment.

 

4. KEEP YOUR DEVICES SECURE

  • It’s absolutely imperative that systems and applications are kept up to date on updates and patches. Take it just a bit further and use the operating system or domain policies to limit a remote user’s capabilities within a system.  Realizing that this is not popular and can be difficult to manage, the alternative is much more frightening. Once a system leaves the mother ship the security risk grows exponentially.  Once again I will mention user education (notice a theme here?).  Hammering the fact that this shiny new, expertly provisioned laptop is not a “personal device” is key to reducing the security risk.

 

5. AUDIT WHO HAS ACCESS

  • Auditing is one of, if not the most crucial tools/features that should be enabled in every environment. Some of the key logs that should be audited are:
    • Access logs – Monitoring successful/failed logons at the domain AND local level can alert you to authentication based attacks by looking for the use of privileged accounts at odd hours or large amounts of failed logon attempts from same account, and can also provide critical information for root cause analysis and forensics.
    • File Activity –Native operating system audit policies, File Integrity Monitoring applications and Content Scanners all create audit trails on file servers and endpoints that can be used to detect data theft and suspicious file changes.  In many cases these tools may also alert you to zero-day viruses and other malware.
    • Network, System and Application logs -  These logs can not only identify perimeter attacks , but also identify outbound FTP traffic which can indicate data theft or malware, and critical error and change information that may alert you to site hacking, malware and denial of service attacks sourcing from INSIDE the network.

 

The risk of attacks and breaches only grows with the introduction of Bring Your Own Device (BYOD) mobile devices so implementing the right tools, policies and procedures now just might create the proper security culture within your business.

 

Avoid some of the cybersecurity pitfalls. Secure your environment with Log & Event Manager. Get started for free.

In an earlier blog, we discussed the state of technology adoption by IT organizations in various industries. Now, we’ll look at an IT budget report from Spiceworks and understand how $$$ will be spent in 2015. In this report, 600 IT pros were surveyed across NA and EMEA countries about their IT budget plan in 2015.

 

IS IT BUDGET GOING TO INCREASE IN 2015?

One-third (33%) of surveyed IT pros are expecting an increased spend from 2014 to 2015. 44% feel IT budget is not likely to change. About the size of IT staff/headcount, 26% feel 2015 will see an increase. 67% say there won’t be a change from 2014.

 

ALLOCATION OF IT BUDGET

Top 4 areas of spending:

  1. Hardware projects (41%)
  2. Software projects (33%)
  3. Hosted/Cloud-based services projects (12%)
  4. Managed services projects (10%)

 

Let’s dive a level deeper and look at the spend categories for each of these segments.

State of IT.PNG

It’s not surprising to see desktops, servers, and operating systems coming ahead in the budget share. Email hosting and Web hosting are ahead of the curve in the Cloud investment segment. Virtualization, bagging the third spot in software spend, goes to show the increase in demand for the technology. (See The 2015 Top 10 IT Pro-dictions.)

 

You can view the full report here: http://www.spiceworks.com/marketing/it-budget-report/

 

Where do you spend most of your IT budget? What’s your perspective about IT spending at large in organizations?

Do you treat network configuration management with the same care and love as you do for server backups in your environment? Or, are both equally as unattended to as the other? No, this is not a post about backups, that would be too boring right? Well, I would be willing to bet that some environments treat network configuration management as low down on the priority list as backups. Let’s start with something as simple as backing up the configuration of a network switch or router. Are you thinking, “Well now why would I ever do that”? Let’s be hopeful that you are not one of those organizations but rather one who could answer with “we backup all of our routers and switches nightly”. If you do not back up the configurations of your network devices, how would you recover in the event of a device failure? Hopefully not stuck trying to remember what the configuration was and having to rebuild a replacement network device from scratch. That would be way too time consuming and guaranteed to be a failure. Now, what about tracking changes to your network devices? Is this something that you keep track of in some sort of change management solution or at the very least even to a central syslog repository? This way you could be alerted when a change occurs or have the ability to identify a change that may have been the cause of a network outage? Having a good change management solution in place for network devices should absolutely be crucial to any environment. Without one, you will be left wishing that you had one WHEN disaster strikes. Now, how do you handle change management in your environments?

Two recent thwack conversations touched on the theme of change and initiated tremendous engagement from community members. The first article, Moving Network Configuration Management Forward by lindsayhill, discussed network configuration management that enables higher reliability at the speed of business and the hold up with moving forward. The second discussion, Why are IT people afraid of change? by optwpierce, covered a situation where an IT pro ran into IT inertiaother organizations refused to adopt a more efficient method.

 

I’ve mentioned before that the only guarantee in IT is that something will break. I’d like to amend tothe only guarantee in IT is that something will change. People, process, and technology will change. Accordingly, IT professionals need to adjust to those dynamics as well.

 

So why doesn’t IT just change? To understand the angst against change, you have to understand what changes in IT and the drivers of change. IT change management used to be characterized by just IT operations only i.e. those configurations associated with systems, applications, storage, networks, and software. IT received an annual budget that it could spend as it saw fit to support all of the business projects. Because of the steady, constant nature of the IT budget, IT could plan on a regular cadence for procurement, deployment, testing, and integration.

waves-change.jpg

Unfortunately, many times this did not meet the business requirements or the business opportunity window. This opened the door for IT-as-a-Service from someone other than IT. And guess what, others can sometimes do it better, faster, and cheaper.


So two things emerged against change: process and people. Sentiments like “We’ve always done it this way” “It’s not broken, why fix it” or “If we automate and orchestrate those responsibilities, what will I be doing?” arose in IT organizations. However, these excuses can’t hold off the impending IT transformation. The business is mandating that IT operations match the speed and the scale of services that it wants to use. IT change management needs to evolve to incorporate business operations alongside its day-to-day IT operations.

 

The threat is clear to IT budgets and IT professionalseither effectively and efficiently deal with change management at scale and high velocity or be completely disrupted out of a career. I highly recommend reading the aforementioned articles and the associated comments. And share how you and your IT organization are dealing with change in the comment section below.

At some point IT will be asked to make a business case for some expense related to the Help Desk. It could be to justify hiring new staff, laptop upgrades, training, or to avoid the axe in a time of tightening budgets. When that time comes, for whatever reason, you’ll want to be able to show the return on investment (ROI) of the Help Desk expenses.


At first, this seems like it should be easy enough to show – simply calculate the cost of the Help Desk, create some metrics to calculate the value of what the Help Desk provides to its customers (customers, not users), and then demonstrate that the second value is greater than the first.


In practice, this is often extraordinarily difficult for two reasons. The first is that it’s actually hard to get people to agree on the value of the Help Desk metrics that are easy to measure. Some examples:

  • Number of tickets/cases handled: Some will argue that a high number proves the value of Help Desk, while others maintain that it shows the environment is too fault-prone or difficult to use.
  • The cost of the customer-hours of productivity that Help Desk saves: It’s difficult to get people to agree on the monetary value of a customer-hour or on how much time a particular Help Desk action saved.
  • Average time to ticket/problem resolution: A low number here is an obvious sign of a good Help Desk, but does a closed ticket mean the problem is actually resolved?


The second reason is that the things you actually want to measure turn out to be really difficult to put a number to. What you really want to know is:

  • Are our customers happy with the service they’ve received?
  • Are our customers more productive because of us? If yes, how much more productive?
  • Was the issue actually resolved satisfactorily, or did the customer simply work around it?


What metrics are you tracking for your Help Desk? What do you wish you could track?

In my previous blogs, I provided an overview of thin provisioning and discussed moving from fat to thin with thin provisioning. Now, I’d like to talk about over allocation or over committing of storage space. When you over commit storage it helps enhance application uptime. Further, it makes storage capacity management simple.

 

What is the over committing of storage?

When VM’s are assigned with storage more than what is actually available, it is known as over-allocation. By this mechanism, applications start their operation viewing more storage than what was actually assigned. For example, say 3 applications need 50 GB each to start operation. With over committing, all 3 can start their operation with just a total 50 GB of physical storage. The remaining 100 GB or more can be added to the storage array when the existing 50 GB’s utilization is increasing.  This way available storage arrays are utilized appropriately.

 

Some advantages of over committing storage are: 

  • It cuts down capital expenditures (capex): Capex will be cut down since storage space that goes unused is very minimal.
  • It allows you to dedicate more storage to VM’s than the actual available storage.
  • Flexibility of storage: there is no storage limit, more volume can be added as and when needed.
  • No trouble of forecasting storage growth: Helps you avoid the trouble of having to predict the accurate growth of volume.

 

Some disadvantages include:

  • Applications halt or crash: When the disk group/groups run into overcommit state (when the physical storage gets utilized 100%), applications will not have free disk to store the processed/collected data, causing the application to crash.
  • Adding free capacity can be time consuming: Manual interventions, like adding disk drives are needed to increase free capacity in the disk group/groups. And manual interventions are time consuming.
  • Chances for errors: There is a high chance for errorslike when freeing storage by deleting unwanted files or VM’s that are no longer needed, which can cause the loss of a file that had required data.
  • Rogue application:  A rogue application can completely bring down the storage as it might rapidly consume the free storage. Just imagine the rouge application sharing the same disk group as a business critical application, such as CRM, ERP, etc.

 

To get the best out of over committing in thin provisioning and avoid any risk, it’s important to be readily prepared. So always remember to keep a close eye on your storage. By monitoring your storage, it makes over committing and thin provisioning much easier. Furthermore, you should manage your storage or datastore by setting alerts for over allocation, so you quickly receive an SMS or email before something goes wrong. Finally, be sure to set your alerts at a decent %, so you have the necessary time to add more disk to existing storage volume.

  I hope this 3 piece blog series provided you with some helpful tips and information. If anyone has questions regarding thin provisioning, feel free to ask me in the comments section. 

Filter Blog

By date:
By tag: