Skip navigation

Geek Speak

4 Posts authored by: jdgreen

Wth the pace of business today, it’s easy to lose track of what’s going on. It’s also becoming increasingly difficult to derive value from data quickly enough that the data is still relevant. Oftentimes companies struggle with a situation where by the time the data has been crunched and visualized in a meaningful way, the optimal window for taking action has already come and gone.


One of the strategies that many organizations are using to make sense of the vast amounts of helpful data their infrastructure generates is by collecting all the logging information from various infrastructure components, crunching it by correlating time stamps and using heuristics that take relationships between infrastructure entities into account, and presenting it in a report or dashboard that brings the important metrics to the surface.


ELK Stack is one way modern organizations choose to accomplish this. As the name (“stack”) implies, ELK is not actually a tool in itself, but rather a useful combination of three different tools – Elasticsearch, Logstash, and Kibana – hence ELK. All three are open source projects maintained by Elastic. The descriptions of each tool from Elastic on their website are great, so I have opted not to re-write them. Elastic says they are:


  • Elasticsearch: A distributed, open source search and analytics engine, designed for horizontal scalability, reliability, and easy management. It combines the speed of search with the power of analytics via a sophisticated, developer-friendly query language covering structured, unstructured, and time-series data.
  • Logstash: A flexible, open source data collection, enrichment, and transportation pipeline. With connectors to common infrastructure for easy integration, Logstash is designed to efficiently process a growing list of log, event, and unstructured data sources for distribution into a variety of outputs, including Elasticsearch.
  • Kibana: An open source data visualization platform that allows you to interact with your data through stunning, powerful graphics. From histograms to geomaps, Kibana brings your data to life with visuals that can be combined into custom dashboards that help you share insights from your data far and wide.


Put simply, the tools respectively provide fast searching over a large data set, collect and distribute large amounts of log data, and visualize the collected and processed data. Getting started with ELK stack isn’t too difficult, but there are ways that community members have contributed their efforts to make it even easier. Friend of the IT community Larry Smith wrote a really helpful guide to deploying a highly available ELK stack environment that you can use to get going. Given a little bit of determination, you can use Larry’s guide to get a resilient ELK stack deployment running in your lab in an evening after work!


Alternatively, if you’re looking to get going on an enterprise-class deployment of these tools and don’t have time for fooling around, you could consider whether hosted ELK stack services would meet your needs. Depending on your budget and skills, it could make sense to let someone else do the heavy lifting, and that’s where services like Qbox come in. I’ve not used the service myself and I’m not necessarily endorsing this one, but I’ve seen manages services like this one be very successful in meeting other pressing needs in the past.


If you check this out and ELK Stack doesn’t meet your data insight requirements, there are other awesome options as well. There’s also the ongoing debate about proprietary vs. open source software and you’ll find that there are log collection/search/visualization tools for both sides of the matter. If you’re looking for something different, you may want to consider:

In my previous post, I reviewed the 5 Infrastructure Characteristics that will be included as a part of a good design. The framework is layed out in the great work IT Architect: Foundations in the Art of Infrastructure Design. In this post, I’m going to continue that theme by outlining the 4 Considerations that will also be a part of that design.


While the Characteristics could also be called “qualities” and can be understood as a list of ways by which the design can be measured or described, Considerations could be viewed as the box that defines the boundaries of the design. Considerations set things like the limits and scope of the design, as well as explain what the architect or design team will need to be true of the environment in order to complete the design.


Design Considerations

I like to think of the four considerations as the four walls that create the box that the design lives in. When I accurately define the four different walls, the design to go inside of it is much easier to construct. There are less “unknowns” and I leave myself less exposed to faults or holes in the design.


Requirements – Although they’re all very important, I would venture to say that Requirements is the most important consideration. “Requirements”is  a list - either identified directly by the customer/business or teased out by the architect – of things that must be true about the delivered infrastructure. Some examples listed in the book are a particular Service Level Agreement metric that must be met (like uptime or performance) or governance or regulatory compliance requirements. Other examples I’ve seen could be usability/manageability requirements dictating how the system(s) will be interfaced with or a requirement that a certain level of redundancy must be maintained. For example, the configuration must allow for N+1, even during maintenance.


Constraints – Constraints are the considerations that determine how much liberty the architect has during the design process. Some projects have very little in the way of constraints, while others are extremely narrow in scope once all of the constraints have been accounted for. Examples of constraints from the book include budgetary constraints or the political/strategic choice to use a certain vendor regardless of other technically possible options. More examples that I’ve seen in the field include environmental considerations like “the environment is frequently dusty and the hardware must be able to tolerate poor environmentals” and human resource constraints like “it must be able to be managed by a staff of two.”


Risks – Risks are the architect’s tool for vetting a design ahead of time and showing the customer/business the potential technical shortcomings of the design imposed by the constraints. It also allows the architect to show the impact of certain possibilities outside the control of either the architect or the business. A technical risk could be that N+1 redundancy actually cannot be maintained during maintenance due to budgetary constraints. In this case, the risk is that a node fails during maintenance and puts the system into a degraded (and vulnerable) state. A risk that is less technical might be something like that the business is located within a few hundred yards of a river and flooding could cause a complete loss of the primary data center. When risks are purposely not mitigated in the design, listing them shows that the architect thought through the scenario, but due to cost, complexity, or some other business justification, the choice has been made to accept the risk.


Assumptions – For lack of a better term, an assumption is a C.Y.A. statement. Listing assumptions in a design shows the customer/business that the architect has identified a certain component of the big picture that will come into play but is not specifically addressed in the design (or is not technical in nature). A fantastic example listed in the book is an assumption that DNS infrastructure is available and functioning. I’m not sure if you’ve tried to do a VMware deployment recently, but pretty much everything beyond ESXi will fail miserably if DNS isn’t properly functioning. Although a design may not include specifications for building a functioning DNS infrastructure, it will certainly be necessary for many deployments. Calling it out here ensures that it is taken care of in advance (or in the worst case, the architect doesn’t look like a goofball when it isn’t available during the install!).


If you work these four Considerations (and the 5 Characteristics I detailed in my previous post) into any design documentation you’re putting together, you’re sure to have a much more impressive design. Also, if you’re interested in working toward design-focused certifications, many of these topics will come into play. Specifically, if VMware certification is of interest to you, VCIX/VCDX work will absolutely involve learning these factors well. Good luck on your future designs!

IT infrastructure design is a challenging topic. Experience in the industry is an irreplaceable asset to an architect, but closely following that in terms of importance is a solid framework around which to base a design. In my world, this is made clear by looking at design methodology from organizations like VMware. In the VCAP-DCD and VCDX certification path, VMware takes care to instill a methodology in certification candidates, not just the ability to pass an exam.


Three VCDX certification holders (including John Arrasjid who holds the coveted VCDX-001 certificate) recently released a book called IT Architect: Foundation in the Art of Infrastructure Design which serves exactly the same purpose: to give the reader a framework for doing high quality design.


In this post, I’m going to recap the design characteristics that the authors present. This model closely aligns with (not surprisingly) the model found in VMware design material. Nonetheless, I believe it’s applicable to a broader segment of the data center design space than just VMware-centric designs. In a follow-up post, I will also discuss Design Considerations, which relate very closely to the characteristics that follow.


Design Characteristics

Design characteristics are a set of qualities that can help the architect address the different components of a good design. The design characteristics are directly tied to the design considerations which I’ll discuss in the future. By focusing on solutions that can be mapped directly to one (or more) of these five design characteristics and one (or more) of the four considerations that will follow, an architect can be sure that there’s actually a purpose and a justification for a design decision.


It’s dreadfully easy – especially on a large design – to make decisions just because it makes sense at first blush. Unfortunately, things happen in practice that cause design decisions to have to be justified after the fact. And if they’re doing things correctly, an organization will require all design decisions to be justified before doing any work, so this bit is critical.


Here’s the 5 design characteristics proposed by the authors of the book:


Availability – Every business has a certain set of uptime requirements. One of the challenges an architect faces is accurately teasing these out. Once availability requirements are defined, design decisions can be directly mapped to this characteristic.


For example, “We chose such and such storage configuration because in the event of a loss of power to a single rack, the infrastructure will remain online thus meeting the availability requirements.”


Manageability – This characteristic weighs the operational impacts that a design decision will have. A fancy architecture is one thing, but being able to manage it from a day-to-day perspective is another entirely. By mapping design decisions to Manageability, the architect ensures that the system(s) can be sustainably managed with the resources and expertise available to the organization post-implementation.


For example, “We chose X Monitoring Tool over another option Y because we’ll be able to monitor and correlate data from a larger number of systems using Tool X. This creates an operational efficiency as opposed to using Y + Z to accomplish the same thing.”


Performance – As with availability, all systems have performance requirements, whether they’re explicit or implicit. Once the architect has teased out the performance requirements, design decisions can be mapped to supporting these requirements. Here’s a useful quote from the book regarding performance: “Performance measures the amount of useful work accomplished within a specified time with the available resources.”


For example, “We chose an all-flash configuration as opposed to a hybrid configuration because the performance requirements mandate that response time must be less than X milliseconds. Based on our testing and research, we believe an all-flash configuration will be required to achieve this.”


Recoverability – Failure is a given in the data center. Therefore, all good designs take into account the ease and promptness with which the status quo will be restored. How much data loss can be tolerated is also a part of the equation.


For example, “Although a 50 Mbps circuit is sufficient for our replication needs, we’ve chosen to turn up a 100 Mbps circuit so that the additional bandwidth will be available in the event of a failover or restore. This will allow the operation to complete within the timeframe set forth by the Recoverability requirements.”


Security – Lastly - but certainly one of the most relevant today - is Security. Design decisions must be weighed against the impact they’ll have on security requirements. This can often be a tricky balance; while a decision might help improve results with respect to Manageability, it could negatively impact Security.


For example, “We have decided that all users will be required to use two-factor authentication to access their desktops. Although Manageability is impacted by adding this authentication infrastructure, the Security requirements can’t be satisfied without 2FA.”



I believe that although infrastructure design is much an art as it is a science – as the name of the book I’m referencing suggests – leveraging a solid framework or lens through which to evaluate your design can help make sure there aren’t any gaps. What infrastructure design tools have you leveraged to ensure a high quality product?

All industry changing trends have an uncomfortable period where the benefit to adoption is understood but real world use is often exaggerated. The way the modern use of containers fundamentally changes the paradigm with which operations folks run their data centers means that the case for adoption needs to be extremely compelling before anyone will move forward.


Also, since change is hard, major industry-shifting trends come with lots of pushback from people who have built a career on the technology that is being changed, disrupted, or even displaced. In the case of containers, there exists a sizeable assembly of naysayers and not shockingly, they generally come from an Operations (and specifically virtualization) background.


To that end, I decided to dig deep into a handful of case studies and interview industry acquaintances about their experiences with containers in production. Making the case that containers can be handy for 2 developers on their laptops is easy; I was curious to find out what happens when companies adopt a container-based data center practice throughout the entire software lifecycle and at substantial scale. Here is what I found.

It’s Getting Better

One of the major challenges many people reported with containerization in the early stages with relation to products like Docker Engine and rkt was that at scale, it was very difficult to manage. Natively, these tools didn’t include any sort of single pane of glass management or higher level orchestration.


As the container paradigm has matured, tools like Docker Swarm, Kubernetes, and Cloud Foundry have helped adopters make sense of what’s happening across their entire environment and begin to more successfully automate and orchestrate the entire software development lifecycle.

Small Businesses Are Last, As Usual

As with other pivotal data center technologies like server virtualization, small businesses are sometimes least likely to see a valuable return by jumping on the bandwagon. Because of their small data center footprint, they don’t see the dramatic impact to the bottom line that enterprises do when making a change to the way their data center operates. While that’s obviously not always the case, my discussions with colleagues in the field and research into case studies seems to indicate that just like all the big shifts before it, full-steam-ahead containerization is primarily for the data centers of scale, at least for now.


One way this might change in the future is software distribution by manufacturers in a container format. While small businesses might not need to leverage containers to accelerate their software development practice, they may start getting forced into containerizion by the software manufacturers they deal with. Just like many, many ISVs today deliver their offering in an OVA format to be deploy into a virtualized environment, we may begin to see lots of containers delivered as the platform for running a particular software offering.

Containers are Here to Stay

As much as the naysayers and conservative IT veterans speculate about containers being mostly hype, the anecdotal evidence I’ve collected seems to indicate that many organization have indeed seen dramatic improvement in their operations, limited defects, and ultimately seen the impact to their bottom line.


I try to be very careful about buying in to hype, but it doesn’t look like containers are slowing down any time soon. The ecosystem that is developing around the paradigm is quite substantial, and as a part of the overall DevOps methodology trend, I see container-based technologies enabling the overall vision as much as any other sort of technology. It will be interesting to see how the data center landscape looks with regard to containers in 2020; will it be like the difference between virtualization in 2005 and 2015?

Filter Blog

By date: By tag: