INFRA-AS-CODE: How do we start?

In the last post we discussed a little bit about what Infrastructure-As-Code (Infra-As-Code or IAC) meant. But we obviously didn’t go very deep into how to begin this journey and that is exactly what we will be covering in this post.

Let’s assume that in most environments today you have some sort of process that is followed to implement change on your network. Be it a new change or a change required to resolve a specific issue that needs to be addressed. But for sakes of conversation here let’s assume that our change is to change from existing manually added static routes to dynamic routing using OSPF. Me being the good network admin/architect that I may be, I bring this up in a weekly meeting and state that we need to change our routing methodology. The reason may be that our network is getting too large and difficult to manage by adding static routes manually everywhere. Your manager and peers listen to your reasoning and decide that it makes sense and for you to come up with a plan to implement this change. So you go back and write up some documentation (including a drawing) and add all of the steps required to make this significant change. The following week you present this to your manager and peers once again and receive the go ahead to proceed. So you put in a change control for approvals and attach your document that you spent all of that time writing up.  Now let’s assume that your manager is not so technical and really doesn’t understand the complexity of this change but proceeds with approving your change and you are set for 2 weeks out. You now go and write up some scripts to execute the changes based on your documentation you spent all the time on writing up (better than copy/paste right?). You soon realize that you will be making changes across 50 individual devices but you are not worried because you have scripted it all (50 scripts?). So at the next meeting you assure everyone that you have all of your scripts written out and all should be good to go. Again, your manager and your peers are good with the process you have put together for this massive change and all agree that this change will occur in 1 week. Now, let’s stop right here and evaluate what just occurred. Does this sound like your environment or environments that you are accustomed to? Are we now doing Infra-As-Code? Not even close. So where did we fall short on this so call Infra-As-Code journey? Everywhere could be easily stated.

So what was missed in the above scenario allowing us to start on this new journey? Let’s start with the most obvious first. Where was the code review and peer review? Didn’t we cover peer review discussing this in the meeting(s) with your peers and manager? Nope. Our peer review would have included our team as well as all other teams which could be affected by this change. Remember, you only know what you know and other teams will have different perspectives and uncover additional questions or concerns, in which will be different than your own. So this is a great example of beginning to tear down the silos of communication on implementing change. Up next would be code review. Who else reviewed your scripts for any typos, missing steps or possibly even something added that could break your network? And if you did employ code review did they understand the actual change? Also where was the testing process and results of those tests to review? This doesn’t even cover version control on your scripts and the fact that scripts themselves may not even be the correct automation tooling for this.

So how do we start this journey? It should actually be quite easy but it does require a completely different mentality than what you may be used to in the past. Our process should look something similar to the following.

  • - New change request – Change from manual static routing to dynamic OSPF for all networks
    • Open discussion
      • § Involve all teams and organizations and discuss the specifics and reasons on why this change should be embraced
      • § Entertain input from others on their questions and/or concerns
        • May also include additional benefits which you may not have thought of
      • § Discuss the specifics on a testing plan and what a reasonable timeline should be
    • Development phase
      • § Obtain a baseline of the state of the current routing topology
      • § Document the processes required to get you to the desired state and also include drawings
      • § Create your automation tasks and begin to version control them
        • Using automation templates vs. scripts
        • Version control – will allow you to document all change phases and ensure that your tasks can be repeatable and consistent. As well as allow you to roll back if needed.
        • Leverage API’s (if available) and/or configuration management tooling
    • Testing phase
      • § Ensure that you are testing in a like environment
        • Can also leverage virtual environments which can mimic your current production design
        • You can also use your configuration management tool to run dry runs of the configurations against production systems (No changes made but will report back on results)
      • § Leverage automation integration testing tools to actually perform configuration changes in your test environment (CI/CD)
        • Builds the testing phase documentation and results which will be used in later discussions
        • Maintains a history of the changes and when or where a failed configuration was experienced
        • Allows for a more readily available way to go back and look over all steps performed
      • § Finalize the automation tasks required to reach the desired state of the change requested
    • Code-review
      • § Leverage a code-review system
        • Ensures that the automation tasks are solid and absolutely no room for error
        • Ensures accountability
        • Ability to send back to the testing phase to correct or add tasks which may have been missed or need to be added
    • Peer-review
      • § Re-engage all teams and organizations which were involved in the open discussion phase
      • § Discuss the documentation and drawings of the current baseline established
      • § Discuss the processes and documentation on how to obtain the desired state of the change request
      • § Discuss the findings and results of the testing phase
      • § Share the phases of testing and present the details of each
      • § Discuss any further questions or concerns
      • § Agree on whether to proceed or not to the original timeline
    • Implementation phase
      • § Leverage the exact same automation integration tools and methodologies to bring your production environment to the desired state
        • This will ensure that the processes are exact. Leaving room for error to a minimum and a repeatable process.

As you can see from the above, there is a fair amount of mentality change that has to occur in order to begin this journey. But remember as each new implementation follows these same methodologies it will become much easier going forward. As well as it will provide a much more consistent and repeatable process for each implementation.

Hope you enjoyed this post and it has proven to be of some use. Up next we will discuss some of the tooling required to provide us the needed abilities discussed in the above phases.

Thwack - Symbolize TM, R, and C