To be effective at IT operations, we embrace these philosophies:
- Trustworthy ecosystem. At a high level the goal is to “keep the lights on.” At a detailed level anyone responsible for IT operations wants to run an IT ecosystem that is sufficiently secure, resilient, available, performant, usable, and environmentally friendly. Part of running a trustworthy ecosystem is monitoring running services so as to identify and hopefully avoid potential problems before they occur. For some systems, and perhaps for your IT ecosystem as a whole, you may have service level agreements (SLAs) in place with your end users that guarantee a minimum level of trustworthiness.
- Resilient infrastructure. Resiliency requires a focus on the strategic (long-term) over the tactical (short-term). Anyone responsible for IT operations needs to have a very good understanding between the long-term implications of a decision versus the short-term conveniences. For example, a solution delivery team may want to apply what they believe to be the “best” technologies to implement their system. This makes a lot of sense from the narrow viewpoint of that single solution and it often proves to be incredibly convenient, and fun, for the developers because they often get to work with new technologies. However, from an operational point of view you end up with a mishmash of technologies that must be operated and evolved over time, resulting in a potential maintenance nightmare. Yes, you will still make some short-term decisions but you should do so intelligently. Too great a focus on the long-term results in a stagnant IT ecosystem, too great a focus on short-term decisions results in operations teams who spend all their time fighting fires. The long-term technical vision for your organization is developed by your Enterprise Architecture efforts and the long-term business vision comes from your Product Management activities.
- Standardization without stagnation. The more standardized your IT ecosystem is the easier it will be to run, to release new functionality into, and to find and fix problems if they should arise. However, too much standardization can lead to stagnation where it becomes very difficult to evolve your ecosystem. You will need to work very closely with people performing enterprise architecture and product management activities to ensure that you understand the long-term vision and are working towards it.
- Regulated releases. Most DevOps strategies reflect the viewpoint of a single product team. But what about the viewpoint of your overall IT ecosystem, which may comprise hundreds of products? An interesting question to ask is what is the WIP limit for releases across your overall ecosystem? In other words, what rate of change can your infrastructure, and your stakeholder community, bear? In the Disciplined Agile (DA) tool kit this philosophy is an important driver of the Release Management process blade. Furthermore, some regulatory compliance regimes call out a separation of concerns pertaining to release management – the people building a product are not allowed to release the product into production, someone or something else must make that decision and do the work (even if “the work” is automatically running a script once your regression test suite passes).
- Sufficient documentation. Yes, there will be some documentation maintained about your IT ecosystem. Hopefully this documentation is concise, accurate, and high-level. Common documentation includes an overview(s) of your infrastructure, release procedures (even if fully automated, there’s still some overview documentation and training), and high-level views of critical aspects of your infrastructure including security, data architecture, and network architecture. Organizations that operate in regulated industries will of course need to comply to the documentation requirements of the appropriate regulations. When infrastructure components are discoverable and self-documenting there is a lesser need for external documentation, but there is still a need. Any documentation that you do create should be maintained under configuration management (CM) control.
- Automate, automate, automate. Any IT operations process that still has people in the loop should be questioned, examined, and automated wherever possible. This will increase the predictability, resilience, and speed of your operations efforts while reducing overall cost and service-level variance.