Monday, December 29, 2008

Abstracted Administration

[The latest version of this post can be found on the ControlTier Wiki]

The idea that drives the ControlTier project is "abstracted administration". In this paradigm, you think about and develop management processes using an abstracted view, one that is independent of any particular physical node or software deployment. What drives the need for abstracted administration is rising scale and complexity . Abstracted administration is paramount in a world where
  • the host environment is varied in kind and size (e.g., different scales in heterogeneous environments and also the growing use of "elastic" virtual machine infrastructure)
  • management processes are becoming more distributed and more easily impacted by application and environmental differences (e.g, multi-step procedures that execute across the network and across different tools).
This project supports the idea of an abstracted administration paradigm, one where you manage distributed application services and their management processes through a simplified, more standardized and abstracted view. Within this paradigm, an administrator has the choice to focus on managing operations from a higher level, and let the underlying framework coordinate operations across the actual physical environment. Of course, one can also choose to manage things at a much finer level of granularity, performing management activity on a particular host, which still remains important.

* The ControlTier project is not a VM technology. ControlTier is a technology that lets you deploy and control services hosted on operating system instances (virtualized or not).

Abstracted administration framework

Within this paradigm, the framework
  • Lets you manage many deployments through an abstracted, logical structure or at any distinct point by objectifying your process within the context of the abstracted deployment model
  • Wraps around your scripts to produce workflow endpoints that can be combined together to execute distributed multi step processes.
Within the administration framework, your operations processes become more uniform and therefore more reusable, with environment specific parameters and settings externalized in a collaboratively maintained model, shared across support teams. Within this paradigm exists a hierarchy of types that let you break down operation processes into several standard layers, each focused on a specific aspect. Aspects exist for package building, staging and deployment, service run state control, mediated execution, and more.

Abstracted Nodes

For command execution, this project aims to help you abstract the physical Nodes in your environment. Commands ultimately execute on some host, but in large scale environments, it's cumbersome to specify the particular hosts. The ControlTier software provides a couple ways to abstract node infrastructure:

  • Node tags and attributes: The ctl-exec command lets you execute ad hoc commands by addressing target Nodes using tags and attributes rather than lists of hosts. This is a convenient way to manage host groups and ctl-exec supports inclusion and exclusion filters to identify any subset of hosts. ctl-exec also supports parallel execution important when you need to execute actions simultaneously across a large number of hosts.
  • Command dispatching': The ctl command lets you define reusable commands targeted to individual service management control actions or target coordinated distributed actions, logically via a Site. The command dispatcher looks up the nodes on which commands should be executed and invokes them remotely when necessary. In this case, one stays focused on managing service action without having think about nodes.

Abstracting nodes from your procedures is the first step towards abstracted administration. When nodes vary between environments or when they are based on VMs and can be rescaled at any time depending on conditions, your scripts will not have to be changed to redefine node targets.

Abstracting the Service

One of this project's primary goals is to provide a service management interface that lets you forget about Nodes during operation.
  • Long running application components are called Services in ControlTier. You abstract your services by exposing all the physical environment differences in the Service's object model. Doing this lets you define your service management code in an abstracted way. During execution your procedures are bound to environment specific views.
  • ControlTier's Site, provides the management interface that lets you logically control a set of services be they one machine or many. Application components combine together to form a distributed application. Where these components are hosted depends on the environment. For example, in development or QA they may all reside on one node, while in production they may be spread over many.
Exposing logical control of the many parts of a service is a further step towards abstracted administration, since at this level of abstraction not only are Nodes abstracted but so are the individual application deployments that comprise the integrated service.

Abstracting the process

There are several service management life cycle activities common to any application service: build, stage, install, update, stop, start, configure, check, roll back, etc. Of course these activities vary depending on operating system, application platform, or environment. The last aim of the ControlTier project is this: simplify operations by obscuring environment differences that impact procedures.
  • ControlTier includes a standard set of types, each responsible for carrying out each of the life cycle steps. You can also expose your procedures in place of the standard implementation.
  • Service management processes are carried out over multiple steps across different machines. Again depending on where the process runs, process execution can occur on different machines. ControlTier workflows allow you to define and execute processes independent of the environment making them more reusable.
  • Besides abstracting location, service management processes can also be executed sequentially or in parallel without any code modifications. ControlTier workflows allow you to define a thread count in the object model to control parallel or sequential execution.
By exposing life cycle activities as service management workflows reusable across environments, another level of abstracted administration is achieved.

What drives the ideas behind abstracted administration are the successive layering of abstractions:
  • Abstract the nodes, for better visibility into the services.
  • Abstract the services, to gain better visibility of the management processes.
  • Abstract the processes, for standardized reusable life cycle steps and workflows
Through the process of abstraction, all the specifics become maintained in an object model and the procedural code consolidates into common libraries. Less code means less maintenance, better re-usability, and further elimination of procedural variation, often the root cause of service management problems.

Tuesday, December 23, 2008

Focus for 2009

Hello All,

We are winding down 2008, after a good year's development that culminated in the current 3.2 release. The features of 3.2 both evolved and matured the 3.1 functionality, but also include several new fundamental capabilities. We are excited to see new 3.2-based solutions.

Besides working with consultants in the service group, we also work with community members and have acknowledged several areas where focussed effort should be made for the coming year:

Vastly improve our documentation. The docs are currently in a woeful state. They are spread out in a hard to navigate structure and are written too abstractly and are not helpful to new users. Here's how we are going to improve them:
  • Consolidate all the docs into one medium and one site.
  • Use a Wiki instead of Forrest. A Wiki is a much more fluid way to keep docs up to date. Also we can easily add community members to help contribute.
  • Make the documentation "How To" oriented. These are short focussed explanations on using ControlTier software for a typical use case

Merge "Elements" and the ControlTier "base" libraries into a single source code module and build artifact. You may not know this, but Elements is a library of ready to use modules interfacing with J2EE and other application infrastructure. It is currently hosted at Moduleforge. To improve out-0f-the box productivity we'll:
  • Consolidate Elements and Base into a single controltier project "seed" and CTL extension
  • Write documentation that will assume the Elements library is already installed
  • Write tutorials revolving around Elements use cases

Improve release process. I admit it... our release process was very erratic. 2008 releases were all driven by some external project schedule and did not serve the community well. Here's how we'd like to change:
  • Roadmap and milestone based scheduling
  • Regular release points
  • Publicize releases better including mailing list, news, freshmeat, etc
  • Change lists
  • Standard use of Duke's Bank demo for QA testing

ControlTier Demo shall work out of the box. ControlTier software is a pretty general purpose process automation system which sounds great until you want to see it actually do something. In 2008, we began using the J2EE tutorial's sample application, "Duke's Bank" in our demos. This demo shows quite a breadth of ControlTier use cases and helps make its functionality and applicability more concrete. It should work out of the box. Here's what we want to do:
  • Simple demo setup that is well documented and quick to set up
  • Tutorial documentation revolves around the Duke's Bank use cases to establish a consistent set of examples
  • Supporting slide presentation that lets you run your own demos when you want to show other people in your own groups

Increase community involvement and support
  • Take better advantage of the Sourceforge features to allow contributions of all kinds. We barely scratch Sourceforge's surface now and there's some useful tools to take advantage of
  • The community really is the best authority on how to move the project forward so we want to better facilitate comments from the community discussion areas
  • Run a ControlTier IRC. Sometimes a person wants to ask a quick question or just bounce some ideas. 

Of course, development is not frozen for 2009 but we want to shift priorities to make sure the ControlTier project is healthy and the software usable on its own. We are looking forward to driving these improvements and welcome any feedback or helping hands.

Tuesday, December 16, 2008

Installing ControlTier 3.2 on Windows XP

[Note: These instructions are for installing 3.2 and they point to an archived version of the 3.2 manual. If you are installing a newer version of ControlTier go to the ControlTier wiki]

Here's my cheat sheet for installing the ControlTier framework on Windows XP (soon to be folded into the standard documentation!):
  • Use Firefox with the ControlTier web applications since AJAX compatibility issue preclude using IE
  • Follow the 3.2 framework installation notes on Open.ControlTier
  • Choose a %CTIER_ROOT% directory that doesn't contain spaces since this is know to break certain standard modules (e.g. "C:\ctier")
  • So far as dependencies are concerned:
  1. ControlTier requires Java 1.5. I keep a zipped up version of Sun's JDK that I can unpack into "%CTIER_ROOT%\pkgs" to avoid disturbing the system-wide installation. (There are some issues associated with attempting to install multiple versions of Java on the same system using Sun's Microsoft Installer).
  2. You can find the Graphviz Windows installer on their download page (the latest version when I looked was 2.20.3. I also install Graphviz into "%CTIER_ROOT%\pkgs" since it usually only required by ControlTier. (Note that I've found myself having to manually sort out the Path system environment variable in order to ensure the correct "dot.exe" is in %PATH% once Graphviz has been installed a couple of times).
  • Download the Zip release of the latest 3.2 release of the ControlTier framework from Sourceforge. This was version 3.2.4 at the time of writing this posting. (When installing from a local Windows desktop or from an RDP session I prefer to use the Jar installer, for all other circumstances I use the Zip installer).
  • In preparing these notes I ran through the installation (taking default values) using the Zip installation method successfully, however ...
  • ... with this release the Jar installer (which attempts to establish Jetty as a Windows service) is broken.
  • In order to start the ControlTier server:
  1. I started a new command shell and ran the "ctier.bat" script that the installer placed in my user's home directory ("C:\Documents and Settings\Anthony Shortland").
  2. I executed "%JETTY_HOME%\bin\start.bat"
  3. I picked up the ControlTier server's "Welcome" page at "http://localhost:8080" and from there launched Jobcenter, Workbench, ReportCenter and Jackrabbit ...
  4. ... authenticating as "default/default" as required.
  • In order to populate the default Workbench project with a useful set of modules, I downloaded the latest (3.2.4) release of the Elements Module Library seed Jar from the Sourceforge "Moduleforge" project ...
  • ... and loaded it via Workbench's Admin page "import seed" dialog.

At this point I had a working ControlTier framework installation ready to push ahead and try both some of the tutorials on Open.Controltier.

Anthony Shortland.

Sunday, December 14, 2008

Keeping Workbench trim and fit!

Under the covers the Workbench model data is stored in a set of RDF XML files using the Jena Semantic Web Framework.

At the lowest level, this means that a set of files exists (by default) under "$CTIER_ROOT/workbench/rdfdata" on your ControlTier server for each project you create in Workbench:

$ cd $CTIER_ROOT/workbench/rdfdata
$ ls -lh
total 16M
-rw-rw-r-- 1 anthony anthony 6.1M Dec 8 10:53 Arch_UModules_UPioneerCycling
-rw-rw-r-- 1 anthony anthony 491K Dec 5 17:33 Arch_UObjects_UPioneerCycling
-rw-rw-r-- 1 anthony anthony 6.1M Dec 8 10:53 Arch_UTypes_UPioneerCycling
-rw-rw-r-- 1 anthony anthony 809 Dec 5 14:34 Arch_UXforms_UPioneerCycling
-rw-rw-r-- 1 anthony anthony 490K Dec 8 10:53 Map_UPioneerCycling
-rw-rw-r-- 1 anthony anthony 1022K Dec 8 10:53 Modules_UPioneerCycling
-rw-rw-r-- 1 anthony anthony 56K Dec 5 17:33 Objects_UPioneerCycling
-rw-rw-r-- 1 anthony anthony 1.2M Dec 8 10:53 Types_UPioneerCycling
-rw-rw-r-- 1 anthony anthony 1.4K Dec 5 14:35 Workbench
-rw-rw-r-- 1 anthony anthony 809 Dec 5 14:34 Xforms_UPioneerCycling

A given set of files has the project name appended (in this case "PioneerCycling") and is split into two sets: the primary files and their archives (prefixed with "Arch_").

This would all be largely academic if it were not that managing these files turns out to be critical to the responsive performance of anything but the most trivial projects. It turns out that Jena relies on file level locking to manage updates and in the process repeatedly copies the entire file to temporary "checkpoint" copies. Of course, at the OS level, performance copying files of even tens of MB in size is trivial.

However; streaming the same data through the Jena library turns out to be a significant performance bottleneck; so much so that it really pays to keep the ControlTier repository trim and fit!

The primary way to do this is to navigate to the Workbench administration page, find the "Model Administration (Advanced)" section and run the five file compaction tasks:

This process minimizes the size of the primary data files and can be run as frequently as makes a difference.

Dealing with the archive files is a little more complex.

In normal operation there is no need to track the history of changes to the model so it is reasonable to remove the archive files on a regular basis. The process for achieving this is straightforward:
  1. Shutdown Workbench.
  2. Remove the "Arch_" files from $CTIER_ROOT/workbench/rdfdata associated with the required project(s).
  3. Restart Workbench.
There are a few points to note about this process:
  • It is necessary to do this with Workbench stopped as the file set is cached in the JVM's heap and will simply be re-written otherwise.
  • You may wish to skip the "Modules" archive file since removing it invalidates Workbench's notion of the most recent ("head") version of the packaged modules on the WebDAV requiring that you repackage all Deployment and Package modules - quite a lengthy process.
  • With a project with a stable type model, it is really only "Objects" archive file that has an impact on performance and so it may only necessary to remove this file.
  • As a rule of thumb, only worry about files that are > ~20MB.
  • There have been cases where we've set process up in cron (since Jobcenter/Ctl/Antdepo requires Workbench to be available for normal operation).
Finally, I should note that this whole issue was much more of a problem under ControlTier 3.1 and that we've done a lot to mitigate its impact on performance under ControlTier 3.2 by eliminating unnecessary model versioning. We have not dealt with the fundamental scaling issues in Jena, so it still pays to be conscious of all this.

Anthony Shortland.