Continuous Integration: Executing Remote Tasks with TeamCity, MSBuild, RemCom, and ExecParse
I spent most of last week at work improving our build process, so I thought I’d share with you, and my future self, how I accomplished this.
First, Why?
We practice continuous integration (CI) at my office. This means we have a server, specifically
TeamCity, monitor our source control repository and perform various builds on a clean system (thus avoiding the “it works on my machine!” syndrome). In addition to performing continuous builds (whenever someone checks code in), we also perform full builds at regular intervals throughout the day. We also perform a more exhaustive nightly build, where in addition to compiling our suite, we also deploy the built suite to our lab server for more processing. This additional processing entails running a program to create database schemas on the 3 database platform we support (SQL Server, Oracle, & Access), perform conversions from one version of the suite to another, and validate the results.
Sounds straight-forward enough, right? Well, there’s one really big wrinkle here: our build servers are physical servers located in Texas, our lab servers (app server and two database servers) are virtual machines located in Florida, and between them is, shall I say, a
really crappy network connection. How crappy? Well, if we run the database creation and validation app on our lab app server, the process takes about 45 minutes. If we run the creation and validation app on our build servers, the process takes
significantly longer (I’ve been told 4-6 hours). That extended length is not acceptable, as it interferes with our ability to create the MSIs that we use the next day for integration testing and QA.
To get around this, our nightly build process stopped at copying a zip archive of the compiled suite to our lab server. We then relied on a couple of scheduled tasks on the lab server to run batch files to unzip the suite and run the database creation and validation app. This worked, but had a major draw back. If the creation and validation script failed, it sent an email. However, only a very small number of the team received this email. Since the CI server had no knowledge of this process, it could not fail the build, and the majority of the team would go on not knowing that the suite was effectively broken in a way that would only be apparent at run-time, and only if we happened to exercise a certain bit of functionality that relied on the database schema being valid.
In other words, the risk of massive wasted team productivity was very high.
Now, How
As I mentioned before, we use TeamCity as our CI server. We build a suite of applications for Windows clients and servers, so we use MSBuild as our build system. MSBuild has a built-in task called Exec, which allows you to execute any command, and if the command returns a non-zero exit code, it is considered to have failed. This is great, except that it executes the command on the current machine (recall that the build server is in Texas, but we need the database app to run on the server in Florida.
PsExec: A False Start
My first thought for remotely executing commands on a Windows server was to use
PsExec from the
SysInternals Suite. This worked great, as long as I ran the MSBuild script from my local machine. However, as I discovered after combing the SysInternals forums, PsExec does some funky things with standard in (stdin) and standard error (stderr), such that Java processes (which TeamCity is), seem to hang when calling PsExec. That is, PsExec does not appear to return control to the spawning Java process in the manner that it expects. So, even though the remote processes would execute properly, they would not return control to TeamCity, nor would it provide the exit code of the remote process. As a result, not only would the build fail (we configured TeamCity to fail the build if it ran more than 2 hours), but it would fail for the wrong reason, and continue to mask the real reason from the majority of the team.
This would not do, no sir. So I set out in search of an alternative.
RemCom: A New Hope
After much searching, I came across
RemCom, “the open source psexec”. RemCom performed the same function as PsExec, but it did so without funky stdin/stderr redirection. So, I added RemCom to the build process, adjusted the build scripts accordingly, and kicked off a new nightly build in TeamCity. The build finished in about 90 minutes, and it succeeded! Except for one problem: I knew there was still a problem with the databases, so I was expecting it to fail. Sure enough, I checked the logs and found that RemCom had provided all the output of the remote commands (something PsExec never did, btw), including printing the exit code returned. However, it seems that exit code was not being bubbled up to RemCom. So, we had a situation where RemCom was telling us the remote command failed, but RemCom itself was returning a successful exit code, and so MSBuild saw this as successful.
Exec Task: The Empire Strikes Back
It was great that I was getting the output of the remote commands, but now I needed to parse it. Unfortunately, the Exec task that comes with MSBuild simply does not provide you with the ability to capture the output and do anything meaningful with it. So, it was time to look elsewhere.
ExecParse: The Return of the Exit Code
After a little bit of Googling, I came across
ExecParse:
A custom MSBuild task that inherits from the Exec MSBuild task. The task adds a parameter to allow parsing of the output using regular expressions, and reporting to the MSBuild logger (and consequently the VS.Net IDE).
Bingo!
I then added a reference to the task library in our build script and followed the simple directions to create a configuration with a simple regular expression to grab the exit code of the remote command from RemCom.
[ccie_xml tab_size="2"]
<UsingTask AssemblyFile="ExecParse.dll" TaskName="ExecParse.ExecParse" />
<PropertyGroup>
<DeployServerHost>nasrvfl7004</DeployServerHost>
<DeployServerLocalPath />
<ExecParseConfiguration>
<Error>
<Search>Remote command returned (-?[1-9]\d*)</Search>
<Message>Remote command failed with error code $1</Message>
</Error>
</ExecParseConfiguration>
</PropertyGroup>
<Target Name="CreateDbsOnServer">
<Message Text="Creating databases on Server $(DeployServerHost)..." />
<Error Text="Build and Deploy selected, but the DeployServerHost was not not specified" Condition="'$(DeployServerHost)'==''"></Error>
<Error Text="Build and Deploy selected, but the DeployServerLocalPath was not specified" Condition="'$(DeployServerLocalPath)'==''"></Error>
<Copy SourceFiles="RemCom.exe;CreateDbsOnServer.bat" DestinationFolder="$(DeployLocation)" />
<ExecParse
Command="RemCom.exe \\$(DeployServerHost) /d:$(DeployServerLocalPath) $(DeployServerLocalPath)\CreateDbsOnServer.bat $(DeployServerLocalPath)"
Configuration="$(ExecParseConfiguration)"
ErrorCausesFail="true" />
</Target>
[/ccie_xml]
That, right there, was the secret sauce. Now, we can execute remote tasks from our CI build, include their output in the build log, and fail the build if those commands fail. NICE!
Going forward, we’re going to investigate including the remote commands into the full builds, as they run reasonably quickly enough. We will probably want to investigate adding another build agent, as this would tie up the build server for 90 minutes, thus preventing the continuous builds from running in a timely manner.
Epilogue
Pavel Sher, presumably from JetBrains, contacted me via Twitter and
suggested that I try the latest TeamCity EAP. Pavel tells me that PsExec is working now with this version (would be a 5.0 EAP). Though I am not in a position to go upgrading TeamCity right now, I thought I should include this bit of info, in case someone reading this is.