09 June 2016

Snapshotting Git Projects into Monticello Repo (Automatically)

At some point I wrote a guide for Pharo and Git(Hub) Versioning [Revision 2] where in the end I've described how to create Metacello configurations if you use baselines and git for versioning. While this approach masks all your git workflow behind the common (for smalltalkers) Metacello facade, the complexity of analysis and management remains high. Let me put this the other way and with a concrete example. RMoD team (especially Pavel Krivanek) is working on minification and bootstrapping of Pharo image, and having to manage both Monticello and GitHub projects is a big pain in the neck.

At the beginning of this week I decided to go on a crusade to somehow have QualityAssistant code in Monticello for Pharo integration while I still version it with git. I've succeeded but it was painful and took a lot of time. In this blogpost I will describe the strategy that I used and the lessons that I've learned.


The Story

Now I have a configuration stored in a Monticello repo which points to Monticello packages of QualityAssistant and Renraku (sidekick project). It also has bunch of scripts on the class side that are responsible for updating the Monticello part to the current git version. While the configuration should be present in the latest Pharo 6 image, you can also load it with:
Gofer new
  smalltalkhubUser: 'YuriyTymchuk'
  project: 'ScrapYard';
  package: 'ConfigurationOfFlatQA';
  load

Get the Versions

So the main feature of git baseline is the ability to ignore individual package versions as they are all fixed in a single git commit. On the other hand Monticello allows you to have different versions of different packages, this is why you need to specify them for each release version.

First of all I need to get the versions that are currently in the image e.i. I want to do a snapshot of the current state.
BaselineOfMyProj project versions first packages collect: [ :p |
  | wc versName repo |
  wc := (MCPackage named: p name) workingCopy.
  versName := aWorkingCopy versionInfo name.
  repo := aWorkingCopy repositoryGroup repositories detect: [ :r |
    r includesVersionNamed: versName ].
  repo versionFrom: versName ]
*please note that my code may not be perfect, but this is how I've managed to make it work.
As you can see there is a lot of effort needed to obtain the current version object of a package.

Get the main Repo

Now as we have the versions (which originally came from git repo) we need to put them into some Monticello repo. The Monticello-based configuration which is responsible for updating itself is versioned in a Monticello repository suitable for all the other packages. But we cannot simply ask a package: "What is your main repo?" or "From where were you loaded?". No, we have to do it the complicated way:

self package mcPackage workingCopy repositoryGroup repositories detect: [ :repo |
  repo location matchesRegexIgnoringCase:
    '.+smalltalkhub\.com.+', self projectName, '.+' ]
Where projectName is a hardcoded 'ScrapYard'.

Generating the Version Methods

We need to generate version methods from the data that we've obtained and some version string that will be provided later. This is a super trivial part, as essentially you turn the version string into a method name while replacing characters like dots with underscores. Then you add a pragma with a version name and you also generate the rest of the spec with a list of packages like:
package: 'YourPackage' with: 'YourPackage-Author.555';
populating it with data from current package versions. I also override the symbolic stable version to point to the last created one. This way I can reuse the conf on meta repos.

Copying Versions to the Main Repo

This step is strait-forward: we have the main repo, we have the current versions, just do:
repo storeVersion: version
for each of the versions.

Saving and Distributing the Configuration

Last step. You need to commit the changes you've made to the configuration. Here is the code that does it:
version := self package mcPackage workingCopy
  newVersionWithMessage: 'Finally a new version'
  in: mainRepo.
mainRepo storeVersion: version.
Yes, maybe it's my bad luck, but #newVersionWithMessage:in: didn't actually upload the version to  the repo although Monticello browser was showing that it is there. I had to explicitly #storeVersion:  to make it work. Also I've stored the version in some public repos like Pharo Inbox for integration, and Pharo Meta Repos for catalog. As they are all on SmalltalkHub I copied the main repo (which has username and password information) and just changed the locations, to deal with credential issues:
publicRepoLocations do: [ :loc |
  mainRepo copy
    location: loc;
    storeVersion: aVersion ]
To wrap up:
  • we need the main repo for many operations
  • obtain current versions
  • create configuration methods
  • upload versions
  • save the latest configuration changes and distribute them

The CI

Although the configuration can update itself automatically, I wanted to trigger this process on a CI server to have it completely out of my mind. This was the most time consuming part and I want to thank Peter Uhnak both for his time and his blogpost on Deploying Pharo builds from Travis.

I've done this with Travis, and you can look at the config in the QualityAssistant repository. The setup uses super convenient smalltalkCI framework with most of the configuration manipulation happening in the (pre-)deploy stage. Here is the exact config that I use for that part:
before_deploy:
  - ".utility/prepareDeploy.sh"

deploy:
  provider: script
  skip_cleanup: true
  script: ".utility/flatDeploy.sh"
  on:
    smalltalk: Pharo-6.0
    condition: "$TRAVIS_OS_NAME = osx"
    all_branches: true
    tags: true
First of all I'd like to focus on the deploy:on: section. I specify exact smalltalk version and operating system because my build runs a matrix job which results in four parallel builds. I don't want to run 4 deploy processes in parallel, this is why I have limited it to one job by pharo version and operating system. (yes, it's a pity but the deployment happens if one of the other jobs fail, and it's complicated to cope with that on Travis).
Secondly, it is common to mark the "releases" of a git-baseline-project with tags. This is why I run the deployment only if the build is happening on a tagged commit. This is also why I allow it for all branches, because if I decide to tag a non-master branch it should also deploy. This depends on your development workflow, but I may create a patch version in a hotfix branch and merge it later while being able to integrate the patch immediately.

Then there are two scripts: prepareDeploy.sh and flatDeploy.sh. I want to start with the latter :). It does exactly one thing:
$SMALLTALK_CI_VM $SMALLTALK_CI_IMAGE eval \
  "ConfigurationOfFlatQA makeVersion: '$TRAVIS_TAG'"
runs the configuration scripts with the tag as the new version name.

Now preparation is much more complicated. There are two steps: set the author name and load the latest version of the configuration package. Setting author name is trivial:
$SMALLTALK_CI_VM $SMALLTALK_CI_IMAGE eval --save "Author fullName: 'JonDoe'"
Monticello fails to create a new version if you won't do this.

Then you have to load the configuration package. And you want to initialize repo with a username and a password, so you can commit the changes in the end. I suppose that you don't want to store the credentials in the scripts that are open to everyone, this is why you should use Travis encrypted variables. Now you will think that this will work:
Gofer new
  repository: (
    MCSmalltalkhubRepository
      owner: 'YuriyTymchuk'
      project: 'ScrapYard'
      user: '$HUB_USER'
      password: '$HUB_PASS')
  package: 'ConfigurationOfFlatQA';
  load.
I've also thought like that, but then I spent a couple of more hours to discover and fix a problem. You see, if there is already a repository with the same location, Gofer ignores your's and use the existing one. And there is one in the image, as project is integrated from that repository. So this is an ugly hack that I did in the end but it works.
| repo |
repo := MCSmalltalkhubRepository
  owner: 'YuriyTymchuk'
  project: 'ScrapYard'
  user: '$HUB_USER'
  password: '$HUB_PASS'.

MCRepositoryGroup default repositories
  detect: [ :each | each = repo ]
  ifFound: [ :other | other become: repo ].

Gofer new
  repository: repo;
  package: 'ConfigurationOfFlatQA';
  load.

Conclusions

All this process got me thinking about the state of tools that we have and process that we follow. Here are the small summaries of my thoughts :)

Different Package Versions

As I've mentioned before, configurations are complicated because you are able to store many different version of different packages in one Monticello repo. It's a bit the same if we had multiple branches with different code in git repo and were trying to synchronize them. Maybe if we sticked to one package per repo it would be easier.

Ugly Monticello API

All these parts where you have to save a new version in a repository and then store it in the repository… It looks like there was not enough usage of Monticello programmatically and so there were no comments and suggestions about the available API. I think we should encourage people to use our frameworks in their tools to know if we have a good API.

Too Few Constrains!

Since I started to use git in combination with Monticello, I noticed that they are very similar. But git has much more constraints. For example it's really hard to remove a version from the middle of a history in git. But in Monticello you can simply delete the zip file. You can say that we shouldn't mess with files… Then think about this: in Monticello you can commit a version x that has version y as the parent but version y is not present in the repository. If we had some simple constraints that ensure sane usage of the versioning system, it would be much easier.
This also applies to the fact that we cannot obtain the current version object for a package that we have, and we don't know from which repository it was loaded. If we assumed that there should be a version and it has to come from somewhere there would be this functionality.

Good UI Bad CLI

Although we still have much to do, Pharo has an indeed amazing UI. Imagine that you want to commit and the author name is not specified. You will be presented with a dialog requesting you to enter your author name. But what if you run this in a headless mode on a CI server? For me it was just crashing for no reason, and I've spent plenty of time to track the issue down.

The Idea of Snapshotting

In the end I kinda like it this way. Consider other software projects: there is a source code repository for development and then you compile and package it for a release. Putting everything into a Monticello repository is like this compilation and packaging. However I'd enjoy having this in a single "pack" rather then a bunch of zips in a Monticello bag, but this is the technology that we currently have.
On the other hand I still think that there should be a connection to the versioning system repo, so you can update to the latest version, fix something, contribute.


Let me know if you are using some other approaches for MC-git versioning. The crusade is not over!


No comments:

Post a Comment