Thursday, April 29, 2010

Adventures with chef-server and Gentoo, part 2

Followup to part 1

Once I had a chef-server and webui all set up on my dev box, the next part is to create a "stem cell" image that has chef loaded up on there. Chef 0.8 uses SSL for its new validation mechanism, and so the server creates public/private keypairs for the validation. These are encapsulated in a Chef concept called a client. It took me a while to figure out that client was not a good choice of words. It might have been better renamed as access or credentials.

To begin, I needed to set up knife in the environment I intended to do chef development on. In this case, I appropriated the VM running chef-server but this is not necessary. In fact, I intend to have a dedicated chef-dev VM running in the near future. However, I don't want to run the script I had setup to provision another Gentoo image from storage. I'd still have to change the hostname, run emerge --sync, get the packages up to date, clone my vim setup ... many of the things I intend for chef to handle. Ideally, I'd add a node with roles[dev] and it would just eventually give me back a VM with my dev environment set up.

Setting up knife works like this:

$ sudo knife configure --initial
Your chef server URL? http://chef-server:4000
Your client user name? hosh
Your validation client user name? chef-validator
Path to a chef repository (or leave blank)? .
WARN: *****
WARN:
WARN: You must place your client key in:
WARN: /home/hhh/.chef/hosh.pem
WARN: Before running commands with Knife!
WARN:
WARN: *****
WARN:
WARN: You must place your validation key in:
WARN: /home/hhh/.chef/chef-validator.pem
WARN: Before generating instance data with Knife!
WARN:
WARN: *****
WARN: Configuration file written to /hosh/hhh/.chef/knife.rb


I have chef-server added in my /etc/hosts file using the private IP address. With that, I can get knife cookbooks list. They were mostly empty, which is what I expected. The weird thing was seeing how I had to copy over validation.pem over. I mean, that's a private key. I don't want it floating around, even if this was all on my private VM cloud on my box. The process I'm working through now will eventually make its way to production. I wasn't so sure about this idea with dropping that validation.pem private key, but for now, I took it on faith until I know this system better and I can re-examine the assumptions again.

Now, I provisioned another vm, did the emerge --sync, pulled down the chef-overlay, unmasked the chef-client packages, installed chef-client, started it up ... and nothing. It was not appearing in my node list.

The reason was because the node did not have the authentication key. I flailed around on several online documentation. I came across the documentation that says that, clients are how you create the access, and that while they are conventionally tied to a node, it is not necessary. Since I was making a template image that, ideally, on the first boot, it would phone home and register itself, I figured I can make a special access key. Call it stemcell.pem, then have it register itself, then delete it later.

It turns out that the smart people at OpsCode were already n number of steps ahead of me. This was what validation.pem was supposed to be. They even provided a handy :delete_validation_key recipe to take care of the cleanup. Doh!

I copied validation.pem over, re-ran chef-client on the client machine, and what do you know? It works.

Tuesday, April 27, 2010

Adventures with chef-server and Gentoo, part 1

I had been circling around chef-server and Gentoo for a while. There are three ways to install Chef-server:

  1. Install it manually

  2. Use chef-solo to bootstrap chef-server

  3. Use system packages



Needless to say, (1) was not fun. While (2) invokes the mystical recursive experience of Lisp-written-in-Lisp or maybe, the Stand Alone Complex, as someone who is more excited about using Chef, I just want to get on with it.

Option (3) depends on how good your favorite distro is. And in Gentoo's case, this has been somewhat lacking. Fortunately, someone has put together a chef-overlay and update it for Chef 0.8.10. The instruction in the README works.

What does not work is if, you are like me, decide to be clever and use autounmask instead of downloading the gists. After the tenth gem I had to "auto" unmask, I gave up and used the gist version. If you are building a new chef-server on Gentoo, you should too.

We'll see how this experience goes. As I dive deeper into the rabbit hole, I'll post back here. Stay tuned.

Three free Rackspace Cloud backup images? Not so fast.

Biting the bullet, I went ahead and prepared a Gentoo 10.1 "stem cell" image on Rackspace Cloud so I can use it for my nascent infrastructure. I got it prepared (even though I made a mistake and started the bootstrap from 2008.1, but that is OK) and had the image backed up. According to the control panel:

To move an image to Cloud Files, click the Move link in the table below. You may keep as many images in Cloud Files as you want. A storage rate of 15 cents per gigabyte will be applied. Bandwidth charges will only be applied on image download.

If you move a scheduled image to Cloud Files, the schedule will continue to save the image in any available With Server slot.


However, I could not find the button that says "move". So I brought up a chat window for it and asked about it:

Hosh H: Hi. I am trying to move a cloud server backup image to cloud files.
Hosh H: It says there is a move link but I do not see it
Chris J: let me get you some info about that.
Chris J: I do apologize for any inconvenience, this feature is not fully enabled, if you would like to enable this feature, please know that there is fee of .15 cents per GB for using cloud files backups, and if you are in ORD datacenter all traffic would incur a $0.08/GB in and $0.22/GB out bandwidth charges. This feature is for the whole account and not per server. Once you change over you will not be able to go back to non cloud file backups. How would you like to continue?
Hosh H: Ok, so let me ask some questions
Hosh H: If I change over, I can still do the with-server backups, correct?
Chris J: no its either or.
Hosh H: Thats not what it says in the control panel.
Hosh H: "To move an image to Cloud Files, click the Move link in the table below. You may keep as many images in Cloud Files as you want. A storage rate of 15 cents per gigabyte will be applied. Bandwidth charges will only be applied on image download."
Hosh H: "If you move a scheduled image to Cloud Files, the schedule will continue to save the image in any available With Server slot."
Chris J: I am sorry this system is still very new, some of the wording is not very precise.
Hosh H: Ok, so I just want to make sure; the instruction on the control panel is wrong, is that correct?
Chris J: To the best of my knowledge yes.


Oh well. I was hoping to get three free server backup slots, but I was used to working with Slicehost where the backups slots costs an additional $5/mth. This makes more sense from a business standpoint, and besides, I had already decided to use Cloud Files images and the associated costs, since I wanted the "stem cell images" to be independent of any attached servers.

I'm sure, though, that other Rackspace Cloud customers may be puzzling over this. We'll see what happens when that ticket to change me over gets completed.

Update: The ticket finally came through, and guess what? I still have access to creating With-Server images, just like it says on the control panel. Maybe Rackspace is getting all the early adopters used to the idea of having to pay for With-Server images down the road. But hey, this works as I expected and it works well.

Friday, April 23, 2010

Improving macro registration for Remarkable 4

Since one of the emerging design principle for Remarkable 4 is to shadow Rails 3, and Rails 3's biggest strength is its modularity, I've decided to split Remarkable into components following the Rails 3 modules. For now, we have remarkable (core), remarkable-activemodel, and remarkable-activerecord. I plan for remarkable-rails to be split into remarkable-rack and maybe remarkable-actioncontroller.

However, with all of this, we now introduced the idea of macro collections depending on other macro collections. However, the way these macros are loaded into Rspec does not easily support this proliferation of modules.

Currently, macro developer has to do this:

Remarkable.include_matchers!(Remarkable::ActiveModel, Rspec::Core::ExampleGroup)


But seriously, do the developers need to know they are targeting Rspec::Core::ExampleGroup? This should be a default, with it being overridden by people who know what they are doing.

The other thing is that due to the modularity of Rails 3, it makes sense to have a set of macros built on top of other macros. So for example, ActiveRecord macros are built on top of ActiveModel macros. Not everyone will want to include the entire Rails stack just to use the macros, and we certainly don't want to expose a target (at least, not without having to dig).

Here is a mock API of what I'm thinking of:

The developer adds this:

Remarkable.register_macros(:activemodel, Remarkable::ActiveModel)
Remarkable.register_macros(:activemodel, Remarkable::ActiveModel,
:target => Rspec::Core::ExampleGroup)
Remarkable.register_macros(:activerecord, Remarkable::ActiveRecord,
:depends_on => :activemodel

This makes it easy for plugin developers:

Remarkable.register_macros(:foreigner, Remarkable::Foreigner,
:depends_on => :activerecord)
Remarkable.register_macros(:paperclip, Remarkable::Paperclip,
:depends_on => :activerecord)

If you are using only pieces, then you can active the macros:

Remarkable.activate!(:activerecord)
Remarkable.activate!(:paperclip)

Which will idempotently include the macros and its dependencies into Rspec2.

Remarkable::Rails, being the gregarious gem it is, will call:

Remarkable.activate!

... which will load up all of the macros. This would be in keeping with the expected convention for Rails while still providing modularity.

I've opened up this issue on the github to discuss this further.

Thursday, April 22, 2010

Remarkable 4.0.0.alpha1

I finally got a Remarkable 4 alpha1 one out. Here is how to use it in your project:

gem install remarkable remarkable_activemodel remarkable_activerecord --pre


There is no 4.0.0.alpha1 release for remarkable_rails. Please upvote and comment this github issue if this is important to you.

To install:

# In Gemfiles
group :test do
gem 'rspec'
gem 'rspec-rails'
gem 'remarkable', '>=4.0.0.alpha1'
gem 'remarkable_activemodel', '>=4.0.0.alpha1'
gem 'remarkable_activerecord', '>=4.0.0.alpha1'
end


and add this in your spec/spec_helper.rb

Remarkable.include_matchers!(Remarkable::ActiveModel, Rspec::Core::ExampleGroup)
Remarkable.include_matchers!(Remarkable::ActiveRecord, Rspec::Core::ExampleGroup)


Please note, you will need to have rspec 2.0.0.alpha7 to run this, either installed in system gems or in the bundle.

Caveats


While I ported as many of the features of Remarkable 3.3 over, there were some things that were dropped:


  • support_attributes and passing attributes into a describe or context block no longer works. Rspec2 provides a metadata functionality allowing you to pass metadata in examples and example groups, and this conflicts with the semantics for support_attributes. Quite frankly, I never even knew this feature existed until I attempted this port; it isn't something I ever needed since I do not write models in a way where its behavior changes drastically based on state. When it does change, I typically write out the descriptions long-hand.

    If there are enough upvotes for this issue, or someone who cares enough decide to submit a patch, we can add this back in. Any implementation will use the Rspec2 metadata though.


  • pending example groups are baked into Rspec2, so I took it out of Remarkable. The specs for the macros show they work.

  • I took out the validations that were implemented in ActiveModel and moved them over to a brand-new, shiny remarkable_activemodel gem. You will need both remarkable_activemodel and remarkable_activerecord to use the validations. I plan on making the packaging better so you can easily just activate ActiveModel macros whenever you load in ActiveRecord models. This will allow people to use the standard validation macros against any ActiveModel::Validations-compliant models, which I expect the vast majority of the NoSQL and ActiveResource models to implement.



Please post any issues to the Remarkable issue tracker

Tuesday, April 20, 2010

Taking over the Remarkable gem

Remarkable is an awesome gem. If you use Rspec for a Rails project, this macro framework is certainly something to consider.

The classic way of using Rspec to test your ActiveRecord model looks like this:

class Person < ActiveRecord::Base
belongs_to :organization

validates_presence_of :name
validates_presence of :email
validates_uniqueness_of :email
end

# In spec/models/person_spec.rb
describe Person do
before(:each) do
@valid_attributes = {
:name => "Name"
:email => "email@example.com"
}
end

it "should create a new instance given valid attributes" do
Person.create!(@valid_attributes)
end

it "should validate presence of :name" do
@person = Person.new(@valid_attributes.except(:name)
@person.should_not be_valid
@person.errors.on(:name).should_not be_nil
end

it "should validate presence of :email" do
@person = Person.new(@valid_attributes.except(:email)
@person.should_not be_valid
@person.errors.on(:email).should_not be_nil
end

it "should validate uniqueness of :email" do
@original_person = Person.create!(@valid_attributes)
@clone = Person.create!(@valid_attributes)
@person.should_not be_valid
@person.errors.on(:email).should_not be_nil
end
end

[ See Gist ]

What this does is create a template model that has all the required attributes. We have an example that makes sure it is valid. Then for every validation, we create a modified attribute set so we can test each attribute in isolation.

This gets old very fast. Trust me, I've written many tests like this since Rspec got popular in 2008.

So how can we make this easier?

We can try using fixtures. The problem with fixtures is that (1) we have to manage our own primary keys and argue with the database and (2) you would have to load the fixture in, create a clone, and attempt to test each of the validation.

We can try mocking. This would give us the isolation when we stub out the mock model, but that's essentially another way of writing a valid_attributes. Besides, now we're digging deep into the internals of ActiveRecord.

We can use factories. I like factories, in particular, I currently use the Machinist in combination with Faker gem and Forgery. This gives me the advantage of having a blueprints.rb file I can include in db/seeds.rb to generate data I want to drop into my views. However, we are still essentially testing the ActiveRecord framework.

This is where Remarkable comes in. The above Rspec condenses down to:

describe Person do
it { should validate_presence_of :name }
it { should validate_presence_of :email }
it { should validate_uniqueness_of :email }
end

Done.

[ See Gist]

Notice that these macros shadow the ActiveRecord declarations. This is true for all the other macros, including association macros such as should have_many, should belong_to. Instead of focusing on boilerplate test examples, you focus on writing examples for custom code and really weird business logic.

Remarkable has come a long way from its beginning as a "Shoulda for Rspec". Since version 3.0, Remarkable has been refactored into a generic framework for creating Rspec macros. Remarkable::ActiveRecord and Remarkable::Rails were extensions of this macro. As a result, we saw some other macros: a remarkable for paperclip, mongo, and datamapper. There is also an internationalization package that lets you internationalize your rspec.

However, with big refactoring of Rails 3, followed by the big refactoring of Rspec 2, Remarkable needs some work. Carlos Brando, the founding author of the Remarkable gem has been looking for a new maintainer, and has passed the torch to me. As I stated in the Remarkable mailing list and on Rails-talk, my plan is to release a Remarkable 4.0 that is compatible with Rails 3 and Rspec 2, however, it will not be backwards compatible with Rails 2 and Rspec 1. Remarkable 3.3 will be around for that.

If you use Remarkable and plan on using it for your Rails 3 project, please speak up now. We're going to take advantage of the mass refactoring to do some refactoring of our own.

Monday, April 12, 2010

Chef and Gentoo Binary Repository

Ever since I watched Seth Chisamore's presentation on Chef at the Atlanta Ruby User's Group, I have had to rethink the idea of Gentoo Linux in Production. The first consequence of declaring your infrastructure as source code is that the virtual machine becomes disposable. Although Chef is distribution-agnostic, you do want to standardized on a single distribution to make debugging easier. Otherwise, since Chef is the primary agent in building your servers -- not the system administrator -- then the distribution and package manager that best work as a disposable server wins.

Now, I like Gentoo. This feeling of "liking" Gentoo is very much sunk cost fallacy. I cut my teeth on Slackware and Redhat 3 back in the days as a hobbyist, feeding a 486 box 5.25" floppies. Redhat was fun, but hunting down RPMs in the pre-yum days was not fun. When a friend showed me apt-get for Debian, it solved my problem of having to manually hunt down distributions. Very quickly, however, I found how outdated Debian is. Ubuntu came around and I switched over to that, yet I still found myself wanting to compile from source. At least with RPM-based distributions, source rpms were easy to work with. There were plenty of examples, and the dev and build environments for RPMs were sensible and practical. The build environment for for Debian was horrific nonsense. Ubuntu was no better. To this day, building a .deb package is deep magic that I never once succeeded in performing. I ended up compiling from source, like the good'ol'days of Slackware.

So when I seriously looked at Gentoo, it solved yet another problem for me. I wanted a package manager that I can build from source. Gentoo's portage system also has some features not found from its ancestors -- BSD port, from which most Rails developers would know better as Darwin and Macports. Gentoo portage understood the "USE" semantics, which map to the ./configure --enable-XXX during the build process. Macports uses the concept of "variants", but you could not simply flip on these USE flags. Furthermore, Gentoo lets you selectively mask and unmask packages. You could also selectively overlay other repositories, so you do not have to wait for the package maintainers to catch up to your bleeding edge packages. This worked wonderfully when CouchDB, RabbitMQ, and pals were not readily available anywhere.

USE flags bypasses the fundemental flaw of binary-only package managers. The maintainers have to split platforms along features into smaller packages. Debian and Redhat, for example, have to have a separate Ruby and Ruby + SSL package. For Rubyists, this means splitting the standard library into dependencies outside the control of the Ruby platform.

Masking lets the system administrator control exactly which packages to use. Most people think that Gentoo is a rolling-release, and while you can use it this way, what Gentoo provides is actually far more control over the exact package version than any of the binary packages.

One other common complaint about Gentoo in production is the installation of the compiler. This argument is bunk though. Most early Rails deployments onto production ends up on Ubuntu with build-essentials installed, because the binary packages for Ruby are typically old and compiled with pthreads. Although if one sticks to the principle that one should not have a compiler (and the security issues related to it), it doesn't make it right just because developers-turned-system-administrators install a build kit on production boxes ... it does mean that this problem is not exclusive of Gentoo.

So the biggest upside to Gentoo is that you can compile from source. You can take advantage of the homogenous Opteron chips on Slicehost and Rackspace Cloud. The biggest downside of Gentoo, though, is you have to compile from source. And in general, I've found that it works well for what I want to do.

Now, enter the Chef.

If the first implication of using Chef is that the VM becomes disposable, does that mean it matters whether I use Gentoo or not? Certainly, EngineYard deploys their customer's apps on Gentoo, made easier through Chef. But if I wanted to take advantage of having a monitoring system automatically detect down systems and bring up new VMs on the fly, do I want the infrastructure to recompile all the pieces from scratch? If I wanted to avoid Sunk Cost Fallacy by having disposable VMs built from infrastructure-as-source-code, then wouldn't it make sense to base this on a binary package distribution instead?

For now, the economics says, "Yes, throw away Gentoo."

Say though, I wanted to take advantage of the benefits of Gentoo and reduce the time it takes to bring up a clone server. I would have to use pre-built binaries. Gentoo offers binary packages, misleadingly saved with the .tbz extension. Up until this past weekend, I had the mistaken impression that these .tbz packages contains a bzip2-compressed tarballs. In fact, there is metadata appended to the end and extractable using the xpak format (yes, it is obscure and not well-known). This is similar to the RPM format, which is nothing more than a gzipped tarball with metadata. The Gentoo xpak format is essentially a dump of what is in the system's package database, and contains information on things such as the USE flags, build environment, original ebuild, etc.

Since Gentoo binary packages are often not only machine-specific, they tend to have dependencies based on USE flags. Figuring out a repository scheme for these binaries is a more difficult problem than Debian .deb and Redhat RPMs. You can see an example of an attempt in 2006. The author proposes:

http://gentoo.packages.example.com/gentoo/[arch]/[CHOST]/[gcc version]/[glibc version]/[package group]/[package name]/[package version]/file

We're in 2010 now. We're in the year of Rails 3. The Rails community have seen a couple years of Service Oriented Architectures, and lightweight web services framework such as Sinatra has a strong showing. JSON is supplanting XML for lightweight apps. Git and mercurial made a strong case for using SHA1 hashes. Trying to push all of the unique identifiers for a Gentoo binary package like the one above sounds laughable today, when we can easily do this:


POST http://gentoo.packages.example.com/gentoo/[arch]/[CHOST]/package group]/[package name]/[package version]
{
"USE": [ "threads", "ruby", "-php" ],
"gcc": ">~4.2",
"glibc": "~2.10.1-r1"
}


and that gets back


{
"url": "http://gentoo.packages.example.com/binaries/[arch]/[CHOST]/[package group]/[package name]/[package version]/SHA1HASH.tbz2"
}


And if I let go the idea that a human being is going to manage this, we can compact the query URL to just a POST against http://gentoo.packages.example.com/gentoo/query and get back the binary hash.

The same query format can be used to request a build. Maybe by using the PUT verb to indicate "build if it isn't in the repository".

Once a setup is built from Chef, the next time a clone needs to be made with the same recipes would be able to find the cached binaries built with the same compiler flags and USE flags. Setup right, you would not need a compiler on the production system.

I think this would go a long way towards having a useable Gentoo system configured by Chef. We gain the advantage of being able to specify precise versions and USE flags, in addition to architecture-specific machine code without having to wait as long for each machine to self-compile itself when the binaries can be cached.