Showing posts with label chef. Show all posts
Showing posts with label chef. Show all posts

Monday, May 17, 2010

Adventures with chef-server and Gentoo, part 9

Continued from Part 8

So my nifty Ghetto DNS for Rackspace Cloud backfired on me.

I spent several hours (!?) trying to figure out why (1) the recipe works on my dev environment, (2) the recipe works at our in-house Xen Cloud Server environment, and (3) the recipe refuses to work on the Rackspace Cloud Server.

I had narrowed it down to the fact that:
search(:node, "rackspace_private_ip:#{rackspace_hosts[:private_net]}" ) do |n|
ip = n[:rackspace][:private_ip]
hostnames = [ n[:fqdn] ]
hostnames << (n[:rackspace][:private_aliases] || []).sort
hosts[ip] = hostnames.flatten
end


... was not in fact pulling down node[:rackspace][:private_aliases]. It was not setting it from the override_attributes of the roles properly. It was ignoring what I had set with knife node edit. It should have tipped me off that every time I ran chef-client, node[:rackspace][:private_aliases] was getting overwritten. Instead, I focused on the fact that I had compiled ruby with threads enabled, and maybe, just maybe, chef-solr-indexer had corrupted solr or something. (It didn't). I ended up wiping the solr data directory and forced a reindex ... and it still came out the same.

Finally, out of sheer flailing around, I finally saw a detail I had missed in the growing red haze of frustration. The Rackspace server had a "public_ip" attribute set that I had not set at all. Where did that come from? Suspicion blossomed. I ran ohai on the Rackspace Cloud server, and here is the lesson learned:

ohai manages the rackspace namespace if you are on Rackspace Cloud. Don't touch it!

I have no idea whether this had always been there or not, though I did notice a knife rackspace option pop up since 0.8.16. The fact that ohai will automagically detect Rackspace private ip is awesome -- I don't have to use my own version, except when I'm trying to emulate Rackspace. I'll probably split that out into its own recipe, and use the :ghetto_dns namespace for the host aliases instead.

Another action item: I will be more assertive about asking Opscode for a Changelog -- assuming they are not keeping up with it. And assuming that I ever get out of this Chef iteration I'm not supposed to be operating in...

But for those of you itching to develop your own recipes for Rackspace, now you know.

Adventures with chef-server and Gentoo, part 8

Continued from Part 7

Although in Part 7, I said that was the end of the Chef iteration for now, I wanted to finish up just one more thing. (It's always just one more thing). I attempted to install a fresh chef-server for our internal lab environment, now that I've finished up with the recipes on my dev environment. I followed through the notes I took in Part 2, and proceeded to run into a Big Surprise.

knife configure --initial asked for the location of webui.pem, which I promptly dismissed and ignored because, well, what does the webui have anything to do with setting up an admin use for knife. Well, apparently it does. I went to #chef when my attempt failed to create an access key, at which point, kallistec and jtimberman pointed out what has changed in 0.8.16:

  1. chef-validator and its access key, validation.pem are no longer admin users. The only thing they can do is to create a new non-admin access keys. That's good because that is one less thing to worry about (having to be disciplined enough to delete /etc/chef/validation.pem when booting up a new node).

  2. It does mean that you have to have access to an admin access key in order to create yourself an admin key for knife

  3. Unfortunately, using webui.pem makes it sound as if the webui is repurposed to creating new admins. I asked the guys, what if I don't want to install webui?. Their reply is that webui.pem is generated by chef-server-api, rather than chef-server-webui. That makes sense.

    What doesn't make sense is naming it webui. I thought something like api-master-key would be more descriptive.

  4. The flip side of it though, is that kallistec thought there would be even more trouble had that not been used. That's probably true, because my dev upgrade from 0.8.10 to 0.8.16 was transparent.

  5. Shouldn't knife configure -i drop a non-admin user anyways? I thought about that and realized that I would not have liked it to silently create a non-admin user and have all of attempts at uploading the cookbook fail. At this this way, it is obvious where the problem lie.

  6. I wasn't too happy, but at the end of the day: this is open source software and people are putting their hard work out there, and this is still bleeding edge code with lots of things subject to change. Opscode may change it in the 0.9, which makes sense considering we'd all expect lots of breaking changes in that release.

  7. Also, the only reason I even came across this was because I was (wait for it...) using a non-standard installation of knife. I wanted knife configured on my dev box used to control deployment against the lab site, so I didn't see a point in configuring knife on the server. The knife configure -i, when run on the chef-server and taken with default, would have just worked.


So I ended up with this kludge:

$ knife configure -i
Overwrite /home/hhh/.chef/knife.rb? (Y/N) Y
Your chef server URL? [http://localhost:4000] http://chef-server.lab:4000
Select a user name for your new client: [hhh] hosh
Your existing admin client user name? [chef-webui]
The location of your existing admin key? [/etc/chef/webui.pem] /home/hhh/.chef/api-master-key.pem
Your validation client user name? [chef-validator]
The location of your validation key? [/etc/chef/validation.pem] /home/hhh/.chef/lab.validation.pem
Path to a chef repository (or leave blank)?
WARN: Creating initial API user...
INFO: Created (or updated) client[hosh]
WARN: Configuration file written to /home/hhh/.chef/knife.rb


I had more unrelated trouble, such as not configuring /etc/hosts right and ended up having to enter this stuff three or four time. Maybe when I have time, I'll hack up something that lets you take an existing configuration and create a new admin key without having to type all of the above again. Scratch that itch. (And get off my lawn!)

Friday, May 14, 2010

Adventures with chef-server and Gentoo, part 7

Continued from Part 6

Earlier this week, I integrated veszig's's Portage code into my set of cookbooks. I've also merged in veszig's keywords, use, etc. provider into a seperate module. Along the way, I cleaned up some things (my fascination with Scheme is showing through in the code) and made it so that it will default to EIX if available and fallback to emerge --search if not. I monkey-patched it into the Portage Provider, and may submit it as a real patch upstream to Opscode.

Some gotchas I ran into in the past few days:

  1. Chef 0.8.10 - knife does not properly dump metadata.json, resulting in a rebuld everytime I upload a cookbook.

  2. Chef 0.8.14 - This release fixes the bug in 0.8.10, and introduces another one: it does not properly set the file mode. (Or more precisely, it looks like it creates a file in a read-only mode and did not set it as specified, or something).

  3. Chef 0.8.16 - This release fixed the mode bug. However, it doesn't work with json 1.4.3.


Why do I know this? I was working on the Improved Ghetto DNS recipe. It figure out what the private IP is and save that back into the sever index. jtimberman had said you did this with node.save (huh...), then suggested dropping it into a ruby_block so it is saved at execution time rather than compile time. When I thought about it though, I decided to do this at compile time instead and put guards to keep it from saving the information. This stuff touches host names, the fundamental stuff that Chef assumes you've already figured out. There is (unfortunately) a lag between the time a private ip gets reported, and when it gets indexed by the server index, so we essentially have an ad hoc eventual convergence, over multiple runs of chef-client. Yes. That's why it is Ghetto DNS.

When Ghetto DNS drops the new /etc/hosts file, chef-client continues on its merry way. Unless it is set to mode 0600, of course, in which case, it doesn't know where chef-server is. Oops. Chef 0.8.16 fixed that. Too bad there is no chef-overlay that pulls that in, so I ended up updating with rubygems.

I am probably going to create Chef gem upgrade recipes at some point. However, I've come to the end of the iteration, so I'll need to work on a different project. Maybe I'll update more Gentoo stuff in my off hours, at least until the next Chef iteration comes up.

As it is, I've gotten a lot of monit recipes up. One gotcha, at least for 0.8.10, was that nested definitions do not properly pass params, so you had to rebind variables locally within the block. As it is, I created several monit macros:
monit

Basic monit macro.

Example:
monit 'my_nginx_site'  do
cookbook 'my_site'
source 'nginx.monit.erb'
variables(:listen_ips => listen)
end

monit_service

Monit a background service

monit_service 'chef-solr-indexer' do
process :pid_file => '/var/run/chef/solr-indexer.pid',
:timout_before_restart => '30'
end

monit_net_service

Monit a service listening on an IP port

monit_http_service 'couchdb' do
process :listen_ips => [%w(127.0.0.1 5984)],
:pid_file => '/var/run/couchdb/couchdb.pid',
:timout_before_restart => '30'
end

monit_http_service

Monits a HTTP service, checks HTTP protocol

monit_http_service 'chef-server-webui' do
process :listen_ips => [[nil, '4040']],
:pid_file => '/var/run/chef/server-webui.4040.pid',
:timout_before_restart => '30'
end


monit_service, monit_net_service, monit_http_srevice all take the following arguments:
process

  • :start_cmd - override the start command

  • :stop_cmd - override the stop command

  • :pid_file - override the pid file

  • :timeout_before_restart - override the timeout

cookbook

Specify the cookbook to pull the source template from

source

Specify the source template to use

variables

Specify the variables to pass to the template

enable

If true, then create the monit configuration file.


For complete source code, see it here.

With these, I created chef::monit_server, chef::monit_server_webui, chef::monit_solr, chef::monit_solr_indexer, couchdb::monit_server, and added monit to gentoo::portage_rsync_server and gentoo::portage_binhost_server. Weirdly enough, the one that gave me the most trouble is rsyncd, since it does not behave very well as a daemon. I still have not figured out how to monit rabbitmq, on account that the stock rabbitmq-server does not drop a pid file. (However, rabbitmq-multi does).

It was loads of fun randomly killing off processes and watching them come back up.

Finally, Seth at MaxMedia told me about CloudKick at the Atlanta Ruby User's Group meetup. It is too much for me, but I suppose if I were doing something that I needed to show investors or enterprise customers that "We Can Scale", that would be it.

Tuesday, May 11, 2010

Idea: Cost Accounting using Chef

Since Chef describes your infrastructure as a set of resources, I think it is possible to go beyond the basic bandwidth + CPU/hours for cost accounting.

Suppose you have a set of heterogenous web sites, each using a different classification of recipes:

  • Site A: static files

  • Site B: static files

  • Site C: PHP-FCGI (Wordpress)

  • Site D: Python (Trac)

  • Site E: Ruby on Rails 2

  • Site F: Ruby on Rails 3


In our hypothetic infrastruture, Sites A and B are served from a single Nginx server. Site C and D are also served on that same Nginx server, using proxies.

Site E & F requires a stand-alone web app, however, they both share the same Master MySQL database (though the database schema are seperate).

Now, how do you do the cost-accounting for that? If we were to use basic bandwidth + CPU/hours, the best we can do is determine what each of the hosts, in this case, 4 hosts, costs us. This assumes you are using EC2 or Rackspace CS.

On the other hand, we know from our Chef-repo describing this whole setup, that Sites A, B, C, and D each are Host 1 and split the costs of Host 1. We can weigh Sites C and D higher since they are not using static files, and therefore doesn't take advantage of Linux send_file.

Sites E and Site F each have Host 2, and 3 respectively. However, they would include the costs for sharing Host 4, used as the MySQL master.

I'm not exactly sure how this would be implemented. We can query this by node, but better yet, if we have a higher-level description of each of those sites, we can calculate cost structure around that.

Ultimately, this doesn't matter as much if your site runs a single app exclusively. But even a large company, such as Flickr or Facebook with many different subsystems, would want to be able to track logical cost structure of each subsystem, versus how much revenue each subsystem brings.

Idea: Rubygems for Chef Cookbooks

One day (but not today), we will need the equivalent of Rubygems for Chef Cookbooks. Or at the very least, site cookbook collections.

Friday, May 7, 2010

Adventures with chef-server and Gentoo, part 6

Continued from Part 5

Lots of updates:


  1. I further refined the portage_conf definition. You can pass :force_regen to force a regeneration of make.conf. This is necessary in the case of mirrorselect if you need to install a package.


  2. I added a bunch of recipes. Here is an index of them so far:


    portage

    Base portage

    chef_overlay

    Installs veszig's chef-overlay

    feature_buildpkg

    Enable building binary packages

    feature_getbinpkg

    Fetch binary packages from binhost

    feature_nodocs

    Do not build docs, man pages, and info pages

    cron_emerge_sync

    Adds emerge --sync to cron for nightly-syncs

    exclude_categories

    Exclude certain categories from sync, such as nethack ...

    portage_binhost

    Point to a site binhost

    portage_rsync

    Point to a site rsync mirror

    mirrorselect

    Selects the three fastest Gentoo Mirror.

    gentoolkit

    Installs gentoolkit

    portage_binhost_server

    Sets up a binhost server with Nginx

    portage_rsync_server

    Sets up a local rsync mirror



  3. I talked to jtimberman and veszig on #chef@freenode.net and got some feedback:

    • jtimberman suggested Resource/Provider to replace portage_conf. That was the direction I've been heading towards, though what I have now works. (Unless you want to notify/subscribe to it, then it doesn't)

    • veszig showed me what he has in gentoo-cookbooks. He is using eix for package queries, and actually has a LWRP for emerges. I think it is called gentoo-package. It will let you set USE flags and masks, unmasks, and keywords. I've been using definitions, but again, I can't send notifications to them. I think the way he has set up the emerge is better than what I was going to do -- monkeypatch the Portage provider

    • jtimberman mentioned that knife can do Rackspace deploys, though it is Debian/Ubuntu centric. But the are accepting patches :-) He also mentioned more things about the data-driven database and application recipes, which I can't wait to get started working on that.




All in all, good times. I'm going to attempt a merge of my stuff into a fork of gentoo-cookbooks, and move anything generic back into the mainline cookbooks repo. I may still end up submitting an improved Portage provider, just in case you want to still remain distro-agnostic.

Addendum

It has been two months since I wrote "Intangible Assets and Sunk Cost Fallacy", and this conversation with jtimberman and veszig illustrates the most noticeable payoff of intangible assets, at least in the Rails world. The fact that I can look at veszig's code and veszig can look at my code meant that we can exchange best practices faster. I didn't know about eix, and while more complicated, portage_conf is pretty useful. This would be much more difficult to do if we were exchanging words. Neither of us can guarantee that we are following the exact steps as specified (something that, I'm sure, eats up the vast majority of systems troubleshooting). On the other hand, we each can find out what the other did, simply by looking at the code. "Best Practice" becomes self-documenting, executable code, code that can be conserved, exchanged, and source-controlled.

Ruby, Python, Perl, and Java each have a vast, vibrant community because of the libraries of gems, eggs, CPAN, and Java code; I'm glad to be a part of something new here -- a library of infrastructure code that is still largely invisible to the mainstream. It's my "secret" competitive advantage.

Data Driven Application Deployment with Chef - Blog - Opscode

From Opscode: Data Driven Application Deployment with Chef - Blog - Opscode.

I was wondering when they were getting around to this. As I mentioned in a previous post, we're going to use "segments" at work, and that would be data-driven. In our case, it would be Rackspace Cloud Servers, not EC2 and Rackspace Cloud Files, not S3, but the principles remain the same. The databags seems like a great place to drop the descriptions in there, though I am not sure if I want to have it requery the index every 30 minutes.

Tuesday, May 4, 2010

Adventures with chef-server and Gentoo, part 5

Continued from Part 4

Many Gentoo configurations, such as excluding rsync categories in /usr/portage, installing layman to manage overlays, setting up a local rsync mirror, or having Portage pull binary packages first all require editing configurations found in /etc/make.conf. While we can attempt to manage /etc/make.conf, we end up with the same issues with managed /etc/portage/package.use.

A hint at a better way can be found in the instructions for layman. There, the instructions says to add
source /usr/local/portage/layman

into /etc/make.conf. bash writers would instantly recognize this as the directive for sourcing another bash script. Ideally then, we would have
source /etc/portage/chef/make.conf

inside /etc/make.conf, then have Chef manage that make.conf file. That file in turn can source other files generated by the recipes, such as exclude_categories. In the case of exclude_categories, we need to append an option flag to PORTAGE_RSYNC_EXTRA_OPTS. The cascading inclusion would look like this:
/etc/make.conf
-> /etc/portage/chef/make.conf
-> /etc/portage/chef/conf.d/rsync_exclude

Now, bash scripters may notice that the source directive is a bash directive, and so assume that we can do something like
source `ls /etc/portage/chef/conf.d/*`

... but sadly it doesn't work that way. (Besides, do we really want to automatically include everything in there?) We need a way for recipes to register a configuration, register it, and then have /etc/portage/chef/make.conf regenerated. This actually took a bit of doing:


  1. I used a definition to create portage_conf definition. A recipe-writer should just know, they can add their overrides by using a definition file. In the case of exclude_categories, I used:
    portage_conf :rsync_excludes do
    appends [ :PORTAGE_RSYNC_EXTRA_OPTS, "--exclude-from=#{node[:gentoo][:rsync][:exclude_rsync_file]}" ]
    end

    The recipe writer is responsible for using valid flags. There is no validation done at compile-time.

  2. My first attempt at registering the conf file was naïve. I attempted to declare a variable, portage_confs inside the portage recipe, and had the portage_conf definition append symbols to it. That did not work. I'm not entirely sure how everything works, but at the end of the day, portage_confs was not scoped, and so the DSL automatically tried to create a new resource.

    I next attempted to create a resource class so that the method_missing in the DSL would pick it up properly. However, since this is a fake resource that does not properly duck-type to a real resource, that failed miserably too.

    I ended up creating a custom container class for the sole purpose of pushing portage confs to.


  3. While the container actually worked, it was not registering the portage conf properly. I had thought it had something to do with not notifying the resources properly, so I spent some time create an execute resource that would reset /etc/portage/chef/make.conf and then called the template to rebuild it. While that part worked, it still did not address the fact that the template was still pulling in an empty array.

    This part revealed to me a bit of how Chef works under the hood. This was hinted by the documentation for the ruby_block resource. Chef compiles all the resources and then runs it. ruby_block lets you run a block of code with the other resources. So of course, registering the portage conf would not work the way I had it because it statically declared what goes into /etc/portage/chef/make.conf ... which is an empty array.

    My solution? I broke apart the portage recipe into a second piece, make_conf recipe. The latter contains the template to make /etc/portage/chef/make.conf plus an execute resource that will reset the file. This works.



I don't think this is the ideal solution. Maybe there is something better for (2). I also don't like how I am rebuilding the file by deleting this. It does not take advantage of the backup mechanism that the template resource has, yet there is no :recreate action in template. (Maybe I should write one?). Feel free to comment here or on at Github if you think you have a better way of putting this together.

Update: Thanks to the guys at #chef @ irc.freenode.net I was able to clean up the code by reopening the resource. I didn't know you can still access the variables of a resource and manipulate it, but now it makes more sense.

Monday, May 3, 2010

Adventures with chef-server and Gentoo, part 4

Continued from Part 3

When I followed the instructions for chef-overlay, I noticed a feature for Portage that I never knew existed. For those not familiar with Gentoo, you can specify fine-grained control over the exact version number of compile-time features with four files, /etc/portage/package.use, /etc/portage/package.keywords, /etc/portage/package.unmask, and /etc/portage/package.mask.

The official documentation shows using these as files. However, the instructions from chef-overlay shows that you can make those files as directories, and put multiple files in those directories. For example, instead of:

/etc/portage/package.use


you can instead have,

/etc/portage/package.use/ruby
/etc/portage/package.use/chef
/etc/portage/package.use/java


I wish I had known about this years earlier. My old Gentoo laptop that I used for Rails development has been around since 2006, and has accrued a large set of use flags and keywords. (It is time to wipe that laptop clean and rebuild).

One cool thing about this setup is that autounmask will handle this layout gracefully. For example, if I were to autounmask =net-misc/rabbitmq-server-1.7.2-r2, it will create a file /etc/portage/package.keywords/autounmask-rabbitmq-server instead of attempting to inserting the change into the base /etc/portage/package.keywords ... which may have custom changes a human have made. Human-editable files would require the use of git to manage well. The files generated from autounmask can be added and removed atomically.

I'm not entirely sure this is still the way to go with a Gentoo + Chef infrastructure. Reading the paper, Why Order Matters: Turing Equivalence in Automated Systems Administration, one of the key things for congruent infrastructure is being able to rebuild at anytime. Does having it broken out like that make it easier? I think it is, but we will see.

I have pushed up a portage recipe that sets this up. I took some time to make sure that any legacy files gets backed up and moved into the new directory structure. Again, I'm not sure this is entirely a great idea since it tempts people to attempt to apply Chef, a congruence tool, on what is effectively a convergence problem. I'm leaving it there for now simply because the stem-cell images I am using had some of those files already in place (most notably, turning off the threads use flag for the ruby package).

Adventures with chef-server and Gentoo, part 3

Continuing from Part 2

The next step comes in a bundle of two. I am targeting the Rackspace Cloud platform. Each Rackspace slice gives you two ethernet devices. eth0 points to a public IP address and eth1 gives you a private IP address in the 172.* range. Packets going through eth0 incurs a charge while packets routed through eth1 does not. Since the default, non-nanite install of Chef assumes you will be polling chef-server every so often, having lots of these slices running around gets expensive if routing through the outside addresses.

So what we really want is a way to reference each of the nodes by the private IP address. There are several ways to do this. Since these are non-public addresses, you cannot use the Rackspace DNS server nor any of the public DNS services. 37Signals uses djbdns; others have suggested PowerDNS. The lowest-barrier-of-entry solution I have found so far queries the server index and builds a /etc/hosts. I chose this solution, with some modifications.

First, the /etc/hosts file emitted by the recipe described in that blog post maps the FQDN to 127.0.0.1. That's not what I want. If I want to use 127.0.0.1, I want to reference it with localhost and not the FQDN. Years ago, I remember reading how the Linux loopback device performs insanely faster than going through a regular ethernet device stack. I don't know if that is true now or where I can find the performance figures for it. Regardless, what I don't want is the ip reference for a FQDN changing on me. I have started a Rackspace cookbook here and dropped the ghetto DNS recipe in there along with my modifications.

Second, I do not fully agree with the Opscode design decision to use Fully-Qualified Domain Names. Philosophically, I'm adopting the Chef infrastructure as part of a deliberate strategy based on the principles of extropy. In short, stable forms arises from unstable substrates, and when the economics of the unstable substrate reaches a critical point, the substrate becomes disposable. They don't matter anymore to those who are operating at the higher abstraction. The fallout of that is that the unstable substrate supporting the stable abstraction becomes anonymous.

In practice, I want anonymous server VMs that can turn into anything, very much like pluripotent stem cells. I'd rather have the authoritative Chef node id be some sort of random md5 hash. If I wanted named servers, I'd do that on top of the anonymous nodes by using tags. I don't really want to manage which node is db0 and db1; by keeping these anonymous, I can create parameterized infrastructure. What we have planned at Sparkfly include an architectural building block called a "segment", which has a well-defined set of components spread across a group of two or three servers. Rather than specifying each node and each role for node, this should be all done programmatically using a higher-level description.

On the other hand, we will always have certain servers that must always be referenced. One would be chef-server. I would need a site-wide hosts table referencing that. (But even this can eventually be built on top of anonymous nodes).

Looking through the docs, there were some extra hoops I'd have to jump through to set the node name differently from that of the FQDN. For now, since I'm simply trying to get a working infrastructure up and running before I start digging in and trying to make more sweeping changes, I wrote a kludge that creates an anonymous name and appends that to the /etc/hosts. There is another supporting Gentoo script added into /etc/conf.d/local.start which detects the presence of a file, /etc/conf.d/bootstrap and runs kludge script. Run it, and you get something like node-1ee5ddee46d52c269b898bf216

What's next are rake tools to help query and reference based on tags. I also reserve the right to change my mind. Maybe I'll get tired of copying & pasting non-sensical md5 hashes when working with knife. After all, strategy is pragmatic and concerns what is, rather than its idealized form.

Thursday, April 29, 2010

Adventures with chef-server and Gentoo, part 2

Followup to part 1

Once I had a chef-server and webui all set up on my dev box, the next part is to create a "stem cell" image that has chef loaded up on there. Chef 0.8 uses SSL for its new validation mechanism, and so the server creates public/private keypairs for the validation. These are encapsulated in a Chef concept called a client. It took me a while to figure out that client was not a good choice of words. It might have been better renamed as access or credentials.

To begin, I needed to set up knife in the environment I intended to do chef development on. In this case, I appropriated the VM running chef-server but this is not necessary. In fact, I intend to have a dedicated chef-dev VM running in the near future. However, I don't want to run the script I had setup to provision another Gentoo image from storage. I'd still have to change the hostname, run emerge --sync, get the packages up to date, clone my vim setup ... many of the things I intend for chef to handle. Ideally, I'd add a node with roles[dev] and it would just eventually give me back a VM with my dev environment set up.

Setting up knife works like this:

$ sudo knife configure --initial
Your chef server URL? http://chef-server:4000
Your client user name? hosh
Your validation client user name? chef-validator
Path to a chef repository (or leave blank)? .
WARN: *****
WARN:
WARN: You must place your client key in:
WARN: /home/hhh/.chef/hosh.pem
WARN: Before running commands with Knife!
WARN:
WARN: *****
WARN:
WARN: You must place your validation key in:
WARN: /home/hhh/.chef/chef-validator.pem
WARN: Before generating instance data with Knife!
WARN:
WARN: *****
WARN: Configuration file written to /hosh/hhh/.chef/knife.rb


I have chef-server added in my /etc/hosts file using the private IP address. With that, I can get knife cookbooks list. They were mostly empty, which is what I expected. The weird thing was seeing how I had to copy over validation.pem over. I mean, that's a private key. I don't want it floating around, even if this was all on my private VM cloud on my box. The process I'm working through now will eventually make its way to production. I wasn't so sure about this idea with dropping that validation.pem private key, but for now, I took it on faith until I know this system better and I can re-examine the assumptions again.

Now, I provisioned another vm, did the emerge --sync, pulled down the chef-overlay, unmasked the chef-client packages, installed chef-client, started it up ... and nothing. It was not appearing in my node list.

The reason was because the node did not have the authentication key. I flailed around on several online documentation. I came across the documentation that says that, clients are how you create the access, and that while they are conventionally tied to a node, it is not necessary. Since I was making a template image that, ideally, on the first boot, it would phone home and register itself, I figured I can make a special access key. Call it stemcell.pem, then have it register itself, then delete it later.

It turns out that the smart people at OpsCode were already n number of steps ahead of me. This was what validation.pem was supposed to be. They even provided a handy :delete_validation_key recipe to take care of the cleanup. Doh!

I copied validation.pem over, re-ran chef-client on the client machine, and what do you know? It works.

Tuesday, April 27, 2010

Adventures with chef-server and Gentoo, part 1

I had been circling around chef-server and Gentoo for a while. There are three ways to install Chef-server:

  1. Install it manually

  2. Use chef-solo to bootstrap chef-server

  3. Use system packages



Needless to say, (1) was not fun. While (2) invokes the mystical recursive experience of Lisp-written-in-Lisp or maybe, the Stand Alone Complex, as someone who is more excited about using Chef, I just want to get on with it.

Option (3) depends on how good your favorite distro is. And in Gentoo's case, this has been somewhat lacking. Fortunately, someone has put together a chef-overlay and update it for Chef 0.8.10. The instruction in the README works.

What does not work is if, you are like me, decide to be clever and use autounmask instead of downloading the gists. After the tenth gem I had to "auto" unmask, I gave up and used the gist version. If you are building a new chef-server on Gentoo, you should too.

We'll see how this experience goes. As I dive deeper into the rabbit hole, I'll post back here. Stay tuned.

Monday, April 12, 2010

Chef and Gentoo Binary Repository

Ever since I watched Seth Chisamore's presentation on Chef at the Atlanta Ruby User's Group, I have had to rethink the idea of Gentoo Linux in Production. The first consequence of declaring your infrastructure as source code is that the virtual machine becomes disposable. Although Chef is distribution-agnostic, you do want to standardized on a single distribution to make debugging easier. Otherwise, since Chef is the primary agent in building your servers -- not the system administrator -- then the distribution and package manager that best work as a disposable server wins.

Now, I like Gentoo. This feeling of "liking" Gentoo is very much sunk cost fallacy. I cut my teeth on Slackware and Redhat 3 back in the days as a hobbyist, feeding a 486 box 5.25" floppies. Redhat was fun, but hunting down RPMs in the pre-yum days was not fun. When a friend showed me apt-get for Debian, it solved my problem of having to manually hunt down distributions. Very quickly, however, I found how outdated Debian is. Ubuntu came around and I switched over to that, yet I still found myself wanting to compile from source. At least with RPM-based distributions, source rpms were easy to work with. There were plenty of examples, and the dev and build environments for RPMs were sensible and practical. The build environment for for Debian was horrific nonsense. Ubuntu was no better. To this day, building a .deb package is deep magic that I never once succeeded in performing. I ended up compiling from source, like the good'ol'days of Slackware.

So when I seriously looked at Gentoo, it solved yet another problem for me. I wanted a package manager that I can build from source. Gentoo's portage system also has some features not found from its ancestors -- BSD port, from which most Rails developers would know better as Darwin and Macports. Gentoo portage understood the "USE" semantics, which map to the ./configure --enable-XXX during the build process. Macports uses the concept of "variants", but you could not simply flip on these USE flags. Furthermore, Gentoo lets you selectively mask and unmask packages. You could also selectively overlay other repositories, so you do not have to wait for the package maintainers to catch up to your bleeding edge packages. This worked wonderfully when CouchDB, RabbitMQ, and pals were not readily available anywhere.

USE flags bypasses the fundemental flaw of binary-only package managers. The maintainers have to split platforms along features into smaller packages. Debian and Redhat, for example, have to have a separate Ruby and Ruby + SSL package. For Rubyists, this means splitting the standard library into dependencies outside the control of the Ruby platform.

Masking lets the system administrator control exactly which packages to use. Most people think that Gentoo is a rolling-release, and while you can use it this way, what Gentoo provides is actually far more control over the exact package version than any of the binary packages.

One other common complaint about Gentoo in production is the installation of the compiler. This argument is bunk though. Most early Rails deployments onto production ends up on Ubuntu with build-essentials installed, because the binary packages for Ruby are typically old and compiled with pthreads. Although if one sticks to the principle that one should not have a compiler (and the security issues related to it), it doesn't make it right just because developers-turned-system-administrators install a build kit on production boxes ... it does mean that this problem is not exclusive of Gentoo.

So the biggest upside to Gentoo is that you can compile from source. You can take advantage of the homogenous Opteron chips on Slicehost and Rackspace Cloud. The biggest downside of Gentoo, though, is you have to compile from source. And in general, I've found that it works well for what I want to do.

Now, enter the Chef.

If the first implication of using Chef is that the VM becomes disposable, does that mean it matters whether I use Gentoo or not? Certainly, EngineYard deploys their customer's apps on Gentoo, made easier through Chef. But if I wanted to take advantage of having a monitoring system automatically detect down systems and bring up new VMs on the fly, do I want the infrastructure to recompile all the pieces from scratch? If I wanted to avoid Sunk Cost Fallacy by having disposable VMs built from infrastructure-as-source-code, then wouldn't it make sense to base this on a binary package distribution instead?

For now, the economics says, "Yes, throw away Gentoo."

Say though, I wanted to take advantage of the benefits of Gentoo and reduce the time it takes to bring up a clone server. I would have to use pre-built binaries. Gentoo offers binary packages, misleadingly saved with the .tbz extension. Up until this past weekend, I had the mistaken impression that these .tbz packages contains a bzip2-compressed tarballs. In fact, there is metadata appended to the end and extractable using the xpak format (yes, it is obscure and not well-known). This is similar to the RPM format, which is nothing more than a gzipped tarball with metadata. The Gentoo xpak format is essentially a dump of what is in the system's package database, and contains information on things such as the USE flags, build environment, original ebuild, etc.

Since Gentoo binary packages are often not only machine-specific, they tend to have dependencies based on USE flags. Figuring out a repository scheme for these binaries is a more difficult problem than Debian .deb and Redhat RPMs. You can see an example of an attempt in 2006. The author proposes:

http://gentoo.packages.example.com/gentoo/[arch]/[CHOST]/[gcc version]/[glibc version]/[package group]/[package name]/[package version]/file

We're in 2010 now. We're in the year of Rails 3. The Rails community have seen a couple years of Service Oriented Architectures, and lightweight web services framework such as Sinatra has a strong showing. JSON is supplanting XML for lightweight apps. Git and mercurial made a strong case for using SHA1 hashes. Trying to push all of the unique identifiers for a Gentoo binary package like the one above sounds laughable today, when we can easily do this:


POST http://gentoo.packages.example.com/gentoo/[arch]/[CHOST]/package group]/[package name]/[package version]
{
"USE": [ "threads", "ruby", "-php" ],
"gcc": ">~4.2",
"glibc": "~2.10.1-r1"
}


and that gets back


{
"url": "http://gentoo.packages.example.com/binaries/[arch]/[CHOST]/[package group]/[package name]/[package version]/SHA1HASH.tbz2"
}


And if I let go the idea that a human being is going to manage this, we can compact the query URL to just a POST against http://gentoo.packages.example.com/gentoo/query and get back the binary hash.

The same query format can be used to request a build. Maybe by using the PUT verb to indicate "build if it isn't in the repository".

Once a setup is built from Chef, the next time a clone needs to be made with the same recipes would be able to find the cached binaries built with the same compiler flags and USE flags. Setup right, you would not need a compiler on the production system.

I think this would go a long way towards having a useable Gentoo system configured by Chef. We gain the advantage of being able to specify precise versions and USE flags, in addition to architecture-specific machine code without having to wait as long for each machine to self-compile itself when the binaries can be cached.