chef - [chef] Re: Of Ohai plugins and chef server crashes

Subscribers: 1946
Owners
Bryan McLellan
Joshua Timberman
Nathen Harvey
Seth Chisamore
Serdar Sutay

Subscribe
Unsubscribe
Info
Archive

Post

RSS
Shared documents

General discussion about Chef

[chef] Re: Of Ohai plugins and chef server crashes

From: Daniel DeLeo < >
To:
Subject: [chef] Re: Of Ohai plugins and chef server crashes
Date: Fri, 27 Apr 2012 08:57:17 -0700

On Friday, April 27, 2012 at 8:39 AM, Chris wrote:

> Hi Chefs,
>
> I have a bit of a mystery(at least to me) on my hands. One of my
> environments has hit the Solr maxFieldLength issue
> (http://tickets.opscode.com/browse/CHEF-2346) and following the advise
> in the ticket hasn't worked since it just lead to a server crash. So
> after ignoring the problem for the last couple months, the pain of not
> having half of my production nodes available in search became a
> unbearable. The idea this time was to attack the size of the node
> object itself. Since this environment was linked to a larger Active
> Directory domain the 'etc' hash that Ohai creates was pretty big, so I
> decided to remove it by disabling the passwd plugin. To do this I
> added a small bit of code to the client.rb.erb template:
>
> <% if node.attribute?("ohai") &&
> node["ohai"].attribute?("disabled_plugins") -%>
>
> Ohai::Config[:disabled_plugins] = [<%=
> node["ohai"]["disabled_plugins"].join(",") %>]
> <% end -%>
>
> and this to the environment:
>
> "ohai": {
> "disabled_plugins": [
> "\"passwd\""
> ]
> },
>
>
> I added the disabled plugin to one of the environments in our lab and
> everything went pretty smooth. Saw a slight increase the the server
> cpu load but still within tolerance. After letting that burn in for
> 24hrs I got the go ahead to apply this to the one production
> environment that wasn't getting indexed. My nodes are on the default
> 30 minute interval so about an hour after making the change the cpu
> usage for the chef-server process went to 100% and it started to
> consume all the available memory and eventually stopped responding to
> clients. Restarting the process didnt help as it would immediately hit
> 100% usage and quickly consume all the memory/swap that was regained.
>
> I rolled the changes back and spent the next couple hrs babysitting
> the server and eventually ended up restarting chef-client on all the
> nodes in that environment. My question is, why would disabling an Ohai
> plugin do this? Or did it? Since disabling it in 'test' didnt have the
> same result. The test chef-server and the production chef-server share
> the same couch/rabbit/solr/expander and all the clients use pass
> through a proxy which decides which server to send them to, so i'm
> pretty certain they're configured the same. The only real difference
> is that production has about 60 more clients.
>
> Any thoughts/suggestions on what this could be? Or more likely, what i
> screwed up?

It's hard to imagine how these could be related. Did you get any more info
about what chef server was doing? Was there a stack trace when you killed it?
Did the CPU spike happen during a particular request, or during the startup
routines?

>
> Thanks
>
> -- chris
> --
> Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
> permitted by applicable law.

--
Dan DeLeo

[chef] Of Ohai plugins and chef server crashes, Chris, 04/27/2012
- [chef] Re: Of Ohai plugins and chef server crashes, Daniel DeLeo, 04/27/2012
  - [chef] Re: Re: Of Ohai plugins and chef server crashes, Chris, 04/27/2012
    - [chef] Re: Re: Re: Of Ohai plugins and chef server crashes, Daniel DeLeo, 04/27/2012
      - [chef] Re: Re: Re: Re: Of Ohai plugins and chef server crashes, Chris, 04/27/2012