Blogger‎ > ‎

The importance of having baseline of your servers - part 1

posted Feb 21, 2014, 9:31 PM by Ricardo Fonseca   [ updated Feb 23, 2014, 8:18 PM ]


I decided to discuss about this matter after one of those event where some people argue that performance decreased a lot on one of our MySQL server. To give you a little bit of context, the company where I work use largely the agile model. Not only for developers but sysadmin and dba as well. The event cited above occurred after a sprint release, but was it related, how could I be sure ?? Was it just some limitations on our system that had been reach, was it because some fundamental changes occurred in the way the applications was used that was generating a high load on the server ??

From there I had two axes to focus on to be able troubleshoot effectively  :

A. first : what I call “database code awarness”  

Portion of code deployed, running queries impacting database is not systematically shared to database administrators. Most of the time I have to dig to get the information. Therefore I couldn't quickly guide my scope of analysis around what has been developed in terms of query from our last sprint. Also there is the fact that sometimes we deploy features that we set in sleeping mode and are just not used for quite some time. A problem in the the beginning of a new sprint may come from various factors including start of functionality usage in a large scaled that was just turned on. Therefore it was impossible for me to confirm or infirm that the actual performance impact was from short term application change / additions .

B. second : the lack of baseline

For the past years I've always push back the clear fact that one requires to do and consolidate server baselines automatically and systematically. To do what ?  Well :

  • to be able to evaluate the cost of new development against server performance and have a clear view of the impacts on cpu, memory, io etc...

  • to be able to plan ahead, by doing historical you can do trends and with trend you could be proactive in the process of advising the stake orders about upgrades that will he required. Warning them ahead and spreading the news before it’s too late.

  • to let the developers aware of this process will probably motivate them more in the process to share information about what they do in terms of query load in their application code (point a). Also, they will be more careful when doing those changes knowing that there will be a clear visibility on the impact of those.

  • to be more effective in the troubleshooting process.  Too often I got into these situations that I go somehow blindly in the research of what may cause a performance problem. Issuing very basic performance snapshot ( almost every single time a different one) and trying to correlate information data about what the users are complaining about.

  • to standardize a project to create and manage baseline. This project ( set of scripts ) is based on largely known and long time adopted by many performance tools like iostat, vmstat, free which means that : 

    • the resource consumption of the script is very limited ( most of the data gathered is already computed)
    • the evolution of the script whenever the tools gets updated will be an easy process.
  • finally and most importantly to cover your ass! you need to be able to provide the right answers with concrete arguments to thpeople in charge and stockholders.

With the last event, I told to myself that it was enough. I had to create a baseline process in which the scope was to be small with different milestones . The goal is to make sure that I will take the time and then efforts required to appropriately achieve it and improve it.

From there I decided to create a baseline project that will provide for the first release counters on CPU (%user %sys), memory (swap, free) ,IO (process with most read,  process with most writes), and major MySQL monitoring counters( #threads, slow query, innodb query/sec ...) Like I said it was important to scope small to have a premier release that I will be able to put on all our Linux server knowing that the impact will be almost null.

The actual script and "how to" setup instructions will be provided soon in another post. I didn't want to mix the concerns "why" one need to have frequent baseline done versus the "how" to implement them.

Thank you