Intro
I decided to discuss about this matter after one of those event where some people argue that performance decreased a lot on one of our MySQL server. To give you a little bit of context, the company where I work use largely the agile model. Not only for developers but sysadmin and dba as well. The event cited above occurred after a sprint release, but was it related, how could I be sure ?? Was it just some limitations on our system that had been reach, was it because some fundamental changes occurred in the way the applications was used that was generating a high load on the server ??
From there I had two axes to focus on to be able troubleshoot effectively :
A. first : what I call “database code awarness” Portion of code deployed, running queries impacting database is not systematically shared to database administrators. Most of the time I have to dig to get the information. Therefore I couldn't quickly guide my scope of analysis around what has been developed in terms of query from our last sprint. Also there is the fact that sometimes we deploy features that we set in sleeping mode and are just not used for quite some time. A problem in the the beginning of a new sprint may come from various factors including start of functionality usage in a large scaled that was just turned on. Therefore it was impossible for me to confirm or infirm that the actual performance impact was from short term application change / additions . B. second : the lack of baseline
For the past years I've always push back the clear fact that one requires to do and consolidate server baselines automatically and systematically. To do what ? Well :
With the last event, I told to myself that it was enough. I had to create a baseline process in which the scope was to be small with different milestones . The goal is to make sure that I will take the time and then efforts required to appropriately achieve it and improve it.
From there I decided to create a baseline project that will provide for the first release counters on CPU (%user %sys), memory (swap, free) ,IO (process with most read, process with most writes), and major MySQL monitoring counters( #threads, slow query, innodb query/sec ...) Like I said it was important to scope small to have a premier release that I will be able to put on all our Linux server knowing that the impact will be almost null.
The actual script and "how to" setup instructions will be provided soon in another post. I didn't want to mix the concerns "why" one need to have frequent baseline done versus the "how" to implement them. Thank you Ricardo |
Blogger >