jueves, 9 de febrero de 2012

Turbocharge Your Website With Memcached

Your latest PHP/MySQL website is finally online. And it’s awesome. But it’s not as fast as you want it to be, because of the many SQL queries running every time a page is generated. And above that, you have the feeling it will not scale well under heavy loads. And you are most likely right.

In this tutorial, we will see how you can greatly improve your website’s responsiveness, and help it scale to handle many simultaneous visitors, by implementing a cache layer between your code and your database. The good news is it is fairly easy, and can be done in a few minutes!


Introducing Memcached

Memcached is a high-performance in-memory data caching system.

Modern websites and web applications use a lot of data, and it’s not uncommon to count as many as 20 or even 30 SQL queries in a single page generation. Multiply this amount by a big number of visitors, and you often get an overloaded database, and pages that take seconds to be generated and sent to the client.

The tool we are going to use today to improve performance is called Memcached. It’s a high-performance in-memory data caching system. Or to put it another way, a very fast application that runs on your server and uses a fraction of the available memory to store an associative array of data. You can ask Memcached to do two things :

  • Store the value V with the key K
  • Retrieve the value V stored with the key K

This looks minimalist, but there’s a lot you can do thanks to these two features, as we will see very soon. In fact, Memcached can do a few more things, but they’re all tied to storing or retrieving data.

Installing Memcached on modern Linux distributions is quite simple :

  • Ubuntu : sudo apt-get install memcached
  • Gentoo : sudo emerge install memcached
  • Redhat : sudo yum install memcached

Once installed, Memcached will be automatically started every time your server boots. You can set the amount of memory reserved for Memcached, along with other options, in the configuration file (/etc/memcached.conf). 64Mb is allocated by default. The configuration file also contains the IP address and the port Memcached will be bound to. Default values (127.0.0.1 and 11211) are fine for a standard setup.


Accessing Memcached from PHP

We want to store and retrieve data from your PHP scripts. This means we are going to need a way to connect to Memcached from PHP. For that, we’re going to install the “Memcache” extension for PHP. As it is a PECL extension, it is very easy to install with the “pecl” by typing the following command :

 sudo pecl install memcache 

There are two PHP extensions related to Memcache : “Memcache” and “Memcached” (notice the “d” in the second one). Both are very similar, but the first one has a smaller footprint. In this tutorial, we will use the lighter Memcache. Once installed, this extension should be enabled and the Memcache-related functions should now be available to your PHP scripts.


How does Caching Work?

Our work here is based on the following assumptions:

  • retrieving data from the database takes resources (CPU + i/o)
  • retrieving data from the database takes time
  • we often retrieve the very same data over and over

We also want to store our data it in a way that allows us to retrieve it efficiently.

Generally speaking, we want to save our data in a persistent environment (our MySQL database for instance). But we also want to store our data it in a way that allows us to retrieve it efficiently, even if the storage is non-persistent. So in the end, we will have two copies of our data : one being stored in MySQL and the other being stored in Memcache.

Here are the steps we have to take to make this happen :

  • Every write operation (SQL INSERTs and UPDATEs) will be performed in both MySQL and Memcached
  • Every read operation (SQL SELECTs) will be performed in Memcached, and will fall back to MySQL in case of error

At this point, you probably see which parts of your code need to be modified : parts where you write data and parts where you read data. If your PHP code is well structured, you should have wrapped your data access code in functions or —even better— classes. If so, updating your site should be very fast. If not, you might have a little more work.


Connecting to our Cache Server

First of all, let’s create a connection to our Memcached server. Here is the code you should use, early in your PHP scripts :

         // Connection constants         define('MEMCACHED_HOST', '127.0.0.1');         define('MEMCACHED_PORT', '11211');          // Connection creation         $memcache = new Memcache;         $cacheAvailable = $memcache->connect(MEMCACHED_HOST, MEMCACHED_PORT); 

At this point, we have established a connection to our Memcache server. It may have failed, but we know so thanks to the $cacheAvailable variable.


Storing Data in our Cache

Let’s dive into data storage. We are going to take an example to make things clearer – an online shop. We have a script called edit_product.php whose purpose is to save a product’s data into our database. Each one of our products has the following information:

  • id
  • name
  • description
  • price

At some point in our edit_product.php code, we run an INSERT or UPDATE SQL query whose purpose is to write this product’s data to our MySQL database. It could look just like this :

         // We have validated and sanitized our data         // We have escaped every risky char with mysql_real_escape_string()         // Now we want to save it into our database         $sql = "INSERT INTO products (id, name, description, price) VALUES ($id, '$name', '$description', $price)";         $querySuccess = mysql_query($sql, $db); 

As I mentioned above, we want to store our data both in our MySQL database and Memcached server. Here is how we are going to proceed :

         // We have validated and sanitized our data         // We have escaped every risky char with mysql_real_escape_string()         // Now we want to write them to our database :         $sql = "INSERT INTO products (id, name, description, price) VALUES ($id, '$name', '$description', $price)";         $querySuccess = mysql_query($sql, $db);          // We have written our data into our database         // Now let's store the product name, description and price into our cache         // The method "set" tells our Memcached server to store the data associated to a specific key         if ($querySuccess === true)         {                 // We build a unique key that we can build again later                 // We will use the word 'product' plus our product's id (eg. "product_12")                 $key = 'product_' . $id;                  // We store an associative array containing our product data                 $product = array('id' => $id, 'name' => $name, 'description' => $description, 'price' => $price);                  // And we ask Memcached to store that data                 $memcache->set($key, $product);         } 

At this point, both our database and cache contain our product data.


Retrieving Data from our Cache

In case our cache is unavailable, we want to fall back to MySQL.

Now let’s retrieve our data. In the same example, let’s say our online shop has a script called product.php that displays a specific product. Accessing the page product.php?id=12 will display the product whose identifier is 12.

At some point in our product.php code, we run a SELECT SQL query whose purpose is to retrieve a product’s data from our MySQL database. It could look just like this :

         // We have validated and sanitized our data         // We have escaped every risky char with mysql_real_escape_string()         // Now we want to read from our database :         $sql = "SELECT id, name, description, price FROM products WHERE id = " . $id;         $queryResource = mysql_query($sql, $db);         $product = mysql_fetch_assoc($queryResource); 

As we said above, we want to retrieve our data from our Memcached server if possible, because it’s faster than getting it from MySQL. But in case our cache server can’t be reached, or if it simply doesn’t store the data we need, we want to fall back to MySQL. Here is how we are going to proceed :

         // Initialize our $product variable         $product = null;          // First we check that our cache server is available         // The $cacheAvailable variable was initialized when we connected to our cache server         if ($cacheAvailable == true)         {                 // We build the key we associated to our product data                 $key = 'product_' . $id;                  // Now we get the data from our cache server                 $product = $memcache->get($key);         }          // do we need to access MySQL ?         if (!$product)         {                 // In case we do...because our $product variable is still null                 // We have validated and sanitized our data                 // We have escaped every risky char with mysql_real_escape_string()                 // Now we want to read from our database :                 $sql = "SELECT id, name, description, price FROM products WHERE id = " . $id;                 $queryResource = mysql_query($sql, $db);                 $product = mysql_fetch_assoc($queryResource);         } 

At this point, we have retrieved the data we needed. It was most likely done from our cache, but could be from MySQL if the cache was not filled or couldn’t be accessed for some reason.


Conclusion

We have seen how Memcached can be used to speed up your website and limit your database load. Our example above was based on PHP and MySQL because these techologies are widely deployed, but this principle is universal and works just the same with many other technologies : C/C++, Java, Python, Ruby, Perl, .Net, MySQL, Postgres, Erlang, Lua, Lisp, Cold Fusion, Ocaml and io are listed along with PHP on the official Memcached wiki.

As I briefly mentioned earlier, Memcached provides more features than the simple set and get methods we’ve seen above. Two useful additional features are increment/decrement updates, and the ability to set an expiration time to a specific stored data. Both are available in PHP, along with a few others, as you can see in the Memcache documentation.

Have fun implementing this on your websites, and enjoy the —free— performances improvement. Thank you so much for reading and please let me know if you have any questions in the comments below.



No hay comentarios:

Publicar un comentario