Running a Daemon (self restarting) Process in Ubuntu – the easy way

I recently had to think about changing my cron jobs that run every 2 minutes to something more reliable.

Basically, running a cron job every other minute has the disadvantage of restarting the script from the beginning even when the previous instance is still running. Say for example you have a script that needs to check if there are new records in a database table. You run that script every minute to “poll” the database, edit the records and move them to a new table.

What happens if you run the script at 2:00am and it doesn’t stop, then your cron runs again at 2:02am while the first one is still running? You will get two copies of the data in the database table.

This could easily corrupt your data and lead to duplicate results as seen above.Now this is where daemons and queue managers come in handy.

In this scenario I decided it was time for me to take the gauntlet of doing something different with queued jobs. While I’m still implementing this I thought it would be useful to keep a reference to the article that helped me make that decision. Find a piece of it below. Source link at the bottom.

 

Writing an upstart script

Turns out, writing your own upstart scripts is way easier than building init.d files based on the /etc/skeleton file.

Ok so here’s how it looks like; You should store the script in /etc/init/yourprogram.conf, create one for each Node program you write.

description "node.js server"
author      "kvz - http://kevin.vanzonneveld.net"

# used to be: start on startup
# until we found some mounts weren't ready yet while booting:
start on started mountall
stop on shutdown

# Automatically Respawn:
respawn
respawn limit 99 5

script
    # Not sure why $HOME is needed, but we found that it is:
    export HOME="/root"

    exec /usr/local/bin/node /where/yourprogram.js >> /var/log/node.log 2>&1
end script

post-start script
   # Optionally put a script here that will notifiy you node has (re)started
   # /root/bin/hoptoad.sh "node.js has started!"
end script

Wow how easy was that? Told you, upstart scripts are childsplay. In fact they’re so compact, you may find yourself changing almost every line cause they contain specifics to our environment.

non-root

Node can do a lot of stuff. Or break it if you’re not careful. So you may want to run it as a user with limited privileges. We decided to go conventional and chose www-data.

We found the easiest way was to prepend the Node executable with a sudo like this:

exec sudo -u www-data /usr/local/bin/node

Don’t forget to change your export HOME accordingly.

Restarting your Node.js daemon

This is so ridiculously easy..

$ start yourprogram
$ stop yourprogram

And yes, Node will already:

  • automatically start at boottime
  • log to /var/log/node.log

..that’s been defined inside our upstart script.

initctl

But wait, start and stop are just shortcuts. Who’s really behind the wheel here, is initctl. You can play around with the command to see what other possibilities there are:

$ initctl help
$ initctl status yourprogram
$ initctl reload yourprogram
$ initctl start yourprogram # yes, this is the same start
# etc

Update from October 30th, 2012

The basic idea has not changed since 2009, but we did add some tricks to our upstart script. Here’s what we now use in production at transloadit.com:

# cat /etc/init/transloaditapi2.conf
# http://upstart.ubuntu.com/wiki/Stanzas

description "Transloadit.com node.js API 2"
author      "kvz"

stop on shutdown
respawn
respawn limit 20 5

# Max open files are @ 1024 by default. Bit few.
limit nofile 32768 32768

script
  set -e
  mkfifo /tmp/api2-log-fifo
  ( logger -t api2 </tmp/api2-log-fifo & )
  exec >/tmp/api2-log-fifo
  rm /tmp/api2-log-fifo
  exec sudo -u www-data MASTERKEY=`cat /transloadit/keys/masterkey` /transloadit/bin/server 2>&1
end script

post-start script
   /transloadit/bin/notify.sh 'API2 Just started'
end script

Starting & stopping your daemon:

Usually, the suggested method for starting and stopping daemon process is to use the system’s “service” command. So here we would use:

sudo service transloaditapi2 start
sudo service transloaditapi2 stop

Check if your daemon is running:

sudo service transloaditapi2 status

via kvz.io.

Better Performance – Speeding Up Your CakePHP Website

It seems that whenever I mention CakePHP to the developer community, those who have not used it think of it as a slow framework. Indeed it isn’t the fastest according to the results of many benchmarks – out of the box that is – but what it might lack in performance it certainly makes up for in Rapid Application Development.

By applying a few simple modifications, and even some more complex enhancements, CakePHP can be sped up quite a bit. By the time you work your way through even half of these changes, the performance of your your CakePHP site will be comparable to many other popular PHP frameworks, with the advantage that your development speed will never falter!

There are two types of modifications that I will be describing in the following article. The first is code changes, e.g. these will work for anyone even if you are running on a shared hosting environment. The second type is geared towards users who have their own dedicated or virtual server that they can add and remove software as required.

Do not fear though, if you can only follow the first set you will not be disappointed.

Upgrade CakePHP Versions

It’s important to note that people’s misconceptions around CakePHP’s speed probably date all the way back to versions 1.1 or 1.2. In version 1.3, the core development team performed a serious overhaul of CakePHP’s underlying architecture. In fact, I performed several benchmark tests comparing CakePHP versions 1.2 against 1.3 against 2.0. The results were astonishing. By simply updating from 1.2 to 1.3, I saw an immediate decrease by over 70% in the average time it took to load a page.

If you’ve been delaying upgrading your version of CakePHP and you are still using version 1.2 or less, the first thing you should do is upgrade to at least version 1.3, but preferably version 2.3.x would be even better.

The CakePHP Cookbook provides some very detailed migration guides:

Disable Debug Mode

Don’t laugh! This might seem obvious, but in a mad scramble to move code into production it can be very easy to forget to turn off debugging. A typical setup that I follow is to create multiple app/Config/core.php files: one for my local environment, one for dev, one for staging, and one for production. These files contain the different settings based on the target environment.

The key statement to change is Configure::write('debug', 2) to Configure::write('debug', 0).

The change hides all error messages and no longer refreshes the model caches. This is extremely important because, by default, each page load causes all of your models to be dynamically generated instead of cached from the first page load.

Disable Recursive Find Statements

When you perform a query using the find() method, recursion is set to 0 by default. This indicates that CakePHP should attempt to join any first level related models. For example, if you have a user model that has many user comments, each time you perform a query on the users table it will join the comments table as well.

The processing time on performing, returning, and creating an associative array with this data can be significant, and I’ve actually seen CakePHP sites crash in production because of this!

My preferred approach to making sure that the default recursion is none is to override the setting in app/Model/AppModel.php by adding the following code:

1 <!--?php
2 class AppModel extends Model {
3     public $recursive = -1;
4 }

Cache Query Results

This is truly my favorite optimization. I’d like to think I uncovered it myself, but I’m sure that would be debated as I’ve seen other articles discuss a similar solution.

In many web applications, there are probably a lot of queries going to the database that do not necessarily need to. By overriding the default find() function inside app/Model/AppModel.php, I’ve made it easy to cache the full associative array results of queries. This means that not only do I avoid hitting the database, I even avoid the processing time of CakePHP converting the results into an array. The code required is as follows:

01 <!--?php
02 class AppModel extends Model
03 {
04     public $recursive = -1;
05
06     function find($conditions = null, $fields = array(), $order = null, $recursive = null) {
07         $doQuery = true;
08         // check if we want the cache
09         if (!empty($fields['cache'])) {
10             $cacheConfig = null;
11             // check if we have specified a custom config
12             if (!empty($fields['cacheConfig'])) {
13                 $cacheConfig = $fields['cacheConfig'];
14             }
15             $cacheName = $this->name . '-' . $fields['cache'];
16             // if so, check if the cache exists
17             $data = Cache::read($cacheName, $cacheConfig);
18             if ($data == false) {
19                 $data = parent::find($conditions, $fields,
20                     $order, $recursive);
21                 Cache::write($cacheName, $data, $cacheConfig);
22             }
23             $doQuery = false;
24         }
25         if ($doQuery) {
26             $data = parent::find($conditions, $fields, $order,
27                 $recursive);
28         }
29         return $data;
30     }
31 }

Subtle changes need to be made to the queries we wish to cache. A basic query which looks like this:

1 <!--?php
2 $this->User->find('list');

requires updating to include caching information:

1 <!--?php
2 $this->User->find('list',
3     array('cache' => 'userList', 'cacheConfig' => 'short')
4 );

Two additional values are added: the cache name and the cache config that should be used.

Two final changes must be made to app/Config/core.php. Caching must be turned on and the cacheConfig value that is used must be defined. First, uncomment the following line:

1 <!--?php
2 Configure::write('Cache.check', true);

And then, add a new cache config as follows (updating the parameters as required for name and expiry):

1 <!--?php
2 Cache::config('short', array(
3     'engine' => 'File',
4     'duration'=> '+5 minutes',
5     'probability'=> 100,
6     'path' => CACHE,
7     'prefix' => 'cache_short_'
8 ));

All of the options above can be updated to extend the duration, change the name, or even define where the cache should be stored.

For a more detailed explanation on how to add, update, and purge the cache, continue reading about the specific caching optimization on my blog.

Don’t consider this the end of your CakePHP caching, though. You can cache controller actions, views, and even helper functions, too. Explore the Cache Component in the CakePHP book for more information.

Install Memory Based Caching

By default, PHP sessions are stored on disk (typically in a temp folder). This means that each time you access the session, PHP needs to open the session’s file and decode the information it contains. The problem is disk I/O can be quite expensive. Accessing items from memory opposed to disk I/O is immensely faster.

There are two nice approaches to session-related disk I/O. The first one is to configure a RAM disk on your server. Once configured, the drive will be mounted like any other drive and the session.save_path value in php.ini would be updated to point to the new RAM disk. The second way to is by installing software like Memcached, an open source caching system that allows objects to be stored in memory.

If you’re wondering which approach is best for you, the way to decide this is by answering the following question: Will more than one server be required to access this memory simultaneously? If yes, you’ll want to choose Memcached since it can be installed on a separate system allowing other servers to access it. Whereas, if you are just looking to speed up your single web server, choosing the RAM disk solution is nice and quick and requires no additional software.

Depending on your operating system, installing Memcached can be as simple as typing sudo aptitude install memcached.
Once installed, you can configure PHP to store sessions in memory opposed to on disk by updating your php.ini:

session.save_handler = memcache
session.save_path = 'tcp://127.0.0.1:11211'

If Memcached is installed on a different port or host, then modify your entries accordingly.

After you have finished installing it on your server, you will also need to install the PHP memcache module. Once again depending on your operating system, one of these commands will work for you:

pecl install memcache

or:

sudo aptitude install php5-memcache

Removing Apache and Installing Nginx

Apache is still the favorite according to recent statistics, but Ngnix adoption is picking up a lot of steam when it comes to the most-heavily trafficked websites on the Internet today. In fact, Nginx is becoming an extremely popular replacement for Apache.

Apache is like Microsoft Word, it has a million options but you only need six. Nginx does those six things, and it does five of them 50 times faster than Apache. — Chris Lea

Nginx differs from Apache because it is a process based server, whereas Nginx is event driven. As your web server’s load grows, Apache quickly begins to be a memory hog. To properly handle new requests, Apache’s worker processes spin up new threads causing increase memory and wait-time creating new threads. Meanwhile, Nginx runs asynchronously and uses one or very few light-weight threads.

Nginx also is extremely fast at serving static files, so if you are not using a content delivery network then you’ll definitely want to consider using Nginx for this as well.

In the end, if you are short on memory Nginx will consume as little as 20-30M where Apache might be consuming upwards to 200M for the same load. Memory might be cheap for your PC, but not when it comes to paying for servers in the cloud!

For a more in-depth breakdown between Apache and Nginx visit the Apache vs Nginx WikiVS page.

HowToForge has an excellent article for configuring Nginx on your server. I suggest following the step-by-step guide to install and configuring Nginx with php-fpm.

Configure Nginx to use Memcached

Once you’ve installed Nginx and Memcached, making them leverage each other will even further extend your performance. Even though CakePHP applications are dynamic, it’s likely the 80-90% is still relatively static – meaning that it only changes at specific intervals.

By making the following edits to your Nginx config file, Memcached will begin serving your Nginx requests that have already been processed and stored in memory. This means that only a few requests will actually invoke PHP which significantly increases your website’s speed.

server {
    listen 80;
    server_name endyourif.com www.endyourif.com;
    access_log /var/log/nginx/endyourif-access.log;
    error_log /var/log/nginx/endyourif-error.log;
    root /www/endyourif.com/;
    index index.php index.html index.htm;

    # serve static files
    location / {
        # this serves static files that exists without
        # running other rewrite tests
        if (-f $request_filename) {
            expires 30d;
            break;
        }
        # this sends all-non-existing file or directory requests
        # to index.php
        if (!-e $request_filename) {
            rewrite ^(.+)$ /index.php?q=$1 last;
        }
    }

    location ~ \.php$ {
        set $memcached_key '$request_uri';
        memcached_pass 127.0.0.1:11211;
        default_type       text/html;
        error_page 404 405 502 = @no_cache;
    }    location @no_cache {
        try_files $uri =404;
        fastcgi_split_path_info ^(.+\.php)(/.+)$;
        fastcgi_pass unix:/var/run/php5-fpm.sock;
        fastcgi_index index.php;
        include fastcgi_params;
    }
}

The above configuration specifies that static files should be served immediately. PHP files should be directed to Memcached to serve the page if it’s been cached. If it doesn’t exist in the cache, the @nocache location is reached which acts like your previous PHP setup to serve the page via php-fpm.

Don’t forget to update your app/Config/core.php to feed the memcache cache. The following:

1 <?php
2 $engine = 'File';
3 if (extension_loaded('apc') && function_exists('apc_dec') && (php_sapi_name() !== 'cli' || ini_get('apc.enable_cli'))) {
4     $engine = 'Apc';
5 }

becomes:

1 <?php
2 $engine = 'Memcache';

Our code for caching queries from earlier also requires updating:

1 <?php
2 Cache::config('short', array(
3     'engine' => 'Memcache',
4     'duration'=> '+5 minutes',
5     'probability'=> 100,
6     'path' => CACHE,
7     'prefix' => 'cache_short_'
8 ));

Remove MySQL and Install Percona

The final server modification I recommend is to install Percona‘s build of MySQL. The Percona team has spent many years understanding and fine-tuning the database’s architecture for optimal performance. It is best to follow the installation instructions from Percona as it describes the several different installation options.

Summary

There are quite a lot of techniques to ensure CakePHP will run lightning fast. Some changes may be more difficult to make than others, but do not become discouraged. No matter what your framework or language, optimization comes at a cost, and CakePHP’s ability to let you rapidly develop extremely complex web applications will strongly outweigh the effort involved in optimizing things once your app is complete.

via PHP Master

Switching From Apache MPM Prefork to Worker

My very first experience of setting up a live cloud server was one I had looked forward to with optimism.

In the past I was comfortable with the shared hosting and semi dedicate solutions which provided the basic tools for managing a website. But then I needed shell access and hosting my applications using the dedicated solutions provided by the shared hosting companies was costing a lot more than I bargained for.

A little overcommitted?

After getting several warning emails from my shared hosting company at the time for exceeding resource usage, there was just one option left – move.

Almost everyone is used to the localhost setup of a WAMP or LAMP stack. However, these implementations are built to run on a single computer without the rigors that a constantly active webserver experiences.

In the case of a live webserver on the internet you need to tweak and tune stuff in order to ensure your server doesn’t get overrun on this highway. Crawlers, Spiders, and SPAM bots all contribute to traffic on your server and you need to plan to manage them.

Setting up your LAMP stack is somewhat straight forward (on Debian, Ubuntu type: sudo apt-get install apache2 php5 libapache2-mod-php5).

I wouldn’t go into the details of setting that up here. What I want to share on the other hand is the effect of going along with the default implementation of the LAMP stack, and particularly the implementation of the Apache server that it comes bundled with.

By default, most new installations of Apache install the prefork memory management module (mpm-prefork) which handles each connection using a single process.

Sort of like having multiple instances of Apache running at the same time on the server. However, after several cases of server thrashing (endless memory swapping), I was left with two choices (other than increasing the RAM again). Either switch to Nginx which by default is built to work with PHP as a Fast CGI script, or configure Apache to run PHP using FastCGI with mpm-worker.

Due to time constraints I had to make the switch quick and I was sure I didn’t have enough time to debug a faulty new Nginx install. So I opted to go with the least intensive and more familiar option of changing Apache’s MPM to from prefork to worker.

Highlighted below are the simple steps I took to make this work in less than thirty minutes.

Note that the commands are for Debian distributions of Linux.

  1. Install the FPM CGI binary for PHP. apt-get install php5-fpm
  2. This will run as a daemon and you need to configure to run as a socket instead of it listening on a local port.
    • Open the file located at /etc/php5/fpm/pool.d/www.conf
    • Comment the line with listen = 127.0.0.1:9000 by adding a semicolon to the beginning of that line so it becomes ;listen = 127.0.0.1:9000
    • Now add a line after it with the following: listen = /var/run/php5-fpm.sock
      which makes PHP-FPM run as a socket
    • Restart the service by entering: service php5-fpm restart
  3. Now you need to install the FastCGI module for Apache’s worker MPM which would replace prefork. So you enter apt-get install libapache2-mod-fastcgi
  4. Once the installation is complete, Apache would have replaced your default prefork MPM with the new FastCGI module for worker.
  5. Finally you need to configure your sites to use the new FASTCGI module. It is preferable to make the changes to your main Apache configuration file located at /etc/apache2/apache2.conf
    • Use this config below making sure the directory path below exists or create a path that can be accessed on the server file system
    • <IfModule mod_fastcgi.c>
             AddHandler php5-fcgi .php
             Action php5-fcgi /php5-fcgi
             Alias /php5-fcgi /usr/lib/cgi-bin/php5-fcgi
             FastCgiExternalServer /usr/lib/cgi-bin/php5-fcgi -socket /var/run/php5-fpm.sock -pass-header Authorization
       </IfModule>
    • If you are using virtual hosts it is still preferable to make the changes in the main Apache config file in /etc/apache2/apache2.conf instead of adding it to each config file for each site you have enabled.
  6. Now, all that is left is for you to restart your web server typing the command: service apache2 restart

And that’s all to it. Your server should now work faster and suffer fewer memory outages than when it was configured to use prefork.

Best of luck in your adventure towards better performance …

Related Posts:

Using PHP5-FPM With Apache2 On Ubuntu 12.04 LTS

Setting up Apache with PHP-FPM

Installing Apache 2 with PHP FastCGI on Ubuntu 12.10

HOWTO: Building IPTables rules

How to Resolve Errors in CakePHP After Changing Database Structure – Missing Fields Error

Being an avid CakePHP user, I had the idea that I could do some things and get the desired results in my application whenever I wanted.

Looking now I’ve learned something new again. I recently renamed some fields in one of the tables in my database. I also made the necessary changes to the model file (refactoring) for that table in order to avert any unexpected results in my application.

yahoo internal server error message

So I tested the application locally to ensure there were no glitches and then committed and pushed my changes to the live repository.

But to my utmost dismay – it broke! My application was no longer functioning in one part. Next thing that came to mind was to delete the files in the tmp/cache/models to be sure the table structure wasn’t cached.

Yet, no joy. Now I was thinking of going into the Cakephp core library to find out why this thing was still using an old cached copy of the database table.

Looking through the Model.php file I noticed the caching was still based on the files in the tmp directory which I had deleted.

Finally, I decided to take one more swipe at the tmp directory this time deleting the files in both the tmp/cache/models and tmp/cache/persistent directories.

Alas! All the errors were gone. Now I’ll always remember to delete the files here manually of write an addon to do just that after modifying my database structure.

Untraceable CakePHP Shell Error Solved At Last!

So after spending near 5 hours on Saturday looking for why my Cake Shell was not running on the online server, I finally figured it out.

The first road block I hit was with file permissions.

My web server was configured for write access to just the cache and logs folders inside of the /tmp folder. It turns out that the Cake shell needs to be able to write to the /tmp/cache/persistent. Also, the parent directory must be writeable. So I ended up changing the tmp folder with:

chmod 775 /srv/www/../app/tmp

Finally, the permissions error was gone, and then the next error came that got my searching for hours without respite.

I had written my shell app and tested it locally, ensuring I included all the needed database tables at the top of the class with: public $uses = array(…);

Everything should work now. Git push and let’s go… But no! Some new error message just right in my face.

Error: Database connection “Mysql” is missing, or could not be created.

The thing here was that I could not understand why I was getting a database error when my app was working online.

After so many hours searching on Stackoverflow, Bing, and Lighthouse I finally suspendied my frustration and decided to look at the servers logs.

Nothing in the logs made sense, I tried doing setsebool -P httpd_can_network_connect=1 to see if that would work, but that didn’t change a thing as SELINUX was already disabled.

Then I tried adding some lines with extensions config and socket path to the my.cnf file. That didn’t make any difference even after several reboots and service reloads.

Finally I remembered that I had dynamically set my connection in the database.php file used by CakePHP to choose either a default connection for localhost and a different one for the online server. Alas!!! That was it. Adding a condition after the dynamic selection fixed the problem in database.php

All I did was add this to the __construct method:

if (gethostname() == ‘Hostname‘){ //Find hostname using: $ hostname at your *nix shell
$this->default = $this->online;
}

Set debug to 2 in core.php to clear caches, and then changed it back to 0.

And that was the end of my shell nightmare.

 

Pagination With Twitter Bootstrap and CakePHP

I recently spent almost 30 minutes trying to figure out why my pagination using Twitter bootstrap and CakePHP would get broken into two line whenever the pagination list was a child of a div having a span[n]  class.

 

After I changed this code to the listing below, it came back to normal. I hope this helps someone out there.

 <div>
  <ul>
   <?php
    echo ($this->Paginator->current() > 3) ? $this->Paginator->first('first ', array('tag' => 'li')) : '';
    echo ($this->Paginator->hasPrev()) ? $this->Paginator->prev(__('prev ', true), array('tag' => 'li', 'id' => 'prev' . rand(2, 9000)), null, array('escape' => false)) : '';
    echo $this->Paginator->numbers(array('modulus' => 7, 'separator' => ' ', 'tag' => 'li'));
    echo ($this->Paginator->hasNext()) ? $this->Paginator->next(__(' next', true), array('tag' => 'li'), null, array('escape' => false)) : '';
    echo ((int) $this->Paginator->counter(array('format' => '%pages%')) > 10) ? $this->Paginator->last('last', array('tag' => 'li')) : '';
    echo $this->Js->writeBuffer();
   ?>
  </ul>
 </div>

Why Do We Have Left, Right, and Inner Joins in SQL?

English: This diagram is generated by "Re...

This is one of those blitz posts I occasionally write.

This post is meant for those who like to just skim through the loads of paperwork that often tends to be boring.

Here’s the drift straight up. You have LEFT JOIN, RIGHT JOIN, and INNER JOIN.

We use the LEFT JOIN most times because by default we seek the data from the table on the left side, (the table calling the join).

E.g:

SELECT U.*, P.screen_name FROM users U LEFT JOIN profiles P ON (users.id=profiles.user_id) WHERE users.id = 2;

The users table is the one calling the join (on the left) in the above SQL query.

 

* Use LEFT joins when you want to return data in the left table whether or not there is a matching row in the right table. The above will return the user with id = 2 whether or not there is a profile record in the profiles table.

 

* Use RIGHT join when you want data from the RIGHT table whether or NOT there is a matching row in the LEFT table. With this RIGHT JOIN, even if there is no user with id =2 in the users table, and there is a row in the profiles table with user_id =2, we will get a result after the query containing the profile information at least.

 

The third most used join is the INNER JOIN, which is like strict mode, it’s all or nothing.

* Use INNER join when you want records from both tables in every case. The INNER join will return results only if there is a row in the users table that matches a row in the profile table, otherwise you will get no results if any of the tables does not have a corresponding row for the other side of the join either left or right.

There are other types of JOINS like the CROSS JOIN, and combinations of different JOINs like FULL OUTER JOIN…

You get the drift now.

You can see some more information about joins in this external article http://www.codinghorror.com/blog/2007/10/a-visual-explanation-of-sql-joins.html