Saturday, January 28, 2012

PHP Compiler Part Three: Building An Apache Module

Apache exposes an API for programmers to extend its functionality. Even PHP is run as a module. Without this module, Apache would serve .php files in the same way it does .jpg's and .html's - There'd be no parsing of the script and you'd get just the source code.

The usual way to build an Apache module is to use C. Let me be honest, I never did like C. I started programming with Visual Basic 6. If you've used it, you'll know how much simpler it is than a language like C/C++. But the world loves C and so do the people at Apache. Their examples and sample source code are all in C. One look at their Hello World example and I was already looking for a simpler way to get things done.

Enter mod_python/mod_perl
These are modules for Apache too but they implement a Python/Perl interpreter within Apache. So building a module for Apache is as simple as writing a Python/Perl program. I ended up using mod_python because I know Python, but I hear that Perl is fast. Really fast.

Get mod_python from here. You will have to compile the Linux version first and copy it to your modules folder (/apache/modules/) whereas the Windows version is available as an installer. Note, mod_python for windows needs Python 2.5. After installing/copying, add the following to your httpd.conf Apache config file along with the other LoadModule lines already present.
LoadModule python_module modules/mod_python.so
You will probably have to restart your server after this. Next add one of the following configs to either your .htaccess file or to your httpd.conf file (Quick lesson: .htaccess is for a per-directory config while httpd.conf is used to set a server-wide config. httpd.conf can also be used to make directory level changes too. Also .htaccess files are parsed at run-time when Apache actually parses through a directory whereas httpd.conf settings are loaded at startup. As a result, you will have to restart the server when you make changes in httpd.conf.)

The Python Handler
You should use the publisher handler if you plan on using multiple .py files as separate modules.
PythonHandler mod_python.publisher
The publisher handler locates the module specified in the URL. It also allows access to functions and variables within a module through the URL. Eg, accessing the URL http://localhost/test.py/func1?var1=value would locate the module test.py, execute the function func1 in it and set variable var1 equal to value. index() is the default function that wil be called if nothing is specified after test.py in the above URL.

Another option is to have all your code in a single .py file (eg, test.py as below - extension not specified).
PythonHandler test
Here, a request to any file ending with .py will be handled by test.py which has a handler(req) function which receives a req request object. Apache internals may be accessed through this object to get details about headers, method, connection, filename, etc.

The Code
This is the code I used for this project.
from mod_python import apache, util
import os

def handler(req):
 req.content_type = 'text/plain'
 file = os.path.splitext(req.parsed_uri[apache.URI_PATH][1:])[0]+".exe"
 file = os.path.split(req.filename)[0]+"/"+file
 out = os.popen4('"'+file+'"')[1].read()
 form = util.FieldStorage(req, keep_blank_values=1)
 for i in form:
  out = out + "\n" + i + ":" + form[i]
 req.write(out)
 return apache.OK
When a module is called with the request filename set as something.py, what it does is execute something.exe, read its output from stdout (lines 6-8) and write it back to the user. Lines 9-11 are just to show you how you access GET arguments.

And The Results
I used Apache Bench to benchmark this setup. If you are using xampp, you should find it under /Program Files/xampp/apache/bin/ab.exe (It's probably available even with regular Apache distributions. I don't really know). I compiled out.php (shown below) and saved it as out.exe in the server document root.
<?php
echo 'Hello World', "\n";
?>
These are the ApacheBench results (with a few unnecessary details stripped out).
>ab -n 1000 -c 50 http://localhost/out.php
Document Path:          /out.php
Document Length:        12 bytes

Concurrency Level:      50
Time taken for tests:   1.635 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      336000 bytes
HTML transferred:       12000 bytes
Requests per second:    611.59 [#/sec] (mean)
Time per request:       81.755 [ms] (mean)
Time per request:       1.635 [ms] (mean, across all concurrent requests)
Transfer rate:          200.68 [Kbytes/sec] received


>ab -n 1000 -c 50 http://localhost/out.py
Document Path:          /out.py
Document Length:        12 bytes

Concurrency Level:      50
Time taken for tests:   39.737 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      292000 bytes
HTML transferred:       12000 bytes
Requests per second:    25.17 [#/sec] (mean)
Time per request:       1986.864 [ms] (mean)
Time per request:       39.737 [ms] (mean, across all concurrent requests)
Transfer rate:          7.18 [Kbytes/sec] received
Notice lines 12 and 29. Congratulations if you expected this the second I mentioned mod_python in this post! The reason performance is piss poor is because,
a) Python is an interpreted language
b) PHP and its Apache module have been built for exactly this kind of work and are thus more optimized.
c) Extra steps in the request handling phase when using mod_python like loading the interpreter everytime, etc
d) I'm a lazy idiot who shouldn't have taken the easy way.

Bye bye mod_python. Looks like I'll have to brush up my C coding skills after all. Head over to Part 4 to see how you build an Apache module the right way. Also here's something else you could try if you absolutely insist that you're not going to code a module.