Saturday, January 28, 2012

PHP Compiler Part Three: Building An Apache Module

Apache exposes an API for programmers to extend its functionality. Even PHP is run as a module. Without this module, Apache would serve .php files in the same way it does .jpg's and .html's - There'd be no parsing of the script and you'd get just the source code.

The usual way to build an Apache module is to use C. Let me be honest, I never did like C. I started programming with Visual Basic 6. If you've used it, you'll know how much simpler it is than a language like C/C++. But the world loves C and so do the people at Apache. Their examples and sample source code are all in C. One look at their Hello World example and I was already looking for a simpler way to get things done.

Enter mod_python/mod_perl
These are modules for Apache too but they implement a Python/Perl interpreter within Apache. So building a module for Apache is as simple as writing a Python/Perl program. I ended up using mod_python because I know Python, but I hear that Perl is fast. Really fast.

Get mod_python from here. You will have to compile the Linux version first and copy it to your modules folder (/apache/modules/) whereas the Windows version is available as an installer. Note, mod_python for windows needs Python 2.5. After installing/copying, add the following to your httpd.conf Apache config file along with the other LoadModule lines already present.
LoadModule python_module modules/mod_python.so
You will probably have to restart your server after this. Next add one of the following configs to either your .htaccess file or to your httpd.conf file (Quick lesson: .htaccess is for a per-directory config while httpd.conf is used to set a server-wide config. httpd.conf can also be used to make directory level changes too. Also .htaccess files are parsed at run-time when Apache actually parses through a directory whereas httpd.conf settings are loaded at startup. As a result, you will have to restart the server when you make changes in httpd.conf.)

The Python Handler
You should use the publisher handler if you plan on using multiple .py files as separate modules.
PythonHandler mod_python.publisher
The publisher handler locates the module specified in the URL. It also allows access to functions and variables within a module through the URL. Eg, accessing the URL http://localhost/test.py/func1?var1=value would locate the module test.py, execute the function func1 in it and set variable var1 equal to value. index() is the default function that wil be called if nothing is specified after test.py in the above URL.

Another option is to have all your code in a single .py file (eg, test.py as below - extension not specified).
PythonHandler test
Here, a request to any file ending with .py will be handled by test.py which has a handler(req) function which receives a req request object. Apache internals may be accessed through this object to get details about headers, method, connection, filename, etc.

The Code
This is the code I used for this project.
from mod_python import apache, util
import os

def handler(req):
 req.content_type = 'text/plain'
 file = os.path.splitext(req.parsed_uri[apache.URI_PATH][1:])[0]+".exe"
 file = os.path.split(req.filename)[0]+"/"+file
 out = os.popen4('"'+file+'"')[1].read()
 form = util.FieldStorage(req, keep_blank_values=1)
 for i in form:
  out = out + "\n" + i + ":" + form[i]
 req.write(out)
 return apache.OK
When a module is called with the request filename set as something.py, what it does is execute something.exe, read its output from stdout (lines 6-8) and write it back to the user. Lines 9-11 are just to show you how you access GET arguments.

And The Results
I used Apache Bench to benchmark this setup. If you are using xampp, you should find it under /Program Files/xampp/apache/bin/ab.exe (It's probably available even with regular Apache distributions. I don't really know). I compiled out.php (shown below) and saved it as out.exe in the server document root.
<?php
echo 'Hello World', "\n";
?>
These are the ApacheBench results (with a few unnecessary details stripped out).
>ab -n 1000 -c 50 http://localhost/out.php
Document Path:          /out.php
Document Length:        12 bytes

Concurrency Level:      50
Time taken for tests:   1.635 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      336000 bytes
HTML transferred:       12000 bytes
Requests per second:    611.59 [#/sec] (mean)
Time per request:       81.755 [ms] (mean)
Time per request:       1.635 [ms] (mean, across all concurrent requests)
Transfer rate:          200.68 [Kbytes/sec] received


>ab -n 1000 -c 50 http://localhost/out.py
Document Path:          /out.py
Document Length:        12 bytes

Concurrency Level:      50
Time taken for tests:   39.737 seconds
Complete requests:      1000
Failed requests:        0
Write errors:           0
Total transferred:      292000 bytes
HTML transferred:       12000 bytes
Requests per second:    25.17 [#/sec] (mean)
Time per request:       1986.864 [ms] (mean)
Time per request:       39.737 [ms] (mean, across all concurrent requests)
Transfer rate:          7.18 [Kbytes/sec] received
Notice lines 12 and 29. Congratulations if you expected this the second I mentioned mod_python in this post! The reason performance is piss poor is because,
a) Python is an interpreted language
b) PHP and its Apache module have been built for exactly this kind of work and are thus more optimized.
c) Extra steps in the request handling phase when using mod_python like loading the interpreter everytime, etc
d) I'm a lazy idiot who shouldn't have taken the easy way.

Bye bye mod_python. Looks like I'll have to brush up my C coding skills after all. Head over to Part 4 to see how you build an Apache module the right way. Also here's something else you could try if you absolutely insist that you're not going to code a module.

Friday, December 16, 2011

Removing Blogger/Blogspot Branding from your Blog

When you create a blog with blogspot.com, there should be this annoying navigation bar at the top. Annoying because it's your blog and you should be able to decide whether you want to keep it there or not. Sure, it's got options to "follow" and "share" the blog, blah blah. But I would have liked it if I were given a choice to keep it or not. Then there's the "Powered by Blogger" footer. Not something I want there at the bottom. Or anywhere else.

Blogger gives you an option to insert custom CSS rules. You should find this option under
Template Designer > Advanced > Add CSS. Once there, you should be able to set new styles and even override existing ones.

Removing the Navigation Bar and Footer
Add the following lines in the Add CSS space in the Template Designer.
.navbar {display:none}
#footer-3 {display:none}
#HTML1 {margin:0px}
Here, line 1 is to hide the navigation bar while lines 2 and 3 are for hiding the footer. To add your own custom footer, go to Layout and insert a new HTML/Javascript gadget with something like
<div style="text-align: center;">
© 2011 Your name.
</div>
 at the bottom near the original footer. Click on save arrangement. Your changes should now be visible on your blog.

What else...
If you notice the search box at the top in this blog, it does not have the customary "Powered by Google" branding. Also I have tweaked the search results box a little to make it go better with the theme. I use the following code for this. You are welcome to use it for your own blogs as well. I have to admit, it's just a little buggy. But it's ok. This is a Blogspot blog after all. If you were actually serious, you would be using Wordpress.
.gsc-branding {display:none}
#uds-searchClearResults {border-style:none}
.gsc-tabHeader .gsc-tabhActive {border-left: none;border-right: none;border-top: none;}
#uds-searchControl .gsc-tabHeader.gsc-tabhActive {border-style: none;border-width: 0;}
.gsc-resultsbox-visible {box-shadow: 0 0 20px rgba(0, 0, 0, 0.2);border-style:none}
#uds-searchControl .gsc-results {border-style: none;border-width: 0px;width: none;margin: 20px -16px;}

Tuesday, December 6, 2011

PHP Compiler Part Two: An Introduction to Apache and PHP

I use XAMPP. Relieves me from the hassle of having to download and install Apache and then install PHP and then configure them to work with each other. Things get only more complicated if I want to install MySQL, Perl, etc. "XAMPP is an easy to install Apache distribution containing MySQL, PHP and Perl". Yes, it is. And not just that, it comes with phpMyAdmin, PEAR (the PHP Extension and Application Repository), Perl, Mercury Mail Transport System, etc and even a bunch of modules for Apache, all of them configured and ready to use. The latest version even comes with Tomcat to run Java Servlet Pages.

Apache is the actual web-server. It receives requests from browsers. PHP works under Apache. Apache's functionality can be extended by Modules. They may be loaded through directives specified in the Apache configuration file located at Apache/conf/httpd.conf. Apache also supports CGI. Here, Apache calls an executable to generate the web page that is to be given as response back to the browser. The CGI script can receive POST variables via stdin or Apache can also set environment variables which can be accessed within the CGI script. The output is given back to Apache via stdout.

PHP can be run either as an Apache module (mod_php) or as a CGI program. From what I understand, running PHP as a module is faster while running it as a CGI executable is more secure. XAMPP keeps its configuration separate from the main Apache config by using an Include directive (similar in function to C's #include) in httpd.conf to include httpd-xampp.conf.

Running PHP as an Apache module
When running PHP as a module, you should see something like php5apache2.dll or libphp5.so in your Apache/modules folder. .so files are shared libraries. They are the Unix/Linux equivalent of .dll files in Windows. You might be seeing a combination of .dll and .so files in your modules folder. The extension really doesn't matter.
LoadModule php5_module modules/php5apache2_2.dll

<IfModule php5_module>
<FilesMatch "\.php$">
SetHandler application/x-httpd-php
</FilesMatch>
</IfModule>
A configuration like the one above specifies the module to load and if the php5_module has been loaded, Apache is instructed to recognize all requests that match the regular expression \.php$ as PHP scripts.

Running PHP as a CGI program
When running PHP in this mode, you should notice a php-cgi.exe in your php folder. It is the CGI version of the PHP interpreter. If you have installed xampp in your program files folder, your config should look something like this.
ScriptAlias /php-cgi/ "C:/Program Files/xampp/php/"

<IfModule !php5_module>
<FilesMatch "\.php$">
SetHandler application/x-httpd-php-cgi
</FilesMatch>
<IfModule actions_module>
Action application/x-httpd-php-cgi "/php-cgi/php-cgi.exe"
</IfModule>
</IfModule>
To replace something like PHP, I realized I would have to build my own module for Apache which I could then set as the handler for PHP scripts. More on that in Part 3.

Saturday, December 3, 2011

PHP Compiler Part One: Translating PHP

A PHP Compiler. Hmmm.. That seems new doesn't it? Apparently not. It seemed like a radically-different-world-changing idea when it first struck me one morning. But later that day I learnt tht I was beaten to it by Facebook.... and a whole lot of other people! Anyway, this seemed like an interesting project.

PHP is a scripting language. Scripts are compiled into Opcodes and run on a virtual machine called the Zend Engine. It's a bit like the JVM for Java, except here, scripts have to be compiled each time before execution. In Java, code is compiled once into .class files which are then run on the JVM. The basic idea behind this project is to make PHP faster by compiling it. Machine code is supposed to execute faster than any interpreted language right?

Obviously I was going to need some help with this. Google's always been a good friend of mine at times like this. What I was looking for was something to translate PHP code to C/C++ so that I could use gcc/g++ to compile it to machine code. These are some of the projects I came across.

php2cpp - "The PHP to C++ Translation tool". Downloaded the source, compiled it and translated a tiny script. php2cpp uses a set of translation rules (these) to perform statement-by-statement translation of PHP code to semantically equivalent C++ code. That's all well and good. But then what about PHP's inbuilt functions?? Surely C++ does not have equivalent functions for Every Single function in PHP. Or what about code that makes use of extensions??? I didn't bother finding out as I wasn't able to get anything to actually compile after translation with php2cpp.

Then there's Roadsend. According to the description on Roadsend.com, you can use it to make web applications with FastCGI or even use it with PHP-GTK to make offline applications (with an embedded web-server). Hmmm, desktop application development with PHP. Weird, but if you are interested, check out Roadsend and WinBinder.

Facebooks HipHop - You should probably read this. Definitely better than anything I can write about it here.

PHC - This is what I found next. PHC has been in development since 2005. It supports the entire standard library of functions in PHP. The author claims in his blog that he was sort of a pioneer in this field (and I believe him. He's got a PhD in "compiler optimizations, static analysis and scripting languages") but back then he wasn't able to convince anyone to "give a shit", not even Facebook. And that was before they came up with HipHop. But now that Facebook's on-board, everyone's interested again in the field! Beware, the idiots on the Internet will have you believe that Facebook invented the concept. Now you know the truth.

BinaryPHP is what I eventually ended up using (mainly because it worked on the first try). I did not test it extensively but from my experience, I can tell you that it supports a few, not all, of PHP's standard library functions. It even supports the MySQL functions. I'll admit, I ended up not testing anything else once I got this working. May be some other time. This one works after all.

So ok. I can translate PHP code to C++ code and even compile it. Now I needed it to work under Apache, my web-server. More on that in Part 2.