Main Page | Recent changes | Edit this page | Page history

Printable version | Disclaimers

Not logged in
Log in | Help
 

Servers

From Gemin-Wiki

Table of contents

Web Servers

We have many web projects, most of which are too small to need even a whole server to themselves. Our usual work-a-day machines are general workhorses, set up to serve Apache and some test installs of Gemin-i Plus, plus do the DNS and email and other essential services needed to run the organisation and most of our websites.

The machines are named after the Powerpuff Girls:

buttercup.gemin-i.org

bubbles.gemin-i.org

It is expected that blossom.gemin-i.org will be needed some time in 2008/9, depending on the load on the other two machines.

Rafi.ki

At Gemin-i.org we run a very large install of Gemin-i Plus, built to scale much larger than most G+ setups are ever likely to get. While most setups will run just fine with one server running everything, we spread our G+ install over six servers to handle much larger loads. This document will be instructive to anyone wishing to do something similar, or to understand what's gone wrong if they're Gemin-i staff trouble-shooting our servers.

Layout

We have six machines, three running a replicated MySQL database with one master and two slaves, and 3 running the Gemin-i Plus server software. The MySQL databases are on a group of machines that run very little other than the MySQL server and which are not accessible from outside the server faram. They don't run apache, they don't have Java, all they do is keep the database up to date.

Each of the front-end servers connects to one DB server for reading, and they all connect to the master for writing. They do this at starup and then use those connection throughout it's life. If the connection drops for any reason (and it does now and then, unforunately) the G+ server reopens that connection.

"Him" always connects to read through Snake, "Mojojo" always connects to BigBilly for reading and writing "Fuzzy Lumpkins" always connects to read thorough Grubber

Rubbish ASCII diagram:

                          Gemin-i Plus Servers           MySQL Cluster
                            "The Beat Alls"             "Gangreen Gang"
                   .------------{Him}------<---read------{snake}
                  /                \                          |
                 /                  '-------write-->-----.    |
                /                                         \   |
 {Internet}-----+----------{Mojo-Jojo}-<--read/write-->--{big-billy}
                \                                         /   |
                 \                 .----------write-->---'    |
                  \               /                           |
                   '-------{Fuzzy-Lumpkins}--<--read------{grubber}
                                \                             |
                                 '------------------------{blossom}


Backup

Our backup server, Blossom, is added to the mysql database cluster, it's a slave which constantly tracks big-billy, as the other slaves do. We also add Blossom to the upload/download scripts to try and ensure that it keeps a copy of all files uploadded to Rafi.ki for backup purposes. Blossom is kept at a different ISP, on a different hardware, in a different location to the other machines.

MySQL Replicated Servers - The Gangreen Gang

The MySQL cluster machines are named after the "Gangreen Gang" in the Powerpuff Girls:

 snake.gemin-i.org
 grubber.gemin-i.org
 bigbilly.gemin-i.org

ace.gemin-i.org and littlearturo.gemin-i.org will be added later if needed.

NOTE: We have rejected mysql Clusters as their data size is too limited, all data must be held in RAM, and our databases are far too big for that. You can read about mysql clusters here: http://dev.mysql.com/doc/refman/5.0/en/mysql-cluster-overview.html

We are instead using a replicated master/slave system as detailed here: http://dev.mysql.com/doc/refman/5.0/en/replication.html

Mysql was installed from Debian's packages: apt-get install mysql-server

Setting up the master as a master basially just involves giving it a server-id and setting it as a master in the my.cnf file thus:

 [mysqld]
 log-bin=mysql-bin
 server-id=1

And the slaves just need a server-id and no log-bin. You'll have to perform an SQL command on the slave to make it start reading the bin-logs from the right place. First get some details from the master server:

 Show Master Status;

Then change the master status on the slave machine (Note you'll need to change the password, log_file ame and log_pos depending on the master setup. The password will be the line after the 'repl' line in the master.info file in the /usr/local/mysql/data directory of a slave machine:

 CHANGE MASTER TO
   MASTER_HOST='bigbilly.gemin-i.org',
   MASTER_USER='repl',
   MASTER_PASSWORD='replication_password',
   MASTER_LOG_FILE='myql-bin.000004',
   MASTER_LOG_POS=1011827063;

Then start the slave:

 Start slave;

Gemin-i Plus Servers - The Beat Alls

The Gemin-i Plus servers are named after the "Beat Alls", in one episode of the Power Puff Girls, a short-lived gang made from regular Power-Puff baddies.

 mojojojo.gemin-i.org
 fuzzylumpkins.gemin-i.org
 him.gemin-i.org


If and when demand requires it princess-morebucks.gemin-i.org will be added.

Packages installed on these machines to make G+ work:

 apt-get install sun-java5-jre sun-java5-jdk ant php5-mysql php5 ffmpeg imagemagick php5-gd apache2.2-common libapache2-mod-php5

You'll also need to edit /etc/php5/apache2/php.ini to enable both mysql and gd.

The Gemin-i Plus software is installed on three machines running on a round-robin DNS system. So every request to www.rafi.ki or will be routed to one of these machines at random. If it's a request to /server/*, apache running on these machines will tunnel that request to port 8080 where our Java Jetty servers await.

Most requests will simply grab or send data to or from the Gangreen Gang and then send back results to the flash application on the clients machines.

Some requests will be normal HTTP fetches from the apache install on these machines though. Most will be treated normally, fetching their data from the rsynched HTTP roots of those machines. Some may be 'upload' requests, which have to ensure that the other machines in the Beat Alls are also updated to contain the newly uploaded data.

This setup that indicates how the machines are wired together (IE the above ASCII diagram) is all controlled through the kernel_config file. Here's an example, from mojojojo:

 <cluster>
   <clusterName>http://globalgateway.gemin-i.org/gemin-iplus </clusterName> 
   <clusterServer>87.240.129.90</clusterServer>
   <clusterServer>87.240.129.91</clusterServer>
   <clusterServer>87.240.129.92</clusterServer>
   <registrationMachine>87.240.129.34</registrationMachine>
 </cluster>
 
 <database>
    <host>bigbilly.gemin-i.org</host>
    <readHost>bigbilly.gemin-i.org</readHost>
    <username>UUUUU</username>
    <password>XXXXX</password>
    <name>gemdb</name>
    <type>mysql</type>
    <driver>org.gjt.mm.mysql.Driver</driver>
  </database>

As you can see, the database section contains the usual <host> tag which defines the Write database, and also a <readHost> which is used for DB "Select" commands. If the readHost is missing, the normal host is used for reading too.

The IP addresses of every font-line machine in the cluster must be specified in every G+ machine in the cluster, so to add a new front-line server you would have to edit the config files in ALL the existing servers.

Things like the File Upload process access this list and send the file on to every IP in the list unless the IP matches the machine itself.


DNS

The DNS for the front-line machines is set with round robin DNS. You can see this in the /etc/maradns/gemin-i.org. file, which gives 3 IP addresses for the hostname 'rafi.ki':

 Arafiki.%|300|87.240.129.90
 Arafiki.%|300|87.240.129.91
 Arafiki.%|300|87.240.129.92

The client will chose one at random to send it's queries to, automatically picking another if there is no repsonse.

Changes Made To The Code

Clearly new files will be uploaded to whichever machine the client connects to, chosen basically randomly from the DNS list. However, when the server is part of a cluster, it's that servers responsibility to then ensure that the same files get uploaded to the other servers in the cluster. It uses a function in uploadFile.php called clusterTransfer to do this, that function gets the IP addresses to send the file to from the Kernel_config. Each of the machines recieves the files through clusterUpload.php which only accepts connections from machines also in the cluster (including the Dev machine, which has to set Avatars for new users)

The Sessions, Messages and Subscriptions which were tracked by a Hashtable in Java have been altered to use the database for persistant storage instead. This was nesassary so that the cluster would function, or these states would be lost whenever the client starts sending requests to a different IP address.


Adding New Servers

Obviously the whole point of a large cluster setup like this is that we can grow it with demand, scaling out to meet that demand however many people turn up and start using the service. So if things start to slow down, or we expect a huge influx of users, we can simply add more computers to handle that load.

Adding a front-line G+ server

Adding a database slave

 [mysqld]
 server-id=5

Upgrading a DB Slave to a Master

Retrieved from "http://dev.gemin-i.org/wiki/index.php/Servers"

This page has been accessed 1367 times. This page was last modified 16:28, 17 Mar 2008.


[Main Page]
Main Page
Recent changes
Random page
Current events

Edit this page
Discuss this page
Page history
What links here
Related changes

Special pages
Bug reports