| Q. How long have you worked for Torn? A. Since 2006. Q. What do you do on Torn? A. I manage all the infrastructure (servers, etc.). I also have written some of the code behind the site, and coordinate many of the developers. Q. Infrastructure...... What is that? A. Torn is actually a pretty busy site, we have quite a lot of users and also quite wide ranging requirements that have built up over the years. My job is to make sure this is all running, ensure that it is monitored, investigate problems and then look to improve it. We are always improving things. Want to see Torn's infrastructure? Take a look here. Q. How did you get into this? A. I have worked with growing websites for many years, and Chedburn found me when he was having major problems scaling the site. Users from then will remember what real lag was like - multi minute page loads at all times of day. We slowly introduced changes to scale the site. I have previously worked at large web firms including a large search engine and gaming company (operating in a very different sector to TORN). Q. Has the game changed a lot since you started? A. Totally. It's really difficult to remember what it was like back in 2006. From an Infrastructure standpoint alone we have gone from 1 server to an entire rack of equipment. Q. Have you met Chedburn in RL? A. Yes. More than once. We occasionally meet up in person to discuss major changes. Chedburn is very good at generating ideas, it often becomes my job to actually deliver them. Q. What are you working on now? A. There are several major projects underway. One is an infrastructure refresh - we have never perfected the full high availability for all our servers, and have recently bought enough servers to finish this off. Once this project is done, no single server failure will impact TORN users (currently, there are a small number of single points of failure). I am also trying to figure out a way to make a new generation chat -with a large number of bug fixes and feature requests - actually happen. Q. What kind of tools do you have to run a game this size? A. One of the big challenges is to get the right tool for the job. TORN is big enough to have some pretty significant requirements, but we are actually very small in terms of staff. We use some things that have been custom written - generally by me - to correct specific problems (for example after the great crash many years ago we are pretty paranoid about backups, so have a script that imports every backup into a Amazon EC2 instance, and verifies that the import works). On the other end of the spectrum, we have moved our log management and monitoring system to a hosted service because it was taking far too much time to manage this ourselves! Q. What's the most challenging thing you have dealt with? A. We have had a bunch of major problems as TORN. Significant security problems 18 months ago over the New Year period was pretty tough for everyone, and of course the great crash many years ago when we lost a database. Q. Any advice for budding infrastructure people? A. Get familiar with Linux. Understand how it actually works. Don't just learn how to do something, but why. Don't be afraid to play about. Virtual machines are a wonderful thing. Learn productivity tools - tmux is my current favorite tool, but there is so much time you can save with a good understanding of basic command line utilities. Finally, read up on networking - many problems at scale involve at least some networking element, and it's impossible to troubleshoot without understanding. |