Server room
About servers and the room they live in
Summary
The server room is complete! All departmental hardware has been moved and is running happily. Remaining tasks
- Replace the door with a sound proof, more secure, card-locked one. (Asked Zoology to contribute.)
- Brace racks together. Angle iron? Unistrut?
- Cable tidying.
- A new KVM over IP switch would be nice (asked Zoology to buy one.)
Safety Considerations
- Electrical
- Power can be isolated at the distribution board in the room.
- Three phase power is in use.
- There is a difference of 415V between phases.
- The large commando sockets can supply 32A and are not protected by RCDs.
- The standard sockets are protected by an RCD.
- Structural
- The maximum safe floor loading is 3400kg. Record equipment weights in the tables below.
- Heavier equipment must be placed in the racks closer to the door.
- The racks are not bolted to the floor and there is not yet a support strut bracing them against each other.
- Space constraints
- While the rack doors are open the entrance can be blocked.
- The rear of the racks are not easily passed through.
- Fire escapes
- Out of the room entrance, down the stairs, out through reception
- Out of the room entrance, across the lobby, up the stairs, over the roof down the external fire escape.
- Out of the window, across the roof, down the external fire escape
Racks
Layout and numbering
Equipment sizes
Due to space constraints not all equipment will fit.
The equipment must not be > 85cm deep.
The equipment must not protrude more than 4cm in front of the rack angles at the end racks or 1cm in front of the rack angles in the centre racks.
The rails for the equipment must be able to fit in the 68cm front-rear rack angle spacing.
There are a very limited number of shelves available for non-rackable equipment.
Power
The 3 phase feed is fed through a smart meter. We should try to keep the phases balanced.
Phase 2 powers the 32A sockets for racks 1-3. Phase 1 powers the 32A sockets for racks 4-6. Phase 3 powers the standard 13A sockets.
UPSes feed smart PDUs. IEC cables powering devices should be colour coded to match the UPS they draw from.
netsvc monitors the UPSes for various statistics. Devices should install the NUT upsmon utility so that netsvc can trigger safe shutdowns should the power or cooling fail. If netsvc doesn't know about a device it will just power the UPS down. Power usage, UPS temperature and UPS runtime can be seen here.
Cooling
The Ecocooler unit is on the roof. When the set point thermostat on the ceiling is over its set temperature and the pulse thermostat is above its set temperature the cooler will pull in air from outside through a filter. The filter can be wetted to increase cooling capacity. The humidity sensor on the ceiling will prevent excessive humidity. If the thermostat by the door is too cold the system should recirculate air.
The cooling system will shut down in the event of a fire being detected in the attic zone. It will restart automatically.
There are two fans so that any failure does not stop the system operating. During the winter the system should be isolated from the water main and drained down at drainage points within the room. Maintenance contract with Celsius Design.
Cooling Failure Behaviour
How long does it take room thermostat to hit 35 degrees?
1625: Turned cooling off.
1725: Got bored. Thermostat reached 27 degrees
1736: Room thermostat back at 20 degrees
Networking
There are 4 OM3 fibres leading to the basement comms room. All are live as LACP merged 1GB links to the core switch, providing 2GB of capacity. There is management VLAN linked to netsvc, the UPSes, the PDUs, and various BMCs.
Lists of Equipment
+ means plus a shelf
? means not yet installed
Rack 1
| Machine | Description | Power Source(s) | Year | Weight (kg) | Height (U) |
|---|---|---|---|---|---|
| xen0 | Virtual machine host | rack1b (orange) | 2012 | 5 | 1 |
| xen1 | Virtual machine host | rack1t (green) | 2011 | 5 | 1 |
| xen-jbod | Storage for VM hosts (experimental) | rack1t (orange) + rack1b (green) | 2008 | 30 | 2 |
| rack1t | MGE 2200VA UPS | 13A socket (phase 3) | 2005 | 35 | 2 |
| spare | Old database server | 2004 | 8 | 1 | |
| clara | Flybase server | rack1t (orange) + rack1b (green) | 2012 | 40 | 5 |
| space | 24 | ||||
| central | Glover group storage | rack1b + rack2b | Nov 2008 | 30 | 2 |
| larch | Main file server | rack1t (orange) + rack1b (green) | 2014 | 50 | 4 |
| larch-jbod | More disks for larch | rack1t (orange) + rack1b (green) | 2013 | 50 | 4 |
| rack1b | APC 6000VA UPS | 32A socket (phase 2) | 2012 | 50 | 3 |
293kg / 700kg, 24/42U
Rack 2
| Machine | Description | Power Source(s) | Year | Weight (kg) | Height (U) |
|---|---|---|---|---|---|
| vpn | Draytek VPN device | 13A socket (phase 3) | 2009 | 2 | 1 |
| space | 29 | ||||
| salas-disks2 | Carazo Salas RAID disks | rack2b | 30 | 2 | |
| salas-disks1 | Caraso Salas RAID disks | rack2b | 30 | 2 | |
| salas-disks0 | Caraso Salas RAID disks | rack2b | 30 | 2 | |
| salas-server | Caraso Salas imaging server | rack2b | 8 | 1 | |
| db | Database server | rack2b (red) | 2009 | 5 | 1 |
| bramley | Wiki and OpenDirectory server | rack2b (red) | 2009 | 8 | 1 |
| rack2b | 6000VA UPS | 32A socket (phase 2) | 2013 | 50 | 3 |
163kg / 500kg, 13/42U
Rack 3
![]()
| Machine | Description | Power Source(s) | Year | Weight (kg) | Height (U) |
|---|---|---|---|---|---|
| switch-sr-0 | Network switch | rack3t | 2010 | 5 | 1 |
| netsvc | Network utility server | rack3t | 2004 | 5 | 1 |
| space | 24 | ||||
| Mail server | rack1b + rack2b | ? | 40 | 5+ | |
| heidi | Flybase server | rack1b + rack2b | ? | 40 | 5+ |
| malibu | Flybase server | rack2b | ? | 20 | 4+ |
| xena | flyprot/russell lab server | rack2b (red) | 2009 | 20 | 5+ |
130kg / 300kg, 24/42U
Rack 4
| Machine | Description | Power Source(s) | Year | Weight (kg) | Height (U) |
|---|---|---|---|---|---|
| multivac | Compute server for Jiggins group | rack4b | 2011 | 10 | 1 |
| multivac-raid | Storage for multivac (offline) | rack4b + rack5b | Aug-2008 | 30 | 2 |
| multivac-jbod | Storage for multivac | rack4b + wall | Jul 2013 | 30 | 2 |
| biomart | Bioinformatics server for Glover group | rack2b* + wall | 2010 | 30 | 2 |
| amapress | Web server for Arias group | rack5b | 2012 | 5 | 1 |
| space | 22 | ||||
| heisenberg | Backup server for Proteomics (CSBC) | rack4b + rack4x | 2014 | 50 | 4 |
| dumptruck-eonstor | Storage array for dumptruck (CSBC) | rack2b* + rack2b* + rack5b | ? | 40 | 4 |
| dumptruck | Backup server for CSBC/SCI | rack2b* + rack2b* + rack2b* | ? | 30 | 3 |
| dumptruck-sumo1 | Storage array for dumptruck (CSBC) | rack4x + rack2b* | ? | 80 | 4 |
| dumptruck-sumo2 | Storage array for dumptruck (CSBC) | rack4x + rack2b | ? | 80 | 4 |
| rack4b | 6000VA UPS | 32A socket (phase 1) | 2014 | 50 | 5 |
445kg / 700kg, 30/42U
Rack 5
| Machine | Description | Power Source(s) | Year | Weight | Height (U) |
|---|---|---|---|---|---|
| Space | 14 | ||||
| switch-zoo | Switch for Zoology cluster | rack5m | 2013 | 3 | |
| i21 | Zoology cluster (pc) | rack5m+rack5b | 2013 | 8 | 1 |
| i20 | Zoology cluster (pc) | rack5m+rack5b | 2013 | 8 | 1 |
| i19 | Zoology cluster (pc) | rack5m+rack5b | 2013 | 8 | 1 |
| i18 | Zoology cluster (pc) | rack5m+rack5b | 2013 | 8 | 1 |
| zoo-synology | Storage for Zoology cluster | rack5m+rack5b | 2013 | 8 | 1 |
| zoo-qnap | Storage for Zoology cluster | rack5m + rack5b | 2006 | 30 | 2 |
| rack5t | 3000VA UPS | 13A socket (phase 3) | 2005 | 40 | 2 |
| i17 | Zoology cluster (pc) | rack5m+rack5b | 2013 | 20 | 2 |
| i16 | Zoology cluster (pc) | rack5m+rack5b | 2013 | 20 | 2 |
| x15 | Zoology cluster | rack5m | 2013 | 8 | 1 |
| x14 | Zoology cluster | rack5m | 2013 | 8 | 1 |
| x13 | Zoology cluster | rack5m | 2013 | 8 | 1 |
| x12 | Zoology cluster | rack5m | 2009 | 8 | 1 |
| x11 | Zoology cluster | rack5b | 2009 | 8 | 1 |
| x10 | Zoology cluster | rack5b | 2009 | 8 | 1 |
| x9 | Zoology cluster | rack5b | 2009 | 8 | 1 |
| x8 | Zoology cluster | rack5b | 2009 | 8 | 1 |
| x7 | Zoology cluster | rack5b | 2009 | 8 | 1 |
| x6 | Zoology cluster | rack5b | 2009 | 8 | 1 |
| x4 | Zoology cluster | rack5b | 2009 | 8 | 1 |
| x2 | Zoology cluster | rack5b | 2009 | 8 | 1 |
| rack5b | 6000VA UPS | 32A socket (phase 1) | 2013 | 50 | 3 |
259kg / 500kg, 28/42U
Rack 6
| Machine | Description | Power Source(s) | Year | Weight (kg) | Height (U) |
|---|---|---|---|---|---|
| amaserv | AMA imaging server | Rack5b | 2010 | 40 | 4 |
40kg / 300kg, 4/42U
Overheat protocol
On netsvc2 execute /etc/init.d/nut poweroff to initiate safe shutdown.
Restart protocol
- Start the UPSes
- Start anything labelled "RAID" or "JBOD"
- Start netsvc. This is linked to the KVM so boot progress can be viewed. Wait for it to prompt for a login.
- Start db, xen0 and xen1.
- Wait for xen0 and xen1 to finish booting, they're on the KVM so their boot progress can be viewed.
- Log into xen0 and xen1. Run: watch cat /proc/mdstat
- On xen0 and xen1: Wait for both to have synchronised (i.e. no tasks left with a finish time.) Press ctrl-c to stop watching mdstat
- On xen0 or xen1: Run: sudo watch drbd-overview
- On xen0 or xen1: Wait for all lines to contain "Connected Secondary/Secondary UpToDate/UpToDate C r----" Press ctrl-c to stop watching.
- On xen0
- Run: sudo drbdadm primary ash
- Run: sudo drbdadm primary print-disk
- Start netsvc (Virtual Machine, see note below.) Check it is running by using putty on the departmental laptop to ssh into it.
- Start ash (Virtual Machine, see note below.) Check it is running by using the departmental laptop to connect to \\ash
- Start print (Virtual Machine, see note below.) Check it is running by using the departmental laptop to connect to \\print
- Start keyserver (Virtual Machine, see note below.) Check it is running by using putty on the departmental laptop to ssh into it.
- Start larch. Wait for it to finish booting (takes a while!)
- On xen1
- Run: sudo drbdadm primary birch-disk
- Start birch (Virtual Machine, see note below.) Check it is running by using the departmental laptop to connect to \\birch
- Start dmgweb (Virtual Machine, see note below.) Check it is running by using putty on the departmental laptop to ssh into it.
- Start remaining machines
Starting Virtual Machines
When logged into xen0 or xen1
- Run: sudo xm create <machine>.cfg
- Run: sudo xentop
- Wait 10 minutes. If the CPU usage of any virtual machine is continuously over 50% then something's probably gone wrong and any further machines may cause a crash.