Colossus

Friday, Aug 18, 2017 · 1400 words · approx 7 mins to read

I’m in the middle of trying out Ulysses so I thought I’d write a bit more as I evaluate it. I’m going to build a new computer at some point this year, hopefully soon, which I’ve already called Colossus. I’d planned to blog about it after I’d started building it, to discuss what I’d chosen and why it was being built, but I figure why not talk about some of that while I’m still deciding on parts and waiting for the processor to become available from work (hopefully!).

I used to constantly upgrade my computer to keep pace with what was happening in the computer industry and stay on the bleeding edge. It was my job to write about that edge and inform our readers about what it was like, so I stayed current as much as I could. Sometimes I’d change the guts of that machine every month, effectively giving me a brand new PC each time.

Times changed. I stopped writing about tech for a living and eventually decided to give a Mac a try, months of research culminating in the purchase of a new MacBook in 2008. Since then I’ve had just a handful of machines: the iMac I replaced the MacBook with in 2011, and the iMac that replaced that in 2016. Those iMacs, both of which I’ve loved dearly, have seen me keep using macOS as the mainstay of my productivity and fun for almost the last decade. Unless Apple completely throw out everything I love about the platform somehow, long may that continue.

I’ve also kept a PC for the last few years, partly as something on which to experience consumer VR, and partly to run the cross-platform builds for some software I write in my spare time for my side business. I’ve never spent much personal money on it. Half of the RAM I nabbed from a partly broken PC we didn’t need at work, and in recent years it’s been modestly upgraded by my side business with a new case, big GPU and SSD, and a friend of mine (hi James!) donated the mainboard and processor. Quite a beefy 6-core Sandy Bridge-E as it happens, even if it was almost 5 years old at the time I took stewardship. Thanks, James!

Now that I deal with PC graphics again for a living, and I’m hoping to eventually release some of the GPU Tools software into the wild in 2017, I follow new PC technology again and want to build a new machine to last me 5 years as the backbone of my non-Mac computing. I’ve timed it right, too, since today’s high-end desktop platforms have gained some serious legs in recent months and are now sprinting away from the mid-range in a big way.

Ethos

The idea behind Colossus is a simple one: very high core count, large memory pool, very high speed disk I/O, plus SR-IOV to allow me to run a Unix hypervisor and Windows fully virtualised with native access to the GPUs. Why try and max everything out? Longevity, very high performance, and flexibility for the use cases.

Use case 1: Gaming

Since getting an Oculus Rift CV1 over a year now, I’ve shifted from primarily gaming on consoles and come back to PC for most of it (although I am finally playing The Last Of Us: Remastered on PS4 just now!). While I want Colossus to primarily be a server system, being able to run Windows in a VM and have access to the installed GPUs as if it were bare metal via SR-IOV is a primary feature I need.

Use case 2: Build system

The software I write is cross-platform (Windows, Linux, Android, iOS and I’m working on macOS support in my spare time just now) and I currently build it with TeamCity. TeamCity supports distributed build agents, so the TeamCity server runs on Windows and connects to Linux and Windows VMs running on that same Windows host to build the Linux, Android and Windows posts. iOS and macOS are built on a VM that runs here on my iMac, since you can’t virtualise macOS on non-Apple hardware.

I need to keep that build system running. The plan is to run the TeamCity server in its own VM now, rather than on Colossus’ host OS, running the existing agents by means of copying the existing VMs over to run on the new hypervisor.

Use case 3: test lab

I have a decent amount of stuff online these days for various things, all hosted at Hetzner on a pair of bare metal root servers. One runs FreeBSD and one runs Linux. Crucially, all of the staging infrastructure for the live public stuff also runs side-by-side with the live stuff on the same hardware. While that’s fine, especially since running on the same bare-metal has advantages, the servers are pretty packed with VMs with no room for any more. If I can move the staging VMs to Colossus, that’ll give me some much-needed free space on the servers, allowing me to expand without more capital outlay and monthly cost for yet another server at Hetzner.

Hypervisor

That leads me nicely into talking about the hypervisor and main OS setup that I have planned. If I can, I plan to use FreeBSD as the hypervisor host OS on Colossus, running everything else via virtual machine. FreeBSD supports SR-IOV, although it’s unclear if it does yet on the hardware I’m going to get, so timing might be important if I want to build on top of FreeBSD.

If I’m forced into using Linux, I haven’t chosen a distribution. Ubuntu and Arch both support ZFS-on-Linux today, so probably one of those.

Hardware

So what kind of hardware for a super high-end build that’ll do everything via VM, basically all at once if necessary?

AMD Ryzen Threadripper 1950X
ASRock X399 Taichi
64GiB ECC DDR4 (PC4-19200, 4 x 16GiB) (probably Samsung M391A2K43BB1 B-die)
1TB NVMe M.2 SSD (probably a Polaris SM961)

The rest I already have, although I might change the cooler. While I have a large 140mm Noctua and the TR4 socket adaptor already, Threadripper 1950X is rated at almost 200W, so it might be prudent to switch to liquid cooling. I’ve yet to decide.

There’s still some uncertainty in the main component list, too. Supermicro are apparently working on a TR4 mainboard for Threadripper that will hopefully blend the consumer TR4 socket with their common server-class features and reliability. That’s top of my list, but it might not appear for a while (or Tyan, if they’re thinking of doing the same). Otherwise, Gigabyte’s Aorus Gaming 7 is the best board on sale today for the features that I need.

Then depending on eventual costs it might be possible to get a Samsung 960 Pro instead of the cheaper Polaris SM961. There’s currently £150 in it though for around 10% more read performance and around 17% more write performance (both peak throughputs). It might not be worth it, especially given the performance floor of the SM961 is already amazing.

The build will go into my existing chassis (a Corsair Carbide Air 540) and use my existing GPU (AMD Radeon RX Vega 64).

Notes

Storage wise, rather than double up on costs for the NVMe drives to get RAID1 and some redundancy, I’ll take rolling snapshots of the filesystem (it’ll be ZFS regardless of whether I can use FreeBSD or Linux) and have them backup to my ZFS-powered NAS.

I’m kind of banking on SR-IOV working well, too. It’s the backbone of the whole system and required so that I can use a non-Windows host OS, given my gaming requirement.

Outro

So that’s where I am with the build today during planning. Super high end components to last at least 5 years (with a potential mid-life kicker for the processor depending on what AMD do with the TR4 socket and future models), and a software setup that relies on a modern hypervisor setup to get the most out of it.

Next stop is component buying in October!