Here are initial performance results for a simple write workload on a new Ceph
cluster. There are 6 nodes in the cluster with 2 OSDs per node. Each OSD is has
a dedicated data drive formatted with XFS, and both OSDs share an SSD for the
Well, I hope the cluster can do better than that, but we are only connected
with gigabit Ethernet, so it looks alright so far.
Um, well shit, it didn't get any better. Notice all the zero's in the cur
MB/s column. Something is wonky. Check Ceph status.
That's concerning. Get some details.
Alright, well it is nice that Ceph also thinks something is wrong. Where are those OSDs?
Oh, both osd.3 and osd.9 are on the same node. Must be a shitty node. There
doesn't really seem to be anything out of the ordinary in dmesg on
issdm-23. This is relatively old hardware, and we've had problems before. I'm
just going to decommission the node and replace it later. First, mark the OSDs
on issdm-23 out and wait for recovery to complete.
Ok, now we are ready to really remove them from the cluster. On issdm-23
we'll shutdown the daemons.
Then remove the OSDs from the CRUSH maps, revoke authentication keys, and all
that other good stuff.
Ok, we're back up with a clean bill of health, and 2 less OSDs.