I am going to investigate the BufferedRandomAccessWriter performance. Specifically, we flush when we call position(), and that appears to prevent efficient buffering.
Based on my preliminary tests, there are cases where we flush very small numbers of byte at a time. In the OnDiskGraphIndexWriter#write method, we call position() in an assertion, and that means that we flush once per node instead of when the buffer is full.