FINALLY resolved the issue I was having with the presto connector.
So I've found out cache locality is a thing, because accessing Vec<Vec<T>> is horrific compared to Vec<T> using an API that pretends it's actually a 2D Vector.
So that was one issue, then I found out when presto says it's done, it doesn't mean it's done. it just means that it's done if you get all the data, so I've changed how it looks at done queries and now it's writing fairly fast.
So after this, I just need to finish the binary example, add in some error handling for other queries, and then it should be done?