A'llo.

To be honest I think most of us work with wav or aiff to begin with as speech recordings are probably exported out in those formats, it's just that we directly work in Visionaire Studio with ogg/opus for audio, png/webp for images, mkv for videos, etc because it allows us to work on optimization & see how our games will run while we are working on them.
ogg vorbis isn't a requirement of the engine, we can technically import wav, aiff, ogg or mp3 files. ogg is just the recommended format because it's an open container format unlike mp3 & it's compressed unlike wav, aiff or flac.
In regards to what you mentioned about Thimbleweed Park converting at the end during the exporting/compiling process. We have some options like that, but unfortunately not for audio. Only for converting images to WebP, but the results for that aren't as reliable as converting yourself with XnConvert.
@darren-beckett: I think I actually prefer the tsv version more as it's less flappy. You can edit the tsv & insert additonal frames (mouth shapes) or time stamps if you want.
@machtnix: I keep forgetting that you are still using 4.2.5. I guess maybe you can use Sebastian's script, if he decides to share it. Or you could write a small script that returns a frame per character & idle mouth shape for anything other than a letter. I dunno.