It might not be what you’d think, but my guess is the time is all spent scanning the waveform for peaks for the graphical display. It’s unlikely the wavs are being loaded completely into ram as that would be an unnecessary limit on loop sizes; so what else could the delay be for?
I believe quite a few DAWs and audio editors have similar (if shorter) delays. They get around it by making a cache file with those peaks in for subsequent loads so the delay only happens once.
With a bit of development cost it could probably be avoided completely with clever buffering on the wav read during normal playback, so a screen’s worth of audio has always been scanned during playback. Or do an on demand thing where a simple block is shown in the UI while the peak scan is going on. That wouldn’t delay the use of the song, only the fancy waveform display.
I’m fairly sure that Aeros is an embedded Linux system (its WiFi MAC oui is BeagleBone) so there’s no reason some multi threaded magic couldn’t be going on. So as with a lot of these things: no reason that this winkle won’t go away eventually.