gldownload: Use EGL fences instead of glFinish to sync dmabufs

NXP implemented fast texture downloads using ION backed dmabufs
and used glFinish to ensure rendering is complete before passing
the CPU accessible buffer downstream. glFinish is disastrous for
performance though as it stalls the GPU until any and all pending
operations are completed, as well as the single GL thread shared
between al GL element in the pipeline.

This is particularly bad when there are two or more branches in
the pipeline (i.e. using tee). If one, say, is rendering to the
screen and one is doing color space conversion for inference in
separate paths this glFinish will force both to always wait for
drawing operations in the other even if they're not operating
on the same surfaces.

Since the dmabuf is imported as an EGLImage we can instead use
the EGL_KHR_fence_sync extension to sync CPU access to the
buffer. Not stalling the GPU is the main benefit, but we also
don't have to wait for the rendering to complete in the GL
thread so we're not blocking it either.

End result is about 20% higher throughput in the above
described use case, as well as much smaller screen refresh rate
jitter.

Change-Id: I6de5bcb2198571ac047f985d8179099595bbe8e8
3 files changed