columbia: reduce GPU CMA carveout to 32M

Out of the ~640 total memory set aside for CMA, GPU reserved 128M
on boot. That leaves less for other CMA users such as VPU
and dma-buf backed GL textures.

When the GPU driver needs to allocate memory that doesn't fit in the
carveout it will try generic CMA. By reducing the GPU carveout we can
utilize the CMA region with greater flexibility; GPU can still alloc
as before until CMA runs out, and if it doesn't other users can get
their slices of CMA cake.

GPU carveout is also never cacheable, so for performance reasons we
want all large allocations ever touched by CPU to be in generic CMA.

enterprise already has this exact change.

Change-Id: I1fdfe1d5ed0218cf613deffce0ff0a1732b1ecde
diff --git a/arch/arm64/boot/dts/freescale/fsl-imx8mm.dtsi b/arch/arm64/boot/dts/freescale/fsl-imx8mm.dtsi
index e200219..99624c9 100644
--- a/arch/arm64/boot/dts/freescale/fsl-imx8mm.dtsi
+++ b/arch/arm64/boot/dts/freescale/fsl-imx8mm.dtsi
@@ -1173,7 +1173,7 @@
 	gpu: gpu@38000000 {
 		compatible ="fsl,imx8mm-gpu", "fsl,imx6q-gpu";
 		reg = <0x0 0x38000000 0x0 0x8000>, <0x0 0x38008000 0x0 0x8000>,
-                        <0x0 0x40000000 0x0 0x80000000>, <0x0 0x0 0x0 0x8000000>;
+                        <0x0 0x40000000 0x0 0x80000000>, <0x0 0x0 0x0 0x2000000>;
 		reg-names = "iobase_3d", "iobase_2d",
                         "phys_baseaddr", "contiguous_mem";
 		interrupts = <GIC_SPI 3 IRQ_TYPE_LEVEL_HIGH>,