Eric Crooks - Profile Photo

Eric Crooks

Software Engineer.Developer Advocate.Open Source Contributor.

From Worst in Class to World-Class: How We Flipped MiLUMA

Feb 14, 2026

Situation


When I joined LUMA as a Senior Software Engineer to work on the MiLUMA web app (not the mobile app—that's a separate project), about 11 engineers were developing and maintaining the web app across front-end, back-end, infrastructure, and full-stack roles. I contributed to both front-end and back-end development. Two months after being onboarded, most of the engineers left for other roles, leaving just one other engineer and me to maintain the distributed system. Initially, the system seemed stable to us, but during hurricane season (my first one in this system), it became clear the system couldn’t handle high request volumes due to limited concurrency—stemming from poor software design. The web app crashed, becoming unusable or unresponsive. After several releases that failed to improve performance, we requested dedicated time to thoroughly analyze the underlying issues and formulate a remediation plan. Mind you, this was alongside maintaining the web app.

Findings


After a couple weeks of investigating the microservices, we found database misconfigurations (DB and API level), blocking API calls, and threading issues from poor pool sizing and unnecessary roundtrips to the API gateway. What surprised us the most were the blocking API calls. All microservices contained blocking API calls despite having third-party dependencies enabling them to have non-blocking behavior. The blocking calls were easily identifiable as well because the calls blatantly said .block(). Being easily identifiable, this meant we could search the codebase and do simple changes from .block() to chained, non-blocking APIs (e.g., .someNonBlockingApi(...).map(...)).

Although the change was simple, we spent considerable time tracing call stacks across all flows to confirm non-blocking behavior. During our testing, we discovered single blocking requests taking 30–60 seconds to complete (think of this in terms of you being a customer and the web app taking this long to show you your account dashboard). Scaling by concurrent users worsens this metric, rendering the web app unusable. After switching to non-blocking calls and fixing backpressure solutions, the system handled hundreds of requests per second with stability and resiliency—a simple fix that was overlooked and should've been implemented from the get-go. Like I've been told by my mentors, "Do it nice or do it twice."

Our findings didn't stop at the API level. When we were ready to deploy the updated microservices, we discovered an 8‑hour (sometimes longer) release process. It was hindered with variables that made deployments tedious and error‑prone. By questioning how features were enabled, how changes to config files impacted user/data flows, how rollbacks worked, and how to safely halt traffic, we uncovered code smells. To name a few: misuse of feature flags, coupled microservice deployments, a lengthy rollback strategy, and no maintenance kill switch.

End State


What began as a two‑week investigation became a two‑month effort to migrate to a non‑blocking, zero‑downtime system built on enterprise-grade continuous delivery workflows. We shortened the 8-hour+ release process down to a 1-hour release process where most of that time was spent verifying Azure deploying the microservices properly. We moved to non‑blocking API calls, corrected circuit breaker patterns to properly handle backpressure, eliminated unnecessary microservice‑to‑microservice roundtrips through the API gateway, DRYed code so data flowed through shared non‑blocking paths, and refactored feature flag usage to support dark releases and external kill switches. I also updated Azure Pipelines documentation as I worked through streamlining the deployment flow (I love writing documentation).

For customers, they could reliably submit requests to the system and utility workers could respond to them in a timely manner—something that had not been possible in past hurricane seasons. For the dev teams, everything was meticulously documented—enabling new developers and stakeholders to confidently develop, maintain, deploy, and release with minimal guidance. For the web app, this effort paved the way for a future of high‑performance development, establishing a robust, scalable foundation that will support rapid feature delivery, long‑term maintainability, and trusted iteration over time.