From 30 Seconds to Milliseconds: How Threads Saved Our Printer Dashboard

Blog Image

From 30 Seconds to Milliseconds: How Threads Saved Our Printer Dashboard

In our previous post, we talked about choosing FastAPI and SQLite to build a snappy, efficient backend. But architecture on paper and code in the real world are two different beasts. We recently ran into a classic networking bottleneck that brought our lightning-fast API to a crawling halt—and we fixed it using the magic of threads.

Here is the story of how a critical API endpoint went from taking an agonizing 30 seconds to responding in milliseconds.

The 30-Second Nightmare: The Sequential Trap

Once our backend was hooked up to our fleet of 3D printers, we built an endpoint to fetch the “Printer List”—a dashboard view showing the live status, temperatures, and current print jobs for every machine on the network.

During local testing with a mocked single printer, it was instantaneous. But when we deployed it to the staging environment with a dozen actual printers on the network, the dashboard would hang… and hang… and hang.

The request was taking up to 30 seconds to complete. Why? Because we were making sequential, blocking HTTP requests. Our code essentially looked like a checklist:

  1. Ask Printer 1 for its status. (Wait 2 seconds…)
  2. Ask Printer 2 for its status. (Wait 3 seconds…)
  3. Ask Printer 3 (which is offline and takes 10 seconds to timeout).
  4. …and so on.

The total response time of our API was the sum of every individual network delay and timeout. In a web UI, making a user wait 30 seconds for a dashboard to load is an eternity. It breaks the illusion of a modern, responsive web app.

The Fix: Unleashing Threads

The problem wasn’t CPU power; it was waiting. Network requests are I/O-bound tasks. While the server is waiting for Printer 1 to reply, the CPU is just twiddling its thumbs.

To fix this, we implemented threading (specifically, concurrent execution). Instead of asking one printer at a time, we needed to ask all of them at the exact same time.

By utilizing Python’s concurrency features (like ThreadPoolExecutor and asyncio.gather), we dispatched a fleet of lightweight worker threads.

How it works now:

  1. The user requests the Printer List.
  2. The backend instantly spawns a thread for each printer on the network.
  3. Every thread sends its network request simultaneously.
  4. The server waits only as long as the single slowest response (usually just a few milliseconds, or a strict 1-second timeout for offline machines).
  5. The threads report back, the data is aggregated, and the JSON payload is fired off to the frontend.

The Result: Real-Time Performance

The impact was immediate and staggering.

By switching from a sequential loop to concurrent threads, the response time plummeted from 30 seconds to around 150 milliseconds.

Quick comparison:

  • Sequential Loop: Time = T(Printer 1) + T(Printer 2) + T(Printer N)
  • Threaded / Concurrent: Time = MAX(T(Printer 1), T(Printer 2), T(Printer N))

Now, even if we add 50 more printers to the network, the dashboard will still load in milliseconds. One offline or laggy printer no longer acts as a roadblock for the entire system; its thread simply hits a short timeout and returns an “Offline” status while the rest of the fleet reports in successfully.

Takeaway

If your backend is talking to external services, hardware, or third-party APIs, never do it sequentially.

Threading and asynchronous programming can seem intimidating at first, but for I/O-bound tasks like network requests, they are the absolute best tools in a developer’s arsenal to ensure a snappy, scalable user experience.