Snehal Patel

Snehal Patel

I love to build things ✨

BlinkThink: Self-Hosted Camera Snapshots with FastAPI and Gemini

February 26, 2026

BlinkThink Logo

If you own a Blink camera, you have accepted a certain set of implicit terms: your footage lives on Amazon’s servers, you pay a subscription to access more than a handful of clips, and the moment their service has a bad afternoon your security feed goes dark. For casual use, this is fine. But if you want to build any kind of custom workflow around your cameras: scheduled snapshots, programmatic triggers, automated analysis: the Blink app gives you nothing. There is no API, no export, no hooks. Just a mobile interface and a cloud you do not control.

BlinkThink is my answer to that constraint: a lightweight, self-hosted Python web app that wraps the Blink camera API in a FastAPI server, stores snapshots locally, and optionally runs them through Gemini for structured image analysis. You own the stack. You own the data.


The Problem with Cloud-Dependent Cameras

The frustration with cloud cameras is not really about the cameras themselves: Blink hardware is fine. The frustration is the dependency surface. When Amazon has an S3 outage, your security footage is unavailable. When Blink changes their subscription tiers, features you relied on disappear behind a paywall. When their app gets a forced update, the interface you had memorized changes.

More fundamentally: you have no way to know how long your footage is retained, who can access it under what legal circumstances, or what happens to historical clips if you cancel your subscription. These are not paranoid concerns: they are straightforward consequences of putting your data in someone else’s system.

The Blink app is adequate for checking in on a camera from your phone. It is useless if you want to build anything on top of it. The open-source blinkpy library solves exactly this: it reverse-engineers the Blink API and exposes it as a Python client, giving you programmatic access to authentication, camera metadata, and snapshot capture. BlinkThink builds the rest of the stack on top of it.


What I Built

BlinkThink is a self-hosted FastAPI application that connects to your Blink account, captures snapshots from any of your cameras on demand, stores them locally as JPEGs, and serves a web gallery for browsing them. MFA is supported. Multiple cameras work. The gallery filters by camera. An optional Gemini integration can analyze any snapshot and return structured scene descriptions.

Starting the server is a single command:

uv run uvicorn main:app --reload

No database. No external dependencies beyond the Blink API and an optional Gemini API key. A few hundred lines of Python.


Under the Hood: The Architecture

The app has three distinct layers that are worth walking through separately.

Layer 1: FastAPI Backend (main.py)

The entry point uses FastAPI’s lifespan context manager to handle startup tasks: creating the local snapshot directory and attempting auto-login from persisted credentials before the server starts accepting requests. If auto-login fails, the server starts anyway and waits for a manual login through the UI.

The REST API is cleanly divided by concern:

  • /api/auth/login and /api/auth/verify-mfa handle the two-step authentication flow
  • /api/cameras returns the list of available cameras from the active Blink session
  • /api/snapshot/{camera_id} triggers a snapshot capture and writes the JPEG to disk
  • /api/analyze accepts image bytes and an optional prompt, returns structured Gemini analysis

Pydantic models (LoginRequest, MfaRequest) validate all inputs at the API boundary. CORS is restricted to configured origins: never a wildcard. The intent is that this runs on your local network, not exposed to the internet, but that is not an excuse for sloppy defaults.

The Blink client is a singleton: instantiated once at module level and shared across all requests. This matters because blinkpy session state is not cheap to recreate. Re-authenticating on every request would be slow, fragile, and likely to trigger rate limiting. The singleton pattern keeps a single live session for the application’s lifetime.

The authentication flow mirrors Blink’s two-step OAuth handshake:

# Step 1: initiate login
await client.start_login(email, password)
# Raises BlinkTwoFARequiredError if MFA is needed

# Step 2: complete MFA
await client.verify_mfa(pin)
# Persists credentials to blink_credentials.json

After a successful verify_mfa(), credentials are written to disk. On the next server restart, the lifespan hook picks them up and auto-logs in. The token chain stays alive across restarts without requiring the user to re-authenticate each time.

Snapshot capture works by calling snap_picture() on the camera object, then reading camera._cached_image: the bytes that blinkpy holds in memory after the snap. The method is get_snapshot_bytes() and it returns raw JPEG bytes that the endpoint then passes to the filesystem layer.

Layer 3: Filesystem (utils/)

Two small utilities do the work of keeping the snapshot directory clean and navigable.

sanitize_path_segment() converts camera names into safe directory names. A camera named “Front Door” becomes Front_Door. This is a necessary step: camera names can contain spaces, slashes, and other characters that would break filesystem paths or URL routing.

snapshot_timestamp() returns the current time as YYYYMMDD_HHMMSS. Combined with the camera name, this gives you filenames like 20260226_143022.jpg that sort chronologically without any tooling.

The resulting storage layout looks like this:

snapshots/
├── Front_Door/
│   └── 20260226_143022.jpg
└── Back_Patio/
    └── 20260226_144001.jpg

Every file is human-readable and can be browsed with any file manager or copied off the machine without any export step. This is the point.

All I/O across all three layers: Blink API calls, file writes, Gemini requests: is async throughout, using asyncio and aiofiles. Nothing blocks the event loop.


Three Design Decisions Worth Calling Out

Singleton client with persistent auth state

The alternative to a singleton: re-authenticating on each request: would technically work but would be wrong in several ways. Blink’s authentication involves network calls, credential exchange, and session initialization. Running that on every snapshot request adds latency, burns rate limits, and creates a failure mode where a temporary network blip mid-request leaves you in a half-authenticated state.

The singleton avoids all of this. One auth, one session, persistent for the process lifetime. The lifespan hook handles the startup case gracefully: try auto-login from disk, succeed silently, or fall back to manual login without crashing. Credentials are refreshed and re-persisted after each successful auto-login, so the blink_credentials.json on disk stays current.

Filesystem-first snapshot storage

BlinkThink has no database. Every snapshot write is a direct aiofiles.open() call to a predictable path. The gallery endpoint reads the filesystem: no metadata index, no ORM, no query. This is a deliberate choice, not an oversight.

Databases are the right tool for structured queries across large datasets with complex relationships. A gallery of camera snapshots does not have complex relationships. The data is flat: camera name, timestamp, JPEG bytes. A directory tree encodes all of that naturally, and any file browser, rsync command, or Python os.walk() can work with it directly. The total moving-parts count stays low. There is nothing to migrate, nothing to corrupt, nothing to back up separately from the images themselves.

Structured AI prompting, not freeform

The Gemini integration uses a constrained system prompt rather than asking the model to describe images freely. The output is structured into four specific fields: Scene, Subjects, Activity, and Flags. The instruction is explicit: facts only, no filler, no speculation beyond what is visible in the frame.

This matters in practice. Freeform image descriptions from large models tend to hedge. They add phrases like “it appears that” and “the image seems to show” and “upon closer inspection.” This hedging makes sense for uncertain inputs, but for a camera image with a clear scene it is just noise. The structured prompt forces the model to commit to concrete observations and surfaces the useful signal: an unusual vehicle in the driveway, a person at the door, an animal in the yard: without padding.


The Gemini Layer: Making Cameras Think

GeminiClient wraps the google-genai SDK with async support and rate-limit handling. The default model is gemini-2.5-flash, configurable via the GEMINI_MODEL environment variable.

Extended thinking is enabled with a 2048-token budget. For a camera snapshot, this is overkill most of the time: but for edge cases where the scene is ambiguous or partially occluded, the extra reasoning budget consistently produces better structured output than greedy decoding.

Rate limiting is handled with an asyncio semaphore. When the API returns a 429, the client waits 30 seconds before retrying. This is simple and effective. If the API is unavailable for any reason, the /api/analyze endpoint returns a clean error response rather than propagating an exception to the UI. The snapshot workflow: capture, store, display: works entirely independently of Gemini.

This is worth stating directly: AI analysis is optional and opt-in. You do not need a Gemini API key to run BlinkThink. The core functionality: logging in, capturing snapshots, browsing the local gallery: works without any AI configuration. Gemini is an enhancement, not a dependency.


Closing Thoughts

The interesting engineering problem in BlinkThink turned out not to be the Gemini integration. That part was straightforward: initialize the client, pass image bytes, parse structured output. The harder problem was building a clean authentication flow around blinkpy, handling the two-step MFA handshake, deciding when and how to persist credentials, and making the whole thing work reliably across server restarts without requiring the user to re-authenticate every time.

Credential persistence without a database sounds simple and mostly is, but the edge cases: expired tokens, failed auto-login, partial auth state: each need a defined behavior. “Fall back gracefully” is easy to say and takes some care to actually implement.

The broader point: owning your data does not have to mean complexity. The whole app is a few hundred lines of Python. You get a running web server, a local gallery, multi-camera support, and optional AI analysis. No cloud subscription, no third-party retention policy, no dependency on someone else’s uptime.

If you run Blink cameras and want to do more with them than the app allows, BlinkThink is a reasonable starting point. Contributions welcome: obvious directions include motion detection triggers, scheduled snapshot captures, and support for additional camera backends.