Gemini File Search API for millions of regulatory filings

We’ve been experimenting with Google’s new File Search API to index over 2 million RNS announcements — official regulatory filings from UK-listed companies.

The goal was simple: make the entire RNS archive searchable with natural language queries and metadata filters, without building our own embeddings or vector infrastructure. We already had each document stored as LLM-friendly (stripped down) HTML on S3 with structured metadata from our Ticker API.

Google claimed File Search handled chunking, embedding, and semantic search automatically. Upload files, attach metadata, and query. What could possibly go wrong?

1. Concurrency Gone Wrong

Sequential uploads were too slow, so we added concurrency. That immediately tanked success rates.

Roughly 6% of uploads were succeeding, with the rest silently disappearing — no exceptions, no timeouts, nothing. It looked like everything was fine until you counted the files.

The problem was our concurrency control. We were manually tracking an array of promises and removing them from the pool after Promise.race(), assuming we knew which one had finished. We didn’t. To fix, each upload removes itself from the pool once complete. No manual cleanup, no race conditions, no missing files.

After that change: 100% success rate.

2. Connection Pool Exhaustion

With concurrency working, throughput still sucked. The console was full of warnings like:

@smithy/node-http-handler:WARN – socket usage at capacity=50

AWS’s SDK defaults to 50 sockets. With 20 parallel uploads and multiple S3 requests per upload, we were maxing that instantly. Everything else just queued.

Easy fix — increase the pool and enable connection reuse:

new S3Client({
  requestHandler: new NodeHttpHandler({
    httpsAgent: new https.Agent({
      maxSockets: 200,
      keepAlive: true,
    }),
  }),
});

That single tweak quadrupled throughput. Always check the defaults.

3. The “55-Second Upload” Mystery

Once we were uploading consistently, we started timing everything. The results made no sense.

Each upload was taking 55 seconds, even though data transfer took less than two. The rest of the time was spent waiting for Google’s importFile call to return.

The docs said it should “return immediately as a long-running operation.” It didn’t.

We ran controlled tests with both methods: – Direct uploadToFileSearchStore – Two-step upload + importFile

Both finished in around 5–6 seconds per file. The 55-second versions were outliers, likely caused by cold starts or network lag.

Lesson learned: never optimise around a single benchmark. Measure several runs, in isolation, under different conditions. Most “bottlenecks” are ghosts.

4. Getting Real Performance Numbers

With all fixes in place, the system now uploads 30,000 announcements in around 55 minutes using 50 concurrent workers. That’s about 9 uploads per second, end to end.

At $0.15 per million tokens, indexing the entire month (≈450M tokens) costs $68 one-time. Storage and query-time embeddings are free.

Once a file is uploaded, Google handles chunking, embedding, and indexing automatically. No manual indexing, no post-processing jobs.

5. Lessons Worth Remembering

6. Next Steps

We’ve finished indexing January 2025 and are now processing the historical backlog from 2020–2024. The next phase is adding the natural language query interface and testing real-world analyst queries at scale.

The nice surprise is that File Search actually holds up once you treat it like production software, not a demo. It’s fast, cheap, and predictable — as long as you understand what’s happening underneath.

TL;DR

We built a production RAG system using Google’s File Search API to index two million regulatory filings. The hardest problems weren’t in the API — they were in concurrency control, connection pooling, and misleading benchmarks.

Once fixed, it went from 5 days to 55 minutes.
All for less than $70.