The situation
A global research-and-advisory firm runs a deep content library at the center of its product. Subscribers and customers search it to find the right answer; internal teams across product, editorial, data science, and engineering search it too. When the right result surfaces quickly, the content does its job. When it doesn’t, the value of the whole library erodes.
That library had been served for years by an Apache Solr platform — and years of additions had left it carrying heavy technical debt. The schema had grown into hundreds of overlapping fields layered on top of one another. A custom text analyzer sat in the indexing path that no one on staff had the source code for. Ineffective queries fired on every single search to handle edge cases that affected only a small fraction of requests. Disjoint content types had been forced through a single shared collection and a single schema, undermining how the engine scored and scaled.
The result was a platform that was slow to index, slow to query, and frightening to change. Just as damaging, no single team fully owned how search actually worked. Relevance weights had been set by people who had since left. Core logic lived in code nobody wanted to touch. When the team that owns search and the team that owns content and the team that owns the data don’t share one clear picture of the system, the system ends up mirroring those divisions — fragmented, hard to reason about, and impossible to improve quickly. Every change felt risky, so instead of fixing root causes the team kept adding workarounds. Experimentation had effectively frozen, and the inefficiency was being paid for in over-sized hardware.
The question on the table was the one every mature search team eventually faces: keep optimizing what we have, or re-platform?
Why they called us
A referral brought them to Develomentor. They were also weighing another boutique search firm, so the decision came down to depth.
What tipped it was vendor-neutral, open-source search expertise that goes all the way down to the internals — Solr, Lucene, and OpenSearch at the level of schemas, analyzer chains, and scoring — paired with the judgment to advise at the architecture and organizational level. The work needed someone who could read the actual schema and benchmark the actual bottlenecks, and also weigh Solr honestly against the alternatives without a horse in the race.
The engagement was led by Grant Ingersoll — a core Apache Lucene and Solr committer, an OpenSearch contributor, co-founder of Apache Mahout, and author of Taming Text — delivering alongside a team of senior Develomentor search practitioners. That combination is exactly what a debt-laden open-source search platform demands: hands-on engine depth to diagnose, and the seniority to make the keep-or-replace call with evidence behind it.
What we did
The health check
We started with a Search Health Check: a fast, evidence-based diagnosis of the platform, the corpus, and the teams around it.
On the technical side, that meant auditing the schema and content model — a bloated schema with hundreds of overlapping fields, a custom analyzer no one had the source for, sprawling synonym dictionaries, and disjoint content types commingled in one collection. We benchmarked indexing and query performance to isolate where time was actually going, rather than guessing. The diagnosis was unambiguous: the schema and the indexing path were the bottleneck, and a large share of indexing time was being spent in custom analysis components that no longer earned their keep.
On the organizational side, we mapped who owned which search concepts and where that ownership had fragmented or been lost entirely — because a platform this hard to change is rarely just a technical problem.
The Health Check also carried a platform decision. We evaluated continued investment in Solr against the leading alternatives and landed on OpenSearch: open governance under the Linux Foundation, no licensing entanglements, cleaner APIs, built-in support for modern retrieval and RAG-style use cases, and a migration path that Solr developers could actually follow. The recommendation wasn’t “rebuild for its own sake” — it was a tested, prioritized path off the debt.
Re-platforming onto OpenSearch
From diagnosis we moved into delivery, building the production OpenSearch platform alongside the client’s engineers.
The centerpiece was a rebuilt indexing path. Freed from the bloated schema and the costly, undocumented analysis components, indexing got dramatically faster — a full re-index dropped from over two hours to under fifteen minutes. That is the difference between a recovery or an experiment being an all-day event and being something the team can do before lunch. The slow re-index had been one of the biggest forces freezing the old platform; removing it unfroze everything downstream.
Simpler query processing
The old query path was a tangle of workarounds stacked over years: a smoke-test relaxation query firing on every request despite only a small fraction of searches needing it, a separate disjoint retrieval pass, and additional calls for fields, facets, and re-ranking — all stitched together per request. Each layer had been added to solve a real problem, and together they had made every query slower and harder to reason about.
We collapsed that chain into streamlined query processing. Redundant, equally-weighted fields were consolidated, spelling correction was folded into the main query so the per-request smoke-test call could be retired, and the disjoint multi-pass retrieval was simplified into a single coherent path. Fewer moving parts per request meant faster queries — and a query flow the team could actually understand and extend.
Pipelines and relevance tooling
A faster platform that’s still impossible to change isn’t a win. So the last workstream was about handing control back to the team.
We implemented OpenSearch document and query pipelines, giving the client clear, configurable places to add, reorder, and extend processing stages without code surgery. The hard-coded relevance logic and the “we can’t touch that” components that had frozen the old system were replaced with stages the team can reconfigure deliberately.
Then we stood up OpenSearch relevance tooling so the team can measure and test relevance themselves — repeatable, evidence-based experimentation in place of the un-owned, undocumented config and inherited tuning factors they’d been afraid to disturb. The data science and engineering teams came out of the engagement able to run experiments and ship relevance improvements on their own.
The result
A full re-index dropped from over two hours to under fifteen minutes — an order-of-magnitude improvement, and the headline outcome. The single biggest source of friction on the old platform, the thing that had been blocking recovery and experimentation, was gone.
From there:
- Faster queries. Collapsing the multi-pass workaround chain into streamlined query processing cut latency and made the query flow comprehensible again.
- Configurable pipelines. Document and query pipelines let the team adapt search to change through configuration instead of risky edits to buried code.
- Team-run relevance measurement. OpenSearch relevance tooling put repeatable relevance testing in the team’s hands, replacing reliance on config nobody owned.
- Off legacy Solr, debt retired. The platform now runs on a modern, well-supported OpenSearch foundation, with the schema bloat, the undocumented analysis components, and the synonym sprawl retired in the process.
The engagement ran roughly four months — from Health Check diagnosis through delivered migration — and the teams that own search came out of it able to operate, change, and measure the platform independently.
Most importantly: the system performed on relevance at the same level as before with significantly more upside on the roadmap.
What this means for you
If you run a mature search platform that works but has gotten slow and brittle — where years of accumulated debt have buried the logic, key decisions were made by people who’ve since left, and every change feels risky enough that you keep adding workarounds instead of fixing the root cause — this pattern is yours. The specific engine changes; the shape of the problem doesn’t.
The fix is rarely another workaround. It’s a clear-eyed diagnosis that names the real bottlenecks with evidence, a modern platform you can actually support, and pipelines plus measurement that hand control back to your team — so changing and improving search stops being something you’re afraid to do.
That diagnosis is exactly what a Search Health Check delivers — the fastest way to get an expert read on where your search is breaking down and whether to keep optimizing or re-platform. And our Search & AI Builders practice is where the rebuild lives.
Tell us where your search stack stands today. Book a Discovery Call.
