About the Book

Taming Text: How to Find, Organize, and Manipulate It was published by Manning in 2013, with a foreword by Dr. Elizabeth Liddy. Grant Ingersoll is the lead author, alongside Thomas S. Morton and Andrew L. Farris. The book was written for engineers shipping real text-intelligent software — not researchers chasing papers.

Across its chapters, the book walks through full-text search with Apache Solr and Lucene, named entity recognition, clustering, classification, deduplication, question answering, and text summarization. Each topic is paired with working code and grounded in the realities of production systems. For a long stretch of the 2010s, it was one of the few hands-on references that connected academic NLP to the messy work of actually searching and organizing text at scale.

Grant went on to co-found Lucidworks, where he served as CTO, and later became CTO of the Wikimedia Foundation. The problems the book addresses — helping people find, organize, and make sense of text — are still the problems he and the Develomentor team work on today.

What Has Changed Since 2013

We will be direct: most of the specific techniques and code in Taming Text are no longer how you would build these systems today. The book pre-dates the transformer architecture, the rise of large language models, and the vector search stack that now underpins modern retrieval. If you are picking it up for its Solr configurations, OpenNLP pipelines, or clustering recipes, treat them as historical context rather than current practice.

The work has shifted. Dense vector embeddings and hybrid retrieval have largely replaced bag-of-words ranking as the default starting point. Named entity recognition, classification, and summarization that once required carefully trained models are now often handled by general-purpose LLMs or fine-tuned smaller models. Retrieval-augmented generation, reranking, evaluation frameworks, and vector databases did not exist in anything like their current form when the book was written.

What has not changed is the underlying discipline — understanding your content, defining relevance, evaluating results honestly, and designing systems that hold up under real user behavior. The book's engineering sensibility still holds. The tooling around it does not, and the gap between demo and production is now wider, not narrower.

Building Modern Text Intelligence

If you read the book and are now trying to build search, NLP, or retrieval features into a real product, that is the work Develomentor does. Semantic and hybrid search, RAG pipelines over proprietary content, entity extraction and classification, evaluation harnesses, and the architectural decisions that separate a demo from a production system — these are the engagements we take on.

Grant's depth in this space predates Taming Text and has continued through Lucidworks, Wikimedia, and the current generation of LLM-driven systems. The Develomentor team is built from senior practitioners with comparable depth — engineers and architects who have shipped search and text systems inside large enterprises, led work at open source foundations, and stood up retrieval and NLP features for early-stage products under real constraints. We help teams choose the right retrieval stack, avoid the common RAG failure modes, and build something they can actually maintain.

Whether you are modernizing an older search system, adding AI features to an existing product, or starting from a blank page, Grant and the Develomentor team can help you move from idea to working system without the detours.

Talk to Us About Your Text Problem

Tell us what you are trying to build or fix. We will give you a straight read on the approach, the tradeoffs, and whether we are the right partner.

Book a Technical Consultation