DocSeek – Manual

User Manual

Overview

DocSeek answers a very specific question: “Is what I'm looking for in this document, and where?” It reads your file, breaks it into short passages, and turns each passage into a numerical fingerprint (an embedding) using a compact AI model that runs entirely in your browser. When you type a query, it is fingerprinted the same way, and DocSeek ranks the passages by how close their meaning is to your query. Because it compares meaning rather than letters, it can surface a relevant paragraph even when it uses completely different words than you did. Everything — the file, the index, and your searches — stays on your device. No server, no cloud, no upload.

Reliable by design. DocSeek always shows you the actual passages from your document and highlights the most relevant sentence. It never rewrites or invents text, so you can trust what you read.

Getting Started

1. Load a document

Open file — click the button and pick a file, or
Drag & drop — drop a file onto the dashed area.

DocSeek extracts the text and reports how many sections and passages it found.

2. Build the search index

Click ⚙️ Build search index. The first time ever, a ~25 MB AI model (all-MiniLM-L6-v2) downloads from a CDN and is cached by your browser; after that it works offline. Indexing then fingerprints every passage — a progress bar shows how far along it is. The index is cached locally, so re-opening the same file is instant.

3. Search

Type what you're looking for and press Search. Results appear as passages, each tagged with its location (page number for PDFs, paragraph number otherwise), a match score, and the most relevant sentence highlighted.

Search Modes

Mode	What it does	When to use it
Meaning search	Ranks passages by conceptual similarity using the AI index.	When you don't know the exact wording, or want everything about a topic.
Exact / keyword	Finds literal text, with an optional `regex` toggle. No AI index required.	When you know the precise term, or want a quick literal cross-check.

Use the two modes together: if meaning search surfaces a concept, an exact search confirms whether a specific word truly appears. This is the best way to be sure something is — or isn't — in the document.

Reading the Results

Element	Meaning
Location chip (`p. 4`, `¶ 12`)	Where the passage sits in the document.
Match score (`72% match`)	How close the passage's meaning is to your query. Higher is stronger; green ≥ 50%, amber ≥ 35%.
Highlight	In meaning mode, the single most relevant sentence; in keyword mode, every literal match.

A modest top score (say 30–45%) doesn't mean failure — it often means the topic is only touched on lightly. If even the best matches look unrelated, that's good evidence the document doesn't cover what you asked.

Document Overview

Open 📋 Document overview and click Generate overview to get an extractive summary: DocSeek picks the most central real sentences from the document and lists them in reading order. Nothing is paraphrased or generated, so the overview is always faithful to the source. Choose how many sentences you'd like (3–20).

Settings

Setting	Effect
Passage size	How many characters each passage holds. Smaller = more precise locations; larger = more context per result.
Overlap	How much text is shared between neighbouring passages, so ideas spanning a boundary aren't missed.
Results shown	How many passages to display per search.

Changing passage size or overlap requires rebuilding the index (DocSeek will prompt you).

Privacy

DocSeek processes everything locally in your browser. Your documents are never uploaded — not their text, not your searches. The AI model and the supporting libraries are downloaded once from public CDNs and then cached; from then on the tool runs offline. Your settings live in localStorage and the search index is cached in IndexedDB, both on your own device.

* About “offline”: the very first run needs internet to download the AI model and libraries. After your browser has cached them, DocSeek works with no connection.

Tips & Limitations

Scanned PDFs with no text layer can't be read — DocSeek needs selectable text, not images.
Very large documents take longer to index; the progress bar keeps you informed.
Best results come from full-sentence queries (“how the warranty handles water damage”) rather than single keywords.
English-optimised: this version uses an English model; other languages work but less reliably.

Supported File Types

.pdf · .docx · .html · .md · .txt

About this tool

Technologies used:

Built with:

← Back to Bunka.be