Textsplitter

Chunk documents into size-bounded pieces for RAG.

What it is

When building RAG systems, you usually don’t embed an entire document at once. Instead, you split it into smaller chunks, embed each chunk, and store them for retrieval.

The textsplitter package provides utilities to split text into size-bounded chunks with an optional overlap. Chunk overlap helps preserve context across boundaries.

Key ideas

Example: splitting a file into chunks

The examples/rag-chatbot program loads a local text file and splits it before ingestion.

// From examples/rag-chatbot (edited for brevity)

b, err := os.ReadFile(filePath)
if err != nil {
    // handle error
}

splitter := textsplitter.NewRecursiveCharacterTextSplitter(chunkSize, chunkOverlap)
chunks := compactStrings(splitter.SplitText(string(b)))
if len(chunks) == 0 {
    // handle error
}

// chunks are then embedded and ingested into a vector store

In that example, chunkSize and chunkOverlap are CLI flags so you can tune them per dataset.

Run the example

The RAG Chatbot example exercises the splitter as part of a full ingestion + retrieval flow. Provider setup depends on your chosen LLM/embedder; follow the example README.

# from repo root

go run ./examples/rag-chatbot -file /path/to/your/file.txt

# tuning knobs
# -chunk-size / -chunk-overlap

Practical tuning tips

Related packages