Phero • RAG

What it is

RAG (Retrieval-Augmented Generation) is a pattern where you retrieve relevant text snippets (from documents or memory) and feed them into the LLM as context before answering.

In Phero, the rag package ties together an embedder and a vector store. The most common integration points are:

Ingest chunks of text into a vector store
Query the store for the most similar chunks
Expose retrieval as a tool so an agent can call it
Optionally use retrieval as semantic memory

The RAG pipeline (practical)

Load a document (or messages)
Split it into chunks (textsplitter)
Embed chunks (embedding)
Store vectors (vectorstore)
Retrieve top-k chunks per query

The end-to-end examples below show the full wiring.

Example: document Q&A with a RAG tool

The examples/rag-chatbot program loads a local text file, splits it into chunks, indexes it in Qdrant, and exposes a retrieval tool (search_document) to the agent.

// From examples/rag-chatbot (edited for brevity)

// 1) Load a document and split into chunks
b, _ := os.ReadFile(filePath)

splitter := textsplitter.NewRecursiveCharacterTextSplitter(chunkSize, chunkOverlap)
chunks := compactStrings(splitter.SplitText(string(b)))

// 2) Create an embedder and a vector store (provider/setup omitted here)
llmClient := /* any llm.LLM */
embedder := /* any embedding.Embedder */
store := /* a vectorstore.Store (e.g. Qdrant) */

// 3) Create RAG engine and ingest chunks
ragEngine, _ := rag.New(store, embedder, rag.WithTopK(topK))
_ = ragEngine.Ingest(ctx, chunks)

// 4) Expose retrieval as a tool
ragTool, _ := ragEngine.AsTool(
    "search_document",
    "Search the loaded document for relevant excerpts.",
)

// 5) Run an agent instructed to use the tool
sysPrompt := `You are a helpful chatbot that answers questions about a single document.

Rules:
- For any question that depends on the document, call the tool "search_document" first.
- Use retrieved excerpts as your source of truth.
- If you cannot find supporting excerpts, say you don't know based on the document.`

a, _ := agent.New(llmClient, "RAG Chatbot", sysPrompt)
_ = a.AddTool(ragTool)

out, _ := a.Run(ctx, llm.Text("Question: What does this document say about X?"))
fmt.Println(out.TextContent())

The important bit is the prompt contract: the agent is instructed to call the retrieval tool first whenever the answer depends on the document.

Example: semantic memory (RAG-backed)

The examples/long-term-memory program uses RAG as memory: instead of keeping a chronological transcript, it retrieves the most relevant past snippets for each turn.

// From examples/long-term-memory (edited for brevity)

ragEngine, _ := rag.New(store, embedder, rag.WithTopK(topK))
conversationMemory := ragmemory.New(ragEngine)

a, _ := agent.New(llmClient, "Long-Term Memory Assistant", sysPrompt)

a.SetMemory(conversationMemory)

out, _ := a.Run(ctx, llm.Text("Remember that my favorite color is blue."))
fmt.Println(out.TextContent())

Run the examples

Both examples use Qdrant (vector database). Start it, then run the example. Provider setup depends on your chosen LLM and embedder; follow each example’s README.

# Start Qdrant (one quick way)
docker run --rm -p 6333:6333 -p 6334:6334 qdrant/qdrant

# Document Q&A (requires -file)
go run ./examples/rag-chatbot -file /path/to/your/file.txt

# Semantic long-term memory
# (see the README for flags like -topk and qdrant settings)
go run ./examples/long-term-memory

Related packages

embedding: generate vectors for chunks and queries
vectorstore: persist and query vectors
textsplitter: chunk documents prior to ingestion
memory: wire RAG into agent memory via memory/rag
agent: orchestrates tool calls and uses memory