Phero • LLM

What it is

The llm package is the thin waist of Phero: a minimal interface for chat models plus a small tool system. Higher-level packages (like agent) build on this to orchestrate multi-turn loops and tool execution.

All message and content types are defined in this package — no dependency on any specific provider SDK is required.

Using an LLM backend

Any backend that implements llm.LLM can power agents and tools. Phero includes an OpenAI-compatible client at llm/openai and an Anthropic Messages API client at llm/anthropic.

Choose a backend (e.g. llm/openai, llm/anthropic, or your own implementation)
Pass that client into agent.New
Optionally attach tools created via llm.NewTool

Anthropic backend

llm/anthropic implements llm.LLM using Anthropic's Messages API. The same Phero message/tool types are used at the boundary, so the rest of the framework (agents, tools, memory) doesn't need to change.

import (
    "os"

    "github.com/henomis/phero/llm/anthropic"
)

llmClient := anthropic.New(
    os.Getenv("ANTHROPIC_API_KEY"),
    anthropic.WithModel("claude-sonnet-4-6"),
    anthropic.WithMaxTokens(2048),
)

If you pass an empty API key, the underlying Anthropic SDK will fall back to its environment variable configuration.

Available options:

WithModel(m): Anthropic model name (e.g. "claude-sonnet-4-6")
WithMaxTokens(n): maximum completion tokens
WithTemperature(t): sampling temperature (0–1)
WithBaseURL(url): override the API endpoint (useful for proxies or tests)

OpenAI backend

llm/openai implements llm.LLM using the OpenAI Chat Completions API. It also implements llm.Transcriber and llm.SpeechSynthesizer for audio (see below).

import (
    "os"
    "github.com/henomis/phero/llm/openai"
)

llmClient := openai.New(
    os.Getenv("OPENAI_API_KEY"),
    openai.WithModel("gpt-4o"),
    openai.WithTemperature(0.7),
)

Available options:

WithModel(m): model name (e.g. "gpt-4o", "gpt-4o-mini")
WithTemperature(t): sampling temperature
WithBaseURL(url): override the API base URL (e.g. point at a local proxy)
WithOllamaBaseURL(): shortcut to point at a local Ollama server

Messages and content parts

Every message is an llm.Message containing a Role and a slice of llm.ContentPart values. A content part is either text or an image (URL or base64-encoded bytes).

// Plain text
llm.Text("Hello, world!")

// Image by URL
llm.ImageURL("https://example.com/photo.png")

// Image from a local file (MIME type is detected automatically)
part, err := llm.ImageFile("/path/to/photo.jpg")

// Image as raw base64 bytes
llm.ImageBase64("image/png", base64EncodedData)

Role constants and message constructors:

// Role constants
llm.RoleSystem    // "system"
llm.RoleUser      // "user"
llm.RoleAssistant // "assistant"
llm.RoleTool      // "tool"

// Constructors
llm.SystemMessage("You are a helpful assistant.")
llm.UserMessage(llm.Text("Hello!"))
llm.UserMessage(llm.Text("Describe this image:"), llm.ImageURL("https://..."))
llm.AssistantMessage([]llm.ContentPart{llm.Text("Hi!")})
llm.ToolResultMessage(toolCallID, llm.Text("42"))

To extract the plain text from a message or from loose parts, use TextContent:

// From a message
text := msg.TextContent()

// From loose content parts
text := llm.TextContent(parts...)

Multimodal input

Pass image parts alongside text to send multimodal messages to vision-capable models. The agent's Run method accepts variadic ContentPart values:

imagePart, err := llm.ImageFile("screenshot.png")
if err != nil {
    panic(err)
}

result, err := a.Run(ctx,
    llm.Text("What does this image show?"),
    imagePart,
)
if err != nil {
    panic(err)
}
fmt.Println(result.TextContent())

See examples/multimodal for a complete working example.

LLM middleware

Just as tools support middleware, the llm package provides an LLMMiddleware type and a llm.Use function to compose decorators around any llm.LLM. This is the right place to add caching, rate limiting, logging, or automatic retries without modifying individual backends.

// A simple logging middleware
func loggingMiddleware(next llm.LLM) llm.LLM {
    return llm.LLMFunc(func(ctx context.Context, msgs []llm.Message, tools []*llm.Tool) (*llm.Result, error) {
        fmt.Printf("calling LLM with %d messages\n", len(msgs))
        result, err := next.Execute(ctx, msgs, tools)
        fmt.Printf("LLM responded; err=%v\n", err)
        return result, err
    })
}

// Wrap a base client with one or more middlewares
base := openai.New(os.Getenv("OPENAI_API_KEY"))
wrapped := llm.Use(base, loggingMiddleware)

Middlewares are applied in declaration order: llm.Use(base, m1, m2) means m1 is outermost and runs first. See examples/llm-middleware for a full example.

Audio: transcription and speech

The llm/openai client also implements llm.Transcriber and llm.SpeechSynthesizer, so the same client used for chat can also transcribe audio and synthesize speech.

// Speech-to-text (Transcriber)
result, err := llmClient.Transcribe(ctx, llm.TranscriptionRequest{
    Input: llm.AudioFile("recording.mp3"),
})
fmt.Println(result.Text)

// Text-to-speech (SpeechSynthesizer)
speech, err := llmClient.SynthesizeSpeech(ctx, llm.SpeechRequest{
    Input:  "Hello from Phero!",
    Format: llm.SpeechResponseFormatMP3,
})
// speech.Data holds the raw MP3 bytes; speech.MIMEType is "audio/mpeg"

See examples/audio for a runnable example.

Function tools

The main way you integrate capabilities is via function tools. In examples/conversational-agent, a get_current_time tool is exposed to the agent.

type TimeInput struct{}

type TimeOutput struct {
    CurrentTime string `json:"current_time" jsonschema:"description=The current local time in RFC3339 format"`
}

func getCurrentTime(_ context.Context, _ *TimeInput) (*TimeOutput, error) {
    return &TimeOutput{CurrentTime: time.Now().Format(time.RFC3339)}, nil
}

tool, err := llm.NewTool(
    "get_current_time",
    "Get the current local time",
    getCurrentTime,
)
if err != nil {
    panic(err)
}

Tools are added to an agent with AddTool, and the agent will run them when the model requests a tool call.

Tool middleware

Tools support middleware via tool.Use(...). This is the place to add validation, permission checks, logging, or other cross-cutting behavior without baking it into each tool handler.

timeTool, err := llm.NewTool(
    "get_current_time",
    "Get the current local time",
    getCurrentTime,
)
if err != nil {
    panic(err)
}

timeTool.Use(func(tool *llm.Tool, next llm.ToolHandler) llm.ToolHandler {
    return func(ctx context.Context, arguments string) (any, error) {
        fmt.Printf("running %s with args %s\n", tool.Name(), arguments)
        return next(ctx, arguments)
    }
})

Middleware order is preserved: if you call tool.Use(m1, m2), then m1 runs before m2. This replaces older per-tool validation helpers and keeps approval logic at wiring time.

Tracing raw LLM calls

The trace package can wrap any llm.LLM with trace.NewLLM. This is useful when you want observability around direct Execute calls without going through an agent.

import (
    "github.com/henomis/phero/trace"
    "github.com/henomis/phero/trace/text"
)

traced := trace.NewLLM(llmClient, text.New(os.Stderr))

result, err := traced.Execute(ctx, messages, tools)

When called inside an agent, request and response events are automatically annotated with the agent name and iteration number.

Putting it together

The minimal loop is: create an LLM client, register one or more tools, then run an agent. This is the core pattern used throughout examples/.

# from repo root
go run ./examples/simple-agent

go run ./examples/conversational-agent

Related packages

agent: orchestration loop around llm.LLM + tools
trace: observability wrapper for agents and standalone LLM calls
tool: prebuilt tools you can attach to agents/models
embedding: vector embeddings for RAG and memory
a2a: expose agents as A2A servers or call remote agents as tools