In ragnar_find_links(), the default children_only = FALSE now
returns all links on a page. If you relied on the previous default,
set children_only = TRUE (#115).
ragnar_register_tool_retrieve() now uses search_{store@name} as
the default tool name prefix (instead of
rag_retrieve_from_{store@name}), so you may need to update any
code that refers to the tool name explicitly (#123, #127).
New embed_azure_openai() supports embeddings from Azure AI Foundry
(#144).
New embed_snowflake() supports embeddings via the Snowflake Cortex
Embedding API (#148).
New mcp_serve_store() lets local MCP clients (e.g. Codex CLI or
Claude Code) search a RagnarStore (#123).
ragnar_retrieve() (and the corresponding ellmer retrieval tool)
now accepts a vector of queries (#150).
New ragnar_store_atlas() visualizes store embeddings (#124).
New ragnar_store_ingest() prepares documents in parallel with
mirai and inserts them into a store (#133).
embed_ollama() now defaults to the embeddinggemma model (#121).
embed_openai() error messages are now surfaced to the user (#112).
Embedding helpers now share a generalized request retry policy,
configurable via options(ragnar.embed.req_retry = ...) (#138).
ragnar now requires mirai >= 2.5.1 (#139).
print() on a RagnarStore now shows the store location (#116).
ragnar_retrieve_bm25() now orders results by descending score
(#122).
ragnar_retrieve() no longer returns duplicate rows when called
with multiple queries (#153).
The ellmer retrieval tool now omits score columns from its output (#130).
ragnar_store_inspect() now includes keyboard shortcuts, a
draggable divider, improved preview linkification, better metadata
display, and other UI tweaks (#120, #117, #118).
ragnar_find_links() works better with local HTML files (#115).
ragnar_store_insert() and ragnar_store_update() (v2 stores) now
handle stores that are missing store@schema metadata (#146).
read_as_markdown() once again fetches YouTube transcripts and now
supports youtube_transcript_formatter, so you can add timestamps
or links to the transcript output (#149).
read_as_markdown() gains an origin argument to customize the
@origin recorded on returned documents (#128).
read_as_markdown() now correctly reads plain-text files with
non-ASCII characters (#151).
Vignette heading levels were fixed (#129).
Added an example using sentence-transformers embeddings (#131).
ragnar_register_tool_retrieve() now registers a tool that will not
return previously returned chunks, enabling the LLM to perform
deeper searches of a ragnar store with repeated tool calls (#106).
Updates for ellmer v0.3.0 and duckdb v1.3.1 (#99)
Improved docs and error message in ragnar_store_insert()
(@mattwarkentin, #88)
ragnar_find_links() can now parse sitemap.xml files. It also
gains a validate argument, allowing for sending a HEAD request
to each link and filtering out broken links (#83).
ragnar_inspector() now renders all urls as clickable links in the
chunk markdown viewer, even if url is not a formal markdown link
(#82).
Before running examples and tests we now check if ragnar can load DuckDB extensions. This fixes issues in environments where DuckDB pre-built binaries for extensions are not compatible with the installed DuckDB version (#94).
Added embed_lm_studio to use LMStudio as an embedding provider
(#100).
Fixed a bug causing ragnar_retrieve() to fail when documents were
inserted without an origin (#102).
We now suppress a "Couldn't find ffmpeg or avconv" warning when
importing markitdown when using read_as_markdown(). The warning
would only be relevant for users doing audio transcription (#103).
Added embed_google_gemini to use Google Gemini API as an embedding
provider (#105).
ragnar_store_create() gains a new argument: version, with
default 2. Store version 2 adds support for chunk deoverlapping on
retrieval and automatic chunk augmentation with headings. To support
these features, the internal schema and ingestion requirements are
different. See markdown_chunk() and new S7 classes
MarkdownDocument and MarkdownDocumentChunks. Backwards
compatibility is maintained with version = 1. (#58, #39, #36)
ragnar_store_create() now supports Date and POSIXct classes
supplied to extra_cols.
ragnar_store_create() now supports remote MotherDuck Databases
specified with md:<dbname> as the location argument. (#50)
ragnar_retrieve() and friends gain a filter argument, adding
support for efficiently filtering retrieval results.
ragnar_retrieve_bm25() gains arguments b, k, and conjunctive
(#56).
ragnar_retrieve_vss() gains argument query_vector, supporting
workflows that preprocess the query string before embedding.
ragnar_retrieve_vss() set of valid method choices have been
updated to a narrower set to ensure that an HNSW index scan is
used.
Passing a tbl(store) to ragnar_retrieve() is deprecated.
New chunker markdown_chunk() with support for chunk heading
context generation, semantic boundary selection, overlapping chunks,
document segmentation, and more. (#56)
New function ragnar_chunks_view() for quickly previewing chunks
(#42)
ragnar_register_tool_retrieve() gains optional name and title
arguments to allow for more descriptive tool registration. These
values can also be set in ragnar_store_create() (#43).
ragnar_read() and read_as_markdown() now accept paths that begin
with ~ (@topepo, #46, #48).
Changes to read_as_markdown() HTML conversion (#40, #51):
html_extract_selectors and html_zap_selectors
provide a flexible way to exclude some html page elements from
being included in the converted markdown.