Conversational Search¶

Prerequisites

Working with these examples requires at least finishing the Quick Start OR the Manual Setup Quick Start first.

AI Coding Assistants

If you use Claude Code, Codex CLI, or Cursor, we recommend installing the orcheo-demos skill from agent-skills to streamline running and deploying these demos.

This guide walks you through a progressive demo suite for building conversational search applications with Orcheo. Each demo builds on the previous one, taking you from basic RAG to production-ready evaluation pipelines.

Overview¶

Demo	Description	Source	Credentials Required	External Services	Online Demo
Web Scrape & Upload	Scrape web pages, chunk text, generate embeddings, upload to MongoDB	`examples/mongodb_agent/01_web_scrape_and_upload.py`	`openai_api_key`, `mdb_connection_string`	MongoDB Atlas	—
MongoDB Indexes	Create text and vector indexes for hybrid search	`examples/mongodb_agent/02_create_index_and_hybrid_search.py`	`mdb_connection_string`	MongoDB Atlas	—
MongoDB RAG Agent	AI agent with MongoDB hybrid search tool	`examples/mongodb_agent/03_qa_agent.py`	`openai_api_key`, `mdb_connection_string`	MongoDB Atlas
Pinecone Indexes	Create dense & sparse Pinecone indexes from web pages	`examples/conversational_search/demo_1_hybrid_indexing/demo_1.py`	`openai_api_key`, `pinecone_api_key`	Pinecone	—
Basic RAG	Basic RAG pipeline (in-memory store)	`examples/conversational_search/demo_2_basic_rag/demo_2.py`	`openai_api_key`	None	—
Hybrid Search	Hybrid search + web search + rerank	`examples/conversational_search/demo_3_hybrid_search/demo_3.py`	`openai_api_key`, `pinecone_api_key`, `tavily_api_key`	Pinecone, Tavily
Conversational Search	Conversational search	`examples/conversational_search/demo_4_conversational/demo_4.py`	`openai_api_key`, `pinecone_api_key`	Pinecone
Production Pipeline	Production-ready pipeline with caching and guardrails	`examples/conversational_search/demo_5_production/demo_5.py`	`openai_api_key`, `pinecone_api_key`	Pinecone
Evaluation & Research	Evaluation & research	`examples/conversational_search/demo_6_evaluation/demo_6.py`	`openai_api_key`, `pinecone_api_key`	Pinecone

Examples have moved

The example sources now live in the colleague-candidates repository (vendored in this monorepo as the colleague-candidates/ git submodule) under examples/. Some directories were also reorganized: the numbered demos are packaged as conversational_search/conversational_search_demo_<N>/ workflow directories (workflow.py + config.json), and the MongoDB agent scripts evolved into the Knowledge Desk colleagues under colleagues/knowledge_desk/. Script paths in the table above reflect the original layout; resolve them against the new locations.

Prerequisites¶

Install Dependencies¶

uv sync --group examples

This installs the examples dependency group including orcheo-backend for credential vault access.

Set Up Credentials¶

Store credentials securely in the Orcheo vault:

# Required for all demos
orcheo credential create openai_api_key --secret sk-your-openai-key

# Required for MongoDB demos (Web Scrape & Upload, MongoDB Indexes, MongoDB RAG Agent)
orcheo credential create mdb_connection_string --secret "mongodb+srv://..."

# Required for Hybrid Search (web search)
orcheo credential create tavily_api_key --secret tvly-your-tavily-key

# Required for Pinecone-based demos (Pinecone Indexes, Hybrid Search, Conversational Search, Production Pipeline, Evaluation & Research)
orcheo credential create pinecone_api_key --secret your-pinecone-key

MongoDB Agent Demos¶

These demos show how to build AI agents with MongoDB Atlas as the vector store backend.

Web Scrape & Upload¶

Scrapes web pages, chunks the body text, generates vector embeddings, and uploads the results to a MongoDB collection.

orcheo workflow upload examples/mongodb_agent/01_web_scrape_and_upload.py --name "Web Scrape & Upload" --config-file examples/mongodb_agent/config.json
orcheo workflow run

You should see output like:

Starting workflow execution...
Execution ID: 5a64424e-a506-46f7-9de9-3a107295a471

Trace update: workflow.execution (UNSET)
• web_loader (results)
Trace update: web_loader (UNSET)
• chunking (results)
Trace update: chunking (UNSET)
• chunk_embedding (results)
Trace update: chunk_embedding (UNSET)
• mongodb_upload (results)
Trace update: mongodb_upload (UNSET)
✓ Workflow completed successfully

MongoDB Indexes¶

Creates text and vector search indexes in MongoDB Atlas for hybrid search capabilities.

orcheo workflow upload examples/mongodb_agent/02_create_index_and_hybrid_search.py --name "MongoDB Indexes" --config-file examples/mongodb_agent/config.json
orcheo workflow run <workflow-id>

You should see output like:

Starting workflow execution...
Execution ID: 11672ce8-0a07-4dc6-adbe-b64408e54b6c

Trace update: workflow.execution (UNSET)
• ensure_text_index (results)
Trace update: ensure_text_index (UNSET)
• ensure_vector_index (results)
Trace update: ensure_vector_index (UNSET)
✓ Workflow completed successfully

MongoDB RAG Agent¶

An AI agent with a MongoDB hybrid search tool. The agent can answer questions by searching the MongoDB collection using both text and vector search.

This demo is best experienced through the integrated ChatKit UI by uploading and publishing the workflow:

orcheo workflow upload examples/mongodb_agent/03_qa_agent.py --name "MongoDB RAG Agent" --config-file examples/mongodb_agent/config.json
orcheo workflow publish <workflow-id> --force

Then you can interact with the agent through the generated link.

Alternatively, you can try the online demo directly: Try the online demo →

Try asking about UvA

The knowledge base for this demo contains only University of Amsterdam (UvA) news articles. Try questions like "What's the latest research at UvA?" or "Tell me about UvA news" for the best results.

Pinecone Indexes¶

Create dense and sparse Pinecone indexes from web pages. This demo scrapes URLs, chunks the text, generates both dense (OpenAI) and sparse (BM25) embeddings, and upserts them into two separate Pinecone indexes.

What It Does¶

Scrapes web pages specified in config.json
Extracts metadata and infers document titles from the first line
Chunks text with configurable size and overlap
Generates dense embeddings (OpenAI text-embedding-3-small) and sparse embeddings (Pinecone BM25)
Upserts dense and sparse vectors into separate Pinecone indexes

Run It¶

Upload and run through Orcheo:

orcheo workflow upload examples/conversational_search/demo_1_hybrid_indexing/demo_1.py --name "Pinecone Indexes" --config-file examples/conversational_search/demo_1_hybrid_indexing/config.json
orcheo workflow run <workflow-id>

The default config.json scrapes UvA news articles and writes to the orcheo-demo-dense and orcheo-demo-sparse indexes under the hybrid_search namespace. Edit config.json to customise the URLs or index settings.

Configuration¶

{
  "configurable": {
    "urls": [
      {"url": "https://example.com/page-1"},
      {"url": "https://example.com/page-2"}
    ],
    "chunk_size": 512,
    "chunk_overlap": 64,
    "dense_embed_model": "openai:text-embedding-3-small",
    "sparse_model": "pinecone:bm25",
    "vector_store_index_dense": "orcheo-demo-dense",
    "vector_store_index_sparse": "orcheo-demo-sparse",
    "vector_store_namespace": "hybrid_search"
  }
}

Workflow Diagram¶

flowchart TD
    start([START]) --> loader[WebDocumentLoaderNode]
    loader --> metadata[MetadataExtractorNode]
    metadata --> chunking[ChunkingStrategyNode]
    chunking --> embedding[ChunkEmbeddingNode]
    embedding --> dense_upsert[VectorUpsertDense]
    dense_upsert --> sparse_upsert[VectorUpsertSparse]
    sparse_upsert --> done([END])

Basic RAG Pipeline¶

The simplest starting point. This demo works entirely locally with no external vector database.

What It Does¶

Routes queries through ingestion, search, or direct generation based on context
Uses an in-memory vector store for document embeddings
Uses OpenAI embeddings (openai:text-embedding-3-small) for retrieval and grounded generation
Produces grounded responses with inline citations

Run It¶

Upload and run through Orcheo:

orcheo workflow upload examples/conversational_search/demo_2_basic_rag/demo_2.py --name "Basic RAG" --config-file examples/conversational_search/demo_2_basic_rag/config.json
orcheo workflow run <workflow-id> --inputs '{"message": "What is this document about?"}'
orcheo workflow run <workflow-id> --inputs '{"documents":[{"storage_path":"/abs/path/document.txt","source":"document.txt","metadata":{"category":"tech"}}],"message":"What is this document about?"}'

What to expect:

If documents are provided, ingestion runs first and (when message is present) the workflow continues to retrieval + grounded generation.
If no documents are provided and the in-memory store already has chunks from prior runs, the workflow skips ingestion and performs search + generation.
If no documents are provided and the store is empty, the workflow answers directly.

Configuration¶

{
  "configurable": {
    "chunk_size": 512,
    "chunk_overlap": 64,
    "top_k": 5,
    "similarity_threshold": 0.0
  }
}

Adjust chunk_size, chunk_overlap, top_k, and similarity_threshold in config.json to tune the pipeline.

Workflow Diagram¶

flowchart TD
    start([START]) --> entry[EntryRoutingNode]
    entry -->|documents provided| loader[DocumentLoaderNode]
    entry -->|vector store has records| search[DenseSearchNode]
    entry -->|otherwise| generator[GroundedGeneratorNode]

    subgraph Ingestion
        loader --> metadata[MetadataExtractorNode] --> chunking[ChunkingStrategyNode] --> chunk_embedding[ChunkEmbeddingNode] --> vector_upsert[VectorStoreUpsertNode]
    end

    vector_upsert --> post{Inputs.message?}
    post -->|true| search
    post -->|false| end1([END])

    search --> generator --> end2([END])

Hybrid Search¶

Dense + sparse retrieval with reciprocal-rank fusion and optional web search.

Prerequisites¶

Run Pinecone Indexes first to populate the Pinecone indexes.

Run It¶

orcheo workflow upload examples/conversational_search/demo_3_hybrid_search/demo_3.py --name "Hybrid Search" --config-file examples/conversational_search/demo_3_hybrid_search/config.json
orcheo workflow publish <workflow-id> --force

Note: Publishing the workflow before running it is recommended, as published workflows benefit from optimised execution.

What to expect:

Queries fan out across dense (vector), sparse (BM25), and web search branches
Results are fused with reciprocal-rank fusion, reranked in Pinecone, and summarized before generation
Outputs a grounded answer with citations

Configuration¶

{
  "configurable": {
    "dense_top_k": 8,
    "dense_similarity_threshold": 0.0,
    "dense_embed_model": "openai:text-embedding-3-small",
    "sparse_top_k": 10,
    "sparse_score_threshold": 0.0,
    "sparse_vector_store_candidate_k": 50,
    "sparse_model": "pinecone:bm25",
    "web_search_provider": "tavily",
    "web_search_max_results": 5,
    "web_search_search_depth": "advanced",
    "fusion_rrf_k": 60,
    "fusion_top_k": 8,
    "context_max_tokens": 400,
    "context_summary_model": "openai:gpt-4o-mini",
    "vector_store_index_dense": "orcheo-demo-dense",
    "vector_store_index_sparse": "orcheo-demo-sparse",
    "vector_store_namespace": "hybrid_search",
    "reranker_model": "bge-reranker-v2-m3",
    "reranker_top_n": 10,
    "generation_model": "openai:gpt-4o-mini"
  }
}

Edit config.json to customise retrieval parameters, models, or index settings.

Alternatively, you can try the online demo directly: Try the online demo →

Try open-domain questions too

The indexed content contains UvA news articles, but thanks to Tavily web search integration, the system can answer general knowledge queries beyond the indexed documents. E.g., try "What's Orcheo?"

Conversational Search¶

Stateful, multi-turn chat with conversation memory and query rewriting.

What It Does¶

ConversationStateNode: Maintains session history and summary
QueryClassifierNode: Routes to search, clarification, or finalize branches
QueryRewriteNode: Uses an AI model to rewrite follow-up queries with context
CoreferenceResolverNode: Rewrites pronouns using recent context
TopicShiftDetectorNode: Flags topic divergence
MemorySummarizerNode: Persists compact summaries at finalization

Prerequisites¶

Run Pinecone Indexes first to populate the Pinecone indexes.

Run It¶

orcheo workflow upload examples/conversational_search/demo_4_conversational/demo_4.py --name "Conversational Search" --config-file examples/conversational_search/demo_4_conversational/config.json
orcheo workflow run <workflow-id> --inputs '{"message": "How does authentication work?"}'

What to expect:

Iterate across multiple turns to see:

Query classification and coreference resolution for follow-ups
Clarification prompts when ambiguity is detected
Topic-shift detection
Memory summarization when the conversation is finalized
Grounded responses with properly formatted citations and references

Configuration¶

{
  "configurable": {
    "conversation": {
      "max_turns": 20,
      "max_sessions": 8,
      "max_total_turns": 160
    },
    "query_processing": {
      "topic_shift": {
        "similarity_threshold": 0.4,
        "recent_turns": 3
      }
    },
    "retrieval": {
      "top_k": 3,
      "score_threshold": 0.0
    },
    "generation": {
      "citation_style": "inline"
    },
    "vector_store": {
      "type": "pinecone",
      "index_name": "orcheo-demo-dense",
      "namespace": "hybrid_search",
      "client_kwargs": {
        "api_key": "[[pinecone_api_key]]"
      }
    }
  }
}

Edit config.json to customise conversation limits, retrieval parameters, or vector store settings.

Alternatively, you can try the online demo directly: Try the online demo →

Try multi-turn conversations

Ask a question, then follow up with related questions to see coreference resolution, topic-shift detection, and memory summarization in action.

Production Pipeline¶

Production-focused scaffold with caching, guardrails, streaming, and multi-hop planning.

Features¶

Caching: Response caching for repeated queries
Guardrails: Hallucination detection and policy checks
Citation formatting: Properly formatted references with source details
Streaming: Streaming generator for fast iteration
Multi-hop planning: Plans chained search queries
Session controls: Conversation state and memory privacy hooks

Prerequisites¶

Run Pinecone Indexes first to populate the Pinecone indexes.

Run It¶

This demo is designed for the Orcheo server:

orcheo workflow upload examples/conversational_search/demo_5_production/demo_5.py --name "Production Pipeline" --config-file examples/conversational_search/demo_5_production/config.json

Execute via the Orcheo Console or API.

Configuration¶

{
  "configurable": {
    "retrieval": {
      "vector_store": {
        "type": "pinecone",
        "index_name": "orcheo-demo-dense",
        "namespace": "hybrid_search",
        "client_kwargs": {
          "api_key": "[[pinecone_api_key]]"
        }
      },
      "top_k": 4,
      "score_threshold": 0.0,
      "embed_model": "openai:text-embedding-3-small"
    },
    "session": {
      "max_sessions": 8,
      "max_turns": 20,
      "max_total_turns": 200
    },
    "caching": {
      "ttl_seconds": 3600,
      "max_entries": 128
    },
    "multi_hop": {
      "max_hops": 3
    },
    "memory_privacy": {
      "retention_count": 32
    },
    "guardrails": {
      "blocked_terms": ["password", "ssn"]
    },
    "streaming": {
      "chunk_size": 8,
      "buffer_limit": 64
    }
  }
}

Edit config.json to customise: - Retrieval: Vector store settings, top-k results, and embedding model - Session: Conversation limits and turn constraints - Caching: TTL and cache size for response caching - Multi-hop: Maximum hops for multi-step query planning - Memory Privacy: Retention count for privacy-aware memory management - Guardrails: Blocked terms for policy compliance checks - Streaming: Chunk size and buffer limits for streaming generation

Alternatively, you can try the online demo directly: Try the online demo →

Production-grade features

This demo includes caching, guardrails, multi-hop planning, and streaming. Try asking complex questions that require multiple search steps to see multi-hop planning in action.

Evaluation & Research¶

Evaluation-focused scaffold with golden datasets, retrieval A/B testing, and LLM-based judging.

For benchmark-oriented workflows (QReCC and MultiDoc2Dial), see Evaluation.

What's Included¶

Golden queries: Defaults to GitHub raw data under examples/conversational_search/data/golden/golden_dataset.json
Relevance labels: Defaults to GitHub raw data under examples/conversational_search/data/labels/relevance_labels.json
Variant definitions: Compare dense-only vs hybrid retrieval strategies

Prerequisites¶

Run Pinecone Indexes first to populate the Pinecone indexes.

Run It¶

orcheo workflow upload examples/conversational_search/demo_6_evaluation/demo_6.py --name "Evaluation & Research" --config-file examples/conversational_search/demo_6_evaluation/config.json

Execute via the Orcheo Console or API to run evaluation sweeps.

Configuration¶

{
  "recursion_limit": 250,
  "tags": ["conversational-search", "demo-6", "evaluation"],
  "configurable": {
    "dataset": {
      "golden_path": "https://raw.githubusercontent.com/AI-Colleagues/orcheo/refs/heads/main/examples/conversational_search/data/golden/golden_dataset.json",
      "queries_path": "https://raw.githubusercontent.com/AI-Colleagues/orcheo/refs/heads/main/examples/conversational_search/data/queries.json",
      "labels_path": "https://raw.githubusercontent.com/AI-Colleagues/orcheo/refs/heads/main/examples/conversational_search/data/labels/relevance_labels.json",
      "docs_path": "https://raw.githubusercontent.com/AI-Colleagues/orcheo/refs/heads/main/examples/conversational_search/data/docs/product_overview.md",
      "split": "test",
      "limit": null
    },
    "retrieval": {
      "top_k": 4,
      "embed_model": "openai:text-embedding-3-small",
      "sparse_model": "pinecone:bm25",
      "sparse_top_k": 4,
      "sparse_candidate_k": 50,
      "sparse_score_threshold": 0.0,
      "fusion": {
        "strategy": "rrf",
        "rrf_k": 30,
        "top_k": 4
      }
    },
    "vector_store": {
      "type": "pinecone",
      "pinecone": {
        "index_name": "orcheo-demo-dense",
        "namespace": "hybrid_search"
      }
    },
    "sparse_vector_store": {
      "type": "pinecone",
      "pinecone": {
        "index_name": "orcheo-demo-sparse",
        "namespace": "hybrid_search"
      }
    },
    "generation": {
      "citation_style": "inline",
      "model": "openai:gpt-4o-mini"
    },
    "ab_testing": {
      "experiment_id": "retrieval_comparison_001",
      "min_metric_threshold": 0.35
    },
    "llm_judge": {
      "min_score": 0.5,
      "model": "openai:gpt-4o-mini"
    }
  }
}

Edit config.json to customise: - Dataset: Paths to golden queries, relevance labels, and test documents - Retrieval: Top-k results, embedding models, fusion strategy - Vector store: Pinecone index names and namespaces for dense and sparse indexes - Generation: Citation style and model for answer generation - A/B testing: Experiment ID and minimum metric threshold for rollout decisions - LLM judge: Minimum score threshold and model for quality evaluation

Alternatively, you can try the online demo directly: Try the online demo →

Evaluation workflows

This demo runs golden-dataset evaluation, retrieval A/B testing, and LLM-based judging. Try submitting queries to see how different retrieval strategies are compared and scored.

Deploying Demos to Orcheo¶

All demos can be uploaded and run on the Orcheo server:

# Upload a demo
orcheo workflow upload examples/conversational_search/demo_2_basic_rag/demo_2.py

# Upload a demo with config
orcheo workflow upload examples/conversational_search/demo_6_evaluation/demo_6.py --config-file examples/conversational_search/demo_6_evaluation/config.json

# List workflows
orcheo workflow list

# Run a workflow
orcheo workflow run <workflow-id> --inputs '{"message": "What is Orcheo?"}'

The server detects the orcheo_workflow() entrypoint automatically. Configuration can be provided via --config-file.

Sample Data¶

The demos share sample data in examples/conversational_search/data/:

docs/: Sample documents (authentication, product overview, troubleshooting)
queries.json: Baseline queries for testing
golden/: Golden datasets for evaluation
labels/: Relevance labels for metrics

The Pinecone Indexes demo reads URLs from its config.json. Evaluation & Research defaults to GitHub raw URLs. You can point configs at these local files when running offline or customising the corpus.

Next Steps¶

Try the MongoDB Agent demos for MongoDB Atlas-based vector search
Start with Basic RAG to understand the basic RAG pattern
Run Pinecone Indexes before Hybrid Search, Conversational Search, Production Pipeline, and Evaluation & Research to seed the Pinecone indexes
Progress through Hybrid Search and Conversational Search to add hybrid search and conversation state
Use Production Pipeline patterns for production deployments
Set up Evaluation & Research for systematic evaluation of your search quality