These answers come from the year-long archive of my previous chatbot that lived on my previous site iamnicola.ai. I’ve curated the most useful sessions—real questions from operators exploring AI workflows, experimentation, and conversion work—and lightly edited them so you get the original signal without the noise.

ai-workflows

How does this chatbot work?

Complete Guide

The Complete Guide to AI Workflow Automation for Businesses

Everything you need to know about implementing AI workflows, from strategy to execution. Learn how to identify automation opportunities, choose the right tools, and measure ROI.

Direct Answer

Aria is a Retrieval-Augmented Generation (RAG) chatbot that answers questions by searching through more than 200 content sources on nicolalazzari.ai—including articles, guides, case studies, Q&A entries, pricing information, and consulting pages. Instead of relying on pre-written responses, it retrieves relevant content in real-time and generates answers that cite their sources, ensuring accuracy and staying up-to-date as the site evolves. For a detailed technical deep-dive into the architecture, implementation, and results, see the Aria chatbot RAG case study.

Architecture Overview

The chatbot uses a lightweight, modular architecture where each component handles one responsibility. This makes the system observable, easy to extend, and cost-effective to run.

Content Sources

Aria draws from a unified corpus that includes:

  • Markdown articles — Technical blog posts and tutorials
  • Structured guides — Experimentation playbooks, AI implementation guides, and frameworks
  • Case studies — Detailed project narratives with results and technical details
  • Pricing and consulting pages — Service descriptions, engagement models, and rate information
  • Q&A entries — Both database-backed and static fallback entries covering common questions
  • External signals — Live data from APIs like Last.fm for personal context

Content Ingestion Pipeline

Every night, automated workers crawl all content sources and prepare them for embedding:

  1. Normalization. Content is converted to clean HTML, removing formatting noise while preserving structure.
  2. Metadata enrichment. Each document gets canonical URLs, breadcrumbs, and content type tags.
  3. Fingerprinting. A SHA-256 hash is computed for each document to detect changes.
  4. Change detection. If a document's hash hasn't changed since the last run, it's skipped entirely—this keeps embedding costs predictable.

Embedding Generation

Only changed or new documents are sent to OpenAI's text-embedding-3-small model, which converts text into high-dimensional vectors (embeddings). These vectors capture semantic meaning, so similar concepts cluster together in vector space. All embeddings are stored in a PostgreSQL database with pgvector extension, creating a searchable knowledge base.

The embedding process is incremental and cost-optimized. Hash comparison prevents re-embedding unchanged content, and the unified corpus means adding a new article automatically feeds the chatbot without manual configuration.

Retrieval System

When you ask a question, Aria uses a hybrid retrieval approach that combines two search methods:

  • Semantic search (cosine similarity). Your question is converted to an embedding, then compared against all stored content embeddings. Documents with similar meaning score higher, even if they don't contain exact keywords.
  • Keyword search (BM25). BM25 (Best Matching 25) is a ranking algorithm that boosts documents containing exact keyword matches. This ensures precise terms like "Calendly" or "experimentation" get proper relevance.

The hybrid approach ensures both semantic understanding and precise keyword matching work together. Documents below a relevance threshold are discarded, which is why hallucinations stay below 3%.

Response Generation

Once the most relevant content is retrieved, it's packaged into a context prompt and sent to OpenAI's gpt-4o-mini model, which:

  • Generates a natural-language answer using only the retrieved context
  • Streams the response token-by-token for fast perceived performance (1.9s average first token)
  • Includes citations linking back to source pages
  • Adapts the tone to match the site's voice
  • Surfaces context-aware calls-to-action based on conversation topics

If there's a risk of truncation or the response quality drops, the system falls back to gpt-4 automatically.

Call-to-Action Intelligence

Aria tracks conversation context, scroll depth, and CTA performance to surface relevant next steps:

  • When someone asks about pricing, it suggests booking a strategy call
  • When discussing experimentation, it links to the experimentation playbooks
  • When exploring AI workflows, it surfaces relevant case studies
  • It integrates with Calendly to show live availability from Google Calendar

This context-aware approach has increased CTA conversion by 2.4× compared to static prompts.

Performance & Results

By the numbers:

  • 1.9 seconds average first-token latency (68% faster than the previous version)
  • 92% grounded answers — every response cites an internal resource
  • Under 3% hallucinations — thanks to strict relevance filtering
  • 35% longer sessions — visitors who engage for three turns stay meaningfully longer
  • 2.4× CTA conversion — context-aware prompts outperform static flows
  • Zero manual upkeep — content ingestion, hashing, and embedding run automatically

Privacy & Data Handling

Conversations are stored locally in your browser (localStorage) and are not sent to external analytics unless you've given consent. The chatbot only accesses publicly available content on nicolalazzari.ai—it doesn't have access to private data or user accounts.

Continuous Improvement

The system is designed to improve automatically as content is added. When new articles, guides, or Q&A entries are published, they're automatically included in the next embedding run. The hash-based change detection ensures only new or updated content triggers re-embedding, keeping costs low while maintaining freshness.

Technical Stack

  • Frontend: Next.js with React, streaming UI with Server-Sent Events (SSE)
  • Backend: Next.js API routes, PostgreSQL with pgvector
  • Embeddings: OpenAI text-embedding-3-small
  • Generation: OpenAI gpt-4o-mini (with gpt-4 fallback)
  • Deployment: Vercel with automatic deployments

Takeaway & Related Content

Aria demonstrates how RAG can transform a basic chatbot into a precision assistant that stays accurate, up-to-date, and helpful. The architecture prioritizes cost efficiency, accuracy, and maintainability—making it a practical blueprint for production RAG systems.

Want to go deeper?

If this answer sparked ideas or you'd like to discuss how it applies to your team, let's connect for a quick strategy call.

Book a Strategy Call