System Architecture

The key priority of this architecture is developer velocity.

For hosted offerings, Vercel + Railway + Supabase + Beam has been a fantastic combo.
We also self host much of our stack with Docker.

Our entire stack

Everything runs in Docker. Vercel is the one exception, but we also have a docker version.

Full stack frontend: React + Next.js

Backend: Python Flask

Only used for Python-specific features, like advanced retrieval methods, Nomic document maps.
All other backend operations live in Next.js.

Databases

SQL: Postgres
Object storage: S3 / MinIO
Vector DB: Qdrant
Metadata: Redis - required for every page load

Required stateless services:

Document ingest queue (to handle spiky workloads without overwhelming our DBs): Python-RQ
User Auth: Keycloak (user data stored in Postgres)

Optional stateless add-ons:

LLM Serving: Ollama and vLLM
Web Crawling: Crawlee
Semantic Maps of documents and conversation history: Nomic Atlas

Optional state-full add-ons:

Tool use: N8N workflow builder
Error monitoring: Sentry
Google Analytics clone: Posthog

User-defined Custom Tool Use by LLM

Using N8N for a user-friendly GUI to define custom tools. This way, any user can give their chatbot custom tools that will be automatically invoked when appropriate, as decided by the LLM.

How does it work, in technical detail?

RAG chatbot, what happens when you hit send?

User submits prompt
1. Determine if tools should be invoked, if so execute them and store the outputs.
Embed user prompt with LLM embedding model
Retrieve most related documents from vector DB
Robust prompt engineering to:
1. add as many documents as possible to the context window,
2. retain as much of the conversation history as possible
3. include tool outputs and images
4. include our user-configurable prompt engineering features (tutor mode, document references)
Send final prompt-engineered message to the final LLM, stream result.
1. During streaming, replace LLM citations with proper links (using state machine). e.g. [doc 1, page 3] is replaced with https://s3.link-to-document.pdf?page=3

Document Ingest, how does it work?

User uploads a document via "Dropzone" file upload.
1. Client-side check for supported filetypes.
2. Generate pre-signed S3 url for direct Clinet --> S3 upload (bypass our servers to save bandwith fees).
3. After upload is complete, send POST to our Beam.cloud Ingest() queue.
Beam.cloud Ingest() queue. Code is here.
1. Ingest high level: A ingest function for each filetype -> Prevent duplicate uploads -> Chunk & embed -> upload to Qdrant & SQL databases. Done. If any failure occurres, it'll retry a max of 9 times with exponential backoff.
2. Read filetype, forward request to proper ingest function (e.g. pdf/word/excel/etc).
3. Each ingest function has the same interface.
  1. Input: s3_filepath, course_name
    Call self.split_and_upload() with the extracted text + metadata:
  A parallel lists of metadata and text strings, the indexes match so metadata[0] is for text[0], so on. Metadata dictionaries (typically 1 per "page") and a list of text strings which is the content.
  metadatas: List[Dict[str, Any]] = [ { 'course_name': course_name, 's3_path': s3_path, 'pagenumber': page['page_number'] + 1, 'timestamp': '', 'readable_filename': kwargs.get('readable_filename', page['readable_filename']), 'url': kwargs.get('url', ''), 'base_url': kwargs.get('base_url', ''), } for page in pdf_pages ] pdf_texts = [page['text'] for page in pdf_pages]
During this time, the frontend is poling the SQL database to update the website GUI with success/failed indicators.

Document ingest during web crawling

While web crawling we always link to the source materials, like a search engine. Our citations operate like Perplexity or ChatGPT with Search; crawl the web and link to the original sources.

Compatible "files" are uploaded to S3, including PDFs, Word, PPT, Excel. Even that, that's just a backup - we always link to the original source, and attempt to detect when they're 404 missing and fallback to our local version.

Most web pages are not files, they're HTML, and that is not uploaded to S3. Instead it's stored directly in SQL, and we link to the original source, just like a search engine.

Self-hostable version (coming Q1 2025)

Simplify to a single Docker-compose script.

: Main or "top level" storage, contains pointers to all other DBs and additional metadata.
MinIO: File storage (pdf/docx/mp4)
Redis/ValKey: User and project metadata, fast retrieval needed for page load.
Qdrant: Vector DB for document embeddings.

PreviousEndpoints NextDeveloper Quickstart

Last updated 4 months ago

Was this helpful?