System Architecture
Last updated
Was this helpful?
Last updated
Was this helpful?
The key priority of this architecture is developer velocity.
For hosted offerings, Vercel + Railway + Supabase + Beam has been a fantastic combo.
We also self host much of our stack with Docker.
Everything runs in Docker. Vercel is the one exception, but we also have a docker version.
Full stack frontend: React + Next.js
Backend: Python Flask
Only used for Python-specific features, like advanced retrieval methods, Nomic document maps.
All other backend operations live in Next.js.
Databases
SQL: Postgres
Object storage: S3 / MinIO
Vector DB: Qdrant
Metadata: Redis - required for every page load
Required stateless services:
Document ingest queue (to handle spiky workloads without overwhelming our DBs): Python-RQ
User Auth: Keycloak (user data stored in Postgres)
Optional stateless add-ons:
LLM Serving: Ollama and vLLM
Web Crawling: Crawlee
Semantic Maps of documents and conversation history: Nomic Atlas
Optional state-full add-ons:
Tool use: N8N workflow builder
Error monitoring: Sentry
Google Analytics clone: Posthog
Using N8N for a user-friendly GUI to define custom tools. This way, any user can give their chatbot custom tools that will be automatically invoked when appropriate, as decided by the LLM.
User submits prompt
Determine if tools should be invoked, if so execute them and store the outputs.
Embed user prompt with LLM embedding model
Retrieve most related documents from vector DB
Robust prompt engineering to:
add as many documents as possible to the context window,
retain as much of the conversation history as possible
include tool outputs and images
include our user-configurable prompt engineering features (tutor mode, document references)
Send final prompt-engineered message to the final LLM, stream result.
User uploads a document via "Dropzone" file upload.
Client-side check for supported filetypes.
After upload is complete, send POST to our Beam.cloud Ingest()
queue.
Each ingest function has the same interface.
Input: s3_filepath, course_name
Call self.split_and_upload()
with the extracted text + metadata:
A parallel lists of metadata and text strings, the indexes match so metadata[0] is for text[0], so on. Metadata dictionaries
(typically 1 per "page") and a list of text strings which is the content.
During this time, the frontend is poling the SQL database to update the website GUI with success/failed indicators.
While web crawling we always link to the source materials, like a search engine. Our citations operate like Perplexity or ChatGPT with Search; crawl the web and link to the original sources.
Compatible "files" are uploaded to S3, including PDFs, Word, PPT, Excel. Even that, that's just a backup - we always link to the original source, and attempt to detect when they're 404 missing and fallback to our local version.
Most web pages are not files, they're HTML, and that is not uploaded to S3. Instead it's stored directly in SQL, and we link to the original source, just like a search engine.
Simplify to a single Docker-compose script.
: Main or "top level" storage, contains pointers to all other DBs and additional metadata.
MinIO: File storage (pdf/docx/mp4)
Qdrant: Vector DB for document embeddings.
During streaming, replace LLM citations with proper links (using state machine). e.g. [doc 1, page 3] is replaced with
for direct Clinet --> S3 upload (bypass our servers to save bandwith fees).
Beam.cloud Ingest()
queue. .
Ingest high level: A ingest function for each filetype -> -> Chunk & embed -> upload to Qdrant & SQL databases. Done. If any failure occurres, it'll retry a max of 9 times with exponential backoff.
(e.g. pdf/word/excel/etc).
Redis/: User and project metadata, fast retrieval needed for page load.