UIUC.chat Documentation
UIUC.chat Main Page
  • Quick start
  • FAQs
  • Features
    • Tool use in conversation
    • Retrieval Methods
    • Web Crawling Details
    • Bulk Export Documents or Conversation History
    • Duplication in Ingested Documents
  • CropWizard
    • CropWizard Documents
    • Pest Detection Tool
    • CropWizard Document License Information
  • API
    • API Keys
    • Endpoints
  • Technical
    • System Architecture
  • Developers
    • Developer Quickstart
Powered by GitBook
On this page
  • Our entire stack
  • User-defined Custom Tool Use by LLM
  • How does it work, in technical detail?
  • RAG chatbot, what happens when you hit send?
  • Document Ingest, how does it work?
  • Self-hostable version (coming Q1 2025)

Was this helpful?

  1. Technical

System Architecture

PreviousEndpointsNextDeveloper Quickstart

Last updated 1 month ago

Was this helpful?

The key priority of this architecture is developer velocity.

  • For hosted offerings, Vercel + Railway + Supabase + Beam has been a fantastic combo.

  • We also self host much of our stack with Docker.

Our entire stack

Everything runs in Docker. Vercel is the one exception, but we also have a docker version.

Full stack frontend: React + Next.js

Backend: Python Flask

  • Only used for Python-specific features, like advanced retrieval methods, Nomic document maps.

  • All other backend operations live in Next.js.

Databases

  • SQL: Postgres

  • Object storage: S3 / MinIO

  • Vector DB: Qdrant

  • Metadata: Redis - required for every page load

Required stateless services:

  • Document ingest queue (to handle spiky workloads without overwhelming our DBs): Python-RQ

  • User Auth: Keycloak (user data stored in Postgres)

Optional stateless add-ons:

  • LLM Serving: Ollama and vLLM

  • Web Crawling: Crawlee

  • Semantic Maps of documents and conversation history: Nomic Atlas

Optional state-full add-ons:

  • Tool use: N8N workflow builder

  • Error monitoring: Sentry

  • Google Analytics clone: Posthog

User-defined Custom Tool Use by LLM

Using N8N for a user-friendly GUI to define custom tools. This way, any user can give their chatbot custom tools that will be automatically invoked when appropriate, as decided by the LLM.

How does it work, in technical detail?

RAG chatbot, what happens when you hit send?

  1. User submits prompt

    1. Determine if tools should be invoked, if so execute them and store the outputs.

  2. Embed user prompt with LLM embedding model

  3. Retrieve most related documents from vector DB

  4. Robust prompt engineering to:

    1. add as many documents as possible to the context window,

    2. retain as much of the conversation history as possible

    3. include tool outputs and images

    4. include our user-configurable prompt engineering features (tutor mode, document references)

  5. Send final prompt-engineered message to the final LLM, stream result.

Document Ingest, how does it work?

  1. User uploads a document via "Dropzone" file upload.

    1. Client-side check for supported filetypes.

    2. After upload is complete, send POST to our Beam.cloud Ingest() queue.

    1. Each ingest function has the same interface.

      1. Input: s3_filepath, course_name

        1. Call self.split_and_upload() with the extracted text + metadata:

      A parallel lists of metadata and text strings, the indexes match so metadata[0] is for text[0], so on. Metadata dictionaries (typically 1 per "page") and a list of text strings which is the content.

          metadatas: List[Dict[str, Any]] = [
              {
                  'course_name': course_name,
                  's3_path': s3_path,
                  'pagenumber': page['page_number'] + 1,
                  'timestamp': '',
                  'readable_filename': kwargs.get('readable_filename', page['readable_filename']),
                  'url': kwargs.get('url', ''),
                  'base_url': kwargs.get('base_url', ''),
              } for page in pdf_pages
          ]
          pdf_texts = [page['text'] for page in pdf_pages]
          
  2. During this time, the frontend is poling the SQL database to update the website GUI with success/failed indicators.

Document ingest during web crawling

While web crawling we always link to the source materials, like a search engine. Our citations operate like Perplexity or ChatGPT with Search; crawl the web and link to the original sources.

Compatible "files" are uploaded to S3, including PDFs, Word, PPT, Excel. Even that, that's just a backup - we always link to the original source, and attempt to detect when they're 404 missing and fallback to our local version.

Most web pages are not files, they're HTML, and that is not uploaded to S3. Instead it's stored directly in SQL, and we link to the original source, just like a search engine.

Self-hostable version (coming Q1 2025)

Simplify to a single Docker-compose script.

  • : Main or "top level" storage, contains pointers to all other DBs and additional metadata.

  • MinIO: File storage (pdf/docx/mp4)

  • Qdrant: Vector DB for document embeddings.

During streaming, replace LLM citations with proper links (using state machine). e.g. [doc 1, page 3] is replaced with

for direct Clinet --> S3 upload (bypass our servers to save bandwith fees).

Beam.cloud Ingest() queue. .

Ingest high level: A ingest function for each filetype -> -> Chunk & embed -> upload to Qdrant & SQL databases. Done. If any failure occurres, it'll retry a max of 9 times with exponential backoff.

(e.g. pdf/word/excel/etc).

Redis/: User and project metadata, fast retrieval needed for page load.

https://s3.link-to-document.pdf?page=3
Generate pre-signed S3 url
Code is here
Prevent duplicate uploads
Read filetype, forward request to proper ingest function
ValKey
Architecture as of March 2025. Every grey line item is a Docker container.
Document ingest for uploaded files. Web crawling is very similar.
Document ingest during web crawling.