Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
/venv

# Don't include LanceDB data.
/.data
/data

# Don't include default storage location.
Expand Down
10 changes: 6 additions & 4 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ FROM python:3.13-slim AS reqs
WORKDIR /app

RUN apt-get update -y && apt-get upgrade -y \
&& rm -rf /var/lib/apt/lists/
&& apt-get install -y postgresql-client && rm -rf /var/lib/apt/lists/

RUN python -m venv /venv
ENV PATH=/venv/bin:$PATH
Expand All @@ -17,13 +17,15 @@ COPY pyproject.toml pyproject.toml
RUN pip install --no-cache-dir -q .[dev]

FROM reqs AS app
COPY willa willa
COPY README.rst README.rst
COPY CHANGELOG.rst CHANGELOG.rst
COPY public public
COPY sql sql
COPY chainlit.md chainlit.md
COPY .chainlit .chainlit
COPY bin bin
COPY tests tests
COPY README.rst README.rst
COPY CHANGELOG.rst CHANGELOG.rst
COPY willa willa
RUN pip install --no-cache-dir -e .

CMD ["chainlit", "run", "/app/willa/web/app.py", "-h", "--host", "0.0.0.0"]
7 changes: 5 additions & 2 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -94,11 +94,14 @@ The chatbot service is deployed via Docker Compose. You can set up a similar
environment by running::

docker compose build --pull
docker compose run prisma migrate deploy
bin/dev

The ``bin/dev`` command sets a few environment variables for you and then runs
the appropriate ``compose`` command.
the appropriate ``compose`` command. When you set up your environment for the
first time, you will need to initialise the database. This can be done *before*
running ``bin/dev`` by running::

docker compose run --rm app bin/dbinit

To run Prisma Studio::

Expand Down
21 changes: 21 additions & 0 deletions bin/dbinit
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
#!/bin/sh -e

# Initialise the configured database environment for use with Willa,
# setting up the needed tables and indexes.
#
# Copyright © 2025 The Regents of the University of California. MIT license.

# Use the credentials from the app's environment.
export PGHOST=${POSTGRES_HOST:-db}
export PGPORT=${POSTGRES_PORT:-5432}
export PGUSER=${POSTGRES_USER}
export PGPASSWORD=${POSTGRES_PASSWORD}
export PGDATABASE=${POSTGRES_DB}

# Determine if the database needs to be created or not.
if [ "$(psql -d template1 -t -A -c "SELECT COUNT(*) FROM pg_database WHERE datname='${POSTGRES_DB}';")" = '0' ]; then
createdb
fi

# -1: Ensure atomicity: if any statement fails, no changes are persisted.
psql -1 -f sql/willa.sql
195 changes: 195 additions & 0 deletions sql/willa.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,195 @@
--
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't able to get this running, and it also includes a number of statements (e.g. the grants) which go beyond the scope of just establishing the schema.

What about cat-ing together the migration files that ship with prisma (in prisma/migrations/*/migration.sql)? That would be identical to what it seems prisma is doing under the hood.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GRANT was needed in my testing because the database created by default in Docker doesn't allow the created role to create tables in the public schema. (It wasn't part of the original pg_dump.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed some of the superfluous sets that are already defaults. What failure were you seeing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That GRANT line is not necessary in my testing. Steps to reproduce:

  1. Hard reset to the current code version.
  2. Remove or comment out the GRANT line.
  3. Down the stack (remove orphans and volumes)
  4. Delete .data/postgres1: rm -rf .data/postgres
  5. Start the stack: docker compose up --build -d app db
  6. Run dbinit: docker compose exec app bin/dbinit2

1 Should we use a named volume here instead? I found this confusing as it means docker compose down -v doesn't actually delete the data, which has to be rm'd manually. (Not necessary for this PR IMO, just raising the Q.)

2 Or, equivalently, use docker compose run.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following those exact steps, I get:

psql:sql/willa.sql:10: ERROR:  no schema has been selected to create in

And this is because the public schema does not allow CREATE to willa:

willa=# \dn+
                                       List of schemas
  Name  |       Owner       |           Access privileges            |      Description       
--------+-------------------+----------------------------------------+------------------------
 public | pg_database_owner | pg_database_owner=UC/pg_database_owner+| standard public schema
        |                   | =U/pg_database_owner                   | 
(1 row)

willa=# GRANT CREATE ON SCHEMA public TO willa;
GRANT
willa=# \dn+
                                       List of schemas
  Name  |       Owner       |           Access privileges            |      Description       
--------+-------------------+----------------------------------------+------------------------
 public | pg_database_owner | pg_database_owner=UC/pg_database_owner+| standard public schema
        |                   | =U/pg_database_owner                  +| 
        |                   | willa=C/pg_database_owner              | 
(1 row)

Now the real question is why willa isn't inheriting it, because as far as I can tell, willa is the owner:

willa=# \l willa
                                               List of databases
 Name  | Owner | Encoding | Locale Provider |  Collate   |   Ctype    | Locale | ICU Rules | Access privileges 
-------+-------+----------+-----------------+------------+------------+--------+-----------+-------------------
 willa | willa | UTF8     | libc            | en_US.utf8 | en_US.utf8 |        |           | 

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created AP-472 for the named volumes issue.

-- PostgreSQL database dump
--

SET client_encoding = 'UTF8';
SET standard_conforming_strings = on;
SELECT pg_catalog.set_config('search_path', '', false);
SET search_path TO public;


CREATE EXTENSION IF NOT EXISTS pgcrypto;
COMMENT ON EXTENSION pgcrypto IS 'cryptographic functions';


CREATE TYPE "StepType" AS ENUM (
'assistant_message',
'embedding',
'llm',
'retrieval',
'rerank',
'run',
'system_message',
'tool',
'undefined',
'user_message'
);


SET default_tablespace = '';
SET default_table_access_method = heap;


CREATE TABLE "Element" (
id text DEFAULT gen_random_uuid() NOT NULL,
"createdAt" timestamp(3) without time zone DEFAULT CURRENT_TIMESTAMP NOT NULL,
"updatedAt" timestamp(3) without time zone DEFAULT CURRENT_TIMESTAMP NOT NULL,
"threadId" text,
"stepId" text NOT NULL,
metadata jsonb NOT NULL,
mime text,
name text NOT NULL,
"objectKey" text,
url text,
"chainlitKey" text,
display text,
size text,
language text,
page integer,
props jsonb
);


CREATE TABLE "Feedback" (
id text DEFAULT gen_random_uuid() NOT NULL,
"createdAt" timestamp(3) without time zone DEFAULT CURRENT_TIMESTAMP NOT NULL,
"updatedAt" timestamp(3) without time zone DEFAULT CURRENT_TIMESTAMP NOT NULL,
"stepId" text,
name text NOT NULL,
value double precision NOT NULL,
comment text
);


CREATE TABLE "Step" (
id text DEFAULT gen_random_uuid() NOT NULL,
"createdAt" timestamp(3) without time zone DEFAULT CURRENT_TIMESTAMP NOT NULL,
"updatedAt" timestamp(3) without time zone DEFAULT CURRENT_TIMESTAMP NOT NULL,
"parentId" text,
"threadId" text,
input text,
metadata jsonb NOT NULL,
name text,
output text,
type "StepType" NOT NULL,
"showInput" text DEFAULT 'json'::text,
"isError" boolean DEFAULT false,
"startTime" timestamp(3) without time zone NOT NULL,
"endTime" timestamp(3) without time zone NOT NULL
);


CREATE TABLE "Thread" (
id text DEFAULT gen_random_uuid() NOT NULL,
"createdAt" timestamp(3) without time zone DEFAULT CURRENT_TIMESTAMP NOT NULL,
"updatedAt" timestamp(3) without time zone DEFAULT CURRENT_TIMESTAMP NOT NULL,
"deletedAt" timestamp(3) without time zone,
name text,
metadata jsonb NOT NULL,
"userId" text,
tags text[] DEFAULT ARRAY[]::text[]
);


CREATE TABLE "User" (
id text DEFAULT gen_random_uuid() NOT NULL,
"createdAt" timestamp(3) without time zone DEFAULT CURRENT_TIMESTAMP NOT NULL,
"updatedAt" timestamp(3) without time zone DEFAULT CURRENT_TIMESTAMP NOT NULL,
metadata jsonb NOT NULL,
identifier text NOT NULL
);


CREATE TABLE _prisma_migrations (
id character varying(36) NOT NULL,
checksum character varying(64) NOT NULL,
finished_at timestamp with time zone,
migration_name character varying(255) NOT NULL,
logs text,
rolled_back_at timestamp with time zone,
started_at timestamp with time zone DEFAULT now() NOT NULL,
applied_steps_count integer DEFAULT 0 NOT NULL
);


ALTER TABLE ONLY "Element"
ADD CONSTRAINT "Element_pkey" PRIMARY KEY (id);

ALTER TABLE ONLY "Feedback"
ADD CONSTRAINT "Feedback_pkey" PRIMARY KEY (id);

ALTER TABLE ONLY "Step"
ADD CONSTRAINT "Step_pkey" PRIMARY KEY (id);

ALTER TABLE ONLY "Thread"
ADD CONSTRAINT "Thread_pkey" PRIMARY KEY (id);

ALTER TABLE ONLY "User"
ADD CONSTRAINT "User_pkey" PRIMARY KEY (id);

ALTER TABLE ONLY _prisma_migrations
ADD CONSTRAINT _prisma_migrations_pkey PRIMARY KEY (id);


CREATE INDEX "Element_stepId_idx" ON "Element" USING btree ("stepId");

CREATE INDEX "Element_threadId_idx" ON "Element" USING btree ("threadId");

CREATE INDEX "Feedback_createdAt_idx" ON "Feedback" USING btree ("createdAt");

CREATE INDEX "Feedback_name_idx" ON "Feedback" USING btree (name);

CREATE INDEX "Feedback_name_value_idx" ON "Feedback" USING btree (name, value);

CREATE INDEX "Feedback_stepId_idx" ON "Feedback" USING btree ("stepId");

CREATE INDEX "Feedback_value_idx" ON "Feedback" USING btree (value);

CREATE INDEX "Step_createdAt_idx" ON "Step" USING btree ("createdAt");

CREATE INDEX "Step_endTime_idx" ON "Step" USING btree ("endTime");

CREATE INDEX "Step_name_idx" ON "Step" USING btree (name);

CREATE INDEX "Step_parentId_idx" ON "Step" USING btree ("parentId");

CREATE INDEX "Step_startTime_idx" ON "Step" USING btree ("startTime");

CREATE INDEX "Step_threadId_idx" ON "Step" USING btree ("threadId");

CREATE INDEX "Step_threadId_startTime_endTime_idx" ON "Step" USING btree ("threadId", "startTime", "endTime");

CREATE INDEX "Step_type_idx" ON "Step" USING btree (type);

CREATE INDEX "Thread_createdAt_idx" ON "Thread" USING btree ("createdAt");

CREATE INDEX "Thread_name_idx" ON "Thread" USING btree (name);

CREATE INDEX "User_identifier_idx" ON "User" USING btree (identifier);


CREATE UNIQUE INDEX "User_identifier_key" ON "User" USING btree (identifier);


ALTER TABLE ONLY "Element"
ADD CONSTRAINT "Element_stepId_fkey" FOREIGN KEY ("stepId") REFERENCES "Step"(id) ON UPDATE CASCADE ON DELETE CASCADE;

ALTER TABLE ONLY "Element"
ADD CONSTRAINT "Element_threadId_fkey" FOREIGN KEY ("threadId") REFERENCES "Thread"(id) ON UPDATE CASCADE ON DELETE CASCADE;

ALTER TABLE ONLY "Feedback"
ADD CONSTRAINT "Feedback_stepId_fkey" FOREIGN KEY ("stepId") REFERENCES "Step"(id) ON UPDATE CASCADE ON DELETE SET NULL;

ALTER TABLE ONLY "Step"
ADD CONSTRAINT "Step_parentId_fkey" FOREIGN KEY ("parentId") REFERENCES "Step"(id) ON UPDATE CASCADE ON DELETE CASCADE;

ALTER TABLE ONLY "Step"
ADD CONSTRAINT "Step_threadId_fkey" FOREIGN KEY ("threadId") REFERENCES "Thread"(id) ON UPDATE CASCADE ON DELETE CASCADE;

ALTER TABLE ONLY "Thread"
ADD CONSTRAINT "Thread_userId_fkey" FOREIGN KEY ("userId") REFERENCES "User"(id) ON UPDATE CASCADE ON DELETE SET NULL;


--
-- PostgreSQL database dump complete
--