Marks some things as done

2026-07-01 09:44:19 +00:00 · 2026-06-12 11:38:20 -07:00
parent b2151acfd5
commit 85cd9b657b
6 changed files with 0 additions and 0 deletions
@@ -1,745 +0,0 @@
-# LanceDB Schema Migration Implementation Plan
-
-> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
-
-**Goal:** Add a schema versioning and migration system to the LanceDB vector store so that structural column changes can be applied in-place without re-embedding documents, avoiding token costs for users on paid embedding APIs.
-
-**Architecture:** A `schema_version.json` file is written alongside the LanceDB data directory and tracks the current applied version. A `Migration` dataclass registry in `vector_store.py` holds ordered, typed migration steps; each migration is classified as `requires_reembed=True/False`. At index update time, structural-only migrations are applied in-place via LanceDB's `add_columns`/`alter_columns`/`drop_columns` APIs; if any pending migration requires re-embedding, the existing model-mismatch rebuild path is reused.
-
-**Tech Stack:** Python 3.11, lancedb 0.33, pyarrow, pytest, pytest-mock, factory-boy
-
---
-
-## File Map
-
-| File                                          | Change                                                                                                                                |
-| --------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------- |
-| `src/paperless_ai/vector_store.py`            | Add `CURRENT_SCHEMA_VERSION`, `Migration` dataclass, version file helpers, migration methods; modify `_ensure_table` and `drop_table` |
-| `src/paperless_ai/indexing.py`                | Call migration inside `update_llm_index`'s `write_store` block                                                                        |
-| `src/paperless_ai/tests/test_vector_store.py` | New `TestSchemaVersioning` and `TestMigrations` test classes                                                                          |
-| `src/paperless_ai/tests/test_ai_indexing.py`  | Two new integration tests for migration path                                                                                          |
-
---
-
-## Task 1: Schema version file helpers
-
-**Files:**
-
- Modify: `src/paperless_ai/vector_store.py`
- Test: `src/paperless_ai/tests/test_vector_store.py`
-
- [ ] **Step 1: Write the failing tests**
-
-Add a new class at the bottom of `test_vector_store.py`:
-
-```python
-class TestSchemaVersioning:
-    @pytest.fixture
-    def uri(self, tmp_path: Path) -> str:
-        return str(tmp_path / "idx")
-
-    def test_version_file_written_on_table_creation(self, uri: str) -> None:
-        from paperless_ai.vector_store import CURRENT_SCHEMA_VERSION
-
-        store = PaperlessLanceVectorStore(uri=uri)
-        store.add([_node("1-0", "1", "text", 0.1)])
-
-        version_file = Path(uri) / "schema_version.json"
-        assert version_file.exists()
-        assert json.loads(version_file.read_text())["version"] == CURRENT_SCHEMA_VERSION
-
-    def test_stored_schema_version_returns_current_when_file_missing(
-        self, uri: str
-    ) -> None:
-        from paperless_ai.vector_store import CURRENT_SCHEMA_VERSION
-
-        store = PaperlessLanceVectorStore(uri=uri)
-        store.add([_node("1-0", "1", "text", 0.1)])
-        (Path(uri) / "schema_version.json").unlink()
-
-        reopened = PaperlessLanceVectorStore(uri=uri)
-        assert reopened.stored_schema_version() == CURRENT_SCHEMA_VERSION
-
-    def test_stored_schema_version_persists_after_reopen(self, uri: str) -> None:
-        from paperless_ai.vector_store import CURRENT_SCHEMA_VERSION
-
-        PaperlessLanceVectorStore(uri=uri).add([_node("1-0", "1", "text", 0.1)])
-
-        reopened = PaperlessLanceVectorStore(uri=uri)
-        assert reopened.stored_schema_version() == CURRENT_SCHEMA_VERSION
-
-    def test_drop_table_removes_version_file(self, uri: str) -> None:
-        store = PaperlessLanceVectorStore(uri=uri)
-        store.add([_node("1-0", "1", "text", 0.1)])
-        assert (Path(uri) / "schema_version.json").exists()
-
-        store.drop_table()
-        assert not (Path(uri) / "schema_version.json").exists()
-
-    def test_version_file_written_on_upsert_creation(self, uri: str) -> None:
-        from paperless_ai.vector_store import CURRENT_SCHEMA_VERSION
-
-        store = PaperlessLanceVectorStore(uri=uri)
-        store.upsert_document("1", [_node("1-0", "1", "text", 0.1)])
-
-        version_file = Path(uri) / "schema_version.json"
-        assert json.loads(version_file.read_text())["version"] == CURRENT_SCHEMA_VERSION
-```
-
-Add `import json` and `import pytest_mock` to the top of `test_vector_store.py`.
-
- [ ] **Step 2: Run tests to verify they fail**
-
-```bash
-bash /c/Users/tholmes/Documents/Coding/paperless/vmtest.sh "src/paperless_ai/tests/test_vector_store.py::TestSchemaVersioning -v"
-```
-
-Expected: all 5 tests fail with `ImportError` or `AttributeError` — `CURRENT_SCHEMA_VERSION` and `stored_schema_version` don't exist yet.
-
- [ ] **Step 3: Implement the schema version helpers in `vector_store.py`**
-
-After the existing imports and before the `DEFAULT_TABLE_NAME` constant, add:
-
-```python
-import json
-from pathlib import Path
-```
-
-After `DEFAULT_TABLE_NAME = "documents"`, add:
-
-```python
-CURRENT_SCHEMA_VERSION: int = 1
-```
-
-After the `ANN_PQ_SUB_VECTORS` constant, add nothing yet — version methods go on the class.
-
-Inside `PaperlessLanceVectorStore`, add these methods after `stored_model_name`:
-
-```python
-@property
-def _schema_version_path(self) -> Path:
-    return Path(self._uri) / "schema_version.json"
-
-def stored_schema_version(self) -> int:
-    """Return the schema version recorded on disk, or CURRENT_SCHEMA_VERSION if missing.
-
-    Missing means either the table predates versioning or was just created and the
-    write hasn't happened yet — treat conservatively as already current.
-    """
-    try:
-        return int(json.loads(self._schema_version_path.read_text())["version"])
-    except (FileNotFoundError, KeyError, ValueError):
-        return CURRENT_SCHEMA_VERSION
-
-def _write_schema_version(self, version: int) -> None:
-    self._schema_version_path.parent.mkdir(parents=True, exist_ok=True)
-    self._schema_version_path.write_text(json.dumps({"version": version}))
-```
-
-Modify `_ensure_table` to write the version after creating the table. Replace the current method body:
-
-```python
-def _ensure_table(self, rows: list[dict[str, Any]], dim: int) -> bool:
-    if self._table is not None:
-        return False
-    self._table = self._conn.create_table(
-        self._table_name,
-        rows,
-        schema=self._schema(dim, self._embed_model_name),
-    )
-    self._write_schema_version(CURRENT_SCHEMA_VERSION)
-    return True
-```
-
-Modify `drop_table` to also remove the version file:
-
-```python
-def drop_table(self) -> None:
-    if self.table_exists():
-        self._conn.drop_table(self._table_name)
-    self._table = None
-    self._schema_version_path.unlink(missing_ok=True)
-```
-
- [ ] **Step 4: Run tests to verify they pass**
-
-```bash
-bash /c/Users/tholmes/Documents/Coding/paperless/vmtest.sh "src/paperless_ai/tests/test_vector_store.py::TestSchemaVersioning -v"
-```
-
-Expected: all 5 tests pass.
-
- [ ] **Step 5: Verify no regressions**
-
-```bash
-bash /c/Users/tholmes/Documents/Coding/paperless/vmtest.sh "src/paperless_ai/tests/test_vector_store.py -v"
-```
-
-Expected: all existing tests still pass.
-
- [ ] **Step 6: Lint**
-
-```bash
-ruff check src/paperless_ai/vector_store.py src/paperless_ai/tests/test_vector_store.py
-ruff format src/paperless_ai/vector_store.py src/paperless_ai/tests/test_vector_store.py
-```
-
-Expected: no errors.
-
- [ ] **Step 7: Commit**
-
-```bash
-git add src/paperless_ai/vector_store.py src/paperless_ai/tests/test_vector_store.py
-git commit -m "feat(ai): add schema version file tracking to LanceDB vector store"
-```
-
---
-
-## Task 2: Migration dataclass and pending migration detection
-
-**Files:**
-
- Modify: `src/paperless_ai/vector_store.py`
- Test: `src/paperless_ai/tests/test_vector_store.py`
-
- [ ] **Step 1: Write the failing tests**
-
-Add a new class to `test_vector_store.py`:
-
-```python
-class TestMigrationRegistry:
-    @pytest.fixture
-    def uri(self, tmp_path: Path) -> str:
-        return str(tmp_path / "idx")
-
-    def _store_at_version(self, uri: str, version: int) -> PaperlessLanceVectorStore:
-        """Create a store with a table and then fake its on-disk version."""
-        store = PaperlessLanceVectorStore(uri=uri)
-        store.add([_node("1-0", "1", "text", 0.1)])
-        store._write_schema_version(version)
-        return PaperlessLanceVectorStore(uri=uri)  # reopen to pick up written version
-
-    def test_pending_migrations_empty_at_current_version(self, uri: str) -> None:
-        from paperless_ai.vector_store import CURRENT_SCHEMA_VERSION, Migration
-
-        store = self._store_at_version(uri, CURRENT_SCHEMA_VERSION)
-        assert store.pending_migrations() == []
-
-    def test_pending_migrations_returns_migrations_above_stored_version(
-        self, uri: str, mocker: pytest_mock.MockerFixture
-    ) -> None:
-        from paperless_ai.vector_store import Migration
-
-        m2 = Migration(version=2, description="add col", requires_reembed=False, apply=lambda t: None)
-        m3 = Migration(version=3, description="reindex", requires_reembed=True, apply=lambda t: None)
-        mocker.patch("paperless_ai.vector_store.MIGRATIONS", [m2, m3])
-
-        store = self._store_at_version(uri, 1)
-        pending = store.pending_migrations()
-        assert pending == [m2, m3]
-
-    def test_pending_migrations_excludes_already_applied(
-        self, uri: str, mocker: pytest_mock.MockerFixture
-    ) -> None:
-        from paperless_ai.vector_store import Migration
-
-        m2 = Migration(version=2, description="add col", requires_reembed=False, apply=lambda t: None)
-        m3 = Migration(version=3, description="reindex", requires_reembed=True, apply=lambda t: None)
-        mocker.patch("paperless_ai.vector_store.MIGRATIONS", [m2, m3])
-
-        store = self._store_at_version(uri, 2)
-        pending = store.pending_migrations()
-        assert pending == [m3]
-
-    def test_pending_migrations_empty_when_no_table(self, uri: str) -> None:
-        store = PaperlessLanceVectorStore(uri=uri)
-        assert store.pending_migrations() == []
-
-    def test_requires_reembed_migration_false_when_none_pending(self, uri: str) -> None:
-        store = self._store_at_version(uri, 1)
-        assert store.requires_reembed_migration() is False
-
-    def test_requires_reembed_migration_false_when_only_structural_pending(
-        self, uri: str, mocker: pytest_mock.MockerFixture
-    ) -> None:
-        from paperless_ai.vector_store import Migration
-
-        m2 = Migration(version=2, description="add col", requires_reembed=False, apply=lambda t: None)
-        mocker.patch("paperless_ai.vector_store.MIGRATIONS", [m2])
-
-        store = self._store_at_version(uri, 1)
-        assert store.requires_reembed_migration() is False
-
-    def test_requires_reembed_migration_true_when_reembed_migration_pending(
-        self, uri: str, mocker: pytest_mock.MockerFixture
-    ) -> None:
-        from paperless_ai.vector_store import Migration
-
-        m2 = Migration(version=2, description="reindex", requires_reembed=True, apply=lambda t: None)
-        mocker.patch("paperless_ai.vector_store.MIGRATIONS", [m2])
-
-        store = self._store_at_version(uri, 1)
-        assert store.requires_reembed_migration() is True
-```
-
- [ ] **Step 2: Run tests to verify they fail**
-
-```bash
-bash /c/Users/tholmes/Documents/Coding/paperless/vmtest.sh "src/paperless_ai/tests/test_vector_store.py::TestMigrationRegistry -v"
-```
-
-Expected: all 7 tests fail — `Migration`, `MIGRATIONS`, `pending_migrations`, `requires_reembed_migration` don't exist yet.
-
- [ ] **Step 3: Add `Migration` dataclass and registry to `vector_store.py`**
-
-Add near the top of the file, after the existing imports:
-
-```python
-from dataclasses import dataclass, field
-from typing import Callable
-```
-
-After the `CURRENT_SCHEMA_VERSION` constant, add:
-
-```python
-@dataclass(frozen=True)
-class Migration:
-    version: int
-    description: str
-    requires_reembed: bool
-    apply: Callable[[Any], None] = field(compare=False, hash=False)
-```
-
-(`compare=False, hash=False` excludes `apply` from `__eq__` and `__hash__` — equality is driven by `version` alone, which is the natural identity key. This avoids lambda identity issues in tests and makes the API safe for callers that construct `Migration` instances inline.)
-
-# Ordered list of schema migrations. Each entry upgrades the table to `version`.
-
-# Structural migrations (requires_reembed=False) are applied in-place via LanceDB's
-
-# add_columns/alter_columns/drop_columns APIs — no re-embedding needed.
-
-# Migrations with requires_reembed=True cause a full rebuild on next index update,
-
-# exactly like a model-name change does today.
-
-#
-
-# To add a migration:
-
-# 1. Increment CURRENT_SCHEMA_VERSION.
-
-# 2. Append a Migration entry here with the new version number.
-
-# 3. For structural changes, call table.add_columns/alter_columns/drop_columns in apply().
-
-# 4. For embedding-invalidating changes, set requires_reembed=True; apply() can be a no-op.
-
-MIGRATIONS: list[Migration] = []
-
-````
-
-Inside `PaperlessLanceVectorStore`, add after `requires_reembed_migration` (which we'll add next):
-
-```python
-def pending_migrations(self) -> list[Migration]:
-    """Return migrations not yet applied to this table, in version order."""
-    if self._table is None:
-        return []
-    current = self.stored_schema_version()
-    return [m for m in MIGRATIONS if m.version > current]
-
-def requires_reembed_migration(self) -> bool:
-    """True when any pending migration requires a full re-embedding."""
-    return any(m.requires_reembed for m in self.pending_migrations())
-````
-
- [ ] **Step 4: Run tests to verify they pass**
-
-```bash
-bash /c/Users/tholmes/Documents/Coding/paperless/vmtest.sh "src/paperless_ai/tests/test_vector_store.py::TestMigrationRegistry -v"
-```
-
-Expected: all 7 tests pass.
-
- [ ] **Step 5: Lint**
-
-```bash
-ruff check src/paperless_ai/vector_store.py src/paperless_ai/tests/test_vector_store.py
-ruff format src/paperless_ai/vector_store.py src/paperless_ai/tests/test_vector_store.py
-```
-
- [ ] **Step 6: Commit**
-
-```bash
-git add src/paperless_ai/vector_store.py src/paperless_ai/tests/test_vector_store.py
-git commit -m "feat(ai): add Migration registry and pending migration detection"
-```
-
---
-
-## Task 3: Apply structural migrations in-place
-
-**Files:**
-
- Modify: `src/paperless_ai/vector_store.py`
- Test: `src/paperless_ai/tests/test_vector_store.py`
-
- [ ] **Step 1: Write the failing tests**
-
-Add a new class to `test_vector_store.py`:
-
-```python
-class TestApplyStructuralMigrations:
-    @pytest.fixture
-    def uri(self, tmp_path: Path) -> str:
-        return str(tmp_path / "idx")
-
-    def _store_at_version(self, uri: str, version: int) -> PaperlessLanceVectorStore:
-        store = PaperlessLanceVectorStore(uri=uri)
-        store.add([_node("1-0", "1", "text", 0.1)])
-        store._write_schema_version(version)
-        return PaperlessLanceVectorStore(uri=uri)
-
-    def test_apply_structural_adds_column_via_lancedb(
-        self, uri: str, mocker: pytest_mock.MockerFixture
-    ) -> None:
-        from paperless_ai.vector_store import Migration
-
-        def _add_extra(table: Any) -> None:
-            table.add_columns({"extra": "CAST(NULL AS VARCHAR)"})
-
-        m2 = Migration(version=2, description="add extra col", requires_reembed=False, apply=_add_extra)
-        mocker.patch("paperless_ai.vector_store.MIGRATIONS", [m2])
-
-        store = self._store_at_version(uri, 1)
-        applied = store.apply_structural_migrations()
-
-        assert len(applied) == 1
-        assert applied[0] == m2
-        # Column actually present in the table schema.
-        reopened = PaperlessLanceVectorStore(uri=uri)
-        field_names = [f.name for f in reopened._table.schema]
-        assert "extra" in field_names
-
-    def test_apply_structural_updates_version_file(
-        self, uri: str, mocker: pytest_mock.MockerFixture
-    ) -> None:
-        from paperless_ai.vector_store import Migration
-
-        m2 = Migration(version=2, description="add col", requires_reembed=False, apply=lambda t: t.add_columns({"c": "CAST(NULL AS VARCHAR)"}))
-        mocker.patch("paperless_ai.vector_store.MIGRATIONS", [m2])
-
-        store = self._store_at_version(uri, 1)
-        store.apply_structural_migrations()
-
-        assert store.stored_schema_version() == 2
-
-    def test_apply_structural_skips_reembed_migrations(
-        self, uri: str, mocker: pytest_mock.MockerFixture
-    ) -> None:
-        from paperless_ai.vector_store import Migration
-
-        applied_versions: list[int] = []
-        m2 = Migration(version=2, description="structural", requires_reembed=False, apply=lambda t: applied_versions.append(2) or t.add_columns({"c": "CAST(NULL AS VARCHAR)"}))
-        m3 = Migration(version=3, description="reembed", requires_reembed=True, apply=lambda t: applied_versions.append(3))
-        mocker.patch("paperless_ai.vector_store.MIGRATIONS", [m2, m3])
-
-        store = self._store_at_version(uri, 1)
-        applied = store.apply_structural_migrations()
-
-        assert [m.version for m in applied] == [2]
-        assert 3 not in applied_versions
-        # Version advances only to the last structural migration applied.
-        assert store.stored_schema_version() == 2
-
-    def test_apply_structural_noop_at_current_version(self, uri: str) -> None:
-        store = self._store_at_version(uri, 1)
-        applied = store.apply_structural_migrations()
-        assert applied == []
-
-    def test_apply_structural_noop_when_no_table(self, uri: str) -> None:
-        store = PaperlessLanceVectorStore(uri=uri)
-        applied = store.apply_structural_migrations()
-        assert applied == []
-
-    def test_apply_structural_refreshes_table_reference(
-        self, uri: str, mocker: pytest_mock.MockerFixture
-    ) -> None:
-        """After add_columns the in-memory table object must reflect the new schema."""
-        from paperless_ai.vector_store import Migration
-
-        m2 = Migration(version=2, description="add col", requires_reembed=False, apply=lambda t: t.add_columns({"extra": "CAST(NULL AS VARCHAR)"}))
-        mocker.patch("paperless_ai.vector_store.MIGRATIONS", [m2])
-
-        store = self._store_at_version(uri, 1)
-        store.apply_structural_migrations()
-
-        # The store's own _table reference (not a re-open) must see the new column.
-        field_names = [f.name for f in store._table.schema]
-        assert "extra" in field_names
-```
-
-Add `from typing import Any` to the test file imports if not already present.
-
- [ ] **Step 2: Run tests to verify they fail**
-
-```bash
-bash /c/Users/tholmes/Documents/Coding/paperless/vmtest.sh "src/paperless_ai/tests/test_vector_store.py::TestApplyStructuralMigrations -v"
-```
-
-Expected: all 6 tests fail — `apply_structural_migrations` doesn't exist yet.
-
- [ ] **Step 3: Implement `apply_structural_migrations` in `vector_store.py`**
-
-Add after `requires_reembed_migration` on the class:
-
-```python
-def apply_structural_migrations(self) -> list[Migration]:
-    """Apply all pending structural (non-reembed) migrations in version order.
-
-    Each applied migration's ``apply`` callable receives the live LanceDB table
-    object and should call ``add_columns``, ``alter_columns``, or ``drop_columns``
-    as needed.  After all structural migrations run, the version file is updated
-    to the highest version applied and the in-memory table reference is refreshed.
-
-    Migrations with ``requires_reembed=True`` are skipped — the caller is
-    responsible for detecting them via ``requires_reembed_migration()`` and
-    triggering a full rebuild.
-    """
-    if self._table is None:
-        return []
-    structural = [m for m in self.pending_migrations() if not m.requires_reembed]
-    if not structural:
-        return []
-    for migration in structural:
-        logger.info("Applying schema migration v%d: %s", migration.version, migration.description)
-        migration.apply(self._table)
-    # Refresh the in-memory table so subsequent operations see the new schema.
-    self._table = self._conn.open_table(self._table_name)
-    self._write_schema_version(structural[-1].version)
-    return structural
-```
-
- [ ] **Step 4: Run tests to verify they pass**
-
-```bash
-bash /c/Users/tholmes/Documents/Coding/paperless/vmtest.sh "src/paperless_ai/tests/test_vector_store.py::TestApplyStructuralMigrations -v"
-```
-
-Expected: all 6 tests pass.
-
- [ ] **Step 5: Full test_vector_store regression check**
-
-```bash
-bash /c/Users/tholmes/Documents/Coding/paperless/vmtest.sh "src/paperless_ai/tests/test_vector_store.py -v"
-```
-
-Expected: all tests pass.
-
- [ ] **Step 6: Lint**
-
-```bash
-ruff check src/paperless_ai/vector_store.py src/paperless_ai/tests/test_vector_store.py
-ruff format src/paperless_ai/vector_store.py src/paperless_ai/tests/test_vector_store.py
-```
-
- [ ] **Step 7: Commit**
-
-```bash
-git add src/paperless_ai/vector_store.py src/paperless_ai/tests/test_vector_store.py
-git commit -m "feat(ai): implement apply_structural_migrations for in-place schema changes"
-```
-
---
-
-## Task 4: Wire migrations into `update_llm_index`
-
-**Files:**
-
- Modify: `src/paperless_ai/indexing.py`
- Test: `src/paperless_ai/tests/test_ai_indexing.py`
-
- [ ] **Step 1: Write the failing tests**
-
-Add these two tests to `test_ai_indexing.py`, after the existing `test_update_llm_index_rebuilds_on_model_name_change` test:
-
-```python
-@pytest.mark.django_db
-def test_update_llm_index_applies_structural_migration_without_rebuild(
-    temp_llm_index_dir: Path,
-    real_document: Document,
-    mock_embed_model: FakeEmbedding,
-    mocker: pytest_mock.MockerFixture,
-) -> None:
-    """Structural migrations are applied in-place; no full rebuild (drop) occurs."""
-    from paperless_ai.vector_store import Migration, PaperlessLanceVectorStore
-
-    column_added: list[bool] = []
-
-    def _add_extra(table) -> None:
-        table.add_columns({"extra": "CAST(NULL AS VARCHAR)"})
-        column_added.append(True)
-
-    # Build the initial index at version 1 (the real CURRENT_SCHEMA_VERSION; no patches needed).
-    with patch("documents.models.Document.objects.all") as mock_all:
-        mock_queryset = MagicMock()
-        mock_queryset.exists.return_value = True
-        mock_queryset.__iter__.return_value = iter([real_document])
-        mock_all.return_value = mock_queryset
-        indexing.update_llm_index(rebuild=True)
-
-    # Simulate a new v2 structural migration being introduced after the initial index was built.
-    m2 = Migration(version=2, description="add extra col", requires_reembed=False, apply=_add_extra)
-    mocker.patch("paperless_ai.vector_store.MIGRATIONS", [m2])
-    mocker.patch("paperless_ai.vector_store.CURRENT_SCHEMA_VERSION", 2)
-    drop_spy = mocker.spy(PaperlessLanceVectorStore, "drop_table")
-
-    with patch("documents.models.Document.objects.all") as mock_all:
-        mock_queryset = MagicMock()
-        mock_queryset.exists.return_value = True
-        mock_queryset.__iter__.return_value = iter([real_document])
-        mock_all.return_value = mock_queryset
-        indexing.update_llm_index(rebuild=False)
-
-    assert column_added, "Structural migration apply() was not called"
-    drop_spy.assert_not_called()
-
-
-@pytest.mark.django_db
-def test_update_llm_index_forces_rebuild_on_reembed_migration(
-    temp_llm_index_dir: Path,
-    real_document: Document,
-    mock_embed_model: FakeEmbedding,
-    mocker: pytest_mock.MockerFixture,
-) -> None:
-    """A pending reembed migration causes a full drop+rebuild on next update."""
-    from paperless_ai.vector_store import Migration, PaperlessLanceVectorStore
-
-    # Build the initial index at version 1 (the real CURRENT_SCHEMA_VERSION; no patches needed).
-    with patch("documents.models.Document.objects.all") as mock_all:
-        mock_queryset = MagicMock()
-        mock_queryset.exists.return_value = True
-        mock_queryset.__iter__.return_value = iter([real_document])
-        mock_all.return_value = mock_queryset
-        indexing.update_llm_index(rebuild=True)
-
-    # Simulate a reembed migration at v2 being introduced after the initial index was built.
-    m2 = Migration(version=2, description="requires reembed", requires_reembed=True, apply=lambda t: None)
-    mocker.patch("paperless_ai.vector_store.MIGRATIONS", [m2])
-    mocker.patch("paperless_ai.vector_store.CURRENT_SCHEMA_VERSION", 2)
-    drop_spy = mocker.spy(PaperlessLanceVectorStore, "drop_table")
-
-    with patch("documents.models.Document.objects.all") as mock_all:
-        mock_queryset = MagicMock()
-        mock_queryset.exists.return_value = True
-        mock_queryset.__iter__.return_value = iter([real_document])
-        mock_all.return_value = mock_queryset
-        indexing.update_llm_index(rebuild=False)
-
-    drop_spy.assert_called()
-```
-
- [ ] **Step 2: Run tests to verify they fail**
-
-```bash
-bash /c/Users/tholmes/Documents/Coding/paperless/vmtest.sh "src/paperless_ai/tests/test_ai_indexing.py::test_update_llm_index_applies_structural_migration_without_rebuild src/paperless_ai/tests/test_ai_indexing.py::test_update_llm_index_forces_rebuild_on_reembed_migration -v"
-```
-
-Expected: both tests fail because `update_llm_index` doesn't call migration methods yet.
-
- [ ] **Step 3: Add migration check inside `update_llm_index` in `indexing.py`**
-
-Inside the `with write_store(embed_model_name=model_name) as store:` block in `update_llm_index`, insert the migration check immediately before the `if rebuild or not store.table_exists():` line:
-
-```python
-        if not rebuild and store.table_exists():
-            store.apply_structural_migrations()
-            if store.requires_reembed_migration():
-                logger.warning("Schema migration requires re-embedding; forcing LLM index rebuild.")
-                rebuild = True
-```
-
-The relevant section of `update_llm_index` should now look like:
-
-```python
-    with write_store(embed_model_name=model_name) as store:
-        if not rebuild and store.table_exists():
-            store.apply_structural_migrations()
-            if store.requires_reembed_migration():
-                logger.warning("Schema migration requires re-embedding; forcing LLM index rebuild.")
-                rebuild = True
-        if rebuild or not store.table_exists():
-            (settings.LLM_INDEX_DIR / "meta.json").unlink(missing_ok=True)
-            logger.info("Rebuilding LLM index.")
-            store.drop_table()
-            ...
-```
-
- [ ] **Step 4: Run new tests to verify they pass**
-
-```bash
-bash /c/Users/tholmes/Documents/Coding/paperless/vmtest.sh "src/paperless_ai/tests/test_ai_indexing.py::test_update_llm_index_applies_structural_migration_without_rebuild src/paperless_ai/tests/test_ai_indexing.py::test_update_llm_index_forces_rebuild_on_reembed_migration -v"
-```
-
-Expected: both tests pass.
-
- [ ] **Step 5: Full indexing regression check**
-
-```bash
-bash /c/Users/tholmes/Documents/Coding/paperless/vmtest.sh "src/paperless_ai/tests/test_ai_indexing.py -v"
-```
-
-Expected: all existing tests still pass.
-
- [ ] **Step 6: Full AI module test run**
-
-```bash
-bash /c/Users/tholmes/Documents/Coding/paperless/vmtest.sh "src/paperless_ai/tests/ -v"
-```
-
-Expected: all tests pass.
-
- [ ] **Step 7: Lint**
-
-```bash
-ruff check src/paperless_ai/indexing.py src/paperless_ai/tests/test_ai_indexing.py
-ruff format src/paperless_ai/indexing.py src/paperless_ai/tests/test_ai_indexing.py
-```
-
- [ ] **Step 8: Commit**
-
-```bash
-git add src/paperless_ai/indexing.py src/paperless_ai/tests/test_ai_indexing.py
-git commit -m "feat(ai): wire schema migrations into update_llm_index; structural changes avoid re-embed"
-```
-
---
-
-## How to add a migration (reference for future developers)
-
-When a future schema change is needed:
-
-1. Increment `CURRENT_SCHEMA_VERSION` in `vector_store.py`.
-2. Append a `Migration` to `MIGRATIONS` with the new version number.
-3. If the change is **structural only** (add/rename/drop a column, no embedding content changed):
-   - Set `requires_reembed=False`
-   - In `apply`, call `table.add_columns({"col": "CAST(NULL AS string)"})`, `table.drop_columns(["col"])`, or `table.alter_columns({"path": "col", "rename": "new_name"})` as appropriate.
-4. If the change affects **what text gets embedded** (new fields in `build_llm_index_text`, chunk size change baked into schema, etc.):
-   - Set `requires_reembed=True`
-   - `apply` can be a no-op (`lambda t: None`) — the framework will trigger a full rebuild.
-5. Write tests for the migration in `test_vector_store.py` following the `TestApplyStructuralMigrations` patterns.
-
-Example structural migration adding a `language` column:
-
-```python
-CURRENT_SCHEMA_VERSION: int = 2
-
-MIGRATIONS: list[Migration] = [
-    Migration(
-        version=2,
-        description="Add language column for future locale-aware filtering",
-        requires_reembed=False,
-        apply=lambda table: table.add_columns({"language": "CAST(NULL AS string)"}),
-    ),
-]
-```
@@ -1,446 +0,0 @@
-# Node Metadata Enrichment Implementation Plan
-
-> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
-
-**Goal:** Move `filename`, `storage_path`, and `archive_serial_number` from the LanceDB embedding text into `node.metadata`, and register a schema migration that triggers an automatic index rebuild on upgrade.
-
-**Architecture:** Three small, independent changes to two source files, tested first. The migration is a no-op `apply` (the rebuild regenerates all nodes with correct metadata). All three tests go red first, then each implementation makes them green.
-
-**Tech Stack:** pytest, pytest-django, pytest-mock, factory_boy, llama_index `MetadataMode`, `feature-lancedb-schema-migrate` branch (must be the base branch for this work).
-
-**Branch base:** `feature-lancedb-schema-migrate`
-
---
-
-### Task 1: Fail — embedding text no longer contains the three fields
-
-**Files:**
-
- Modify: `src/paperless_ai/tests/test_embedding.py`
-
- [ ] **Step 1: Update `mock_document` fixture to set an explicit `storage_path`**
-
-  The fixture currently doesn't set `storage_path`, so the existing code path (`doc.storage_path.name if doc.storage_path else ''`) would call `.name` on a `MagicMock`. Give it an explicit value so assertions are unambiguous.
-
-  Add these two lines to the `mock_document` fixture after `doc.archive_serial_number = "12345"`:
-
-  ```python
-  doc.storage_path = MagicMock()
-  doc.storage_path.name = "Finance/Bills"
-  ```
-
- [ ] **Step 2: Update `test_build_llm_index_text` — flip and add assertions**
-
-  The existing test asserts these fields ARE in the result. Change them to assert they are NOT, and add the two missing ones:
-
-  ```python
-  # was: assert "Filename: test_file.pdf" in result
-  assert "Filename: test_file.pdf" not in result
-  assert "Storage Path: Finance/Bills" not in result
-  assert "Archive Serial Number: 12345" not in result
-  ```
-
-  The assertions for `Notes`, `Content`, and `Custom Field` lines are unchanged — leave them as-is.
-
- [ ] **Step 3: Run the test to confirm it fails**
-
-  ```
-  bash /c/Users/tholmes/Documents/Coding/paperless/vmtest.sh "src/paperless_ai/tests/test_embedding.py::test_build_llm_index_text -v"
-  ```
-
-  Expected: `FAILED` — `AssertionError: assert 'Filename: test_file.pdf' not in '...'`
-
---
-
-### Task 2: Pass — remove the three fields from `build_llm_index_text`
-
-**Files:**
-
- Modify: `src/paperless_ai/embedding.py`
-
- [ ] **Step 1: Remove the three lines and the TODO comment**
-
-  Current `build_llm_index_text` (lines 114–133). Replace the function body:
-
-  ```python
-  def build_llm_index_text(doc: Document) -> str:
-      lines = [
-          f"Notes: {','.join([str(c.note) for c in Note.objects.filter(document=doc)])}",
-      ]
-
-      for instance in doc.custom_fields.all():
-          lines.append(f"Custom Field - {instance.field.name}: {instance}")
-
-      lines.append("\nContent:\n")
-      lines.append(doc.content or "")
-
-      return _normalize_llm_index_text("\n".join(lines))
-  ```
-
- [ ] **Step 2: Run the test to confirm it passes**
-
-  ```
-  bash /c/Users/tholmes/Documents/Coding/paperless/vmtest.sh "src/paperless_ai/tests/test_embedding.py::test_build_llm_index_text -v"
-  ```
-
-  Expected: `PASSED`
-
- [ ] **Step 3: Run the full embedding test module to catch regressions**
-
-  ```
-  bash /c/Users/tholmes/Documents/Coding/paperless/vmtest.sh "src/paperless_ai/tests/test_embedding.py -v"
-  ```
-
-  Expected: all green.
-
- [ ] **Step 4: Commit**
-
-  ```bash
-  git add src/paperless_ai/embedding.py src/paperless_ai/tests/test_embedding.py
-  git commit -m "refactor(ai): remove filename/storage_path/asn from embedding text"
-  ```
-
---
-
-### Task 3: Fail — `build_document_node` exposes the three fields in metadata
-
-**Files:**
-
- Modify: `src/paperless_ai/tests/test_ai_indexing.py`
-
- [ ] **Step 1: Extend `test_build_document_node_structured_fields_in_metadata`**
-
-  This test already checks for `title`, `tags`, etc. Add the three new keys. The `real_document` fixture creates a document with no storage path set, so `storage_path` will be `None` — the key must still be present.
-
-  Replace the existing test body:
-
-  ```python
-  @pytest.mark.django_db
-  def test_build_document_node_structured_fields_in_metadata(
-      real_document: Document,
-  ) -> None:
-      """Structured fields must be in node.metadata so the LLM receives them via metadata prepend."""
-      nodes = indexing.build_document_node(real_document)
-      assert len(nodes) > 0
-      for node in nodes:
-          assert "title" in node.metadata
-          assert "tags" in node.metadata
-          assert "correspondent" in node.metadata
-          assert "document_type" in node.metadata
-          assert "created" in node.metadata
-          assert "added" in node.metadata
-          assert "modified" in node.metadata
-          assert "filename" in node.metadata
-          assert "storage_path" in node.metadata        # None is fine; key must exist
-          assert "archive_serial_number" in node.metadata
-  ```
-
- [ ] **Step 2: Add a test that storage_path carries the name when set**
-
-  Add a new test function after `test_build_document_node_structured_fields_in_metadata`:
-
-  ```python
-  @pytest.mark.django_db
-  def test_build_document_node_storage_path_name_in_metadata() -> None:
-      """storage_path metadata value is the StoragePath name, not None, when set."""
-      from documents.tests.factories import DocumentFactory, StoragePathFactory
-
-      sp = StoragePathFactory(name="Finance/Bills")
-      doc = DocumentFactory(storage_path=sp)
-
-      nodes = indexing.build_document_node(doc)
-
-      assert len(nodes) > 0
-      for node in nodes:
-          assert node.metadata["storage_path"] == "Finance/Bills"
-  ```
-
- [ ] **Step 3: Add a test that all three new fields are in `excluded_embed_metadata_keys`**
-
-  Add after the previous test:
-
-  ```python
-  @pytest.mark.django_db
-  def test_build_document_node_new_fields_excluded_from_embedding(
-      real_document: Document,
-  ) -> None:
-      """filename, storage_path, and archive_serial_number must not appear in embedding text."""
-      from llama_index.core.schema import MetadataMode
-
-      nodes = indexing.build_document_node(real_document)
-      assert len(nodes) > 0
-      for node in nodes:
-          assert "filename" in node.excluded_embed_metadata_keys
-          assert "storage_path" in node.excluded_embed_metadata_keys
-          assert "archive_serial_number" in node.excluded_embed_metadata_keys
-          embed_text = node.get_content(metadata_mode=MetadataMode.EMBED)
-          assert "filename" not in embed_text
-          assert "storage_path" not in embed_text
-          assert "archive_serial_number" not in embed_text
-  ```
-
- [ ] **Step 4: Run the new tests to confirm they fail**
-
-  ```
-  bash /c/Users/tholmes/Documents/Coding/paperless/vmtest.sh "src/paperless_ai/tests/test_ai_indexing.py::test_build_document_node_structured_fields_in_metadata src/paperless_ai/tests/test_ai_indexing.py::test_build_document_node_storage_path_name_in_metadata src/paperless_ai/tests/test_ai_indexing.py::test_build_document_node_new_fields_excluded_from_embedding -v"
-  ```
-
-  Expected: all `FAILED` — keys not yet in `node.metadata`.
-
---
-
-### Task 4: Pass — add the three fields to `build_document_node`
-
-**Files:**
-
- Modify: `src/paperless_ai/indexing.py`
-
- [ ] **Step 1: Update the `metadata` dict in `build_document_node`**
-
-  Current metadata dict starts at line 106. Replace it:
-
-  ```python
-  metadata = {
-      "document_id": str(document.id),
-      "title": document.title,
-      "filename": document.filename or "",
-      "storage_path": document.storage_path.name if document.storage_path else None,
-      "archive_serial_number": document.archive_serial_number,
-      "tags": [t.name for t in document.tags.all()],
-      "correspondent": document.correspondent.name
-      if document.correspondent
-      else None,
-      "document_type": document.document_type.name
-      if document.document_type
-      else None,
-      "created": document.created.isoformat() if document.created else None,
-      "added": document.added.isoformat() if document.added else None,
-      "modified": document.modified.isoformat(),
-  }
-  ```
-
- [ ] **Step 2: Update `excluded_embed_metadata_keys`**
-
-  The `LlamaDocument(...)` call currently has:
-
-  ```python
-  excluded_embed_metadata_keys=list(metadata.keys()),
-  ```
-
-  This already excludes all keys, so no change needed here — the new keys are automatically included since they're in the dict. Verify `excluded_llm_metadata_keys` still only excludes `"document_id"`:
-
-  ```python
-  excluded_llm_metadata_keys=["document_id"],
-  ```
-
-  No change needed.
-
- [ ] **Step 3: Run the failing tests to confirm they pass**
-
-  ```
-  bash /c/Users/tholmes/Documents/Coding/paperless/vmtest.sh "src/paperless_ai/tests/test_ai_indexing.py::test_build_document_node_structured_fields_in_metadata src/paperless_ai/tests/test_ai_indexing.py::test_build_document_node_storage_path_name_in_metadata src/paperless_ai/tests/test_ai_indexing.py::test_build_document_node_new_fields_excluded_from_embedding -v"
-  ```
-
-  Expected: all `PASSED`.
-
- [ ] **Step 4: Run the full indexing test module**
-
-  ```
-  bash /c/Users/tholmes/Documents/Coding/paperless/vmtest.sh "src/paperless_ai/tests/test_ai_indexing.py -v"
-  ```
-
-  Expected: all green.
-
- [ ] **Step 5: Commit**
-
-  ```bash
-  git add src/paperless_ai/indexing.py src/paperless_ai/tests/test_ai_indexing.py
-  git commit -m "feat(ai): add filename/storage_path/asn to node metadata"
-  ```
-
---
-
-### Task 5: Fail — migration v2 is registered
-
-**Files:**
-
- Modify: `src/paperless_ai/tests/test_vector_store.py`
-
-These tests use the real (non-mocked) `MIGRATIONS` list, so they go red until the migration is registered in Task 6.
-
- [ ] **Step 1: Add a `TestMetadataEnrichmentMigration` class**
-
-  Add this class near the end of `test_vector_store.py`, before the final `TestApplyStructuralMigrations`:
-
-  ```python
-  class TestMetadataEnrichmentMigration:
-      def test_current_schema_version_is_2(self) -> None:
-          from paperless_ai.vector_store import CURRENT_SCHEMA_VERSION
-          assert CURRENT_SCHEMA_VERSION == 2
-
-      def test_migration_v2_registered(self) -> None:
-          from paperless_ai.vector_store import MIGRATIONS
-          assert len(MIGRATIONS) == 1
-          assert MIGRATIONS[0].version == 2
-          assert MIGRATIONS[0].requires_reembed is True
-
-      def test_store_at_v1_requires_reembed(self, uri: str) -> None:
-          store = _store_at_version(uri, 1)
-          assert store.requires_reembed_migration() is True
-
-      def test_store_at_v2_no_pending_migrations(self, uri: str) -> None:
-          store = _store_at_version(uri, 2)
-          assert store.pending_migrations() == []
-          assert store.requires_reembed_migration() is False
-  ```
-
- [ ] **Step 2: Run the tests to confirm they fail**
-
-  ```
-  bash /c/Users/tholmes/Documents/Coding/paperless/vmtest.sh "src/paperless_ai/tests/test_vector_store.py::TestMetadataEnrichmentMigration -v"
-  ```
-
-  Expected: all `FAILED` — `CURRENT_SCHEMA_VERSION` is still 1 and `MIGRATIONS` is still empty.
-
---
-
-### Task 6: Pass — register migration v2 in `vector_store.py`
-
-**Files:**
-
- Modify: `src/paperless_ai/vector_store.py`
-
- [ ] **Step 1: Add the migration and bump the version constant**
-
-  On the `feature-lancedb-schema-migrate` branch, `vector_store.py` has:
-
-  ```python
-  CURRENT_SCHEMA_VERSION: Final[int] = 1
-  ...
-  MIGRATIONS: list[Migration] = []
-  ```
-
-  Change both:
-
-  ```python
-  CURRENT_SCHEMA_VERSION: Final[int] = 2
-
-  MIGRATIONS: list[Migration] = [
-      Migration(
-          version=2,
-          description="move filename/storage_path/asn from embedding text to metadata; rebuild required",
-          requires_reembed=True,
-          apply=lambda table: None,
-      ),
-  ]
-  ```
-
- [ ] **Step 2: Run the migration tests to confirm they pass**
-
-  ```
-  bash /c/Users/tholmes/Documents/Coding/paperless/vmtest.sh "src/paperless_ai/tests/test_vector_store.py::TestMetadataEnrichmentMigration -v"
-  ```
-
-  Expected: all `PASSED`.
-
- [ ] **Step 3: Run the full vector store test module**
-
-  ```
-  bash /c/Users/tholmes/Documents/Coding/paperless/vmtest.sh "src/paperless_ai/tests/test_vector_store.py -v"
-  ```
-
-  Expected: all green. In particular, `TestSchemaVersioning::test_stored_schema_version_persists_after_reopen` and the `TestMigrationRegistry` tests should still pass — they use `CURRENT_SCHEMA_VERSION` as the baseline.
-
---
-
-### Task 7: Integration — `update_llm_index` rebuilds when schema version is stale
-
-**Files:**
-
- Modify: `src/paperless_ai/tests/test_ai_indexing.py`
-
- [ ] **Step 1: Write the failing integration test**
-
-  Add this test near `test_update_llm_index_rebuilds_on_model_name_change`:
-
-  ```python
-  @pytest.mark.django_db
-  def test_update_llm_index_rebuilds_on_pending_reembed_migration(
-      temp_llm_index_dir: Path,
-      real_document: Document,
-      mock_embed_model: FakeEmbedding,
-  ) -> None:
-      """A stale schema version (v1) must trigger a full rebuild on the next index run."""
-      from paperless_ai.vector_store import PaperlessLanceVectorStore
-
-      # Build an initial index and then rewind the schema version to 1 to simulate
-      # an index created before migration v2 was registered.
-      indexing.update_llm_index(rebuild=True)
-      store = indexing.get_vector_store()
-      store._write_schema_version(1)
-
-      # An incremental run (rebuild=False) must detect the stale version and rebuild.
-      with patch("documents.models.Document.objects.all") as mock_all:
-          mock_queryset = MagicMock()
-          mock_queryset.exists.return_value = True
-          mock_queryset.__iter__.return_value = iter([real_document])
-          mock_all.return_value = mock_queryset
-          indexing.update_llm_index(rebuild=False)
-
-      # After rebuild the schema version must be current.
-      reopened = PaperlessLanceVectorStore(uri=str(temp_llm_index_dir))
-      assert reopened.stored_schema_version() == 2
-  ```
-
- [ ] **Step 2: Run the test to confirm it fails**
-
-  ```
-  bash /c/Users/tholmes/Documents/Coding/paperless/vmtest.sh "src/paperless_ai/tests/test_ai_indexing.py::test_update_llm_index_rebuilds_on_pending_reembed_migration -v"
-  ```
-
-  Expected: `FAILED` — schema version stays at 1 because migration v2 isn't registered yet.
-
-  _(If it passes already because `update_llm_index` detects a different condition, verify the assertion is actually exercising the migration path and not the model-name path.)_
-
- [ ] **Step 3: Run the test again now that migration v2 is registered (Task 6)**
-
-  ```
-  bash /c/Users/tholmes/Documents/Coding/paperless/vmtest.sh "src/paperless_ai/tests/test_ai_indexing.py::test_update_llm_index_rebuilds_on_pending_reembed_migration -v"
-  ```
-
-  Expected: `PASSED`.
-
- [ ] **Step 4: Run the full indexing test module**
-
-  ```
-  bash /c/Users/tholmes/Documents/Coding/paperless/vmtest.sh "src/paperless_ai/tests/test_ai_indexing.py -v"
-  ```
-
-  Expected: all green.
-
- [ ] **Step 5: Final commit**
-
-  ```bash
-  git add src/paperless_ai/vector_store.py src/paperless_ai/tests/test_vector_store.py src/paperless_ai/tests/test_ai_indexing.py
-  git commit -m "feat(ai): register schema migration v2; triggers rebuild for metadata enrichment"
-  ```
-
---
-
-## Self-review checklist
-
-**Spec coverage:**
-
- ✅ `build_llm_index_text` — three lines removed (Tasks 1–2)
- ✅ `build_document_node` — three fields added to metadata + excluded_embed_metadata_keys (Tasks 3–4)
- ✅ Migration v2 registered with `requires_reembed=True` and no-op apply (Tasks 5–6)
- ✅ `update_llm_index` triggers rebuild on stale schema (Task 7)
- ✅ Tests: `test_embedding.py`, `test_ai_indexing.py`, `test_vector_store.py`
-
-**Placeholder scan:** None found. Every step has exact code or exact commands.
-
-**Type consistency:**
-
- `metadata` dict key names (`"filename"`, `"storage_path"`, `"archive_serial_number"`) used consistently across Tasks 1–4.
- `CURRENT_SCHEMA_VERSION = 2` and `MIGRATIONS[0].version == 2` are consistent across Tasks 5–6.
- `_store_at_version` and `_node` helpers referenced in Task 5 are defined in the existing `test_vector_store.py` on the `feature-lancedb-schema-migrate` branch.