A modern bookmark manager focused on intelligent organization, fast retrieval, and contextual discovery. Designed to keep knowledge structured, accessible, and alive.
  • Python 63.1%
  • Go 29.2%
  • Shell 7.7%
Find a file
2026-03-06 17:14:44 +01:00
db Migrate Qwen test flow to MLX and reorganize scripts 2026-03-06 17:14:44 +01:00
deploy chore: consolidar mejoras de autotag y tooling 2026-03-02 10:33:39 +01:00
docs Migrate Qwen test flow to MLX and reorganize scripts 2026-03-06 17:14:44 +01:00
go Checkpoint: include all current workspace changes 2026-02-27 18:01:27 +01:00
python Migrate Qwen test flow to MLX and reorganize scripts 2026-03-06 17:14:44 +01:00
scripts Migrate Qwen test flow to MLX and reorganize scripts 2026-03-06 17:14:44 +01:00
searxng chore: consolidar mejoras de autotag y tooling 2026-03-02 10:33:39 +01:00
.gitignore Add root .cache to gitignore 2026-02-27 18:03:40 +01:00
AGENTS.md Build MVP backend stack with Postgres, ML embeddings, SSE, search, and docs 2026-02-25 16:46:49 +01:00
README.md Add robust URL enrichment pipeline and quarantine tooling 2026-02-26 13:08:34 +01:00

Bookmarks Manager

A multi-user bookmarks manager focused on fast import, async enrichment, and hybrid search foundations.

Current MVP includes:

  • Go API (go/cmd/api)
  • Go worker (go/cmd/worker)
  • PostgreSQL + pgvector + pg_trgm
  • Python ML service (python/ml_service) for embeddings
  • Raindrop CSV import with idempotent upserts
  • Admin/dev endpoints for reset, job monitoring, and SSE events

Architecture (High Level)

  • API:
    • HTTP endpoints (/health, import, admin endpoints)
    • writes bookmarks/tags/jobs in Postgres
  • Worker:
    • consumes queued jobs with FOR UPDATE SKIP LOCKED
    • generates embeddings via ML service
    • updates bookmark status (imported -> ready)
  • ML service:
    • FastAPI + uv
    • keeps embedding model loaded in memory (CPU-first)
    • exposes /health and /embed
  • PostgreSQL:
    • relational data + queue table
    • FTS (tsvector + GIN) and vector index (HNSW with pgvector)

Local Quickstart (Docker Compose)

  1. Choose env file
  • default: deploy/env/.env.dev
  • copy it if you need a variant (example: deploy/env/.env.local)
  1. Start stack
cd deploy
docker compose --env-file ./env/.env.dev up -d
  1. Common commands
# stop (keep data)
docker compose stop

# start existing containers
docker compose start

# rebuild app images/config changes
docker compose up -d --build

# stop + remove containers and network (keep named volumes)
docker compose down

# stop + remove everything including DB volume
docker compose down -v

# follow logs
docker compose logs -f api
docker compose logs -f worker
docker compose logs -f ml

Import Raindrop CSV

curl -X POST \
  -F "file=@/absolute/path/to/export.csv" \
  http://127.0.0.1:8080/integrations/raindrop/import

Typical response:

{
  "ok": true,
  "total": 377,
  "imported": 375,
  "enqueued": 375,
  "duplicated": 2,
  "skipped": 0
}

Dev Admin Endpoints

Admin endpoints are enabled only when APP_ENV=development.

  • Reset imported data:
curl -X POST http://127.0.0.1:8080/admin/reset-ddbb
  • List jobs:
curl "http://127.0.0.1:8080/admin/jobs?status=queued&type=enrich_bookmark&limit=20"
  • Jobs summary:
curl "http://127.0.0.1:8080/admin/jobs/summary"
  • Seed curated tags (supports slug, label, description, optional keywords[]):
curl -X POST http://127.0.0.1:8080/admin/tags/seed
  • Rebuild tag embeddings (required before autotagging):
curl -X POST http://127.0.0.1:8080/admin/tags/rebuild-embeddings
  • Backfill enrich jobs for bookmarks with missing/failed enrichment:
curl -X POST http://127.0.0.1:8080/admin/enrich/backfill
  • Quarantine report for bookmarks without auto-tags (grouped by enrich status/http status):
curl "http://127.0.0.1:8080/admin/enrich/quarantine?limit=20"
  • Backfill auto-tag jobs for ready bookmarks without auto tags:
curl -X POST http://127.0.0.1:8080/admin/autotag/backfill
  • Clear only auto tags (keeps import/manual tags untouched):
curl -X POST http://127.0.0.1:8080/admin/autotag/clear
  • Inspect ready bookmarks still missing auto tags:
curl "http://127.0.0.1:8080/admin/autotag/missing?limit=20"

API Documentation

  • OpenAPI spec: GET /openapi.yaml
  • Swagger UI: GET /docs

Examples:

curl http://127.0.0.1:8080/openapi.yaml
open http://127.0.0.1:8080/docs

SSE Events

Current SSE endpoint is dev-admin scoped:

  • GET /admin/events

Example frontend subscription:

<script>
  const es = new EventSource('http://127.0.0.1:8080/admin/events');

  es.addEventListener('hello', (ev) => console.log('hello', JSON.parse(ev.data)));
  es.addEventListener('job.updated', (ev) => console.log('job', JSON.parse(ev.data)));
  es.addEventListener('bookmark.updated', (ev) => console.log('bookmark', JSON.parse(ev.data)));

  es.onerror = (err) => console.error('sse error', err);
</script>

Note: a public /events endpoint can be added later when auth/roles are in place.

Auto-Tagging Notes

  • Worker pipeline is: enrich_bookmark (URL fetch + readable extraction + bookmark embedding) -> autotag_bookmark.
  • Enrichment stores debug/quarantine fields on bookmarks:
    • enrich_status (ok|not_found|blocked|timeout|not_html|parse_failed|error)
    • enrich_http_status, enrich_error, enrich_final_url, enrich_canonical_url
    • content_text, content_text_len, enriched_at
  • If enrichment fails, bookmark embedding/autotag still runs with title+excerpt fallback.
  • Auto-tag thresholds are configurable:
    • AUTOTAG_TOPK (default 30)
    • AUTOTAG_MAX_TAGS (default 8)
    • AUTOTAG_MIN_SIM (default 0.53)
    • AUTOTAG_USE_IMPORT_TAGS_IN_QUERY (default 1 / true; appends import tags to query text for autotag nearest-tag retrieval only)

Recommended dev flow (from clean import to quality audit):

  1. POST /admin/tags/seed
  2. POST /admin/tags/rebuild-embeddings
  3. POST /admin/enrich/backfill
  4. Wait for jobs (GET /admin/jobs/summary)
  5. POST /admin/autotag/clear (optional for controlled rerun)
  6. POST /admin/autotag/backfill
  7. GET /admin/enrich/quarantine + scripts/audit-autotag.sh

Re-evaluate threshold safely in development:

  1. POST /admin/autotag/clear
  2. POST /admin/autotag/backfill
  3. Inspect coverage (/admin/jobs/summary + SQL query below)
  4. GET /admin/autotag/missing for manual spot checks

Verify auto tags in Postgres:

SELECT b.id, LEFT(COALESCE(b.title,''), 60) AS title, t.slug, bt.score
FROM bookmark_tags bt
JOIN bookmarks b ON b.id = bt.bookmark_id
JOIN tags t ON t.id = bt.tag_id
WHERE bt.source = 'auto'
ORDER BY bt.created_at DESC
LIMIT 30;

More Docs