A modern bookmark manager focused on intelligent organization, fast retrieval, and contextual discovery. Designed to keep knowledge structured, accessible, and alive.
- Python 63.1%
- Go 29.2%
- Shell 7.7%
| db | ||
| deploy | ||
| docs | ||
| go | ||
| python | ||
| scripts | ||
| searxng | ||
| .gitignore | ||
| AGENTS.md | ||
| README.md | ||
Bookmarks Manager
A multi-user bookmarks manager focused on fast import, async enrichment, and hybrid search foundations.
Current MVP includes:
- Go API (
go/cmd/api) - Go worker (
go/cmd/worker) - PostgreSQL + pgvector + pg_trgm
- Python ML service (
python/ml_service) for embeddings - Raindrop CSV import with idempotent upserts
- Admin/dev endpoints for reset, job monitoring, and SSE events
Architecture (High Level)
- API:
- HTTP endpoints (
/health, import, admin endpoints) - writes bookmarks/tags/jobs in Postgres
- HTTP endpoints (
- Worker:
- consumes queued jobs with
FOR UPDATE SKIP LOCKED - generates embeddings via ML service
- updates bookmark status (
imported -> ready)
- consumes queued jobs with
- ML service:
- FastAPI + uv
- keeps embedding model loaded in memory (CPU-first)
- exposes
/healthand/embed
- PostgreSQL:
- relational data + queue table
- FTS (
tsvector + GIN) and vector index (HNSWwith pgvector)
Local Quickstart (Docker Compose)
- Choose env file
- default:
deploy/env/.env.dev - copy it if you need a variant (example:
deploy/env/.env.local)
- Start stack
cd deploy
docker compose --env-file ./env/.env.dev up -d
- Common commands
# stop (keep data)
docker compose stop
# start existing containers
docker compose start
# rebuild app images/config changes
docker compose up -d --build
# stop + remove containers and network (keep named volumes)
docker compose down
# stop + remove everything including DB volume
docker compose down -v
# follow logs
docker compose logs -f api
docker compose logs -f worker
docker compose logs -f ml
Import Raindrop CSV
curl -X POST \
-F "file=@/absolute/path/to/export.csv" \
http://127.0.0.1:8080/integrations/raindrop/import
Typical response:
{
"ok": true,
"total": 377,
"imported": 375,
"enqueued": 375,
"duplicated": 2,
"skipped": 0
}
Dev Admin Endpoints
Admin endpoints are enabled only when APP_ENV=development.
- Reset imported data:
curl -X POST http://127.0.0.1:8080/admin/reset-ddbb
- List jobs:
curl "http://127.0.0.1:8080/admin/jobs?status=queued&type=enrich_bookmark&limit=20"
- Jobs summary:
curl "http://127.0.0.1:8080/admin/jobs/summary"
- Seed curated tags (supports
slug,label,description, optionalkeywords[]):
curl -X POST http://127.0.0.1:8080/admin/tags/seed
- Rebuild tag embeddings (required before autotagging):
curl -X POST http://127.0.0.1:8080/admin/tags/rebuild-embeddings
- Backfill enrich jobs for bookmarks with missing/failed enrichment:
curl -X POST http://127.0.0.1:8080/admin/enrich/backfill
- Quarantine report for bookmarks without auto-tags (grouped by enrich status/http status):
curl "http://127.0.0.1:8080/admin/enrich/quarantine?limit=20"
- Backfill auto-tag jobs for ready bookmarks without auto tags:
curl -X POST http://127.0.0.1:8080/admin/autotag/backfill
- Clear only auto tags (keeps import/manual tags untouched):
curl -X POST http://127.0.0.1:8080/admin/autotag/clear
- Inspect ready bookmarks still missing auto tags:
curl "http://127.0.0.1:8080/admin/autotag/missing?limit=20"
API Documentation
- OpenAPI spec:
GET /openapi.yaml - Swagger UI:
GET /docs
Examples:
curl http://127.0.0.1:8080/openapi.yaml
open http://127.0.0.1:8080/docs
SSE Events
Current SSE endpoint is dev-admin scoped:
GET /admin/events
Example frontend subscription:
<script>
const es = new EventSource('http://127.0.0.1:8080/admin/events');
es.addEventListener('hello', (ev) => console.log('hello', JSON.parse(ev.data)));
es.addEventListener('job.updated', (ev) => console.log('job', JSON.parse(ev.data)));
es.addEventListener('bookmark.updated', (ev) => console.log('bookmark', JSON.parse(ev.data)));
es.onerror = (err) => console.error('sse error', err);
</script>
Note: a public /events endpoint can be added later when auth/roles are in place.
Auto-Tagging Notes
- Worker pipeline is:
enrich_bookmark(URL fetch + readable extraction + bookmark embedding) ->autotag_bookmark. - Enrichment stores debug/quarantine fields on bookmarks:
enrich_status(ok|not_found|blocked|timeout|not_html|parse_failed|error)enrich_http_status,enrich_error,enrich_final_url,enrich_canonical_urlcontent_text,content_text_len,enriched_at
- If enrichment fails, bookmark embedding/autotag still runs with title+excerpt fallback.
- Auto-tag thresholds are configurable:
AUTOTAG_TOPK(default30)AUTOTAG_MAX_TAGS(default8)AUTOTAG_MIN_SIM(default0.53)AUTOTAG_USE_IMPORT_TAGS_IN_QUERY(default1/ true; appends import tags to query text for autotag nearest-tag retrieval only)
Recommended dev flow (from clean import to quality audit):
POST /admin/tags/seedPOST /admin/tags/rebuild-embeddingsPOST /admin/enrich/backfill- Wait for jobs (
GET /admin/jobs/summary) POST /admin/autotag/clear(optional for controlled rerun)POST /admin/autotag/backfillGET /admin/enrich/quarantine+scripts/audit-autotag.sh
Re-evaluate threshold safely in development:
POST /admin/autotag/clearPOST /admin/autotag/backfill- Inspect coverage (
/admin/jobs/summary+ SQL query below) GET /admin/autotag/missingfor manual spot checks
Verify auto tags in Postgres:
SELECT b.id, LEFT(COALESCE(b.title,''), 60) AS title, t.slug, bt.score
FROM bookmark_tags bt
JOIN bookmarks b ON b.id = bt.bookmark_id
JOIN tags t ON t.id = bt.tag_id
WHERE bt.source = 'auto'
ORDER BY bt.created_at DESC
LIMIT 30;