- Watch local directory for document changes
- use Local directory
- Embedding the document, code
- Markdown Documentation Splitter: TreeSitter-markdown
- Office, like Word Document Splitter: docx-rs
- reader spike: Document File Text Extractor, docx, OOXML,
- Code Splitter: TreeSitter, like Bloop
- Web Scrapper
- Extract text from web page, like: scraper
- Document version control
- Vector Search: InMemory
- Search document by semantic
- Embedding Search engine by tantivy
CREATE: POST /api/embedding-document
{
"name": "README.md",
"uri": "file:///path/to/README.md",
"type": "markdown",
"content": "..."
}
READ: GET /api/embedding-document/:id
{
"id": "xxx-xxxx-xxx",
"uri": "http://localhost:8080/api/embedding-document/xxx-xxxx-xxx",
"name": "README.md",
"content": "...",
"type": "markdown",
"chunks": [
{
"id": "xxx-xxxx-xxx",
"text": "...",
"embedding": "..."
},
{
"id": "xxx-xxxx-xxx",
"text": "...",
"embedding": "..."
}
]
}
SEARCH: GET /api/embedding-document/search?q=...
{
"results": [
{
"id": "xxx-xxxx-xxx",
"name": "README.md",
"uri": "http://localhost:8080/api/embedding-document/xxx-xxxx-xxx",
"content": "...",
"type": "markdown",
"chunks": [
{
"id": "xxx-xxxx-xxx",
"text": "...",
"embedding": "..."
},
{
"id": "xxx-xxxx-xxx",
"text": "...",
"embedding": "..."
}
]
}
]
}
UPDATE: PUT /api/embedding-document/:id
{
"name": "README.md",
"uri": "file:///path/to/README.md",
"content": "..."
}
DELETE: DELETE /api/embedding-document/:id
CREATE: POST /api/web-scrapper
{
"url": "https://www.example.com"
}
returns embedding-document
object
DELETE: DELETE /api/web-scrapper/:id
REFRESH: POST /api/web-scrapper/:id/refresh