Add /embeddings endpoints
This MR adds an endpoint to generate embeddings to support meta#54 (closed) and ots/mediawiki/torque!107 (closed).
This is designed so future versions can incorporate the semantic chunking @gridinoc is experimenting with, in which a set of documents can be further split into smaller documents.
Another possible future enhancement would be to add an option to do truncation and normalization of embeddings which support it, like the Matryoshka-enabled model we're currently using. That's also saved for the future.
The /embeddings
endpoint is designed to be similar to the OpenAI endpoint of the same name, though it currently has fewer options and response fields, and adds an ability to specify the type
of embedding, which used as the prompt name by the SentenceTransformers embedding this does.
Steps to test
export API_KEY=<API KEY from .env.local>
make run
curl -X 'POST' \
'http://localhost:8889/embeddings' \
-H 'accept: application/json' \
-H 'Authorization: Bearer '$API_KEY \
-H 'Content-Type: application/json' \
-d '{
"input": "Hello, world!",
"type": "query"
}'
Expected result:
{"data":[{"embedding":[-0.005389418452978134,0.027321476489305496, ...