Snippets Groups Projects

Something went wrong on our end

4 months ago
01ca071b

Update mkdocs instructions · 01ca071b
Harry Minsky authored 4 months ago

Unverified
01ca071b

History

Update mkdocs instructions
Harry Minsky authored 4 months ago

README.md 4.32 KiB

LLM API

This is the API gateway to OTS's philanthropic LLM services. It sits between frontend interfaces (like Torque or Hypha) and gives those front ends access to LLMs. The API performs its work either by talking directly to LLM services or using further gateways (e.g. our llm-infrastructure repository).

OpenAI's API has become the de facto way to talk to LLMs, so we are, for the moment, prototyping using OpenAPI as our backend. We do this knowing we will be able to adapt the results for use with other LLMs as we progress.

Documentation for this repository is in this README and throughout the code and the files that make up the project. The best starting point is the mkdocs documentation, which you can see by running mkdocs serve. Please refer to ./requirements.txt for the necessary mkdocs packages to install.

If you do make run, you will get a live instance that has two documentation endpoints:

http://localhost:8889/mkdocs, full text documentation
http://localhost:8889/docs, interactive, API-specific documentation

If you do not use make run, you might get an error about the site directory not existing. You can create that directory and its contents with make docs.

Run Dev mode

Requires make and python3.11+

Our API is build around a FastAPI server. There are multiple ways ways to run LLM backends for FastAPI. Configuration is done via environment variables, which can be set in .env.local or specified in the usual way at the command line.

The easiest way to run is to just use OpenAPI, which requires an account and API key at OpenAI.com. You can set this API key in .env.local, which is pulled in by our Makefile when you make run. It is the field called OPENAI_API_KEY.
Alternatively, you can use a different LLM, run on your own infrastructure (e.g. your laptop). To do that, you still need to set OPENAI_KEY, but you can set it to any random value. You will want to set OPENAI_API_BASE to the base URL of the server listening for OpenAI API calls. Ex: OPENAI_API_BASE=http://localhost:8080/v1
Soon, you will be able to run the stack in our llm-infrastructure repo as a third way to provide an LLM backend to our LLM API.

For more information about all the environment variables, please see .env.example. Once you have set OPENAI_API_BASE and OPENAI_API_KEY, you can run FastAPI.

create an .env.local based on .env.example
run make run -- this will run api/server.py after creating .venv and installing required modules
use a browser to see the generated API docs at http://0.0.0.0:8889/docs
test /filterset function call

Alternatively, you can run entirely from the commandline:

OPENAI_API_KEY="sk-bob...lob...law" .venv/bin/python ./api/filterset.py

Hermes 2

As noted above, you can use Hermes 2 instead of OpenAI

Make sure Hermes is running. For simplicity, we'll use Docker:

docker run -ti -p 8080:8080 localai/localai:v2.11.0-ffmpeg-core hermes-2-pro-mistral

Set OPENAI_MODEL in .env.local or in your environment to "hermes-2-pro-mistral"
Do make run and visit http://localhost:8889/docs in your browser, as above.

Deployment

Requires make and ansible; and an Ubuntu instance in the inventory. This was tested with Ubuntu LTS on Digital Ocean. On Digital Ocean set DNS name for API and Traefik endpoints, provide them in the .env.* too.

Run the Ansible playbook with make run-playbook, note that requires .env.production file (see .env.example), it will deploy to IP_ADDRESS server.

Gitlab CI deployment

Requires DEPLOY_TARGET_IP_ADDRESS and SSH_PRIVATE_KEY for the destination server as secrets set in GitLab, follow these SSH steps, along with any other env settings from .env.production as that file is not available in CI.

Tracing LLM requests

You can enable Langchain's Langsmith or/and Langfuse tracing by using the corresponding keys from .env.example (If you don't define LANGFUSE_SECRET_KEY then the Langfuse tracing is disabled, wrong LANGFUSE_SECRET_KEY will give you errors.)

Similarly Sentry can be used to monitor FastAPI and LLM calls, enable it by setting SENTRY_DSN.