Skip to content
Snippets Groups Projects
README.md 4.32 KiB
Newer Older
# LLM API
Daniel Schultz's avatar
Daniel Schultz committed

James Vasile's avatar
James Vasile committed
This is the API gateway to OTS's philanthropic LLM services.  It sits between
frontend interfaces (like Torque or Hypha) and gives those front ends access to
LLMs. The API performs its work either by talking directly to LLM services or
using further gateways (e.g. our llm-infrastructure repository).

OpenAI's API has become the de facto way to talk to LLMs, so we are, for the
moment, prototyping using OpenAPI as our backend.  We do this knowing we will be
able to adapt the results for use with other LLMs as we progress.

James Vasile's avatar
James Vasile committed
Documentation for this repository is in this README and throughout the code and
the files that make up the project.  The best starting point is the `mkdocs`
Harry Minsky's avatar
Harry Minsky committed
documentation, which you can see by running `mkdocs serve`. 
Please refer to `./requirements.txt` for the necessary mkdocs packages to install.
James Vasile's avatar
James Vasile committed

If you do `make run`, you will get a live instance that has two documentation
endpoints:

 * `http://localhost:8889/mkdocs`, full text documentation
 * `http://localhost:8889/docs`, interactive, API-specific documentation

If you do not use `make run`, you might get an error about the `site` directory
not existing.  You can create that directory and its contents with `make docs`.

Laurian Gridinoc's avatar
Laurian Gridinoc committed
## Run Dev mode
Daniel Schultz's avatar
Daniel Schultz committed

Requires `make` and `python3.11+`
Laurian Gridinoc's avatar
Laurian Gridinoc committed

Our API is build around a FastAPI server.  There are multiple ways ways to run
James Vasile's avatar
James Vasile committed
LLM backends for FastAPI.  Configuration is done via environment variables,
which can be set in `.env.local` or specified in the usual way at the command
line.
James Vasile's avatar
James Vasile committed
 * The easiest way to run is to just use OpenAPI, which requires an account and
   API key at `OpenAI.com`.  You can set this API key in `.env.local`, which is
   pulled in by our `Makefile` when you `make run`.  It is the field called
   `OPENAI_API_KEY`.

 * Alternatively, you can use a different LLM, run on your own infrastructure
   (e.g. your laptop).  To do that, you still need to set `OPENAI_KEY`, but you
   can set it to any random value.  You will want to set `OPENAI_API_BASE` to
   the base URL of the server listening for OpenAI API calls. Ex:
   `OPENAI_API_BASE=http://localhost:8080/v1`

 * Soon, you will be able to run the stack in our `llm-infrastructure` repo as a
   third way to provide an LLM backend to our LLM API.

James Vasile's avatar
James Vasile committed
For more information about all the environment variables, please see `.env.example`.  Once you have set `OPENAI_API_BASE` and `OPENAI_API_KEY`, you can run FastAPI.
James Vasile's avatar
James Vasile committed
1. create an `.env.local` based on `.env.example`
2. run `make run` -- this will run `api/server.py` after creating `.venv` and installing required modules
Laurian Gridinoc's avatar
Laurian Gridinoc committed
3. use a browser to see the generated API docs at [http://0.0.0.0:8889/docs](http://0.0.0.0:8889/docs)
4. test `/filterset` function call
Daniel Schultz's avatar
Daniel Schultz committed

Alternatively, you can run entirely from the commandline:
James Vasile's avatar
James Vasile committed
```
OPENAI_API_KEY="sk-bob...lob...law" .venv/bin/python ./api/filterset.py
```
### Hermes 2
As noted above, you can use Hermes 2 instead of OpenAI
 * Make sure Hermes is running.  For simplicity, we'll use Docker:
   docker run -ti -p 8080:8080 localai/localai:v2.11.0-ffmpeg-core hermes-2-pro-mistral
 * Set `OPENAI_MODEL` in .env.local or in your environment to "hermes-2-pro-mistral"

James Vasile's avatar
James Vasile committed
 * Do `make run` and visit [http://localhost:8889/docs](http://localhost:8889/docs) in your browser, as above.
Laurian Gridinoc's avatar
Laurian Gridinoc committed
## Deployment
Daniel Schultz's avatar
Daniel Schultz committed

Laurian Gridinoc's avatar
Laurian Gridinoc committed
Requires `make` and `ansible`; and an Ubuntu instance in the `inventory`. This was tested with Ubuntu LTS on Digital Ocean.
Laurian Gridinoc's avatar
Laurian Gridinoc committed
On Digital Ocean set DNS name for API and Traefik endpoints, provide them in the `.env.*` too.

Laurian Gridinoc's avatar
Laurian Gridinoc committed
Run the Ansible playbook with `make run-playbook`, note that requires `.env.production` file (see `.env.example`), it will deploy to `IP_ADDRESS` server.

## Gitlab CI deployment

Requires `DEPLOY_TARGET_IP_ADDRESS` and `SSH_PRIVATE_KEY` for the destination server as secrets set in GitLab, [follow these SSH steps](https://docs.gitlab.com/ee/ci/ssh_keys/), along with any other env settings from `.env.production` as that file is not available in CI.

## Tracing LLM requests

You can enable [Langchain's Langsmith](https://www.langchain.com/langsmith) or/and [Langfuse](https://langfuse.com/) tracing by using the corresponding keys from `.env.example` (If you don't define `LANGFUSE_SECRET_KEY` then the Langfuse tracing is disabled, wrong `LANGFUSE_SECRET_KEY` will give you errors.)
Laurian Gridinoc's avatar
Laurian Gridinoc committed

Similarly Sentry can be used to monitor FastAPI and LLM calls, enable it by setting `SENTRY_DSN`.