README.md

# LLM API

This is the API gateway to OTS's philanthropic LLM services.  It sits between
frontend interfaces (like Torque or Hypha) and gives those front ends access to
LLMs. The API performs its work either by talking directly to LLM services or
using further gateways (e.g. our llm-infrastructure repository).

OpenAI's API has become the de facto way to talk to LLMs, so we are, for the
moment, prototyping using OpenAPI as our backend.  We do this knowing we will be
able to adapt the results for use with other LLMs as we progress.

Documentation for this repository is in this README and throughout the code and
the files that make up the project.  The best starting point is the `mkdocs`
documentation, which you can see by running `mkdocs serve`. 
Please refer to `./requirements.txt` for the necessary mkdocs packages to install.

If you do `make run`, you will get a live instance that has two documentation
endpoints:

 * `http://localhost:8889/mkdocs`, full text documentation
 * `http://localhost:8889/docs`, interactive, API-specific documentation

If you do not use `make run`, you might get an error about the `site` directory
not existing.  You can create that directory and its contents with `make docs`.

## Run Dev mode

Requires `make` and `python3.11+`

Our API is build around a FastAPI server.  There are multiple ways ways to run
LLM backends for FastAPI.  Configuration is done via environment variables,
which can be set in `.env.local` or specified in the usual way at the command
line.

 * The easiest way to run is to just use OpenAPI, which requires an account and
   API key at `OpenAI.com`.  You can set this API key in `.env.local`, which is
   pulled in by our `Makefile` when you `make run`.  It is the field called
   `OPENAI_API_KEY`.

 * Alternatively, you can use a different LLM, run on your own infrastructure
   (e.g. your laptop).  To do that, you still need to set `OPENAI_KEY`, but you
   can set it to any random value.  You will want to set `OPENAI_API_BASE` to
   the base URL of the server listening for OpenAI API calls. Ex:
   `OPENAI_API_BASE=http://localhost:8080/v1`

 * Soon, you will be able to run the stack in our `llm-infrastructure` repo as a
   third way to provide an LLM backend to our LLM API.

For more information about all the environment variables, please see `.env.example`.  Once you have set `OPENAI_API_BASE` and `OPENAI_API_KEY`, you can run FastAPI.

1. create an `.env.local` based on `.env.example`
2. run `make run` -- this will run `api/server.py` after creating `.venv` and installing required modules
3. use a browser to see the generated API docs at [http://0.0.0.0:8889/docs](http://0.0.0.0:8889/docs)
4. test `/filterset` function call

Alternatively, you can run entirely from the commandline:

```
OPENAI_API_KEY="sk-bob...lob...law" .venv/bin/python ./api/filterset.py
```


### Hermes 2

As noted above, you can use Hermes 2 instead of OpenAI

 * Make sure Hermes is running.  For simplicity, we'll use Docker:

   ```
   docker run -ti -p 8080:8080 localai/localai:v2.11.0-ffmpeg-core hermes-2-pro-mistral
   ```

 * Set `OPENAI_MODEL` in .env.local or in your environment to "hermes-2-pro-mistral"

 * Do `make run` and visit [http://localhost:8889/docs](http://localhost:8889/docs) in your browser, as above.

## Deployment

Requires `make` and `ansible`; and an Ubuntu instance in the `inventory`. This was tested with Ubuntu LTS on Digital Ocean.
On Digital Ocean set DNS name for API and Traefik endpoints, provide them in the `.env.*` too.

Run the Ansible playbook with `make run-playbook`, note that requires `.env.production` file (see `.env.example`), it will deploy to `IP_ADDRESS` server.

## Gitlab CI deployment

Requires `DEPLOY_TARGET_IP_ADDRESS` and `SSH_PRIVATE_KEY` for the destination server as secrets set in GitLab, [follow these SSH steps](https://docs.gitlab.com/ee/ci/ssh_keys/), along with any other env settings from `.env.production` as that file is not available in CI.

## Tracing LLM requests

You can enable [Langchain's Langsmith](https://www.langchain.com/langsmith) or/and [Langfuse](https://langfuse.com/) tracing by using the corresponding keys from `.env.example` (If you don't define `LANGFUSE_SECRET_KEY` then the Langfuse tracing is disabled, wrong `LANGFUSE_SECRET_KEY` will give you errors.)

Similarly Sentry can be used to monitor FastAPI and LLM calls, enable it by setting `SENTRY_DSN`.