-
Harry Minsky authoredHarry Minsky authored
LLM API
This is the API gateway to OTS's philanthropic LLM services. It sits between frontend interfaces (like Torque or Hypha) and gives those front ends access to LLMs. The API performs its work either by talking directly to LLM services or using further gateways (e.g. our llm-infrastructure repository).
OpenAI's API has become the de facto way to talk to LLMs, so we are, for the moment, prototyping using OpenAPI as our backend. We do this knowing we will be able to adapt the results for use with other LLMs as we progress.
Documentation for this repository is in this README and throughout the code and
the files that make up the project. The best starting point is the mkdocs
documentation, which you can see by running mkdocs serve
.
Please refer to ./requirements.txt
for the necessary mkdocs packages to install.
If you do make run
, you will get a live instance that has two documentation
endpoints:
-
http://localhost:8889/mkdocs
, full text documentation -
http://localhost:8889/docs
, interactive, API-specific documentation
If you do not use make run
, you might get an error about the site
directory
not existing. You can create that directory and its contents with make docs
.
Run Dev mode
Requires make
and python3.11+
Our API is build around a FastAPI server. There are multiple ways ways to run
LLM backends for FastAPI. Configuration is done via environment variables,
which can be set in .env.local
or specified in the usual way at the command
line.
-
The easiest way to run is to just use OpenAPI, which requires an account and API key at
OpenAI.com
. You can set this API key in.env.local
, which is pulled in by ourMakefile
when youmake run
. It is the field calledOPENAI_API_KEY
. -
Alternatively, you can use a different LLM, run on your own infrastructure (e.g. your laptop). To do that, you still need to set
OPENAI_KEY
, but you can set it to any random value. You will want to setOPENAI_API_BASE
to the base URL of the server listening for OpenAI API calls. Ex:OPENAI_API_BASE=http://localhost:8080/v1
-
Soon, you will be able to run the stack in our
llm-infrastructure
repo as a third way to provide an LLM backend to our LLM API.
For more information about all the environment variables, please see .env.example
. Once you have set OPENAI_API_BASE
and OPENAI_API_KEY
, you can run FastAPI.
- create an
.env.local
based on.env.example
- run
make run
-- this will runapi/server.py
after creating.venv
and installing required modules - use a browser to see the generated API docs at http://0.0.0.0:8889/docs
- test
/filterset
function call
Alternatively, you can run entirely from the commandline:
OPENAI_API_KEY="sk-bob...lob...law" .venv/bin/python ./api/filterset.py
Hermes 2
As noted above, you can use Hermes 2 instead of OpenAI
-
Make sure Hermes is running. For simplicity, we'll use Docker:
docker run -ti -p 8080:8080 localai/localai:v2.11.0-ffmpeg-core hermes-2-pro-mistral
-
Set
OPENAI_MODEL
in .env.local or in your environment to "hermes-2-pro-mistral" -
Do
make run
and visit http://localhost:8889/docs in your browser, as above.
Deployment
Requires make
and ansible
; and an Ubuntu instance in the inventory
. This was tested with Ubuntu LTS on Digital Ocean.
On Digital Ocean set DNS name for API and Traefik endpoints, provide them in the .env.*
too.
Run the Ansible playbook with make run-playbook
, note that requires .env.production
file (see .env.example
), it will deploy to IP_ADDRESS
server.
Gitlab CI deployment
Requires DEPLOY_TARGET_IP_ADDRESS
and SSH_PRIVATE_KEY
for the destination server as secrets set in GitLab, follow these SSH steps, along with any other env settings from .env.production
as that file is not available in CI.
Tracing LLM requests
You can enable Langchain's Langsmith or/and Langfuse tracing by using the corresponding keys from .env.example
(If you don't define LANGFUSE_SECRET_KEY
then the Langfuse tracing is disabled, wrong LANGFUSE_SECRET_KEY
will give you errors.)
Similarly Sentry can be used to monitor FastAPI and LLM calls, enable it by setting SENTRY_DSN
.