This guide details how to use an iOS Shortcut to interact with large language models (LLMs) hosted on a local desktop computer through a FastAPI endpoint. The setup allows you to send prompts from your iOS device and receive responses from the LLM, enabling on-the-go access to advanced language models. There are 2 main components to this setup:
- FastAPI Application: A FastAPI application that serves as an endpoint for the iOS Shortcut to send requests to.
- iOS Shortcut: A shortcut that sends prompts to the FastAPI application and displays the response.
- Less battery drain - sending web requests is much less battery intensive than running inference on-device
- Faster inference with remote models
- More flexibility - run models accessible via LLM library
- AnyScale compatible
-
Current AnyScale model names include (1/3/24)
meta-llama/Llama-2-7b-chat-hfmeta-llama/Llama-2-13b-chat-hfmeta-llama/Llama-2-70b-chat-hfcodellama/CodeLlama-34b-Instruct-hfmistralai/Mistral-7B-Instruct-v0.1mistralai/Mixtral-8x7B-Instruct-v0.1Open-Orca/Mistral-7B-OpenOrcaHuggingFaceH4/zephyr-7b-beta
-
All GGUF models are available through local inference via llama.cpp
- The Bloke regularly publishes many GGUF models
- ngrok AuthToken
- AnyScale API key
- Docker
- iOS / Shortcuts app
-
Git Clone:
- Clone this repository to your local machine:
git clone https://github.com/00brad/LLM-Shortcut.git
- Clone this repository to your local machine:
-
Build Docker Image:
-
Create a
.envfile in the project root with your ngrok AuthToken and AnyScale API Key (Model can be set to an AnyScale model name or a local GGUF model name):NGROK_AUTHTOKEN=your_ngrok_authtoken ANYSCALE_ENDPOINTS_KEY=your_anyscale_key MODEL=your_model_name MAX_TOKENS=your_max_tokens -
Build the Docker image:
docker build -t llm_shortcut .
-
-
If using local GGUF model (optional), download the into project root:
- Example:
wget https://huggingface.co/TheBloke/Llama-2-7b-Chat-GGUF/resolve/main/llama-2-7b-chat.Q6_K.gguf
- Set the model name in the
.envfile to the name of the downloaded model.
- Example:
-
Run the FastAPI Application:
- Start the FastAPI server using Uvicorn:
docker run llm_shortcut
- Upon startup,
ngrokgenerates a public URL that tunnels to your local desktop computer. - The generated ngrok URL, which is your endpoint, will be displayed in the console after running:
- Start the FastAPI server using Uvicorn:
-
Shortcut Setup:
- Click the following link to open the LLM Shortcut in the Shortcuts app.
- Click the '+ Add to Shortcut' button to add the shortcut to your library.
- Click the '...' button to edit the shortcut.
- Change the url in the first line to the ngrok URL generated in the previous section.
- Add the API key.
- Click the 'Done' button to save the changes.
- Click the 'i' button to view the shortcut details and click Add to Home Screen to add the shortcut to your home
-
Using the Shortcut:
- Click the shortcut and speak your prompt or say "Hey Siri, Ask AI" to activate the shortcut.
- The prompt is sent to the FastAPI server, processed by the LLM, and a response is returned to your device.
- The ngrok URL will change each time the FastAPI server restarts.
- You can run in detached mode by adding the
-dflag to thedocker runcommand. (Use docker logs to view the ngrok URL) - Ensure the
.envfile is correctly set up with your AnyScale API key, ngrok AuthToken, and model name. - Currently, the server is configured to use an AnyScale models.
This setup allows you to leverage the power of language models directly from your iOS device, making it fast and convenient to use advanced AI capabilities wherever you go.