Technical Deep Dive

Master Streaming Chatbot Development with AWS Lambda in 3 Simple Steps

Exploring the impact of AI on education, focusing on personalized learning and administrative automation.

AWS Lambda is a popular tool to develop and deploy any service given its low cost and simple server-less model. However, one of the main issues when building a chatbot using Lambda functions was lack of clear instructions around making a streaming service. A streaming chatbot is essential for a good user experience, and has been the standard since ChatGPT was introduced.

In this post, we’ll give you a way to deploy a simple backend service which wraps around OpenAI’s API, and can then be easily extended to support functionalities like RAG. The same service can also be easily swapped with other LLM providers or open source models.

We’ll follow these steps –

  1. Create a docker container with all the requirements.
  2. Build a FastAPI streaming app
  3. Deploy the application to AWS Lambda with appropriate settings

Prerequisites

  1. Basic understanding of AWS Lambda, Docker, ECR
  2. Comfortable with Python3
  3. AWS CLI or console access.

Creating the Docker Container

For this lambda function, we’ll use Docker so we can make use of AWS Lambda Web Adapter. AWS Lambda Web Adapter allows you to build web apps with familiar frameworks like FastAPI run it on AWS Lambda. Here’s a Dockerfile you can use to import the adapter directly into your image –

FROM public.ecr.aws/docker/library/python:3.12.0-slim-bullseye
COPY --from=public.ecr.aws/awsguru/aws-lambda-adapter:0.8.1 /lambda-adapter /opt/extensions/lambda-adapter

ENV AWS_LWA_INVOKE_MODE=RESPONSE_STREAM

WORKDIR /app
COPY requirements.txt requirements.txt
RUN pip install -r requirements.txt

COPY __init__.py __init__.py
COPY main.py main.py

CMD ["python", "main.py"]

FastAPI Streaming Chatbot App

We chose FastAPI given it’s simplicity and ease of understanding. The lambda adapter should work with any HTTP based backend server. Here’s a backend app you can use for this –

import os
import uvicorn
from fastapi import FastAPI, Depends, HTTPException, status, Header
from fastapi.responses import StreamingResponse, JSONResponse
from pydantic import BaseModel
from openai import OpenAI


openai_client = OpenAI(api_key="<API_KEY>")

async def openai_stream(system_prompt, query, api_key):
   response = openai_client.chat.completions.create(
       model="gpt-4-turbo",
       messages=[
           {
               "role": "system",
               "content": system_prompt
           },
           {
               "role": "user",
               "content": query
           }
       ],
       temperature=0,
       stream=True
   )
   final_response = ""
   for chunk in response:
       chunk_response = chunk.choices[0].delta.content
       if chunk_response:
           final_response += chunk_response
           yield chunk_response

app = FastAPI()

class QueryRequest(BaseModel):
   query: str

@app.post("/api/completion_stream")
async def api_completion_stream(request_body: QueryRequest):
   query = request_body.query
   if not query:
       return StreamingResponse("", media_type="text/plain")
   system_prompt = "You are a helpful maths teacher. Answer questions only about maths and be very thoughtful."
   stream_generator = openai_stream(system_prompt, query)
   response = StreamingResponse(stream_generator, media_type="text/plain")

   return response


if __name__ == "__main__":
   uvicorn.run(app, host="0.0.0.0", port=int(os.environ.get("PORT", "8080")))

Deploy the application to AWS Lambda

AWS Lambda has long had support for docker runtime. They also added the capability to stream output responses, which is what we will use in this case. We can break down this process into first setting up the lambda function, and then applying appropriate settings.

First, you need to setup AWS ECR (Elastic container registry). You can use the guide here. Note your repository endpoint and region once the repository is created.

Once you have ECR setup, you can use the following command –

aws ecr create-repository --repository-name <repo_name>

Next, we can push our docker container to the ECR repository using following commands –

## Login to ECR
aws ecr get-login-password --region <ECR_REGION> | docker login --username AWS --password-stdin <ECR_ENDPOINT_ID>.dkr.ecr.<ECR_REGION>.amazonaws.com
## Build Docker image
docker buildx build --platform=linux/amd64 -t <repo_name> .
## Apply latest tag
docker tag chatbot-api:latest <ECR_ENDPOINT_ID>.dkr.ecr.<ECR_REGION>.amazonaws.com/<repo_name>:latest
## Push to ECR
docker push <ECR_ENDPOINT_ID>.dkr.ecr.<ECR_REGION>.amazonaws.com/<repo_name>:latest

Once the image is created, we can move on to setup our AWS Lambda function. When creating the function, choose docker runtime, and pick the docker image from ECR repo we created in last step. Use the tag latest to make sure it picks latest changes. If you are using the aws cli, here’s a sample command –

aws lambda create-function \
   --function-name my-lambda-function \
   --package-type Image \
   --code ImageUri=<ECR_ENDPOINT_ID>.dkr.ecr.<ECR_REGION>.amazonaws.com/chatbot-api:latest \
   --role arn:aws:iam::ACCOUNT_ID:role/service-role/my-lambda-role \
   --timeout 900 \
   --memory-size 1024

For easier integration, set “auth-type” to None, and timeout to 900.

Once the function is created, you can head over to “Configuration -> Function URL” to setup your a HTTPS endpoint for your service. This is the endpoint you will eventually call from your web app. While creating the endpoint make sure you setup the Invoke mode to RESPONSE_STREAM. This is what enables the streaming output for your function.

And you’re done! Note the function URL and your service is ready to be served. Here’s a sample curl request to verify –

curl --location '<function_url>/api/completion' \      
--header 'Content-Type: application/json' \
--data '{"query": "What is Pythagorean theorem"}'

Most open source wrappers and API providers for other LLM’s also support OpenAI API format, making it easy for you to swap out different providers. This will enable you to deploy your service in a cheap yet scalable manner, while maintaining all the speed benefits that come with AWS Lambda functions.

Let Me Know How It Went

I hope you give this method a shot, and I’m always happy to hear from you. If you have any issues, feel free to comment below and I will take a look. If you tried it successfully, I’d love to hear from you as well!