Technical Deep Dive

Master Streaming Chatbot Development with AWS Lambda in 3 Simple Steps

Exploring the impact of AI on education, focusing on personalized learning and administrative automation.

AWS Lambda is a popular tool to develop and deploy any service given its low cost and simple server-less model. However, one of the main issues when building a chatbot using Lambda functions was lack of clear instructions around making a streaming service. A streaming chatbot is essential for a good user experience, and has been the standard since ChatGPT was introduced.

In this post, we’ll give you a way to deploy a simple backend service which wraps around OpenAI’s API, and can then be easily extended to support functionalities like RAG. The same service can also be easily swapped with other LLM providers or open source models.

We’ll follow these steps –

Create a docker container with all the requirements.
Build a FastAPI streaming app
Deploy the application to AWS Lambda with appropriate settings

Prerequisites

Basic understanding of AWS Lambda, Docker, ECR
Comfortable with Python3
AWS CLI or console access.

Creating the Docker Container

For this lambda function, we’ll use Docker so we can make use of AWS Lambda Web Adapter. AWS Lambda Web Adapter allows you to build web apps with familiar frameworks like FastAPI run it on AWS Lambda. Here’s a Dockerfile you can use to import the adapter directly into your image –

FROM public.ecr.aws/docker/library/python:3.12.0-slim-bullseye COPY --from=public.ecr.aws/awsguru/aws-lambda-adapter:0.8.1 /lambda-adapter /opt/extensions/lambda-adapter ENV AWS_LWA_INVOKE_MODE=RESPONSE_STREAM WORKDIR /app COPY requirements.txt requirements.txt RUN pip install -r requirements.txt COPY __init__.py __init__.py COPY main.py main.py CMD ["python", "main.py"]

FastAPI Streaming Chatbot App

We chose FastAPI given it’s simplicity and ease of understanding. The lambda adapter should work with any HTTP based backend server. Here’s a backend app you can use for this –

import os import uvicorn from fastapi import FastAPI, Depends, HTTPException, status, Header from fastapi.responses import StreamingResponse, JSONResponse from pydantic import BaseModel from openai import OpenAI openai_client = OpenAI(api_key="<API_KEY>") async def openai_stream(system_prompt, query, api_key): response = openai_client.chat.completions.create( model="gpt-4-turbo", messages=[ { "role": "system", "content": system_prompt }, { "role": "user", "content": query } ], temperature=0, stream=True ) final_response = "" for chunk in response: chunk_response = chunk.choices[0].delta.content if chunk_response: final_response += chunk_response yield chunk_response app = FastAPI() class QueryRequest(BaseModel): query: str @app.post("/api/completion_stream") async def api_completion_stream(request_body: QueryRequest): query = request_body.query if not query: return StreamingResponse("", media_type="text/plain") system_prompt = "You are a helpful maths teacher. Answer questions only about maths and be very thoughtful." stream_generator = openai_stream(system_prompt, query) response = StreamingResponse(stream_generator, media_type="text/plain") return response if __name__ == "__main__": uvicorn.run(app, host="0.0.0.0", port=int(os.environ.get("PORT", "8080")))

Deploy the application to AWS Lambda

AWS Lambda has long had support for docker runtime. They also added the capability to stream output responses, which is what we will use in this case. We can break down this process into first setting up the lambda function, and then applying appropriate settings.

First, you need to setup AWS ECR (Elastic container registry). You can use the guide here. Note your repository endpoint and region once the repository is created.

Once you have ECR setup, you can use the following command –

aws ecr create-repository --repository-name <repo_name>

Next, we can push our docker container to the ECR repository using following commands –

## Login to ECR aws ecr get-login-password --region <ECR_REGION> | docker login --username AWS --password-stdin <ECR_ENDPOINT_ID>.dkr.ecr.<ECR_REGION>.amazonaws.com ## Build Docker image docker buildx build --platform=linux/amd64 -t <repo_name> . ## Apply latest tag docker tag chatbot-api:latest <ECR_ENDPOINT_ID>.dkr.ecr.<ECR_REGION>.amazonaws.com/<repo_name>:latest ## Push to ECR docker push <ECR_ENDPOINT_ID>.dkr.ecr.<ECR_REGION>.amazonaws.com/<repo_name>:latest

Once the image is created, we can move on to setup our AWS Lambda function. When creating the function, choose docker runtime, and pick the docker image from ECR repo we created in last step. Use the tag latest to make sure it picks latest changes. If you are using the aws cli, here’s a sample command –

aws lambda create-function \ --function-name my-lambda-function \ --package-type Image \ --code ImageUri=<ECR_ENDPOINT_ID>.dkr.ecr.<ECR_REGION>.amazonaws.com/chatbot-api:latest \ --role arn:aws:iam::ACCOUNT_ID:role/service-role/my-lambda-role \ --timeout 900 \ --memory-size 1024

For easier integration, set “auth-type” to None, and timeout to 900.

Once the function is created, you can head over to “Configuration -> Function URL” to setup your a HTTPS endpoint for your service. This is the endpoint you will eventually call from your web app. While creating the endpoint make sure you setup the Invoke mode to RESPONSE_STREAM. This is what enables the streaming output for your function.

And you’re done! Note the function URL and your service is ready to be served. Here’s a sample curl request to verify –

curl --location '<function_url>/api/completion' \ --header 'Content-Type: application/json' \ --data '{"query": "What is Pythagorean theorem"}'

Most open source wrappers and API providers for other LLM’s also support OpenAI API format, making it easy for you to swap out different providers. This will enable you to deploy your service in a cheap yet scalable manner, while maintaining all the speed benefits that come with AWS Lambda functions.

Let Me Know How It Went