OpenAI’s Assistants API allows app developers to integrate AI without any background on how AI works. It is a great onboard for beginners, but falls short in practice.
I recently took some time to research this topic, so I’ll save you some time and tell you about why the Assistants API is so appealing for AI beginners, and why at the same time, there are some major shortcomings to keep in mind, that makes this API better suited for prototyping than production use cases, at least for now. Indeed, it is still in beta, so a lot of these shortcomings are likely to be addressed by OpenAI in the future.
For developers who are eager to get involved with AI by building AI apps or projects, and are looking for a way to get started quickly, OpenAI’s Assistants API can be a great solution. All you need is some basic understanding of a programming language like Python or Java to get started. Actually, if you have no coding experience at all, you can also build an Assistant manually in the Assistants Playground.
The Assistants API is a new API unveiled by OpenAI on the November 2023 Dev Day. It extends the existing ChatCompletions API, and makes it easier for developers to craft powerful AI applications. It is a framework that allows you to access some very powerful technologies built by OpenAI.
The benefit of the Assistants API is that it abstracts away a lot of the complexities of working with large language models and allows developers to focus on building the application.
You can gain a visual understanding of the Assistant API through the Assistants Playground, where you can play with the three types of tools currently available:
Function calling is a very cool tool, where you can extend the Assistant’s capabilities by writing your own custom functions. For example, you can write a custom function to fetch news based on a given topic, fetch current stock prices, anything you can imagine. These function results can be used by the AI as a part of its toolkit to accomplish tasks.
Code Interpreter allows the AI to write and execute code. For example, if you would like the AI to solve a math problem, you can instruct it to first write out the python code for the problem, and then execute the code, so that it can do math properly. The code interpreter can also process a wider variety of data formats, analyze data, and even generate graphs!
Knowledge retrieval means you can upload your own files – text files, PDF files, etc. to add to the knowledge base of the Assistant. For example, you can upload a research paper, and then ask the AI questions about the paper.
These three tools are extremely powerful in the hands of a developer, and can enable countless use cases. The Assistant playground is pretty self explanatory. I invite you to try it out and build your own Assistant.
You can of course also build Assistants with code. For a great overview of this, you can refer to this wonderful class by freeCodeCamp.
While the Assistant API has captured the interest of many developers, there are still some major shortcomings to keep in mind if you are considering production use cases.
Many people who tried the Assistant API have found their token usage scaled very quickly with usage. OpenAI charges for their service based on the amount of text (measured in tokens) processed by their model during a request.
Now, the amount of text processed can be far beyond the user input text. This is because the Assistant keeps track of a conversation using threads, so that it remembers the context of the entire conversation. So, with each subsequent chat message, it is actually sending back the entire thread to the model, so you are paying for the cumulative token count with every message.
Additionally, if you upload files for knowledge retrieval, the Assistant API feeds the entire document to the model to answer questions, and therefore charges for the token count of the entire document.
As you can imagine, the token usage, and therefore pricing, can get out of control very quickly.
A distinctive part of the ChatGPT user experience is the way it streams its response back to the user, as if it is typing it out word for word. Not only does this experience look cool, it shortens the wait time for impatient users. Since the model is generating words token by token, it streams results as it comes out, so that the user isn’t left waiting for minutes until a large chunk of text comes out. Unfortunately this streaming feature is currently unavailable in the Assistant API endpoint, although it might be coming soon.
Under the legacy ChatCompletion API, you can set hyperparameters like temperature, if you would like to toggle the randomness / creativity of the generated text. These hyperparameters are currently unavailable in the Assistants API.
While the shortcomings are a bit disappointing, interest in the Assistants API is very high for developers. It is a very quick and easy way to access very powerful technologies, and to get your mind thinking about what is possible with AI. AI is an evolving technology — which is very exciting — so the tools are evolving as well.