AI as a Player in Multiplayer

Recently, we've seen more people building AI apps on Jamsocket, our platform for deploying session backends. While session backends are a common architecture for collaborative apps like Figma, we increasingly see overlap between the technical needs of multiplayer and AI apps.

We built a demo using the OpenAI's Assistant API to identify specific ways session backends can make an AI tech stack simpler and more reliable.

Note: we sped up this video for demonstration purposes.

Our demo uses the Assistant to create and edit shapes on a collaborative canvas. This shares challenges that many AI apps face: using an AI model to call functions which operate on shared document state and pushing updates to the client. Session backends are a good canditate for solving these problems. Source code for the demo is available on GitHub

What is the OpenAI Assistant API?

Last month, OpenAI released their Assistant API. The Assistant API lets you build custom AI agents that use tools to incorporate user data, interact with the rest of your app, or execute code.

Threads and Statefulness

Unlike OpenAI's text completions API, the assistant API has built-in support for long-running user sessions by way of a new feature called threads.

Within a thread, you can invoke runs that perform tasks with a relevant model or tool. This allows for richer user interactions – see the OpenAI keynote demo, where the assistant added pins on a map during a natural language conversation with a user.

Applications use the Assistant API in three steps:

  1. Create an assistant and specify which tools you’d like to use. For function calling, that means providing function signatures that the AI can call
  2. Create a thread, which represents a conversation between an assistant and user.
  3. Add incoming messages to a thread and process them by instantiating a run, which may trigger the use of a tool.

The challenges with traditional backend architectures

The stateful nature of the assistant model presents some challenges to traditional backend architectures.

For example, when the assistant invokes a run, the run will by default enter a queued state. You need to periodically poll the run to check on its status and act on it if necessary. When the AI wants to call a function, the run will enter a requires_action state. In our demo, the assistant can call functions in our app to create and edit shapes.

We needed a backend architecture that made it easy to:

  • repeatedly poll the API
  • respond to actions requests from the assistant to call a specific function
  • relay state updates to the client via a WebSocket connection

In a traditional backend architecture, setting up a stateful WebSocket stream that lasts an entire user session is challenging. If the client connection is dropped, the client cannot easily reconnect to the same backend and pick up where it left off. And when it comes to the Assistant API, we also need a place to operate on shared state on behalf of the assistant, and push those updates to connected clients.

While you could patch together a solution with worker queues, sticky sessions, pubsub brokers, and/or realtime databases, there is a simpler alternative: session backends.

Session backend architectures let you maintain a stateful connection with a user or group of users. This is achieved by giving each client a dedicated lightweight backend process, which they can connect (and reconnect) to via a unique URL.

When the assistant reaches a requires_action state, we can call the functions that create or edit shapes, request a rerun if the assistant made errors, update the shared document state, and notify clients of document changes. All of this can happen independent of the client. And if a client connection is dropped, reconnecting to the same session backend is easy.

Who is the source of truth when there are multiple connected clients?

The simplicity of a session backend comes from the fact that the session backend itself is the source of truth for application state.

In our demo, we don't need to store thread or run IDs to a database – state which is meant to last for just a user session. This is also true for the application state that the assistant can operate on. The application state in our demo is simply an array of shapes, which we can just keep as a variable in the session backend's global scope.

Using the session backend as the source of truth comes with some distinct advantages over using a database. In a multiplayer use case, multiple clients can send in potentially conflicting changes, which can't be resolved in the client or database. The stateful nature of the session backend makes it a better place to handle updates from multiple clients and resolve potential conflicts.

AI as part of a new suite of collaborative apps

By treating the assistant as a special kind of collaborative user, the same architecture we use for multiplayer can also be used to integrate an AI assistant.

In our demo, changes made by the assistant are handled in the same way as changes made by a user. Shapes generated by the assistant and shapes generated by other users are all broadcast through the same WebSocket server, as well as other presence information like live cursors and avatars.

Our AI demo was actually built on top of our existing NextJS SocketIO whiteboard tutorial, which had all of the WebSocket infrastructure set in place for a multiplayer use case.

We're super excited about how this new generation of AI apps is not only changing the way we think about multiplayer and collaboration, but also reaffirming the need for new backend architectures for complex web apps.

Check out our demo, which can be a template for building with the Assistant API. If you have any questions about Jamsocket's session backends, reach out to us via Discord or email!

For more like this, subscribe to our Browsertech Digest or follow @JamsocketHQ on Twitter.