@parker.crist switched our backend to GPT-4 for responding to chats but found that it increase the latency to nine to 10 times, so he changed Storytell’s back-end to leverage OpenAI’s chat streaming functionality to have Storytell chat work via a new “chat streaming feature”.
The new update enables streaming data from a backend to a client, which means that data is sent in small, continuous chunks, rather than in a single, large request. This allows for faster, more efficient data transfer, which is especially important in chat applications.
The way it currently works is that when users type in a request, they wait for it to generate an entire response. But what we do now is as chatGPT is getting tokens, we will forward those to the front-end or essentially stream the response which looks like it is typing. Using GPT-4 is slower, but it gives you constant updates rather than having to wait the entire time.