Model accuracy is top of mind for us right now and we wanted to share how we use feedback from you all to identify areas for improvement and rapidly deploy upgrades. You may have noticed that hovering over a chat reponse will show you both a "thumbs up" and a "thumbs down" emoji. Well... what happens when you click these?
When you click either of these buttons, information about the question, answer, source content, etc. is sent to a database that is used to track model performance. These records are then used to build "test cases" which are basically question / answer pairs that we use to evaluate how well the model is performing. These test cases look something like this:
tag_id: ConnorQuestion: "What is the difference between supervised and unsupervised learning in machine learning?"Answer: "Supervised learning involves the use of labeled data to train a machine learning model.In this approach, the model is provided with example inputs and their desired outputs, given by a "teacher".The goal is to learn a general rule that maps inputs to outputs.On the other hand, unsupervised learning does not involve the use of labeled data.The learning algorithm is left on its own to find structure in the input data.It can be a goal in itself, such as discovering hidden patterns in data, or a means towards an end, like feature learning."
In short, this just says that when running this question against the tag "Connor", you should get a response that is at least 80% similar to the given answer (post on similarity calculations coming soon ). Whenever we make changes to Storytell, we evaluate the model against ALL of our previous test cases to make sure that we are never introducing bugs/errors/etc.
What happens if I vote thumbs down, what is used as the "ground truth" for the test case above?
In this case, we either have to do some manual investigation to find the answer in the relevent report and/or we reach out to you, the user. Bonus points for any users who provide feedback when they give a downvote! We will be adding a feature soon that allows you to write-in the correct answer which will drastically speed up the evaluation process.
Thanks for reading and feel free to reach out with questions/comments!