Evaluation and Testing LLM Outputs
This is one of series of articles on different solution considerations for integrating into LLMs.
Question
How can we evaluate and test the Large Language model outputs
This is critical question among enterprises for integrating LLMs into production
Their is no single answer here. Integrating to LLMs has been widespread already , their are lots of showcases, demo tutorials etc. However not many stories on integrating output response from LLMs directly into actions or to end users, especially when your business is at stake. I
So how can we ensure
the outputs are expected (or within the tolerance range )
within the guard rails
ethical and How can we log and observe this for improvement/feedback and as a proof of reference.
Lets cover what we know
What are the toolsets available:
July 2023 (because changes so fast in this space)
Lang chain: Self-critique chain with constitutional AI
References: