Datasets are at the heart of LangChain's evaluation methods. These are collections of
Examples that define inputs for those evaluations. Each example generally consists of an
dictionary and, optionally, an
dictionary that outlines expected results for performance checks. To create your datasets, you can either manually curate them or leverage historical logs of your application’s interactions. This is a great way to create relevant data that reflects real-world performance scenarios.
You can get more insight on creating datasets through the
LangSmith SDK guidelines.