Artificial intelligence (AI) has made significant strides in recent years, particularly with the development of large language models (LLMs). However, these advancements have been accompanied by persistent challenges, notably the issue of ‘hallucinations’—instances where AI generates incorrect or nonsensical information. Addressing this critical problem, the startup Probably has emerged with a mission to enhance the reliability of AI systems.
Recently, Probably secured $9 million in seed funding from Andreessen Horowitz, a prominent venture capital firm known for backing innovative technology companies. This investment underscores the growing recognition of the need for more dependable AI solutions in various industries.
Founded by Peter Elias, Probably aims to prevent AI-generated errors from reaching end-users by achieving a level of accuracy comparable to traditional deterministic systems, which often boast 99.99% reliability. To accomplish this, the company is rethinking fundamental aspects of AI engineering.
Their inaugural product is a data science tool designed to deliver rapid insights from complex datasets. A distinguishing feature of this tool is its provision of citations and audit trails for each result, promoting transparency and trustworthiness. This approach aligns with a broader industry trend toward enhancing the verifiability of AI outputs.
To mitigate errors, Probably has developed an intricate validation framework. Initially, the LLM generates a response, which is then cross-verified against a deterministic validation system. If discrepancies arise, the response is flagged for review. This process ensures that only accurate information is presented to users. Moreover, the LLM is trained in conjunction with the validator, optimizing the system for both speed and precision.
Elias emphasizes that robust validation mechanisms can compensate for less powerful models. By refining the context in which the AI operates, the system reduces ambiguity, allowing the model to perform effectively without requiring extensive computational resources. Consequently, Probably’s tool operates on smaller AI models, enabling deployment on local hardware such as desktop computers. This approach not only enhances accessibility but also reduces operational costs associated with AI usage.
In an era where organizations are scrutinizing their AI expenditures, Probably’s cost-effective and reliable solution is particularly appealing. The company’s methodology holds promise for applications beyond data science, including fields like accounting and medical services, where precision is paramount.
Elias points out that major AI laboratories have yet to prioritize this level of reliability, suggesting that their business models may not incentivize such developments. By focusing on reducing the need for repeated corrections, Probably positions itself as a leader in delivering dependable AI solutions.
As AI continues to permeate various sectors, the demand for trustworthy and accurate systems becomes increasingly critical. Probably’s innovative approach addresses this need, offering a scalable solution that balances performance with reliability. The success of their data science tool could set a new standard for AI applications, emphasizing the importance of error mitigation and user trust in the evolving technological landscape.