Earlier this month, Nvidia released Chat with RTX, a free-to-download, generative AI-powered chatbot that users can interact with and customize as long as they have relatively affordable GPUs in their desktops.
Users query the chatbot and locate content stored locally in txt, .pdf, .doc, .docx, and .xml files, which can then be connected to open-source language models like Llama 2 and Mistral.
What’s notable about this approach is that it brings a virtual assistant directly onto the user’s local device without reaching out to online server farms.
In announcing Nvidia Chat, Nvidia said: “Since Chat with RT runs locally on Windows RTX PCs and workstations, the provided results are fast — and the user’s data stays on the device.
“Rather than relying on cloud-based LLM services, Chat with RTX lets users process sensitive data on a local PC without the need to share it with a third party or have an internet connection.”
Key Takeaways
- Nvidia’s latest release of Chat with RTX brings a generative AI-powered chatbot directly onto local devices — with no internet connection.
- By using retrieval-augmented generation (RAG), users can process data locally and customize virtual assistants on their Windows PCs.
- Combined with Google’s Gemma, these push the momentum towards AI on the desktop rather than web-based, along with decentralized AI trained on specific datasets.
LLMs as an Operating System
For Nvidia, Chat with RTX is a step toward reimagining how LLMS are accessed and used.
Giving users the ability to customize a chatbot locally on a Windows PC, on their own documents and data, opens the door for developers to create their own personalized virtual assistants.
This is made possible for owners of the RTX 30 Series and 40 Series GPU through the use of a combination of retrieval-augmented generation (RAG), NVIDIA TensorRT-LLM, and NVIDIA RTX acceleration.
Senior research scientist and lead of AI agents Dr Jim Fan, in a post on X:
NVIDIA released Chat with RTX. Why is it significant? It's NVIDIA's first step towards the vision of "LLM as Operating System" – a locally running, heavily optimized AI assistant that deeply integrates with your file systems, features retrieval as the first-class citizen,… pic.twitter.com/0aJTHwqymV
— Jim Fan (@DrJimFan) February 14, 2024
As Fan says, “NVIDIA is going local before OpenAI,” releasing an offline-friendly virtual assistant rather than a web-hosted chatbot like ChatGPT.
More broadly, Chat with RTX demonstrates how more and more vendors are looking to make the power of generative AI accessible to end users.
For instance, Google’s release of Gemma, a series of open models lightweight enough to run directly on a laptop or desktop computer, gives users the option to train models on smaller devices.
A Step Toward a Decentralized Future
The decentralization of AI has been brewing for quite some time – as researchers look to build their own custom models with specific datasets.
After all, while tools like ChatGPT and Claude can be useful for tasks like text summarization or even content creation, they aren’t a fit for every use case.
Giving developers the option to train open-source models offline with Chat with RTX and local files provides more control over model development by presenting an opportunity to build and train with a highly-curated dataset.
This local training is a key point of differentiation from other solution providers like Google and Microsoft – which have attempted to develop LLM-driven tools powered by web-based data.
In each case, these two tech giants have attempted to allow users to process data stored across their cloud environments and product ecosystems.
For instance, Microsoft Copilot enables users to query data stored in documents, emails, calendars, chats, meetings, and contacts. This means users can ask Copilot questions about documents, pull data from emails, or even use assistants as part of popular 365 Apps like Word, Excel, PowerPoint, Outlook, and Teams.
Likewise, Google Gemini can integrate with Google Workspace products like Gmail, Docs, and Sheets so that users can search for content stored within these tools or even generate content directly with Gemini when using them.
Chat with RTX provides an alternative to this approach by giving users the option to search through content located in their local device rather than stored in the cloud.
The Bottom Line
Chat with RTX highlights that the concept of virtual assistant is evolving to embrace personalization. Local files and data can be as valuable to generating insights as web-based data and are much easier to protect from unauthorized third parties.
The future of generative AI is moving toward personalized models, trained on select datasets and optimized for specific use cases rather than more generic consumer-grade models.