Member-only story
RAG without GPU : How to build a Financial Analysis Model with Qdrant, Langchain, and GPT4All x Mistral-7B all on CPU!
Introduction
In this fast-paced AI innovations-rich environment, we are among some really good open-source Large language models like Llama-2, Mistral, GPT-J, Falcon, Vicuna, and so on. One unsaid thing about running these models properly is that you need some serious computing power, basically GPUs! But since most people do not have access to big bulky GPUs, there’s active work going on to make these models run on commonly available low-resource devices (CPUs).
What if I tell you there’s a way to run a RAG pipeline completely on your local laptop/PC using just CPUs?
A typical RAG or “Retrieval Augmented Generation” consists of these steps:
Join me as I cover these in detail in this blog:
- Documents: I will be working with a PDF document “Microsoft’s Annual Report 2023”, which contains their annual revenue and business report.
- Data Load and Ingestion Using Langchain: You will see how to use LangChain and its document parsers to ingest this PDF document.
- Indexing Using Qdrant: Qdrant is a vector database that supports vector indexing and search. If you don’t know about Qdrant, don’t worry…