Experiences with using RAG for personal context

The next most powerful feature to take context based conversation to the next level is RAG: Retrieval Augmented Generation. In many tools you can experience RAG in a user-friendly way - there are a couple of different approaches I am making use of. i) Claude's 'Projects' feature was initially the most attractive reason to switch from ChatGPT (we can save an LLM debate for another day.) Projects offer 2 distinct capabilities and 1 missing feature that might surprise you. Like GPTs, you can give all chat sessions of the same project a parent system instruction that defines how they respond. This means you don't need to start every chat within the project 'You are a helpful ... ' If you give the system prompt enough project context and instruction, you can launch into any chat session and the model will retain the understanding of what you are trying to achieve. Let's take a simple holiday planning project as an example - you can explain common detail like number of traveler's, who they are, dates, locations etc and then launch different chat's for flights, accommodation etc. This approach gets around context limit and usage limit issues. You could run such an activity as one long chat session. But context and usage limits start to become strained as the conversation is fed back to the model every time to retain context. Things also become messy if you want to run variable simulations or different approaches and keep context clean between chats (but under the same project instruction). This exposes the missing feature (potentially on purpose for that reason) - there is no explicit memory retained between chat session within a project. The second chat doesn't recall the context learned in the first chat. This might seem counter intuitive, but it does come good for exploring hypotheticals and testing of ideas - dead end conversations aren't automatically committed to core corpus. Instead Anthropic have added a manual work around which works hand in hand with its second key feature: "Project Knowledge." Every project allows 20 files up to 30MB to be uploaded as context and referenced in all chats. This is a powerful step on the way to contextual memory that is vital for personal assistance. If you want just a thin layer of personalisation in your own AI, this capability might be all you need. In more recent experiments I have exposed just how little context might be required for LLMs to infer your intent and create the illusion of more intimate understanding. The icing on the cake for this mode is a graceful combination of Claude's Canvas mode and a manual 'add to project context' button - this allows you to store any generated works to the project knowledge repository and be referenced in all other chats thereafter. Thus, solving the cross-chat session memory issue since you could ask any relevant knowledge to be packaged up at the end of a session and stored into 'memory.' Furthermore, Claude keeps version control on these docs, so if the output of one session was project charter file you worked on. You could reference this in another chat and even iterate and update it from future sessions. This approach delivers a high level of assistance if you keep your domain and activity to a narrow field - ie a project. I've found this mode to be incredibly useful even for longer term objectives and 'themes' - fantastic at iteration, building up ideas over time. The context and saving of documentation means you can keep coming back and refining your project over and over again, having it grow with you. I've deployed this technique in synthesising my own personal digital systems and processes - blending and refining philosophy, method and technical development to build my Cathedral System in Obsidian - documenting the processes, writing scripts and capturing my own brand of digital gardening and second brain. ii) 'Knowledge Stacks' in the MSTY app is a more explicit and simpler use of RAG - but offers a wider and more flexible scope of potential context and application. In this approach, the information a chat session might require is pre-define and set up as a curated set of Knowledge or 'Knowledge Stack' - using selectable and customisable embedding models to chunk sources from uploaded documents, Video links and attractively for me: Obsidian Vaults. These stacks are invoked in any chat session (within a project or stand alone) as you would attach a single doc, or activate web search etc. This adds the function for the LLM to search the embeddings for relevant matches and use in responses. This approach divorces knowledge from specific projects, allowing it to be connected to different scenarios, all the while maintaining a level of data privacy since only the relevant chunks are brought into the chat, not entire documents. However, in practice this presents a couple of issues for the advancement of a personal AI. Firstly you are trading off the ability to update and evolve knowledge in concert with the AI for a the option of a larger and broader memory that is in a 'read-only' mode. Secondly this more rudimentary RAG set up is less accurate when you need to cite specific parts of text in knowledge. Claude's approach to RAG adds additional layers of 'Contextual Retrieval' and Re-ranking steps to the response to create a more performant outcome. I'd certainly agree that in my experience, Claude's 'Projects' feel like the LLM is a real partner working on the project with me, where as adding knowledge stacks manually when a bit of context is required is like messaging a librarian on whatsapp in the middle of a white boarding session and asking them to quickly see if there's anything relevant on the shelves ... it frequently under performs. MSTY app does allow for some customisation of the embedding and chucking approaches as well as the option to include a re-ranking service via API - so with some experimentation and configuration as well as carefully constructed folder/system prompts it is possible to improve performance considerably. In doing so we learn another key element to the architecture required for personal AI. The next most powerful feature to take context based conversation to the next level is RAG: Retrieval Augmented Generation. In many tools you can experience RAG in a user-friendly way - there are a couple of different approaches I am making use of. i) Claude's 'Projects' feature was initially the most attractive reason to switch from ChatGPT (we can save an LLM debate for another day.) Projects offer 2 distinct capabilities and 1 missing feature that might surprise you. Like GPTs, you can give all chat sessions of the same project a parent system instruction that defines how they respond. This means you don't need to start every chat within the project 'You are a helpful ... ' If you give the system prompt enough project context and instruction, you can launch into any chat session and the model will retain the understanding of what you are trying to achieve. Let's take a simple holiday planning project as an example - you can explain common detail like number of traveler's, who they are, dates, locations etc and then launch different chat's for flights, accommodation etc. This approach gets around context limit and usage limit issues. You could run such an activity as one long chat session. But context and usage limits start to become strained as the conversation is fed back to the model every time to retain context. Things also become messy if you want to run variable simulations or different approaches and keep context clean between chats (but under the same project instruction). This exposes the missing feature (potentially on purpose for that reason) - there is no explicit memory retained between chat session within a project. The second chat doesn't recall the context learned in the first chat. This might seem counter intuitive, but it does come good for exploring hypotheticals and testing of ideas - dead end conversations aren't automatically committed to core corpus. Instead Anthropic have added a manual work around which works hand in hand with its second key feature: "Project Knowledge." Every project allows 20 files up to 30MB to be uploaded as context and referenced in all chats. This is a powerful step on the way to contextual memory that is vital for personal assistance. If you want just a thin layer of personalisation in your own AI, this capability might be all you need. In more recent experiments I have exposed just how little context might be required for LLMs to infer your intent and create the illusion of more intimate understanding. The icing on the cake for this mode is a graceful combination of Claude's Canvas mode and a manual 'add to project context' button - this allows you to store any generated works to the project knowledge repository and be referenced in all other chats thereafter. Thus, solving the cross-chat session memory issue since you could ask any relevant knowledge to be packaged up at the end of a session and stored into 'memory.' Furthermore, Claude keeps version control on these docs, so if the output of one session was project charter file you worked on. You could reference this in another chat and even iterate and update it from future sessions. This approach delivers a high level of assistance if you keep your domain and activity to a narrow field - ie a project. I've found this mode to be incredibly useful even for longer term objectives and 'themes' - fantastic at iteration, building up ideas over time. The context and saving of documentation means you can keep coming back and refining your project over and over again, having it grow with you. I've deployed this technique in synthesising my own personal digital systems and processes - blending and refining philosophy, method and technical development to build my Cathedral System in Obsidian - documenting the processes, writing scripts and capturing my own brand of digital gardening and second brain. ii) 'Knowledge Stacks' in the MSTY app is a more explicit and simpler use of RAG - but offers a wider and more flexible scope of potential context and application. In this approach, the information a chat session might require is pre-define and set up as a curated set of Knowledge or 'Knowledge Stack' - using selectable and customisable embedding models to chunk sources from uploaded documents, Video links and attractively for me: Obsidian Vaults. These stacks are invoked in any chat session (within a project or stand alone) as you would attach a single doc, or activate web search etc. This adds the function for the LLM to search the embeddings for relevant matches and use in responses. This approach divorces knowledge from specific projects, allowing it to be connected to different scenarios, all the while maintaining a level of data privacy since only the relevant chunks are brought into the chat, not entire documents. However, in practice this presents a couple of issues for the advancement of a personal AI. Firstly you are trading off the ability to update and evolve knowledge in concert with the AI for a the option of a larger and broader memory that is in a 'read-only' mode. Secondly this more rudimentary RAG set up is less accurate when you need to cite specific parts of text in knowledge. Claude's approach to RAG adds additional layers of 'Contextual Retrieval' and Re-ranking steps to the response to create a more performant outcome. I'd certainly agree that in my experience, Claude's 'Projects' feel like the LLM is a real partner working on the project with me, where as adding knowledge stacks manually when a bit of context is required is like messaging a librarian on whatsapp in the middle of a white boarding session and asking them to quickly see if there's anything relevant on the shelves ... it frequently under performs. MSTY app does allow for some customisation of the embedding and chucking approaches as well as the option to include a re-ranking service via API - so with some experimentation and configuration as well as carefully constructed folder/system prompts it is possible to improve performance considerably. In doing so we learn another key element to the architecture required for personal AI.