Xncode-Store-Retrieve

📄 Read Full Paper

We have developed a system that acts like an external memory for humans. By combining Augmented Reality (AR) glasses with advanced AI, this work explores how everyday experiences captured in first-person video can be encoded into language, stored in databases, and later retrieved when we need them most. This “Encode-Store-Retrieve” framework points toward a future where AR devices serve as practical memory augmentation assistants.

The Memory Challenge

Human memory is fallible. We forget where we placed our belongings, overlook details of past events, or struggle to recall information during critical moments. Lifelogging with AR devices offers a way to record everything we see, but raw video data is massive and impractical to search. Traditional video storage consumes terabytes per year, and existing retrieval methods are inefficient. The challenge is clear: How can we capture, store, and recall experiences in a way that is both lightweight and useful?

The Encode-Store-Retrieve Approach

We designed a memory augmentation agent inspired by human cognition: encoding, storing, and retrieving information. The workflow unfolds as follows:

Encode: Egocentric videos are transformed into detailed text descriptions using Ego-LLaVA, a fine-tuned vision-language model.
Store: These text encodings are converted into vector embeddings and stored in a Chroma database for efficient search.
Retrieve: When the user asks a question like 'Where did I leave my keys?', the system retrieves relevant memory chunks and uses GPT-4 to generate an answer.

Results That Outperform Human Memory

The Encode-Store-Retrieve agent was tested extensively:

Benchmark Performance: On the QA-Ego4D dataset, the system achieved a BLEU score of 8.3, outperforming traditional models that scored between 3.4 and 5.8.
User Study: Using HoloLens 2, participants compared their own recollection with the agent's answers. The AI significantly outperformed humans on episodic memory questions (average score 4.1/5 vs 2.5/5 for humans).
User Feedback: Participants valued the system's accuracy and detail, while noting concerns about privacy, constant camera use, and social acceptance.

Applications

This memory augmentation system opens doors to practical use cases:

Finding misplaced objects at home.
Supporting students or professionals during learning and meetings.
Helping researchers recall details in experiments or fieldwork.
Offering memory assistance for individuals with cognitive challenges.

Key Takeaways

Language encoding reduces storage demands dramatically compared to raw video.
Ego-LLaVA fine-tuning improves memory recall performance significantly.
Combining AR and AI makes practical external memory assistants possible.
Privacy and social comfort are key challenges that must be addressed for adoption.

Conclusion

Encode-Store-Retrieve demonstrates how AR and AI can combine to extend human memory. By transforming lifelogging videos into searchable language representations, this approach makes memory augmentation practical, efficient, and user-friendly. While limitations remain, the work represents a step toward a future where technology can act as a reliable external memory for our daily lives.

Encode-Store-Retrieve: Augmenting Human Memory through Language-Encoded Egocentric Perception

The Memory Challenge

The Encode-Store-Retrieve Approach

Results That Outperform Human Memory

Applications

Key Takeaways

Conclusion

Read more

MARC: Memory-Augmented RL Token Compression for Efficient Video Understanding

UGC-VideoCaptioner: An Omni UGC Video Detail Caption Model and New Benchmarks

Prompt-Driven Agentic Video Editing System: Autonomous Comprehension of Long-Form, Story-Driven Media