Evolution of Agent
The Platonic Representation Hypothesis
· 11 min read
This year, one of the most impactful papers delves into the fascinating idea that AI models are converging toward a shared understanding of reality, much like Plato's concept of an ideal world.
RAG
· 9 min read
Retrieval Augmented Generation (RAG) - Unveiling the Simplicity and Magic of an LLM Innovation In today’s landscape, large language models (LLMs) are catalyzing transformative applications across various industries.
LLM
· 11 min read
Introduction to Large Language Model This blog summarizes the key components of LLM briefly.
Quantum Information
· 20 min read
Learning Notes for Quantum Computing - Part 1.
Non-uniformity in First Order Non-convex Optimization
· 7 min read
Leveraging Non-uniformity in First-order Non-convex Optimization This is a co-work with my collaborators Jincheng, Bo, Dale, Csaba.
Desired characteristics for real-world RL agents
· 1 min read
Desired characteristics for real-world RL agents - My research internship project at BorealisAI This is the post of my project during my research internship (with my collaborators Pablo and Kry) at BorealisAI, as well as part of my Msc thesis.
An Operator View of PG Methods
· 8 min read
Introduction As I wroted in my recent post, I wonder since in PG algorithms, the update direction is not actually the gradient, then its not quite clear what an update actually does and why it finally converges to a promising policy.
Private Approximations of a Convex Hull in Low Dimensions
· 1 min read
Here’s my work with Or Sheffet lying on the intersection of differential privacy and computational geometry.
A Talk : Is the Policy Gradient a Gradient ?
· 1 min read
Here's my slides of a talk about the paper "Is the Policy Gradient a Gradient ?
Is the Policy Gradient a Gradient ?
· 5 min read
Introduction Policy gradient theorem is the cornerstone of policy gradient methods, and in the last post, I presented the proof of policy gradient therorem, which describes the gradient of the discounted objective w.
GradientThm
· 3 min read
The Proof of Policy Gradient Theorem Introduction Recall that policy gradient methods aims to directly optimize parameterized policies $\pi_{\theta}(a|s)$.
MPO Extension
· 4 min read
MPO Extension -- A more intuitive interpretation of MPO Introduction As stated in the last post, MPO is motivated from the perspective of "RL as inference".
MPO
· 8 min read
Maximum a Posterior Policy Optimization Background Policy gradient algorithms like TRPO or PPO in practice require carefully runed entropy regularization to prevent policy collaspe, moreover, there are works showing that the plausible performace of PPO comes from the code-level optimization.
TRPO and PPO
· 8 min read
TRPO and PPO -- A Reading Summary Introduction Generally speaking, goal of reinforcement learning is to find an optimal behaviour strategy which maximizes rewards.
Welcome to my blog
· 1 min read
Welcome to my blog 😊
I’ll share my reading summaries and maybe also some random research thoughts/feelings, hopefully, in every two weeks.