The Platonic Representation Hypothesis

Sep 29, 2024 · 11 min read

This year, one of the most impactful papers delves into the fascinating idea that AI models are converging toward a shared understanding of reality, much like Plato’s concept of an ideal world.

RAG

Jul 20, 2024 · 9 min read

Retrieval Augmented Generation (RAG) - Unveiling the Simplicity and Magic of an LLM Innovation In today’s landscape, large language models (LLMs) are catalyzing transformative applications across various industries.

LLM

Oct 10, 2023 · 11 min read

Introduction to Large Language Model This blog summarizes the key components of LLM briefly.

Quantum Information

Sep 27, 2023 · 20 min read

Learning Notes for Quantum Computing - Part 1.

Cherish godsend gifts and make peace with myself -- self-exploration of a HSP during my quarter-life

July 31, 2022 · 10 min read

过几个月我的25岁就结束了，这并不是一篇写quarter-life crisis的blog，而是记录我这些年在各个方面的自我探索与自我和解。

Non-uniformity in First Order Non-convex Optimization

Oct 6, 2021 · 7 min read

Leveraging Non-uniformity in First-order Non-convex Optimization This is a co-work with my collaborators Jincheng, Bo, Dale, Csaba.

Desired characteristics for real-world RL agents

Oct 5, 2021 · 1 min read

Desired characteristics for real-world RL agents - My research internship project at BorealisAI This is the post of my project during my research internship (with my collaborators Pablo and Kry) at BorealisAI, as well as part of my Msc thesis.

An Operator View of PG Methods

Aug 6, 2020 · 8 min read

Introduction As I wroted in my recent post, I wonder since in PG algorithms, the update direction is not actually the gradient, then its not quite clear what an update actually does and why it finally converges to a promising policy.

Private Approximations of a Convex Hull in Low Dimensions

Jul 19, 2020 · 1 min read

Here’s my work with Or Sheffet lying on the intersection of differential privacy and computational geometry.

A Talk : Is the Policy Gradient a Gradient ?

Jul 16, 2020 · 1 min read

Here's my slides of a talk about the paper "Is the Policy Gradient a Gradient ?

Is the Policy Gradient a Gradient ?

Jun 4, 2020 · 5 min read

Introduction Policy gradient theorem is the cornerstone of policy gradient methods, and in the last post, I presented the proof of policy gradient therorem, which describes the gradient of the discounted objective w.

GradientThm

May 30, 2020 · 3 min read

The Proof of Policy Gradient Theorem Introduction Recall that policy gradient methods aims to directly optimize parameterized policies $\pi_{\theta}(a|s)$.

MPO Extension

Apr 23, 2020 · 4 min read

MPO Extension -- A more intuitive interpretation of MPO Introduction As stated in the last post, MPO is motivated from the perspective of "RL as inference".

MPO

Apr 20, 2020 · 8 min read

Maximum a Posterior Policy Optimization Background Policy gradient algorithms like TRPO or PPO in practice require carefully runed entropy regularization to prevent policy collaspe, moreover, there are works showing that the plausible performace of PPO comes from the code-level optimization.

TRPO and PPO

Feb 21, 2020 · 8 min read

TRPO and PPO -- A Reading Summary Introduction Generally speaking, goal of reinforcement learning is to find an optimal behaviour strategy which maximizes rewards.

Welcome to my blog

Feb 21, 2020 · 1 min read

Welcome to my blog 😊 I’ll share my reading summaries and maybe also some random research thoughts/feelings, hopefully, in every two weeks.