多臂赌博机

发表于 2019-07-10 | 分类于 blogs

多臂赌博机(Multi-armed Bandits)

1.1 问题描述

强化学习和监督学习的最大区别是，对于一个动作，RL给出的是评估(evaluation)，而SL给出的是判断或者说指导(instruction)。意思是说，RL通过价值函数告诉你这个动作有多好，而并不告诉你这个动作是最好的或最差的，SL正相反，他会告诉你哪个动作是正确的。当然也有一些情况，评估和指导可以联合起来训练模型，但是这里我们先用多臂赌博机来展示一下RL“给出评估”的特点，同时也展示一些最基本的RL方法。

阅读全文 »

线性分类器2

发表于 2019-07-08 | 分类于 blogs

**1. Support Vector Machine (SVM, Logistic regression and multi class classification

阅读全文 »

线性分类器1

发表于 2019-07-04 | 分类于 blogs

线性 = 齐次性 + 可加性，即$f(x)$被称为线性函数如果：

$f(ax)=af(x)$
$f(a+b)=f(a)+f(b)$

阅读全文 »

Dual Learning

发表于 2019-07-04 | 分类于 blogs

link to the paper

阅读全文 »

概率密度估计的非参数方法

发表于 2019-07-01 | 分类于 blogs

1. 问题定义：

给定观测样本$x_1, x_2, …, x_N$，设其独立同分布(iid)，但分布形式未知，如何从样本中估计出概率密度函数(pdf) ?

不用对pdf的形式做任何假设，直接用样本估计出整个函数。某种意义上，非参数估计也可以理解为无限参数的估计，因为对pdf形式的假设其实给分布定下了很强的限制，参数个数一般决定了模型复杂度，而非参数估计的限制非常弱，可以认为模型复杂度无穷大。

阅读全文 »

Word2Vec

发表于 2019-06-29 | 分类于 blogs

(NNLM paper) Y. Bengio, R. Ducharme, P. Vincent. A neural probabilistic language model. Journal of Machine Learning Research, 3:1137-1155, 2003.

(word2vec paper) Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. ICLR Workshop, 2013.

阅读全文 »

词向量模型

发表于 2019-06-15 | 分类于 blogs

These series of methods, including LSA, PLSA, LDA and GloVe, are based on statisticscalled topic models. They are not specifically designed for word representation, however, LSA, PLSA and GloVe do obtain word vectors during the training.

阅读全文 »

Linux相关

发表于 2018-05-09 | 分类于 computer

linux安装、常用命令、配置常用软件、远程连接

阅读全文 »

Gao Haoxiang

little steps

GitHub