長・短期記憶

長・短期記憶（ちょう・たんききおく、英: Long short-term memory、略称: LSTM）は、深層学習（ディープラーニング）の分野において用いられる人工回帰型ニューラルネットワーク（RNN）アーキテクチャである^[1]。標準的な順伝播型ニューラルネットワークとは異なり、LSTMは自身を「汎用計算機」（すなわち、チューリングマシンが計算可能なことを何でも計算できる）にするフィードバック結合を有する^[2]。LSTMは（画像といった）単一のデータ点だけでなく、（音声あるいは動画といった）全データ配列を処理できる。例えば、LSTMは分割されていない、つながった手書き文字認識^[3]や音声認識^[4]^[5]といった課題に適用可能である。ブルームバーグビジネスウィーク誌は「これらの力がLSTMを、病気の予測から作曲まで全てに使われる、ほぼ間違いなく最も商業的なAIの成果としている」と書いた^[6]。

一般的なLSTMユニットは、記憶セル、入力ゲート、出力ゲート、および忘却ゲートから構成される。記憶セルは任意の時間間隔にわたって値を記憶し、3つの「ゲート」は記憶セルを出入りする情報の流れを制御する。

LSTMネットワークは時系列データに基づく分類、処理（英語版）、予測によく適している。これは、LSTMネットワークが時系列中の重要な事象間の未知の期間の時間差となることができるためである。LSTMは、従来のRNNを訓練する際に遭遇しうる勾配爆発および消失問題に対処するために開発された。ギャップの長さに対する相対的な鈍感さが、多数の応用におけるRNNや隠れマルコフモデル、その他の系列学習法に対するLSTMの優位性である^[要出典]。

LSTMは2010年代まで主流の地位にあったが、2017年以降は更に高性能なTransformerという基盤モデルに取って代わられていった。

歴史

LSTMは1997年にゼップ・ホッフライター（英語版）とユルゲン・シュミットフーバー（英語版）によって提唱された^[1]。Constant Error Carousel（定誤差カルーセル、CEC）ユニットの導入によって、LSTMは勾配爆発および消失問題を解決しようとする。LSTMブロックの最初の型はセル、入力ゲート、および出力ゲートを含んでいた^[7]。

1999年、フェリックス・ゲルス（英語版）と彼のアドバイザーのユルゲン・シュミットフーバーとFred CumminsはLSTMアーキテクチャへ忘却ゲート（「保持ゲート」とも）を導入した^[8]。これはLSTMが自身の状態をリセットすることを可能にする^[7]。2000年、ゲルス、シュミットフーバー、CumminsはLSTMアーキテクチャへ覗き穴（peehole）結合（セルからゲートへの結合）を追加した^[9]。加えて、出力活性化関数は削除された^[7]。

2014年、Kyunghyun Cho（조 경현）らはゲート付き回帰型ユニット（Gated recurrent unit、GRU）と呼ばれる単純化した変異型を提案した^[10] 。

その他の成功の中で、LSTMは自然言語テキスト圧縮^[11]と分割されていないつながった手書き文字認識^[12]で新記録を達成し、IDCAR（英語版）手書き文字認識大会（2009年）で優勝した。LSTMネットワークは2013年に、古典的なTIMIT（英語版）自然音声データセットで新記録となる音素誤り率17.7%を達成したネットワークの主要な構成要素であった^[13]。

2016年時点で、Google、Apple、マイクロソフトを含む主要なテクノロジー企業は新製品の基本要素としてLSTMを使用していた^[14]。例えば、Googleはスマートフォン上での音声認識^[15]^[16]、スマートアシスタントAllo^[17]、およびGoogle翻訳^[18]^[19]のためにLSTMを使用した。AppleはiPhone上の「Quicktype」機能^[20]^[21]、Siri^[22]のためにLSTMを使用する。AmazonはAmazon AlexaのためにLSTMを使用する^[23]。

2017年、FacebookはLSTMネットワークを使用して毎日45億回の自動翻訳を実行した^[24]。

2017年、ミシガン州立大学、IBM基礎研究所、コーネル大学の研究者らは、Knowledge Discovery and Data Mining（KDD）会議において研究発表を行った^[25]^[26]^[27]。彼らの研究は、広く使われるLSTMニューラルネットワークよりも特定のデータセットに対して優れた性能を示す新規ニューラルネットワークに関するものである。

さらに2017年、マイクロソフトは16万5千語の語彙を含むSwitchboardコーパスにおいて95.1%の認識精度に達したと報告した。この手法は「対話セッションに基づく長・短期記憶」を使用した^[28]。

着想

理論的には、古典的な（バニラな）RNNは入力配列における任意の長期依存性を追跡できる。バニラRNNの問題は本質的に計算的（あるいは実務的な）なものである。誤差逆伝播法を使ってバニラRNNを訓練する時、逆伝播される勾配は「消失」（すなわち勾配がゼロに収束する）あるいは「爆発」（すなわち無限に発散する）しうる。これは、計算が有限精度数（英語版）を用いる過程を含むためである。LSTMユニットを用いるRNNは、LSTMユニットが勾配を「不変」のまま流れることも可能とするため、勾配消失問題を部分的に解決する。しかしながら、LSTMネットワークは勾配爆発問題にはまだ悩まされうる^[29]。

アーキテクチャ

複数のLSTMユニットのアーキテクチャが存在する。一般的なアーキテクチャは、セル（LSTMユニットの記憶部分）と大抵ゲートと呼ばれるLSTMユニット内部の情報の流れの3つの「調整器」（入力ゲート、出力ゲート、忘却ゲート）から構成される。LSTMユニットの一部の変異型はこれらのゲートの1つ以上を持たない、あるいは別のゲートを持つこともある。例えばゲート付き回帰型ユニット（GRU）は出力ゲートを持たない。

直感的には、「セル」は入力配列中の要素間の依存性を追跡するために必要である。「入力ゲート」はセルへの新たな値の流れの度合いを制御し、「忘却ゲート」はセル中に値が留まる度合いを制御し、「出力ゲート」はセル中の値がLSTMユニットの出力活性化を計算するために使われる度合いを制御する。LSTM「ゲート」の活性化関数にはロジスティック関数（英語版）が使われることが多い。

LSTMゲートへの、そしてLSTMからの結合が存在し、そのうちいくつかは回帰結合（リカレント）している。訓練中に学習される必要があるこれらの結合の重みが、ゲートがどのように動作するかを決定する。

変異型

以下の方程式において、小文字の変数はベクトルを表わす。行列 $W_{q}$ および $U_{q}$ はそれぞれ入力および回帰結合の重みを含み、添字 $_{q}$ は、計算される活性化に依存して、入力ゲート $i$ 、出力ゲート $o$ 、忘却ゲート $f$ 、または記憶セル $c$ になりうる。この節では、ゆえに「ベクトル表記」を使用する。そのため、例えば、 $c_{t}\in \mathbb {R} ^{h}$ は単に1つのLSTMユニットの1つのセルではなく、 $h$ 個のLSTMユニットのセルを含む。演算子 $\circ$ はアダマール積（要素ごとの積）を示す。

忘却ゲートを持つLSTM

忘却ゲートを持つLSTMユニットの順方向通路のための方程式のコンパクト形は以下の通りである^[1]^[9]。

{\begin{aligned}f_{t}&=\sigma _{g}(W_{f}x_{t}+U_{f}h_{t-1}+b_{f})\\i_{t}&=\sigma _{g}(W_{i}x_{t}+U_{i}h_{t-1}+b_{i})\\o_{t}&=\sigma _{g}(W_{o}x_{t}+U_{o}h_{t-1}+b_{o})\\c_{t}&=f_{t}\circ c_{t-1}+i_{t}\circ \sigma _{c}(W_{c}x_{t}+U_{c}h_{t-1}+b_{c})\\h_{t}&=o_{t}\circ \sigma _{h}(c_{t})\end{aligned}}

上式において、初期値は $c_{0}=0$ および $h_{0}=0$ 、を示す。添字 $t$ は時間ステップにインデックスを付ける。

変数

$x_{t}\in \mathbb {R} ^{d}$ : LSTMユニットへの入力ベクトル
$f_{t}\in \mathbb {R} ^{h}$ : 忘却ゲートの活性化ベクトル
$i_{t}\in \mathbb {R} ^{h}$ : 入力ゲートの活性化ベクトル
$o_{t}\in \mathbb {R} ^{h}$ : 出力ゲートの活性化ベクトル
$h_{t}\in \mathbb {R} ^{h}$ : LSTMユニットの出力ゲートとも呼ばれる隠れ状態ベクトル
$c_{t}\in \mathbb {R} ^{h}$ : セル状態ベクトル
$W\in \mathbb {R} ^{h\times d}$ 、 $U\in \mathbb {R} ^{h\times h}$ 、 $b\in \mathbb {R} ^{h}$ : 訓練中に学習される必要がある重み行列およびバイアスベクトルのパラメータ

上付き文字 $d$ および $h$ はそれぞれ入力要素の数および隠れユニットの数を示す。

活性化関数

$\sigma _{g}$ : シグモイド関数
$\sigma _{c}$ : 双曲線正接関数
$\sigma _{h}$ : 双曲線正接関数または、覗き穴LSTMの論文^[30]^[31]が提案しているように $\sigma _{h}(x)=x$

重み行列と活性化関数を集約することで

${\begin{aligned}(f_{t}^{T},i_{t}^{T},o_{t}^{T},ci_{t}^{T})^{T}&=\sigma (Wx_{t}+Uh_{t-1}+b)\\c_{t}&=f_{t}\circ c_{t-1}+i_{t}\circ ci_{t}\\h_{t}&=o_{t}\circ \sigma _{h}(c_{t})\end{aligned}}$

となる。この式から $c_{t-1}$ が直接回帰し、 $h_{t-1}$ がゲート・セルを通じて回帰していることがわかる。また入力と重みの積は時間を跨いで回帰無しに計算できることがわかる（ $WX=W{\bigl (}{\begin{smallmatrix}x_{0}&x_{1}&...&x_{n}\end{smallmatrix}}{\bigr )}$ で一括計算が可能）。

覗き穴LSTM

{\displaystyle i} — 入力（ $i$ ）、出力（ $o$ ）、および忘却（ $f$ ）ゲートを持つ覗き穴LSTMユニット。これらのゲートのそれぞれは順伝播型（または多層）ニューラルネットワークにおける「標準的」なニューロンとして考えることができる。すなわち、それらは（活性化関数を用いて）加重和の活性化を計算する。 $i_{t}$ 、 $o_{t}$ および $f_{t}$ はそれぞれ時間ステップ $t$ における入力、出力、および忘却ゲートの活性化を表わす。記憶セル $c$ から3つのゲート $i$ 、 $o$ 、および $f$ へ出ていく3本の矢印は「覗き穴」結合を表わす。これらの覗き穴結合は実際には時間ステップ $t-1$ における記憶セル $c$ の活性化の寄与（すなわち、図が示唆するように、 $c_{t}$ ではなく $c_{t-1}$ の寄与）を示す。言い換えれば、ゲート $i$ 、 $o$ 、および $f$ は時間ステップ $t$ におけるそれらの活性化（すなわち $i_{t}$ 、 $o_{t}$ および $f_{t}$ ）を計算し、時間ステップ $t-1$ における記憶セル $c$ の活性化（すなわち $c_{t-1}$ ）も考慮する。記憶セルから出る単一の左から右への矢印は覗き穴結合ではなく、 $c_{t}$ を示す。 $\times$ 記号を含む小さな丸は出力間の要素毎の乗算を表わす。Sのような曲線を含む大きな丸は加重和への（シグモイド関数のような）微分可能な関数の適用を表わす。 LSTMには他にも多くの種類が存在する^[7]。

右図は覗き穴結合を持つLSTMユニット（すなわち覗き穴LSTM）の図式的な表現である^[30]^[31]。覗き穴結合によって、ゲートが定誤差カルーセル（CEC。その活性化がセル状態である）へアクセスすることが可能となる^[32]。 $h_{t-1}$ は使われず、ほんどの場所で $c_{t-1}$ が代わりに使われる。

{\begin{aligned}f_{t}&=\sigma _{g}(W_{f}x_{t}+U_{f}c_{t-1}+b_{f})\\i_{t}&=\sigma _{g}(W_{i}x_{t}+U_{i}c_{t-1}+b_{i})\\o_{t}&=\sigma _{g}(W_{o}x_{t}+U_{o}c_{t-1}+b_{o})\\c_{t}&=f_{t}\circ c_{t-1}+i_{t}\circ \sigma _{c}(W_{c}x_{t}+U_{c}c_{t-1}+b_{c})\\h_{t}&=o_{t}\circ \sigma _{h}(c_{t})\end{aligned}}

覗き穴畳み込みLSTM

覗き穴畳み込みLSTM^[33]。 $*$ は畳み込み演算子を示す。

{\begin{aligned}f_{t}&=\sigma _{g}(W_{f}*x_{t}+U_{f}*h_{t-1}+V_{f}\circ c_{t-1}+b_{f})\\i_{t}&=\sigma _{g}(W_{i}*x_{t}+U_{i}*h_{t-1}+V_{i}\circ c_{t-1}+b_{i})\\o_{t}&=\sigma _{g}(W_{o}*x_{t}+U_{o}*h_{t-1}+V_{o}\circ c_{t}+b_{o})\\c_{t}&=f_{t}\circ c_{t-1}+i_{t}\circ \sigma _{c}(W_{c}*x_{t}+U_{c}*h_{t-1}+b_{c})\\h_{t}&=o_{t}\circ \sigma _{h}(c_{t})\end{aligned}}

訓練

LSTMを用いるRNNは、一連の訓練において、教師あり学習のやり方で訓練できる。訓練では、最適化プロセス中で必要な勾配を計算するための通時的誤差逆伝播法（英語版）（Backpropagation through time、BPTT）と組み合わせて最急降下法のような最適化アルゴリズムを使って、（LSTMネットワークの出力層における）誤差の微分に比例してLSTMネットワークの個々の重みを変化させる。

標準的なRNNに対して最急降下法を使用することの問題点は、誤差勾配が重要な事象間の時間のずれの大きさにしたがって指数関数的にすばやく消失することである。これは、 $W$ のスペクトル半径が1よりも小さいと $\lim _{n\to \infty }W^{n}=0$ となるためである^[34]^[35]。

しかしながら、LSTMユニットを使うと、誤差値が出力層から逆伝播される時、誤差はLSTMユニットのセル内に留まる。この「誤差カルーセル」はLSTMユニットのゲートが値のカットオフを学習するまで、個々のゲートへ誤差を絶えずフィードバックする。

CTCスコア関数

多くの応用がLSTM RNNのスタックを使用し^[36]、訓練セット中のラベル配列の確率を最大化する重み行列を探すために、それらをコネクショニスト時系列分類（英語版）（CTC）によって訓練する^[37]。CTCはアラインメントと認識の両方を達成する。

代替手段

特に「教師」（すなわち訓練ラベル）が存在しない時は、Neuroevolution^[38]または⽅策勾配法によってLSTM（の一部）を訓練するのが有利なことがありうる。

成功

LSTMユニットを持つRNNを教師なしで訓練した複数の成功例がある。

2018年、ビル・ゲイツは、OpenAIによって開発されたボットがゲームDota 2で人間を破ることができた時、これを「人工知能の進歩における巨大な一里塚」と呼んだ^[39]。OpenAI Fiveは5つの独立した、しかし協調したニューラルネットワークから成る。個々のネットワークは教師なしで方策勾配法によって訓練され、現在のゲーム状態を見て、複数の可能な動作から動作を出す単層の1024ユニットLSTMを含む^[39] 。

2018年、OpenAIはかつてないほどの器用さで物体を巧みに扱う人間のようなロボットハンドを制御するために方策勾配法によって同様のLSTMを訓練した^[40]。

2019年、DeepMindのプログラムAlphaStarは複雑なビデオゲームスタークラフトに秀でるためにディープなLSTMコアを使用した^[41]。これは、人工汎用知能へ向けた重要な前進と見なされた^[41]。

特性

記憶力

単純RNNと比較してより長い系列の情報を保持する能力を持つ。一方、長期記憶を評価するCopyingタスクでは系列長200以上での学習に失敗することが知られている^[42]。

応用

LSTMの応用には以下の事柄が含まれる。

ロボット制御（英語版）^[43]
時系列予想^[38]
音声認識^[44]^[45]^[46]
リズム学習^[31]
作曲^[47]
文法学習^[48]^[30]^[49]
手書き文字認識^[50]^[51]
人物行動認識^[52]
手話学習^[53]
タンパク質相同性検出^[54]
タンパク質の細胞内局在の予測^[55]
時系列異常検出^[56]
ビジネスプロセス管理の分野におけるいくつかの予測課題^[57]
医療パスにおける予測^[58]
意味的構文解析（英語版）^[59]
被写体共セグメンテーション（英語版）^[60]^[61]

出典

^ ^a ^b ^c Sepp Hochreiter; Jürgen Schmidhuber (1997). “Long short-term memory”. Neural Computation 9 (8): 1735–1780. doi:10.1162/neco.1997.9.8.1735. PMID 9377276.
^ Siegelmann, Hava T.; Sontag, Eduardo D. (1992). On the Computational Power of Neural Nets. COLT '92. 440–449. doi:10.1145/130385.130432. ISBN 978-0897914970
^ Graves, A.; Liwicki, M.; Fernandez, S.; Bertolami, R.; Bunke, H.; Schmidhuber, J. (2009). “A Novel Connectionist System for Improved Unconstrained Handwriting Recognition”. IEEE Transactions on Pattern Analysis and Machine Intelligence 31 (5): 855–868. doi:10.1109/tpami.2008.137. PMID 19299860.
^ Sak, Hasim (2014年). “Long Short-Term Memory recurrent neural network architectures for large scale acoustic modeling”. 2019年4月3日閲覧。
^ Li, Xiangang; Wu, Xihong (15 October 2014). "Constructing Long Short-Term Memory based Deep Recurrent Neural Networks for Large Vocabulary Speech Recognition". arXiv:1410.4281 [cs.CL]。
^ Vance, Ashlee (May 15, 2018). “Quote: These powers make LSTM arguably the most commercial AI achievement, used for everything from predicting diseases to composing music.”. Bloomberg Business Week 2019年1月16日閲覧。
^ ^a ^b ^c ^d Klaus Greff; Rupesh Kumar Srivastava; Jan Koutník; Bas R. Steunebrink; Jürgen Schmidhuber (2015). “LSTM: A Search Space Odyssey”. IEEE Transactions on Neural Networks and Learning Systems 28 (10): 2222–2232. arXiv:1503.04069. doi:10.1109/TNNLS.2016.2582924. PMID 27411231.
^ Felix Gers; Jürgen Schmidhuber; Fred Cummins (1999). “Learning to Forget: Continual Prediction with LSTM”. Proc. ICANN'99, IEE, London: 850–855.
^ ^a ^b Felix A. Gers; Jürgen Schmidhuber; Fred Cummins (2000). “Learning to Forget: Continual Prediction with LSTM”. Neural Computation 12 (10): 2451–2471. doi:10.1162/089976600300015015.
^ Cho, Kyunghyun; van Merrienboer, Bart; Gulcehre, Caglar; Bahdanau, Dzmitry; Bougares, Fethi; Schwenk, Holger; Bengio, Yoshua (2014). "Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation". arXiv:1406.1078 [cs.CL]。
^ “The Large Text Compression Benchmark” (英語). 2017年1月13日閲覧。
^ Graves, A.; Liwicki, M.; Fernández, S.; Bertolami, R.; Bunke, H.; Schmidhuber, J. (May 2009). “A Novel Connectionist System for Unconstrained Handwriting Recognition”. IEEE Transactions on Pattern Analysis and Machine Intelligence 31 (5): 855–868. doi:10.1109/tpami.2008.137. ISSN 0162-8828. PMID 19299860.
^ Graves, Alex; Mohamed, Abdel-rahman; Hinton, Geoffrey (22 March 2013). "Speech Recognition with Deep Recurrent Neural Networks". arXiv:1303.5778 [cs.NE]。
^ “With QuickType, Apple wants to do more than guess your next text. It wants to give you an AI.” (英語). WIRED. (2016-06-14) 2016年6月16日閲覧。.
^ Beaufays, Françoise (August 11, 2015). “The neural networks behind Google Voice transcription”. Research Blog 2017年6月27日閲覧。
^ Sak, Haşim; Senior, Andrew; Rao, Kanishka; Beaufays, Françoise; Schalkwyk, Johan (September 24, 2015). “Google voice search: faster and more accurate” (英語). Research Blog 2017年6月27日閲覧。
^ Khaitan, Pranav (May 18, 2016). “Chat Smarter with Allo”. Research Blog 2017年6月27日閲覧。
^ Wu, Yonghui; Schuster, Mike; Chen, Zhifeng; Le, Quoc V.; Norouzi, Mohammad; Macherey, Wolfgang; Krikun, Maxim; Cao, Yuan; Gao, Qin (26 September 2016). "Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation". arXiv:1609.08144 [cs.CL]。
^ Metz, Cade (September 27, 2016). “An Infusion of AI Makes Google Translate More Powerful Than Ever | WIRED”. Wired 2017年6月27日閲覧。.
^ Efrati, Amir (June 13, 2016). “Apple's Machines Can Learn Too”. The Information. 2017年6月27日閲覧。
^ Ranger, Steve (June 14, 2016). “iPhone, AI and big data: Here's how Apple plans to protect your privacy | ZDNet”. ZDNet 2017年6月27日閲覧。
^ Smith, Chris (2016年6月13日). “iOS 10: Siri now works in third-party apps, comes with extra AI features”. BGR. 2017年6月27日閲覧。
^ Vogels, Werner (30 November 2016). “Bringing the Magic of Amazon AI and Alexa to Apps on AWS. - All Things Distributed”. www.allthingsdistributed.com. 2017年6月27日閲覧。
^ Ong, Thuy (4 August 2017). “Facebook's translations are now powered completely by AI”. www.allthingsdistributed.com. 2019年2月15日閲覧。
^ “Patient Subtyping via Time-Aware LSTM Networks”. msu.edu. 21 Nov 2018閲覧。
^ “Patient Subtyping via Time-Aware LSTM Networks”. Kdd.org. 24 May 2018閲覧。
^ “SIGKDD”. Kdd.org. 24 May 2018閲覧。
^ Haridy, Rich (August 21, 2017). “Microsoft's speech recognition system is now as good as a human”. newatlas.com. 2017年8月27日閲覧。
^ “Why can RNNs with LSTM units also suffer from "exploding gradients"?”. Cross Validated. 25 December 2018閲覧。
^ ^a ^b ^c Gers, F. A.; Schmidhuber, J. (2001). “LSTM Recurrent Networks Learn Simple Context Free and Context Sensitive Languages”. IEEE Transactions on Neural Networks 12 (6): 1333–1340. doi:10.1109/72.963769. PMID 18249962.
^ ^a ^b ^c Gers, F.; Schraudolph, N.; Schmidhuber, J. (2002). “Learning precise timing with LSTM recurrent networks”. Journal of Machine Learning Research 3: 115–143.
^ Gers, F. A.; Schmidhuber, E. (November 2001). “LSTM recurrent networks learn simple context-free and context-sensitive languages”. IEEE Transactions on Neural Networks 12 (6): 1333–1340. doi:10.1109/72.963769. ISSN 1045-9227. PMID 18249962.
^ Xingjian Shi; Zhourong Chen; Hao Wang; Dit-Yan Yeung; Wai-kin Wong; Wang-chun Woo (2015). “Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting”. Proceedings of the 28th International Conference on Neural Information Processing Systems: 802–810. arXiv:1506.04214. Bibcode: 2015arXiv150604214S.
^ S. Hochreiter. Untersuchungen zu dynamischen neuronalen Netzen. Diploma thesis, Institut f. Informatik, Technische Univ. Munich, 1991.
^ Hochreiter, S.; Bengio, Y.; Frasconi, P.; Schmidhuber, J. (2001). “Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies (PDF Download Available)”. A Field Guide to Dynamical Recurrent Neural Networks.. IEEE Press
^ Fernández, Santiago; Graves, Alex; Schmidhuber, Jürgen (2007). “Sequence labelling in structured domains with hierarchical recurrent neural networks”. Proc. 20th Int. Joint Conf. On Artificial In℡ligence, Ijcai 2007: 774–779.
^ Graves, Alex; Fernández, Santiago; Gomez, Faustino (2006). “Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks”. In Proceedings of the International Conference on Machine Learning, ICML 2006: 369–376.
^ ^a ^b Wierstra, Daan; Schmidhuber, J.; Gomez, F. J. (2005). “Evolino: Hybrid Neuroevolution/Optimal Linear Search for Sequence Learning”. Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI), Edinburgh: 853–858. https://www.academia.edu/5830256.
^ ^a ^b Rodriguez, Jesus (July 2, 2018). “The Science Behind OpenAI Five that just Produced One of the Greatest Breakthrough in the History of AI”. Towards Data Science 2019年1月15日閲覧。
^ “Learning Dexterity”. OpenAI Blog. (July 30, 2018) 2019年1月15日閲覧。
^ ^a ^b Stanford, Stacy (January 25, 2019). “DeepMind’s AI, AlphaStar Showcases Significant Progress Towards AGI”. Medium ML Memoirs 2019年1月15日閲覧。
^ " The LSTM is able to beat the baseline only for 100 times steps." Arjovsky, et al. (2015). Unitary Evolution Recurrent Neural Networks.
^ Mayer, H.; Gomez, F.; Wierstra, D.; Nagy, I.; Knoll, A.; Schmidhuber, J. (October 2006). A System for Robotic Heart Surgery that Learns to Tie Knots Using Recurrent Neural Networks. 543–548. doi:10.1109/IROS.2006.282190. ISBN 978-1-4244-0258-8
^ Graves, A.; Schmidhuber, J. (2005). “Framewise phoneme classification with bidirectional LSTM and other neural network architectures”. Neural Networks 18 (5–6): 602–610. doi:10.1016/j.neunet.2005.06.042. PMID 16112549.
^ Fernández, Santiago; Graves, Alex; Schmidhuber, Jürgen (2007). An Application of Recurrent Neural Networks to Discriminative Keyword Spotting. ICANN'07. Berlin, Heidelberg: Springer-Verlag. 220–229. ISBN 978-3540746935
^ Graves, Alex; Mohamed, Abdel-rahman; Hinton, Geoffrey (2013). “Speech Recognition with Deep Recurrent Neural Networks”. Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on: 6645–6649.
^ Eck, Douglas; Schmidhuber, Jürgen (2002-08-28). Learning the Long-Term Structure of the Blues. Lecture Notes in Computer Science. 2415. Springer, Berlin, Heidelberg. 284–289. doi:10.1007/3-540-46084-5_47. ISBN 978-3540460848
^ Schmidhuber, J.; Gers, F.; Eck, D.; Schmidhuber, J.; Gers, F. (2002). “Learning nonregular languages: A comparison of simple recurrent networks and LSTM”. Neural Computation 14 (9): 2039–2041. doi:10.1162/089976602320263980. PMID 12184841.
^ Perez-Ortiz, J. A.; Gers, F. A.; Eck, D.; Schmidhuber, J. (2003). “Kalman filters improve LSTM network performance in problems unsolvable by traditional recurrent nets”. Neural Networks 16 (2): 241–250. doi:10.1016/s0893-6080(02)00219-8. PMID 12628609.
^ A. Graves, J. Schmidhuber. Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks. Advances in Neural Information Processing Systems 22, NIPS'22, pp 545–552, Vancouver, MIT Press, 2009.
^ Graves, Alex; Fernández, Santiago; Liwicki, Marcus; Bunke, Horst; Schmidhuber, Jürgen (2007). Unconstrained Online Handwriting Recognition with Recurrent Neural Networks. NIPS'07. USA: Curran Associates Inc.. 577–584. ISBN 9781605603520
^ M. Baccouche, F. Mamalet, C Wolf, C. Garcia, A. Baskurt. Sequential Deep Learning for Human Action Recognition. 2nd International Workshop on Human Behavior Understanding (HBU), A.A. Salah, B. Lepri ed. Amsterdam, Netherlands. pp. 29–39. Lecture Notes in Computer Science 7065. Springer. 2011
^ Huang, Jie; Zhou, Wengang; Zhang, Qilin; Li, Houqiang; Li, Weiping (30 January 2018). "Video-based Sign Language Recognition without Temporal Segmentation". arXiv:1801.10111 [cs.CV]。
^ Hochreiter, S.; Heusel, M.; Obermayer, K. (2007). “Fast model-based protein homology detection without alignment”. Bioinformatics 23 (14): 1728–1736. doi:10.1093/bioinformatics/btm247. PMID 17488755.
^ Thireou, T.; Reczko, M. (2007). “Bidirectional Long Short-Term Memory Networks for predicting the subcellular localization of eukaryotic proteins”. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB) 4 (3): 441–446. doi:10.1109/tcbb.2007.1015. PMID 17666763.
^ Malhotra, Pankaj; Vig, Lovekesh; Shroff, Gautam; Agarwal, Puneet (April 2015). “Long Short Term Memory Networks for Anomaly Detection in Time Series”. European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning — ESANN 2015.
^ Tax, N.; Verenich, I.; La Rosa, M.; Dumas, M. (2017). Predictive Business Process Monitoring with LSTM neural networks. Lecture Notes in Computer Science. 10253. 477–492. arXiv:1612.02130. doi:10.1007/978-3-319-59536-8_30. ISBN 978-3-319-59535-1
^ Choi, E.; Bahadori, M.T.; Schuetz, E.; Stewart, W.; Sun, J. (2016). “Doctor AI: Predicting Clinical Events via Recurrent Neural Networks”. Proceedings of the 1st Machine Learning for Healthcare Conference: 301–318. arXiv:1511.05942. Bibcode: 2015arXiv151105942C.
^ Jia, Robin; Liang, Percy (2016-06-11). "Data Recombination for Neural Semantic Parsing". arXiv:1606.03622 [cs].
^ Wang, Le; Duan, Xuhuan; Zhang, Qilin; Niu, Zhenxing; Hua, Gang; Zheng, Nanning (2018-05-22). “Segment-Tube: Spatio-Temporal Action Localization in Untrimmed Videos with Per-Frame Segmentation”. Sensors 18 (5): 1657. doi:10.3390/s18051657. ISSN 1424-8220. PMC 5982167. PMID 29789447.
^ Duan, Xuhuan; Wang, Le; Zhai, Changbo; Zheng, Nanning; Zhang, Qilin; Niu, Zhenxing; Hua, Gang (2018). Joint Spatio-Temporal Action Localization in Untrimmed Videos with Per-Frame Segmentation. 25th IEEE International Conference on Image Processing (ICIP). doi:10.1109/icip.2018.8451692. ISBN 978-1-4799-7061-2。

外部リンク

Recurrent Neural Networks with over 30 LSTM papers by Jürgen Schmidhuber's group at IDSIA
Gers, Felix (2001年). “Long Short-Term Memory in Recurrent Neural Networks”. PhD thesis. 2019年4月3日閲覧。
Gers, Felix A.; Schraudolph, Nicol N.; Schmidhuber, Jürgen (Aug 2002). “Learning precise timing with LSTM recurrent networks”. Journal of Machine Learning Research 3: 115–143 2019年4月3日閲覧。.
Abidogun, Olusola Adeniyi (2005年). “Data Mining, Fraud Detection and Mobile Telecommunications: Call Pattern Analysis with Unsupervised Neural Networks”. Master's Thesis. hdl:11394/249. May 22, 2012時点のオリジナルよりアーカイブ。2019年4月3日閲覧。
- original with two chapters devoted to explaining recurrent neural networks, especially LSTM.
“A generalized LSTM-like training algorithm for second-order recurrent neural networks” (2010年). 2019年4月3日閲覧。 “High-performing extension of LSTM that has been simplified to a single node type and can train arbitrary architectures”
Herta, Christian. “How to implement LSTM in Python with Theano”. Tutorial. 2019年4月3日閲覧。
Chevalier, Guillaume. Tutorial: How to use LSTMs with TensorFlow in Python on cellphone sensor data - GitHub

[lstm1997-1] Sepp Hochreiter; Jürgen Schmidhuber (1997). “Long short-term memory”. Neural Computation 9 (8): 1735–1780. doi:10.1162/neco.1997.9.8.1735. PMID 9377276.

[Siegelmann92-2] Siegelmann, Hava T.; Sontag, Eduardo D. (1992). On the Computational Power of Neural Nets. COLT '92. 440–449. doi:10.1145/130385.130432. ISBN 978-0897914970

[3] Graves, A.; Liwicki, M.; Fernandez, S.; Bertolami, R.; Bunke, H.; Schmidhuber, J. (2009). “A Novel Connectionist System for Improved Unconstrained Handwriting Recognition”. IEEE Transactions on Pattern Analysis and Machine Intelligence 31 (5): 855–868. doi:10.1109/tpami.2008.137. PMID 19299860.

[sak2014-4] Sak, Hasim (2014年). “Long Short-Term Memory recurrent neural network architectures for large scale acoustic modeling”. 2019年4月3日閲覧。

[liwu2015-5] Li, Xiangang; Wu, Xihong (15 October 2014). "Constructing Long Short-Term Memory based Deep Recurrent Neural Networks for Large Vocabulary Speech Recognition". arXiv:1410.4281 [cs.CL]。

[bloomberg2018-6] Vance, Ashlee (May 15, 2018). “Quote: These powers make LSTM arguably the most commercial AI achievement, used for everything from predicting diseases to composing music.”. Bloomberg Business Week 2019年1月16日閲覧。

[ASearchSpaceOdyssey-7] Klaus Greff; Rupesh Kumar Srivastava; Jan Koutník; Bas R. Steunebrink; Jürgen Schmidhuber (2015). “LSTM: A Search Space Odyssey”. IEEE Transactions on Neural Networks and Learning Systems 28 (10): 2222–2232. arXiv:1503.04069. doi:10.1109/TNNLS.2016.2582924. PMID 27411231.

[lstm1999-8] Felix Gers; Jürgen Schmidhuber; Fred Cummins (1999). “Learning to Forget: Continual Prediction with LSTM”. Proc. ICANN'99, IEE, London: 850–855.

[lstm2000-9] Felix A. Gers; Jürgen Schmidhuber; Fred Cummins (2000). “Learning to Forget: Continual Prediction with LSTM”. Neural Computation 12 (10): 2451–2471. doi:10.1162/089976600300015015.

[10] Cho, Kyunghyun; van Merrienboer, Bart; Gulcehre, Caglar; Bahdanau, Dzmitry; Bougares, Fethi; Schwenk, Holger; Bengio, Yoshua (2014). "Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation". arXiv:1406.1078 [cs.CL]。

[11] “The Large Text Compression Benchmark” (英語). 2017年1月13日閲覧。

[12] Graves, A.; Liwicki, M.; Fernández, S.; Bertolami, R.; Bunke, H.; Schmidhuber, J. (May 2009). “A Novel Connectionist System for Unconstrained Handwriting Recognition”. IEEE Transactions on Pattern Analysis and Machine Intelligence 31 (5): 855–868. doi:10.1109/tpami.2008.137. ISSN 0162-8828. PMID 19299860.

[13] Graves, Alex; Mohamed, Abdel-rahman; Hinton, Geoffrey (22 March 2013). "Speech Recognition with Deep Recurrent Neural Networks". arXiv:1303.5778 [cs.NE]。

[14] “With QuickType, Apple wants to do more than guess your next text. It wants to give you an AI.” (英語). WIRED. (2016-06-14) 2016年6月16日閲覧。.

[Beau15-15] Beaufays, Françoise (August 11, 2015). “The neural networks behind Google Voice transcription”. Research Blog 2017年6月27日閲覧。

[GoogleVoiceSearch-16] Sak, Haşim; Senior, Andrew; Rao, Kanishka; Beaufays, Françoise; Schalkwyk, Johan (September 24, 2015). “Google voice search: faster and more accurate” (英語). Research Blog 2017年6月27日閲覧。

[GoogleAllo-17] Khaitan, Pranav (May 18, 2016). “Chat Smarter with Allo”. Research Blog 2017年6月27日閲覧。

[GoogleTranslate-18] Wu, Yonghui; Schuster, Mike; Chen, Zhifeng; Le, Quoc V.; Norouzi, Mohammad; Macherey, Wolfgang; Krikun, Maxim; Cao, Yuan; Gao, Qin (26 September 2016). "Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation". arXiv:1609.08144 [cs.CL]。

[WiredGoogleTranslate-19] Metz, Cade (September 27, 2016). “An Infusion of AI Makes Google Translate More Powerful Than Ever | WIRED”. Wired 2017年6月27日閲覧。.

[AppleQuicktype-20] Efrati, Amir (June 13, 2016). “Apple's Machines Can Learn Too”. The Information. 2017年6月27日閲覧。

[AppleQuicktype2-21] Ranger, Steve (June 14, 2016). “iPhone, AI and big data: Here's how Apple plans to protect your privacy | ZDNet”. ZDNet 2017年6月27日閲覧。

[AppleSiri-22] Smith, Chris (2016年6月13日). “iOS 10: Siri now works in third-party apps, comes with extra AI features”. BGR. 2017年6月27日閲覧。

[AmazonAlexa-23] Vogels, Werner (30 November 2016). “Bringing the Magic of Amazon AI and Alexa to Apps on AWS. - All Things Distributed”. www.allthingsdistributed.com. 2017年6月27日閲覧。

[FacebookTranslate-24] Ong, Thuy (4 August 2017). “Facebook's translations are now powered completely by AI”. www.allthingsdistributed.com. 2019年2月15日閲覧。

[25] “Patient Subtyping via Time-Aware LSTM Networks”. msu.edu. 21 Nov 2018閲覧。

[26] “Patient Subtyping via Time-Aware LSTM Networks”. Kdd.org. 24 May 2018閲覧。

[27] “SIGKDD”. Kdd.org. 24 May 2018閲覧。

[28] Haridy, Rich (August 21, 2017). “Microsoft's speech recognition system is now as good as a human”. newatlas.com. 2017年8月27日閲覧。

[29] “Why can RNNs with LSTM units also suffer from "exploding gradients"?”. Cross Validated. 25 December 2018閲覧。

[peepholeLSTM-30] Gers, F. A.; Schmidhuber, J. (2001). “LSTM Recurrent Networks Learn Simple Context Free and Context Sensitive Languages”. IEEE Transactions on Neural Networks 12 (6): 1333–1340. doi:10.1109/72.963769. PMID 18249962.

[peephole2002-31] Gers, F.; Schraudolph, N.; Schmidhuber, J. (2002). “Learning precise timing with LSTM recurrent networks”. Journal of Machine Learning Research 3: 115–143.

[32] Gers, F. A.; Schmidhuber, E. (November 2001). “LSTM recurrent networks learn simple context-free and context-sensitive languages”. IEEE Transactions on Neural Networks 12 (6): 1333–1340. doi:10.1109/72.963769. ISSN 1045-9227. PMID 18249962.

[33] Xingjian Shi; Zhourong Chen; Hao Wang; Dit-Yan Yeung; Wai-kin Wong; Wang-chun Woo (2015). “Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting”. Proceedings of the 28th International Conference on Neural Information Processing Systems: 802–810. arXiv:1506.04214. Bibcode: 2015arXiv150604214S.

[34] S. Hochreiter. Untersuchungen zu dynamischen neuronalen Netzen. Diploma thesis, Institut f. Informatik, Technische Univ. Munich, 1991.

[gradf-35] Hochreiter, S.; Bengio, Y.; Frasconi, P.; Schmidhuber, J. (2001). “Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies (PDF Download Available)”. A Field Guide to Dynamical Recurrent Neural Networks.. IEEE Press

[fernandez2007-36] Fernández, Santiago; Graves, Alex; Schmidhuber, Jürgen (2007). “Sequence labelling in structured domains with hierarchical recurrent neural networks”. Proc. 20th Int. Joint Conf. On Artificial In℡ligence, Ijcai 2007: 774–779.

[graves2006-37] Graves, Alex; Fernández, Santiago; Gomez, Faustino (2006). “Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks”. In Proceedings of the International Conference on Machine Learning, ICML 2006: 369–376.

[wierstra2005-38] Wierstra, Daan; Schmidhuber, J.; Gomez, F. J. (2005). “Evolino: Hybrid Neuroevolution/Optimal Linear Search for Sequence Learning”. Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI), Edinburgh: 853–858. https://www.academia.edu/5830256.

[OpenAIfive-39] Rodriguez, Jesus (July 2, 2018). “The Science Behind OpenAI Five that just Produced One of the Greatest Breakthrough in the History of AI”. Towards Data Science 2019年1月15日閲覧。

[OpenAIhand-40] “Learning Dexterity”. OpenAI Blog. (July 30, 2018) 2019年1月15日閲覧。

[alphastar-41] Stanford, Stacy (January 25, 2019). “DeepMind’s AI, AlphaStar Showcases Significant Progress Towards AGI”. Medium ML Memoirs 2019年1月15日閲覧。

[42] " The LSTM is able to beat the baseline only for 100 times steps." Arjovsky, et al. (2015). Unitary Evolution Recurrent Neural Networks.

[43] Mayer, H.; Gomez, F.; Wierstra, D.; Nagy, I.; Knoll, A.; Schmidhuber, J. (October 2006). A System for Robotic Heart Surgery that Learns to Tie Knots Using Recurrent Neural Networks. 543–548. doi:10.1109/IROS.2006.282190. ISBN 978-1-4244-0258-8

[44] Graves, A.; Schmidhuber, J. (2005). “Framewise phoneme classification with bidirectional LSTM and other neural network architectures”. Neural Networks 18 (5–6): 602–610. doi:10.1016/j.neunet.2005.06.042. PMID 16112549.

[45] Fernández, Santiago; Graves, Alex; Schmidhuber, Jürgen (2007). An Application of Recurrent Neural Networks to Discriminative Keyword Spotting. ICANN'07. Berlin, Heidelberg: Springer-Verlag. 220–229. ISBN 978-3540746935

[ReferenceA-46] Graves, Alex; Mohamed, Abdel-rahman; Hinton, Geoffrey (2013). “Speech Recognition with Deep Recurrent Neural Networks”. Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on: 6645–6649.

[47] Eck, Douglas; Schmidhuber, Jürgen (2002-08-28). Learning the Long-Term Structure of the Blues. Lecture Notes in Computer Science. 2415. Springer, Berlin, Heidelberg. 284–289. doi:10.1007/3-540-46084-5_47. ISBN 978-3540460848

[48] Schmidhuber, J.; Gers, F.; Eck, D.; Schmidhuber, J.; Gers, F. (2002). “Learning nonregular languages: A comparison of simple recurrent networks and LSTM”. Neural Computation 14 (9): 2039–2041. doi:10.1162/089976602320263980. PMID 12184841.

[49] Perez-Ortiz, J. A.; Gers, F. A.; Eck, D.; Schmidhuber, J. (2003). “Kalman filters improve LSTM network performance in problems unsolvable by traditional recurrent nets”. Neural Networks 16 (2): 241–250. doi:10.1016/s0893-6080(02)00219-8. PMID 12628609.

[50] A. Graves, J. Schmidhuber. Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks. Advances in Neural Information Processing Systems 22, NIPS'22, pp 545–552, Vancouver, MIT Press, 2009.

[51] Graves, Alex; Fernández, Santiago; Liwicki, Marcus; Bunke, Horst; Schmidhuber, Jürgen (2007). Unconstrained Online Handwriting Recognition with Recurrent Neural Networks. NIPS'07. USA: Curran Associates Inc.. 577–584. ISBN 9781605603520

[52] M. Baccouche, F. Mamalet, C Wolf, C. Garcia, A. Baskurt. Sequential Deep Learning for Human Action Recognition. 2nd International Workshop on Human Behavior Understanding (HBU), A.A. Salah, B. Lepri ed. Amsterdam, Netherlands. pp. 29–39. Lecture Notes in Computer Science 7065. Springer. 2011

[53] Huang, Jie; Zhou, Wengang; Zhang, Qilin; Li, Houqiang; Li, Weiping (30 January 2018). "Video-based Sign Language Recognition without Temporal Segmentation". arXiv:1801.10111 [cs.CV]。

[54] Hochreiter, S.; Heusel, M.; Obermayer, K. (2007). “Fast model-based protein homology detection without alignment”. Bioinformatics 23 (14): 1728–1736. doi:10.1093/bioinformatics/btm247. PMID 17488755.

[55] Thireou, T.; Reczko, M. (2007). “Bidirectional Long Short-Term Memory Networks for predicting the subcellular localization of eukaryotic proteins”. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB) 4 (3): 441–446. doi:10.1109/tcbb.2007.1015. PMID 17666763.

[56] Malhotra, Pankaj; Vig, Lovekesh; Shroff, Gautam; Agarwal, Puneet (April 2015). “Long Short Term Memory Networks for Anomaly Detection in Time Series”. European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning — ESANN 2015.

[57] Tax, N.; Verenich, I.; La Rosa, M.; Dumas, M. (2017). Predictive Business Process Monitoring with LSTM neural networks. Lecture Notes in Computer Science. 10253. 477–492. arXiv:1612.02130. doi:10.1007/978-3-319-59536-8_30. ISBN 978-3-319-59535-1

[58] Choi, E.; Bahadori, M.T.; Schuetz, E.; Stewart, W.; Sun, J. (2016). “Doctor AI: Predicting Clinical Events via Recurrent Neural Networks”. Proceedings of the 1st Machine Learning for Healthcare Conference: 301–318. arXiv:1511.05942. Bibcode: 2015arXiv151105942C.

[59] Jia, Robin; Liang, Percy (2016-06-11). "Data Recombination for Neural Semantic Parsing". arXiv:1606.03622 [cs].

[Wang_Duan_Zhang_Niu_p=1657-60] Wang, Le; Duan, Xuhuan; Zhang, Qilin; Niu, Zhenxing; Hua, Gang; Zheng, Nanning (2018-05-22). “Segment-Tube: Spatio-Temporal Action Localization in Untrimmed Videos with Per-Frame Segmentation”. Sensors 18 (5): 1657. doi:10.3390/s18051657. ISSN 1424-8220. PMC 5982167. PMID 29789447.

[Duan_Wang_Zhai_Zheng_2018_p.-61] Duan, Xuhuan; Wang, Le; Zhai, Changbo; Zheng, Nanning; Zhang, Qilin; Niu, Zhenxing; Hua, Gang (2018). Joint Spatio-Temporal Action Localization in Untrimmed Videos with Per-Frame Segmentation. 25th IEEE International Conference on Image Processing (ICIP). doi:10.1109/icip.2018.8451692. ISBN 978-1-4799-7061-2。

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]

[38]

[39]

[40]

[41]

[42]

[43]

[44]

[45]

[46]

[47]

[48]

[49]

[50]

[51]

[52]

[53]

[54]

[55]

[56]

[57]

[58]

[59]

[60]

[61]