Digital Garden

Curriculum Learning

๐Ÿ‘€ 1

Curriculum Learning

ํ•œ ์ค„ ์š”์•ฝ

์‚ฌ๋žŒ์ด ์‰ฌ์šด ๊ฒƒ๋ถ€ํ„ฐ ๋ฐฐ์šฐ๋“ฏ์ด, AI๋„ ์‰ฌ์šด ๋ฐ์ดํ„ฐ๋ถ€ํ„ฐ ์–ด๋ ค์šด ๋ฐ์ดํ„ฐ ์ˆœ์„œ๋กœ ํ•™์Šต์‹œํ‚ค๋ฉด ๋” ํšจ์œจ์ ์œผ๋กœ ๋ฐฐ์šด๋‹ค๋Š” ๊ธฐ๊ณ„ํ•™์Šต ๋ฐฉ๋ฒ•๋ก .

์‰ฌ์šด ์„ค๋ช…

์—ฌ๋Ÿฌ๋ถ„์ด ์ˆ˜ํ•™์„ ๋ฐฐ์šธ ๋•Œ๋ฅผ ์ƒ๊ฐํ•ด๋ณด์„ธ์š”. ๊ฐ‘์ž๊ธฐ ๋ฏธ์ ๋ถ„๋ถ€ํ„ฐ ์‹œ์ž‘ํ•˜์ง€ ์•Š๊ณ , ๋ง์…ˆโ†’๋บ„์…ˆโ†’๊ณฑ์…ˆโ†’๋‚˜๋ˆ—์…ˆ ์ˆœ์„œ๋กœ ๋ฐฐ์šฐ์ฃ ? ์ด๊ฒŒ ๋ฐ”๋กœ Curriculum Learning์˜ ํ•ต์‹ฌ ์•„์ด๋””์–ด์ด๋‹ค.

2009๋…„์— Yoshua Bengio๋ผ๋Š” ์œ ๋ช…ํ•œ AI ํ•™์ž๊ฐ€ ์ œ์•ˆํ•œ ๋ฐฉ๋ฒ•์ธ๋ฐ, ์‹ ๊ฒฝ๋ง(AI ๋ชจ๋ธ)์„ ํ•™์Šต์‹œํ‚ฌ ๋•Œ๋„ ๋ฐ์ดํ„ฐ๋ฅผ ๋‚œ์ด๋„ ์ˆœ์„œ๋Œ€๋กœ ์ œ๊ณตํ•˜๋ฉด ํ›จ์”ฌ ๋นจ๋ฆฌ ๋ฐฐ์šด๋‹ค๋Š” ๊ฒƒ์„ ๋ฐœ๊ฒฌํ–ˆ์–ด์š”.

์˜ˆ๋ฅผ ๋“ค์–ด, ๊ฐ•์•„์ง€์™€ ๊ณ ์–‘์ด๋ฅผ ๊ตฌ๋ถ„ํ•˜๋Š” AI๋ฅผ ๋งŒ๋“ ๋‹ค๊ณ  ํ•ด๋ด…์‹œ๋‹ค:

  • ์ž˜๋ชป๋œ ๋ฐฉ๋ฒ•: ๋ชจ๋“  ์‚ฌ์ง„์„ ๋ฌด์ž‘์œ„๋กœ ์„ž์–ด์„œ ํ•™์Šต
  • Curriculum Learning: ๋จผ์ € ์„ ๋ช…ํ•œ ์ •๋ฉด ์‚ฌ์ง„ โ†’ ์ธก๋ฉด ์‚ฌ์ง„ โ†’ ํ๋ฆฟํ•œ ์‚ฌ์ง„ โ†’ ์ผ๋ถ€๋งŒ ๋ณด์ด๋Š” ์‚ฌ์ง„ ์ˆœ์„œ๋กœ ํ•™์Šต

๋งˆ์น˜ ์šด์ „์„ ๋ฐฐ์šธ ๋•Œ ์ฃผ์ฐจ์žฅ์—์„œ ๋จผ์ € ์—ฐ์Šตํ•˜๊ณ  ๋‚˜์ค‘์— ๊ณ ์†๋„๋กœ๋กœ ๋‚˜๊ฐ€๋Š” ๊ฒƒ์ฒ˜๋Ÿผ, AI๋„ ๋‹จ๊ณ„์ ์œผ๋กœ ๋ฐฐ์šฐ๋ฉด ๋” ์•ˆ์ •์ ์œผ๋กœ ํ•™์Šต์ด ๋œ๋‹ค.

ํ•ต์‹ฌ ํฌ์ธํŠธ

  • ์ ์ง„์  ํ•™์Šต: ์‰ฌ์šด ์˜ˆ์ œ์—์„œ ์–ด๋ ค์šด ์˜ˆ์ œ๋กœ ์ ์ง„์ ์œผ๋กœ ๋‚œ์ด๋„๋ฅผ ๋†’์ž„
  • ์ˆ˜๋ ด ๊ฐ€์†: ์ ์ ˆํ•œ ํ•™์Šต ์ˆœ์„œ๋ฅผ ์ •ํ•˜๋ฉด ๋ชฉํ‘œ ์„ฑ๋Šฅ์— ๋” ๋นจ๋ฆฌ ๋„๋‹ฌํ•จ (20-40% ๋น ๋ฆ„)
  • ์•ˆ์ •์„ฑ ํ–ฅ์ƒ: ์ฒ˜์Œ๋ถ€ํ„ฐ ์–ด๋ ค์šด ๋ฌธ์ œ๋ฅผ ์ฃผ๋ฉด ํ•™์Šต์ด ๋ถˆ์•ˆ์ •ํ•˜๊ฑฐ๋‚˜ ์‹คํŒจํ•  ์ˆ˜ ์žˆ์Œ
  • Local Minima ํšŒํ”ผ: ์‰ฌ์šด ์˜ˆ์ œ๋กœ ์ข‹์€ ์ถœ๋ฐœ์ ์„ ์žก์œผ๋ฉด, ๋‚˜์œ ํ•จ์ •(Local Minima)์— ๋น ์ง€๋Š” ๊ฒƒ์„ ๋ฐฉ์ง€

๊ด€๋ จ ๊ฐœ๋…

  • ZPD (๊ทผ์ ‘๋ฐœ๋‹ฌ์˜์—ญ) - ๊ต์œกํ•™ ์ด๋ก ์—์„œ ์˜จ ์˜๊ฐ
  • Krashen i+1 ๊ฐ€์„ค - ์–ธ์–ด ์Šต๋“์—์„œ์˜ ์œ ์‚ฌํ•œ ์›๋ฆฌ
  • Fine-tuning - Curriculum Learning์ด ์ฃผ๋กœ ์ ์šฉ๋˜๋Š” ํ•™์Šต ๋‹จ๊ณ„
  • Perplexity (PPL) - ๋‚œ์ด๋„ ์ธก์ •์— ์‚ฌ์šฉ๋˜๋Š” ์ง€ํ‘œ
  • Ablation Study - Curriculum Learning ํšจ๊ณผ๋ฅผ ๊ฒ€์ฆํ•˜๋Š” ๋ฐฉ๋ฒ•

R4 ์—ฐ๊ตฌ์—์„œ์˜ ์—ญํ• 

R4 ์—ฐ๊ตฌ์˜ ํ•ต์‹ฌ ์ด๋ก ์  ๊ธฐ๋ฐ˜์ด๋‹ค. Bengio์˜ Curriculum Learning์„ ๊ธฐ์ดˆ๋กœ ์‚ผ๊ณ , ์—ฌ๊ธฐ์— ZPD (๊ทผ์ ‘๋ฐœ๋‹ฌ์˜์—ญ)์ด๋ผ๋Š” ๊ต์œกํ•™ ์ด๋ก ์˜ ํ†ต์ฐฐ์„ ๋”ํ•ด์„œ โ€œ์ ์‘ํ˜•(Adaptive)โ€ Curriculum Learning์„ ๋งŒ๋“ค์—ˆ๋‹ค.

๊ธฐ์กด Curriculum Learning์€ ๋‚œ์ด๋„ ์ˆœ์„œ๊ฐ€ ๊ณ ์ •๋˜์–ด ์žˆ์—ˆ๋Š”๋ฐ, R4 ์—ฐ๊ตฌ๋Š” ๋ชจ๋ธ์ด ํ•™์Šตํ•˜๋Š” ๋™์•ˆ ์‹ค์‹œ๊ฐ„์œผ๋กœ ๋‚œ์ด๋„๋ฅผ ์กฐ์ •ํ•˜๋Š” ๊ฒƒ์ด ํŠน์ง•์ด๋‹ค. ๋งˆ์น˜ ์ข‹์€ ์„ ์ƒ๋‹˜์ด ํ•™์ƒ ์ˆ˜์ค€์„ ๋ณด๋ฉด์„œ ๋ฌธ์ œ ๋‚œ์ด๋„๋ฅผ ๋ฐ”๊ฟ”์ฃผ๋Š” ๊ฒƒ์ฒ˜๋Ÿผ์š”.

๋” ์•Œ์•„๋ณด๊ธฐ

  • Bengio, Y., Louradour, J., Collobert, R., & Weston, J. (2009). Curriculum learning. Proceedings of the 26th International Conference on Machine Learning, 41-48.
  • ์›๋ฌธ์—์„œ๋Š” ์ˆ˜ํ•™์  ์ •์˜๋ฅผ ์ œ๊ณต: โ€œW*(T) = argmin_W ฮฃ L(f(x;W), y) ร— Q_T(x,y)โ€ - Q_T๊ฐ€ ์‹œ๊ฐ„์— ๋”ฐ๋ผ ์‰ฌ์šด ์˜ˆ์ œ์—์„œ ์–ด๋ ค์šด ์˜ˆ์ œ๋กœ ๊ฐ€์ค‘์น˜๋ฅผ ์กฐ์ •ํ•˜๋Š” ์—ญํ•