[Short Review] Deep contextualized word representations (ELM0)

카테고리 없음

[Short Review] Deep contextualized word representations (ELM0)

zzangsky 2026. 4. 28. 15:50

Deep contextualized word representations

We introduce a new type of deep contextualized word representation that models both (1) complex characteristics of word use (e.g., syntax and semantics), and (2) how these uses vary across linguistic contexts (i.e., to model polysemy). Our word vectors are

arxiv.org

1. 배경 및 문제점

기존 Word2Vec, GloVe 같은 임베딩은 단어마다 하나의 고정 벡터만 사용한다.
문맥에 따라 의미가 달라도 같은 벡터 사용
예시 : bank -> 은행 / 강둑 -> 구분 불가
syntax + semantics 모두 반영 어려움
polysemy (다의어) 처리 한계

-> 문맥에 따라 달라지는 contextual embedding 필요

2. 아키텍처 특징

문맥 기반 단어 임베딩 (contextual embedding)
- BiLM (Bidirectional Language Model) 기반 : forward LM + backward LM
- LSTM 사용
- 문장의 앞 + 뒤 문맥 모두 반영
주요 특징
- 단어 벡터가 문장 전체에 따라 달라짐 (static embedding X)
- 여러 층의 representation 사용 : lower layer → 문법(syntax), higher layer → 의미(semantics)
- 모든 layer를 가중합해서 사용 : 단순히 top layer만 쓰지 않음
- pretrained 모델 활용 : 대규모 corpus로 biLM 학습, downstream task에 붙여서 사용
- 기존 모델에 쉽게 추가 가능 : concat 방식으로 결합

3. 의의

문맥을 반영한 최초의 실용적 word embedding
다양한 NLP task에서 성능 향상
최대 20% error 감소
pretrain + downstream 구조 확립 -> 이후 BERT, GPT로 이어짐

4.한계

LSTM 기반 -> 병렬 처리 어려움
계산량 큼 (biLM + 여러 layer 사용)
Transformer보다 비효율

-> 이후 BERT / Transformer에 의해 대체

문맥에 따라 단어 표현이 달라지도록 biLSTM 기반 언어모델에서 모든 층의 정보를 결합한
contextual word embedding

현재글[Short Review] Deep contextualized word representations (ELM0)

zzangsky 님의 블로그

zzangsky 님의 블로그 입니다.

Today :
Yesterday :

일	월	화	수	목	금	토
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30

zzangsky 님의 블로그

[Short Review] Deep contextualized word representations (ELM0)

1. 배경 및 문제점

2. 아키텍처 특징

3. 의의

4.한계

'카테고리 없음'의 다른글

티스토리툴바