전체 글 15

[Paper Review] An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (ViT)

https://arxiv.org/abs/2010.11929 An Image is Worth 16x16 Words: Transformers for Image Recognition at ScaleWhile the Transformer architecture has become the de-facto standard for natural language processing tasks, its applications to computer vision remain limited. In vision, attention is either applied in conjunction with convolutional networks, or used to reparxiv.org AbstractTransformer 구조는 자..

카테고리 없음 2026.05.20

[Short Review] Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (T5)

https://arxiv.org/abs/1910.10683 Exploring the Limits of Transfer Learning with a Unified Text-to-Text TransformerTransfer learning, where a model is first pre-trained on a data-rich task before being fine-tuned on a downstream task, has emerged as a powerful technique in natural language processing (NLP). The effectiveness of transfer learning has given rise to a divarxiv.org1. 배경 및 문제점최근 NLP 분..

카테고리 없음 2026.05.20

[Short Review] Generative Adversarial Networks (GAN)

https://arxiv.org/abs/1406.2661 Generative Adversarial NetworksWe propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability thatarxiv.org 1. 배경 및 문제점기존 딥러닝은 이미지, 음성, 자연어처럼 복잡한 데이터를 잘 분류하는 discriminative model..

카테고리 없음 2026.05.13

[Paper Review] You Only Look Once: Unified, Real-Time Object Detection (YOLO)

https://arxiv.org/abs/1506.02640 You Only Look Once: Unified, Real-Time Object DetectionWe present YOLO, a new approach to object detection. Prior work on object detection repurposes classifiers to perform detection. Instead, we frame object detection as a regression problem to spatially separated bounding boxes and associated class probabiliarxiv.org Abstract YOLO는 새로운 객체 탐지(object detection) 방..

카테고리 없음 2026.05.09

[Short Review] Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

https://arxiv.org/abs/1506.02640 You Only Look Once: Unified, Real-Time Object DetectionWe present YOLO, a new approach to object detection. Prior work on object detection repurposes classifiers to perform detection. Instead, we frame object detection as a regression problem to spatially separated bounding boxes and associated class probabiliarxiv.org 1. 배경 및 문제점 존 Fast R-CNN은 detection 속도는 빨라졌지..

카테고리 없음 2026.05.09

[Paper Review] Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition (SPPNet)

https://arxiv.org/abs/1406.4729 Spatial Pyramid Pooling in Deep Convolutional Networks for Visual RecognitionExisting deep convolutional neural networks (CNNs) require a fixed-size (e.g., 224x224) input image. This requirement is "artificial" and may reduce the recognition accuracy for the images or sub-images of an arbitrary size/scale. In this work, we equip tharxiv.org Abstract기존의 CNN은 고정된 크기..

카테고리 없음 2026.05.09

[Short Review] BERT: Pre-training of Deep Bidirectional Transformers forLanguage Understanding

https://arxiv.org/abs/1810.04805 BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingWe introduce a new language representation model called BERT, which stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations from unlaarxiv.org1. 배경 및 문제점기존 Word2V..

카테고리 없음 2026.05.06

[Paper Review] Improving Language Understandingby Generative Pre-Training (GPT-1)

https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf Abstract자연어 이해는 텍스트 함의, 질의응답, 의미 유사도 평가, 문서 분류 등 다양한 작업을 포함한다. 대규모 비지도 텍스트 데이터는 풍부하지만 이러한 작업을 학습하기 위한 라벨된 데이터는 부족하다. 이로 인해 지도학습 기반 모델은 충분한 성능을 내기 어렵다. 우리는 다양한 비지도 텍스트 데이터로 언어 모델을 사전 학습(pre-training)한 뒤 각 작업에 대해 지도학습 fine-tuning을 수행하면 큰 성능 향상을 얻을 수 있음을 보인다. 기존 방법들과 달리 우리는 fine-tuning 단계에서 task-aware 입력 변..

카테고리 없음 2026.05.05

[Paper Review] Attention Is All You Need (Transformer)

https://arxiv.org/abs/1706.03762 Attention Is All You NeedThe dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a newarxiv.orgAbstractsequence 변환 모델들은 encoder와 decoder를 포함한 복잡한 RNN 또는 CNN 구조이다. 거기에 더해 성..

카테고리 없음 2026.05.03

[Short Review] Deep contextualized word representations (ELM0)

https://arxiv.org/abs/1802.05365 Deep contextualized word representationsWe introduce a new type of deep contextualized word representation that models both (1) complex characteristics of word use (e.g., syntax and semantics), and (2) how these uses vary across linguistic contexts (i.e., to model polysemy). Our word vectors arearxiv.org1. 배경 및 문제점기존 Word2Vec, GloVe 같은 임베딩은 단어마다 하나의 고정 벡터만 사용한다.문..

카테고리 없음 2026.04.28