Definition
Neural network architecture that uses self-attention mechanisms to process sequential data.
Detailed Explanation
Transformers revolutionized NLP by introducing self-attention mechanisms that can process all words in a sequence simultaneously, capturing long-range dependencies better than previous architectures. They form the basis for models like BERT, GPT, and T5.
Use Cases
Language translation, text generation, question answering, document summarization