Definition
Neural network architecture that uses self-attention mechanisms to process sequential data
Detailed Explanation
Transformers use self-attention mechanisms to weigh the importance of different parts of the input sequence when computing representations. They process entire sequences in parallel and capture long-range dependencies effectively.
Use Cases
Language modeling, machine translation, text generation, BERT, GPT