TAAFT
Free mode
100% free
Freemium
Free Trial
Create tool

WordPiece

[wɜːd piːs]
AI Infrastructure
Last updated: December 9, 2024

Definition

A subword tokenization algorithm that builds a vocabulary by iteratively adding the most likely sequences of characters. It uses a likelihood-based approach to determine optimal subword units.

Detailed Explanation

WordPiece starts with a base vocabulary of individual characters and iteratively adds new tokens by joining character sequences that maximize the likelihood of the training data. It uses a probability-based scoring mechanism to select which pieces to merge differing from BPE's frequency-based approach. The algorithm was initially developed by Google for neural machine translation.

Use Cases

BERT models Machine translation Text preprocessing Multilingual models

Related Terms