WordPiece | AI Glossary

Definition

A subword tokenization algorithm that builds a vocabulary by iteratively adding the most likely sequences of characters. It uses a likelihood-based approach to determine optimal subword units.

Detailed Explanation

WordPiece starts with a base vocabulary of individual characters and iteratively adds new tokens by joining character sequences that maximize the likelihood of the training data. It uses a probability-based scoring mechanism to select which pieces to merge differing from BPE's frequency-based approach. The algorithm was initially developed by Google for neural machine translation.

Use Cases

BERT models Machine translation Text preprocessing Multilingual models

Definition

Detailed Explanation

Use Cases

Related Terms

TensorFlow

Canary Deployment

Gensim

Help

People also viewed

Create AI Tools

Mini Tool

Vibe code an AI Tool