Posted on , Updated on 

Keynote: K-BERT - Enabling Language Representation with Knowledge Graph

Knowledge-enhanced pre-trained models aims to leverage structured knowledge from knowledge graphs to strengthen these models. This allows them to learn both general semantic knowledge from free text and factual knowledge about real-world entities behind the text, effectively handling downstream knowledge-driven tasks. Existing knowledge enhancement methods for pre-trained language models can generally be categorized into three main types: augmenting input features with knowledge, improving model architecture with knowledge, and constraining training tasks with knowledge. These approaches introduce different knowledge enhancement strategies at the input layer, encoding layer, and pre-training task layer to reinforce the pre-trained language models.

K-BERT belongs to the first category, where knowledge from knowledge graphs is incorporated into the model’s input through methods like entity linking. K-BERT first identifies entities in the input text and then expands the original text into a tree structure using entity queries and linking operations. The pre-order traversal sequence of this tree is then used as the model’s input. Since the insertion of triples may introduce additional noise and cause the input sentence to deviate from its original meaning, K-BERT further mitigates this issue at the input stage by introducing soft position embeddings and a visibility matrix.

Fig.1 Cover Slide
Fig.1 Cover Slide
Fig.2 K-Bert: Model Structure
Fig.2 K-Bert: Model Structure
Fig.3 K-Bert: Knowledge layer
Fig.3 K-Bert: Knowledge layer
Fig.4 K-Bert: Knowledge layer: K-Query&K-Inject
Fig.4 K-Bert: Knowledge layer: K-Query&K-Inject
Fig.5 K-Bert: Model Structure
Fig.5 K-Bert: Model Structure
Fig.6 K-Bert: Embedding layer: Segment Embedding
Fig.6 K-Bert: Embedding layer: Segment Embedding
Fig.7 K-Bert: Embedding layer: Token Embedding
Fig.7 K-Bert: Embedding layer: Token Embedding
Fig.8 K-Bert: Model Structure
Fig.8 K-Bert: Model Structure
Fig.9 K-Bert: Embedding layer: soft-position embedding
Fig.9 K-Bert: Embedding layer: soft-position embedding
Fig.10 K-Bert: seeing layer
Fig.10 K-Bert: seeing layer
Fig.11 K-Bert: seeing layer'
Fig.11 K-Bert: seeing layer'
Fig.12 K-Bert: seeing layer''
Fig.12 K-Bert: seeing layer''
Fig.13 Model Structure
Fig.13 Model Structure
Fig.14 K-Bert: Mask-Transformer Encoder
Fig.14 K-Bert: Mask-Transformer Encoder
Fig.15 K-Bert: Attention is all you need!
Fig.15 K-Bert: Attention is all you need!
Fig.16 K-Bert: Mask-Self-Attention
Fig.16 K-Bert: Mask-Self-Attention
Fig.17 Experiments
Fig.17 Experiments
Fig.18 Experiments'
Fig.18 Experiments'