Code for generating and training sentence embeddings with semantic features. Two main goals:
- increase interpretability of sentence embeddings and explain similarity
- effective aspectual clustering and semantic search
For more information, background and demonstration, please check our AACL paper.
Please make sure to have at least the following packages installed:
package (version tested)
----------------------------------------
torch (1.11.0)
transformers (4.16.1)
sentence-transformers (2.1.0)
numpy (1.21.2)
scipy (1.7.3)
huggingface-hub (0.10.0)
[python (3.8.12)]
Command for installing all needed PyPI packages:
pip install \
torch==1.11.0+cu113 \
transformers==4.16.1 \
sentence-transformers==2.1.0 \
numpy==1.21.2 \
scipy==1.7.3 \
huggingface-hub==0.10.0 \
--extra-index-url https://download.pytorch.org/whl/cu113
The Dockerfile can be build by executing docker build -t s3bert .
in the projects root directory. This will build a Docker Container based on Ubuntu 20.04
with Cuda Version 11.4.3
, including all necessary Python Packages and the default training data. If you do not want to have that training data included in your container comment out the last three lines of the Dockerfile by adding a #
at the beginning of each line.
To work with the locally built container run docker run -it --gpus all s3bert
. Attention: this will allocate all GPUs available to the Container. If you want to allocate only one device replace all
with e.g. device=0
.
The script src/check_cuda.py
should be used for checking GPU capabilities after starting the container.
The basic idea is simple:
- Define/apply metrics that measure similarity with regard to aspects or topics that you're interested in.
- Assign a specific sub-embedding to each metric
- During training, it learns to route information into the assigned sub-embeddings so that they can reflect your metrics of interest. The power of the overall embedding is preserved with consistency control.
- In inference, you are told how the aspects have modulated overall text similarity decision.
Note that any (possibly costly) computation of metrics from step 1. is not needed in inference
Rule of thumb for size of feature dimensions: From experience with different models that use 15 similarity aspect metrics, about 1/3 of the embedding may be reserved for the residual.
edim
: size of sentence embeddingn
: number of custom metricsfeadim
: size of a sentence feature (sentence sub-embedding)
Then feadim
can be set approximately to (edim - edim / 3)/n
.
In our paper, we define metrics between abstract meaning representations (AMRs) such that we can measure, e.g., coreference or quantification similarity of sentences and see how these sub-similarities modulate the overall similarity.
The data contains the sentences and AMRs with AMR metric scores (note: we only need metric scores and sentences, the AMR graphs are attached only for potential further experimention)
Download and extract data:
wget https://cl.uni-heidelberg.de/~opitz/data/amr_data_set.tar.gz
tar -xvzf amr_data_set.tar.gz
This is how the format of the traing data should look
cd src/
python data_helpers.py
Simply run
cd src/
python s3bert_train.py
Some settings can be adjusted in config.py
. For other settings, the source code must be consulted.
We have prepared an example script:
cd src/
python s3bert_infer.py
Check out its content for info on how to obtain and use the embeddings.
We provide pre-trained model here:
Model name | model link | s3bert config |
---|---|---|
s3bert_all-mpnet-base-v2 | model | config |
s3bert_all-MiniLM-L12-v2 | model | config |
Downloaded S3BERT models may be unpacked in src
tar -xvzf s3bert_all-MiniLM-L12-v2 -C src/
Use pre-trained model: See above (S3BERT embeddings: inference). Use specific config.py
(see table above), which is needed so that we know which features are assigned a particular metric.
All numbers are Spearmanr.
Model | STSB | SICKR | UKPASPECT | Concepts | Frames | Named Ent. | Negations | Coreference | SRL | Smatch | Unlabeled | max_indegree_sim | max_outdegree_sim | max_degree_sim | root_sim | quant_sim | score_wlk | score_wwlk |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
s3bert_all-mpnet-base-v2 | 83.5 | 81.1 | 57.9 | 79.8 | 73.0 | 54.5 | 34.9 | 54.9 | 69.8 | 74.7 | 72.0 | 36.2 | 49.6 | 35.3 | 52.3 | 75.3 | 80.8 | 80.3 |
all-mpnet-base-v2 | 83.4 | 80.5 | 56.2 | 74.3 | 41.5 | -12.7 | -0.3 | 9.0 | 42.8 | 57.6 | 52.1 | 23.6 | 21.1 | 17.7 | 22.9 | 10.8 | 68.3 | 66.6 |
s3bert_all-MiniLM-L12-v2 | 83.7 | 78.9 | 56.6 | 74.3 | 66.3 | 51.0 | 33.4 | 44.1 | 61.4 | 67.5 | 65.1 | 31.9 | 42.4 | 29.5 | 43.6 | 73.6 | 74.6 | 74.2 |
all-MiniLM-L12-v2 | 83.1 | 78.9 | 54.2 | 76.7 | 37.3 | -12.8 | -3.8 | 7.7 | 42.1 | 56.3 | 51.5 | 23.8 | 19.0 | 19.0 | 20.1 | 9.4 | 66.3 | 63.5 |
For both SBERT and S3BERT the similarity for every pair is calculated on the full embeddings (cosine).
- STSB: results on human sentence similarity benchmark STS
- SICKR: results on human relatedness similarity benchmark SICK
- UKPA: results on human argument similarity benchmark
For non S3BERT models the aspect similarity is calculated via the full embedding (i.e., it gives the same similarity in every aspect). For S3BERT models the aspect similarities are calculated from the dedicated sub-embeddings.
- Concepts: Similarity w.r.t. to similarity of concepts in sentences
- Frames: Similarity w.r.t. to similarity of predicates in sentences
- Named Ent: Similarity w.r.t. named entity similarities in sentences
- Negation: Similarity w.r.t. negation structure of sentences
- Coreference: Similarity w.r.t. coreference structure of sentences
- SRL: Similarity w.r.t. semantic role structure of sentences
- Smatch: Similarity w.r.t. to overall similarity of sentences' semantic meaning structures
- Unlabeled: Similarity w.r.t. to overall similarity of sentences' semantic meaning structures minus relation label
- (in/out/root)_degree_sim: Similarity w.r.t. to similarity of connected nodes in meaning space ("Focus")
- quant_sim: Similarity w.r.t.\ quantificational structure similarity of sentences(three vs. four, a vs. all, etc.)
- score_wlk: see Smatch, but measured with contextual Weisfeiler Leman Kernel isntead of Smatch
- score_wwlk: See Smatch, but measured with Wasserstein Weisfeiler Leman Kernel instead of Smatch
If you find the work interesting, consider citing:
@article{opitz2022sbert,
title={SBERT studies Meaning Representations: Decomposing Sentence Embeddings into Explainable Semantic Features},
author={Opitz, Juri and Frank, Anette},
journal={arXiv preprint arXiv:2206.07023},
year={2022}
}