[WIP] Added support for BERT. #231

eaplatanios · 2019-11-26T17:41:43Z

This PR adds support for BERT. It is not ready to merge yet.

I believe that the following features should be moved to swift-apis if people agree:

A new Embedding that ought to support using matrix multiplications instead of gather ops for faster execution on TPUs (this is currently commented out due an AutoDiff bug). This can replace the existing layer in swift-apis and should be backwards compatible.
A MultiHeadAttention layer that can be used for other models too (other than BERT or Transformers).
A Regularizable protocol that I don't really like, but it's a temporary solution for supporting weight decay in BERT. We probably shouldn't move this to swift-apis until we have thought out a nicer solution.
A new Optimizer protocol that supports learning rate schedules and a WeightDecayedAdam implementation.
ScheduledParameter corresponds to Added initial support for learning rate schedules. swift-apis#431.

The following features are also added:

Multiple text tokenization approaches, including a byte-pair encoding (BPE) tokenizer as well as a WordPiece tokenizer.
A Transformer implementation.
A BERT implementation that supports multiple variants (i.e., BERT, RoBERTa, and ALBERT).
Support for automatically downloading and loading pre-trained models for all these variants.

Note that for RoBERTa I had to convert the published PyTorch checkpoints to TF checkpoints compatible with this model. I have uploaded these to my Dropbox account, but they take too much space, so I would really appreciate it if we could move them to Google Cloud Storage.

This is not ready to merge yet, but is rather aimed at getting feedback to improve the API and also making sure that the code style is compatible with this repository. A big open question is, how do we want to test this? I know it's working fine as I'm using it for my own research projects, but I don't know what kind of tests would be appropriate.

Sorry if I missed something in this list of changes. I'll keep updating it as we refine the PR.

NOTE: The new layers are not generic over the Scalar type due to a compiler bug (TF-427).

NOTE: Compilation is currently failing because the synthesized TangentVector initializers are internal and I cannot declare conformances to Regularizable without using those initializers.

cc @saeta @dan-zheng @BradLarson

Shashi456 · 2019-11-26T18:56:07Z

This is absolutely huge @eaplatanios. I've been working on BERT for the past few months and can't wait to check this code out :).

Models/Text/Initializers.swift

Shashi456 · 2019-11-30T04:38:26Z

@eaplatanios What do you think about restructuring the encoding techniques under a preproccessing directory either in this repo or the swift-apis repo, These techniques are used everywhere.
I'm also suggesting for a preprocessing directory, because down the line if we image preprocessing techniques including augmentatation, this structure would benefit that as well.
cc: @BradLarson @saeta

Shashi456 · 2019-12-06T17:58:24Z

I think we could take some inspiration on how to test these models from how these are done in pytorch-transformers, as @saeta mentioned in the design meeting, they do just check loading the model and check if the tensor shapes are right. You can take a look at these tests here.

BradLarson · 2019-12-17T22:35:32Z

I'd like to see what can be done to help drive this to completion. I forget where we left things after the community meeting, but is the biggest blocker right now the compilation errors around TangentVector and the activation functions in Attention.swift?

If there are areas you'd like me to look into, I'd be glad to do so.

eaplatanios · 2019-12-20T17:08:17Z

Thanks @BradLarson ! The only blocker is the TangentVector initializer. The issues related to attention have to do with supporting generic BERT layers (currently they're just Float-valued).

eaplatanios · 2019-12-20T17:09:00Z

Please also let me know what you guys think should be pushed to swift-apis and what kind of restructuring and testing would be useful. :)

dan-zheng · 2019-12-20T20:29:27Z

Models/Text/WeightDecayedAdam.swift

+
+extension Dense: Regularizable {
+    public var regularizationValue: TangentVector {
+        // TODO: This initializer is currently internal.


I filed TF-1077 to track the non-public TangentVector memberwise initializer issue.

That's great! Thanks a lot Dan! Also, what do you think of this solution for regularization? I don't really like it too much, but I also couldn't think of another easy way to support it. :/

BradLarson · 2020-01-06T17:19:12Z

Models/Text/BERT.swift

+        /// The URL where this pre-trained model can be downloaded from.
+        public var url: URL {
+            let bertPrefix = "https://storage.googleapis.com/bert_models/2018_"
+            let robertaPrefix = "https://www.dropbox.com/s"


So that you don't have to store these large weights in Dropbox, I've re-uploaded them to a hosted GCS bucket we've created for weights / datasets:

https://storage.googleapis.com/s4tf-hosted-binaries/checkpoints/Text/RoBERTa/base.zip
https://storage.googleapis.com/s4tf-hosted-binaries/checkpoints/Text/RoBERTa/large.zip

If there are others you'd like me to place there, just let me know and I'll add them.

@BradLarson would this bucket also be useful for adding support for pre-trained models and their weights?

@Shashi456 - Yes, that's one of the goals, in addition to being a reliable backup for sometimes-flaky dataset download locations. I'm working on a quick addition to the checkpoint loaders to make it easy to download from here, so that we simplify the process of working with models that need pretrained checkpoints (the existing Transformer and MiniGo models, as well as BERT and family) and start CI testing inference accuracy using real pretrained models.

BradLarson · 2020-01-08T22:02:50Z

Now that the TangentVector visibility issues have been resolved, is there anything else blocking the core model? If so, I'd love to see what we could do to resolve that.

Beyond the core model, do you have an example of BERT in action or a unit test of it that we can use to verify correct operation? If those are too tied up in any internal infrastructure you have, we could potentially pull this in and I could add a demo and / or tests as a follow-on, but if you have simple demo code for this that would really ease the process of pulling this in.

We can merge together the existing Transformer demo and the new Transformer model and utility functions you have here to migrate our text generation demo over to this new structure, but didn't know if there was a similar demo you had available for BERT.

…functions. Change `@differentiable` function default arguments from closures to function references. Related issues: - https://bugs.swift.org/browse/TF-690 - https://bugs.swift.org/browse/TF-1030

@differentiable

Fix non-differentiability error: ``` swift-models/Models/Text/BERT.swift:292:6: error: function is not differentiable @differentiable(wrt: self) ~^~~~~~~~~~~~~~~~~~~~~~~~~ swift-models/Models/Text/BERT.swift:293:17: note: when differentiating this function definition public func callAsFunction(_ input: TextBatch) -> Tensor<Scalar> { ^ swift-models/Models/Text/BERT.swift:299:58: note: cannot differentiate through 'inout' arguments let positionPaddingIndex = withoutDerivative(at: { () -> Int in ^ ``` By using `withoutDerivative(at:)` at the correct location.

Add code and data utilities for the CoLA task. Code shared by eaplatanios@. Original sources are listed in comments at the top of each file. This is progress towards end-to-end BERT training. Todo: implement a main function with data loading and training loop.

The BERT for CoLA training loop compiles: https://i.imgur.com/5KyewAg.png Todo: - Fine-tune training so that loss decreases. - Generalize dataset utilities to work with CoLA remote URL.

Loss still does not steadily decrease: ``` [Epoch: 0] Loss: 0.50369537 [Epoch: 1] Loss: 0.7813513 [Epoch: 2] Loss: 1.0023696 [Epoch: 3] Loss: 0.8235911 [Epoch: 4] Loss: 0.621686 [Epoch: 5] Loss: 0.93954027 [Epoch: 6] Loss: 0.76672614 [Epoch: 7] Loss: 0.45236698 [Epoch: 8] Loss: 0.6538984 [Epoch: 9] Loss: 0.7307098 [Epoch: 10] Loss: 0.90539706 [Epoch: 11] Loss: 0.6684798 [Epoch: 12] Loss: 0.5408703 [Epoch: 13] Loss: 1.113673 ```

… CoLA. (tensorflow#293)

The training loop operates over minibatches, not batches. Thus, "step" is the correct term, not "epoch".

Evaluation reveals the model is not actually learning: ``` True positives: 0 True negatives: 322 False positives: 0 False negatives: 322 ▿ 1 key/value pair ▿ (2 elements) - key: "matthewsCorrelationCoefficient" - value: 0.0 ``` We ought to debug the loss function and BERT classifier class count.

Change class count to 2 and use softmax cross entropy. The evaluation metric now improves but sometimes decreases back to zero. The model isn't very stable, perhaps there's more room for improvement. After 80 steps: ``` True positives: 567 True negatives: 170 False positives: 152 False negatives: 170 ▿ 1 key/value pair ▿ (2 elements) - key: "matthewsCorrelationCoefficient" - value: 0.3192948 ``` After 130 steps: ``` True positives: 717 True negatives: 0 False positives: 322 False negatives: 0 ▿ 1 key/value pair ▿ (2 elements) - key: "matthewsCorrelationCoefficient" - value: 0.0 ```

Improvement todo: make training loop print epochs.

Fix various issues: 1. The sigmoid cross entropy loss was applied on logits of shape `[B, 1]` and labels of shape `[B]`. This forced a silent broadcast of logits to shape `[B, B]`, which resulted in the loss not being informative for training. 2. The batch size was too small. I added a comment in the main script code explaining how batching works in my data pipelines. 3. This is a minor one but there is bug with how I was copying the prefetching iterator. This was a temporary solution but I just disabled prefetching for the dev and test sets so that the dev set is the same across runs. This is not currently tuned but it's working. After ~20 steps MCC should be at about 0.28 and after ~200 steps it should be getting close to 0.50.

- Remove extraneous comment. - Remove trailing whitespace. - Change `dump` to `print`.

dan-zheng · 2020-02-14T00:17:54Z

I merged the tensorflow:bert-wip branch into eaplatanios:bert. This includes the following changes:

Some merge conflicts were fixed.
Some differentiation-related compilation errors were fixed.
A training loop for BERT was added in Models/Text/main.swift.
- A BERT.PreTrainedModel.bertBase pretrained model is loaded and trains on the CoLA task. You can run it via swift run TextModels.

There are a bunch of todos:

Rewrite utilities for downloading/extracting data in this PR using unified ModelSupport APIs.
- Consider generalizing DatasetUtilities.downloadResource to support URLs that don't end in the zipped file name. The CoLA dataset URL is one such example. Alternatively, we could host the dataset elsewhere.
Improve code organization.
- Currently, most BERT code lives at the top-level in Models/Text. Models/Text could be better organized, like Models/ImageClassification.
Verify that BERT training converges, and that results match a reference implementation.
Verify that other BERT variants work.
- BERT variants like roberta and albert were added in this PR but are untested.

swift run TextModels works with latest Swift for TensorFlow toolchains:

$ swift run TextModels
...
[Step: 41]	Loss: 0.3939771
[Step: 42]	Loss: 0.7205323
[Step: 43]	Loss: 0.4848549
[Step: 44]	Loss: 0.2727359
[Step: 45]	Loss: 0.3460037
[Step: 46]	Loss: 0.9574833
[Step: 47]	Loss: 0.7397041
[Step: 48]	Loss: 0.35861257
[Step: 49]	Loss: 0.4990799
[Step: 50]	Loss: 0.92075384
Evaluate BERT for the CoLA task:
Total predictions: 1043
True positives: 719
True negatives: 34
False positives: 288
False negatives: 34
["matthewsCorrelationCoefficient": 0.26019016]

If there are no objections, now seems like a good point to merge this PR and continue incremental improvements. Thanks @eaplatanios for driving this work!

BradLarson

In order to enable further work on BERT and other text models, I'm going to merge this in and we'll continue our work on this with the model inside the repository. I've created a tracking issue for the remaining to-do's that Dan has identified above, and we'll work to knock those down in the near term.

Regarding the failing CI test, we're trying out some CMake-related build options and those are currently failing. I've built this locally, so I'm going to bring this in even without a green Kokoro build.

Once again, this is fantastic work and everyone really appreciates the time and effort you put into building this, as well as your guidance on how to use and improve this. This is going to be tremendously useful to us and to the broader community.

eaplatanios · 2020-02-15T18:03:22Z

Thanks @dan-zheng and @BradLarson for getting this in! I'm sorry I haven't had too much time lately to address some of the mentioned todos.

@differentiable

* Added initial support for BERT. * Renamed 'LayerNormalization' to 'LayerNorm'. * Added a 'TextModels' SwiftPM target. * Fixed some of the compilation errors. * Added 'Optimizer' protocol. * Removed 'truncatedNormalInitializer'. * Added initial support for BERT. * Renamed 'LayerNormalization' to 'LayerNorm'. * Added a 'TextModels' SwiftPM target. * Fixed some of the compilation errors. * Added 'Optimizer' protocol. * Removed 'truncatedNormalInitializer'. * Minor cleanup. * Change `@differentiable` function default arguments from closures to functions. Change `@differentiable` function default arguments from closures to function references. Related issues: - https://bugs.swift.org/browse/TF-690 - https://bugs.swift.org/browse/TF-1030 * Fix non-differentiability error using `withoutDerivative(at:)`. Fix non-differentiability error: ``` swift-models/Models/Text/BERT.swift:292:6: error: function is not differentiable @differentiable(wrt: self) ~^~~~~~~~~~~~~~~~~~~~~~~~~ swift-models/Models/Text/BERT.swift:293:17: note: when differentiating this function definition public func callAsFunction(_ input: TextBatch) -> Tensor<Scalar> { ^ swift-models/Models/Text/BERT.swift:299:58: note: cannot differentiate through 'inout' arguments let positionPaddingIndex = withoutDerivative(at: { () -> Int in ^ ``` By using `withoutDerivative(at:)` at the correct location. * Add code for CoLA task. Add code and data utilities for the CoLA task. Code shared by eaplatanios@. Original sources are listed in comments at the top of each file. This is progress towards end-to-end BERT training. Todo: implement a main function with data loading and training loop. * Add working main function. The BERT for CoLA training loop compiles: https://i.imgur.com/5KyewAg.png Todo: - Fine-tune training so that loss decreases. - Generalize dataset utilities to work with CoLA remote URL. * Tune learning rate schedule, add gradient clipping. Loss still does not steadily decrease: ``` [Epoch: 0] Loss: 0.50369537 [Epoch: 1] Loss: 0.7813513 [Epoch: 2] Loss: 1.0023696 [Epoch: 3] Loss: 0.8235911 [Epoch: 4] Loss: 0.621686 [Epoch: 5] Loss: 0.93954027 [Epoch: 6] Loss: 0.76672614 [Epoch: 7] Loss: 0.45236698 [Epoch: 8] Loss: 0.6538984 [Epoch: 9] Loss: 0.7307098 [Epoch: 10] Loss: 0.90539706 [Epoch: 11] Loss: 0.6684798 [Epoch: 12] Loss: 0.5408703 [Epoch: 13] Loss: 1.113673 ``` * Made some minor edits to get the BERT classifier training to work for CoLA. (tensorflow#293) * Rename "epoch" to "step" in training loop. The training loop operates over minibatches, not batches. Thus, "step" is the correct term, not "epoch". * Add CoLA evaluation. Evaluation reveals the model is not actually learning: ``` True positives: 0 True negatives: 322 False positives: 0 False negatives: 322 ▿ 1 key/value pair ▿ (2 elements) - key: "matthewsCorrelationCoefficient" - value: 0.0 ``` We ought to debug the loss function and BERT classifier class count. * Fix BERT training. Change class count to 2 and use softmax cross entropy. The evaluation metric now improves but sometimes decreases back to zero. The model isn't very stable, perhaps there's more room for improvement. After 80 steps: ``` True positives: 567 True negatives: 170 False positives: 152 False negatives: 170 ▿ 1 key/value pair ▿ (2 elements) - key: "matthewsCorrelationCoefficient" - value: 0.3192948 ``` After 130 steps: ``` True positives: 717 True negatives: 0 False positives: 322 False negatives: 0 ▿ 1 key/value pair ▿ (2 elements) - key: "matthewsCorrelationCoefficient" - value: 0.0 ``` * Make training loop an infinite loop. Improvement todo: make training loop print epochs. * Fixed BERT. (tensorflow#294) Fix various issues: 1. The sigmoid cross entropy loss was applied on logits of shape `[B, 1]` and labels of shape `[B]`. This forced a silent broadcast of logits to shape `[B, B]`, which resulted in the loss not being informative for training. 2. The batch size was too small. I added a comment in the main script code explaining how batching works in my data pipelines. 3. This is a minor one but there is bug with how I was copying the prefetching iterator. This was a temporary solution but I just disabled prefetching for the dev and test sets so that the dev set is the same across runs. This is not currently tuned but it's working. After ~20 steps MCC should be at about 0.28 and after ~200 steps it should be getting close to 0.50. * Minor edits. - Remove extraneous comment. - Remove trailing whitespace. - Change `dump` to `print`. * Temporarily disabled bucketing. * Delete extraneous file. Co-authored-by: Dan Zheng <[email protected]>

@differentiable

* Added initial support for BERT. * Renamed 'LayerNormalization' to 'LayerNorm'. * Added a 'TextModels' SwiftPM target. * Fixed some of the compilation errors. * Added 'Optimizer' protocol. * Removed 'truncatedNormalInitializer'. * Added initial support for BERT. * Renamed 'LayerNormalization' to 'LayerNorm'. * Added a 'TextModels' SwiftPM target. * Fixed some of the compilation errors. * Added 'Optimizer' protocol. * Removed 'truncatedNormalInitializer'. * Minor cleanup. * Change `@differentiable` function default arguments from closures to functions. Change `@differentiable` function default arguments from closures to function references. Related issues: - https://bugs.swift.org/browse/TF-690 - https://bugs.swift.org/browse/TF-1030 * Fix non-differentiability error using `withoutDerivative(at:)`. Fix non-differentiability error: ``` swift-models/Models/Text/BERT.swift:292:6: error: function is not differentiable @differentiable(wrt: self) ~^~~~~~~~~~~~~~~~~~~~~~~~~ swift-models/Models/Text/BERT.swift:293:17: note: when differentiating this function definition public func callAsFunction(_ input: TextBatch) -> Tensor<Scalar> { ^ swift-models/Models/Text/BERT.swift:299:58: note: cannot differentiate through 'inout' arguments let positionPaddingIndex = withoutDerivative(at: { () -> Int in ^ ``` By using `withoutDerivative(at:)` at the correct location. * Add code for CoLA task. Add code and data utilities for the CoLA task. Code shared by eaplatanios@. Original sources are listed in comments at the top of each file. This is progress towards end-to-end BERT training. Todo: implement a main function with data loading and training loop. * Add working main function. The BERT for CoLA training loop compiles: https://i.imgur.com/5KyewAg.png Todo: - Fine-tune training so that loss decreases. - Generalize dataset utilities to work with CoLA remote URL. * Tune learning rate schedule, add gradient clipping. Loss still does not steadily decrease: ``` [Epoch: 0] Loss: 0.50369537 [Epoch: 1] Loss: 0.7813513 [Epoch: 2] Loss: 1.0023696 [Epoch: 3] Loss: 0.8235911 [Epoch: 4] Loss: 0.621686 [Epoch: 5] Loss: 0.93954027 [Epoch: 6] Loss: 0.76672614 [Epoch: 7] Loss: 0.45236698 [Epoch: 8] Loss: 0.6538984 [Epoch: 9] Loss: 0.7307098 [Epoch: 10] Loss: 0.90539706 [Epoch: 11] Loss: 0.6684798 [Epoch: 12] Loss: 0.5408703 [Epoch: 13] Loss: 1.113673 ``` * Made some minor edits to get the BERT classifier training to work for CoLA. (tensorflow#293) * Rename "epoch" to "step" in training loop. The training loop operates over minibatches, not batches. Thus, "step" is the correct term, not "epoch". * Add CoLA evaluation. Evaluation reveals the model is not actually learning: ``` True positives: 0 True negatives: 322 False positives: 0 False negatives: 322 ▿ 1 key/value pair ▿ (2 elements) - key: "matthewsCorrelationCoefficient" - value: 0.0 ``` We ought to debug the loss function and BERT classifier class count. * Fix BERT training. Change class count to 2 and use softmax cross entropy. The evaluation metric now improves but sometimes decreases back to zero. The model isn't very stable, perhaps there's more room for improvement. After 80 steps: ``` True positives: 567 True negatives: 170 False positives: 152 False negatives: 170 ▿ 1 key/value pair ▿ (2 elements) - key: "matthewsCorrelationCoefficient" - value: 0.3192948 ``` After 130 steps: ``` True positives: 717 True negatives: 0 False positives: 322 False negatives: 0 ▿ 1 key/value pair ▿ (2 elements) - key: "matthewsCorrelationCoefficient" - value: 0.0 ``` * Make training loop an infinite loop. Improvement todo: make training loop print epochs. * Fixed BERT. (tensorflow#294) Fix various issues: 1. The sigmoid cross entropy loss was applied on logits of shape `[B, 1]` and labels of shape `[B]`. This forced a silent broadcast of logits to shape `[B, B]`, which resulted in the loss not being informative for training. 2. The batch size was too small. I added a comment in the main script code explaining how batching works in my data pipelines. 3. This is a minor one but there is bug with how I was copying the prefetching iterator. This was a temporary solution but I just disabled prefetching for the dev and test sets so that the dev set is the same across runs. This is not currently tuned but it's working. After ~20 steps MCC should be at about 0.28 and after ~200 steps it should be getting close to 0.50. * Minor edits. - Remove extraneous comment. - Remove trailing whitespace. - Change `dump` to `print`. * Temporarily disabled bucketing. * Delete extraneous file. Co-authored-by: Dan Zheng <[email protected]>

Added initial support for BERT.

0beb166

eaplatanios changed the title ~~Added support for BERT.~~ [WIP] Added support for BERT. Nov 26, 2019

eaplatanios added 4 commits November 26, 2019 12:51

Renamed 'LayerNormalization' to 'LayerNorm'.

e0bec4d

Added a 'TextModels' SwiftPM target.

0c3e803

Fixed some of the compilation errors.

3cbd9e7

Added 'Optimizer' protocol.

cba579d

BradLarson reviewed Nov 26, 2019

View reviewed changes

Models/Text/Initializers.swift Outdated Show resolved Hide resolved

Removed 'truncatedNormalInitializer'.

ac9f7c8

burmako assigned BradLarson Dec 4, 2019

dan-zheng reviewed Dec 20, 2019

View reviewed changes

dan-zheng mentioned this pull request Dec 20, 2019

[AutoDiff] Change TangentVector memberwise initializer access level. swiftlang/swift#28908

Merged

BradLarson reviewed Jan 6, 2020

View reviewed changes

eaplatanios and others added 11 commits January 28, 2020 14:05

Added initial support for BERT.

e36eeb5

Renamed 'LayerNormalization' to 'LayerNorm'.

6103829

Added a 'TextModels' SwiftPM target.

f817dac

Fixed some of the compilation errors.

f8ea3d2

Added 'Optimizer' protocol.

075ebe1

Removed 'truncatedNormalInitializer'.

e5af5cc

Minor cleanup.

9a0d6f5

Change @differentiable function default arguments from closures to …

63219ee

…functions. Change `@differentiable` function default arguments from closures to function references. Related issues: - https://bugs.swift.org/browse/TF-690 - https://bugs.swift.org/browse/TF-1030

Add code for CoLA task.

509c89c

Add code and data utilities for the CoLA task. Code shared by eaplatanios@. Original sources are listed in comments at the top of each file. This is progress towards end-to-end BERT training. Todo: implement a main function with data loading and training loop.

Add working main function.

e674ade

The BERT for CoLA training loop compiles: https://i.imgur.com/5KyewAg.png Todo: - Fine-tune training so that loss decreases. - Generalize dataset utilities to work with CoLA remote URL.

dan-zheng and others added 12 commits January 29, 2020 21:08

Made some minor edits to get the BERT classifier training to work for…

9a00c04

… CoLA. (tensorflow#293)

Rename "epoch" to "step" in training loop.

6761958

The training loop operates over minibatches, not batches. Thus, "step" is the correct term, not "epoch".

Make training loop an infinite loop.

3a210ad

Improvement todo: make training loop print epochs.

Minor edits.

daa720c

- Remove extraneous comment. - Remove trailing whitespace. - Change `dump` to `print`.

Temporarily disabled bucketing.

b7bcaa4

Merge branch 'master' of github.com:tensorflow/swift-models into bert

5b50b48

Merge branch 'bert-wip' of github.com:tensorflow/swift-models into bert

9a73fbe

Delete extraneous file.

ce25ee9

BradLarson added the kokoro:run label Feb 14, 2020

kokoro-team removed the kokoro:run label Feb 14, 2020

BradLarson mentioned this pull request Feb 14, 2020

Improve BERT model #315

Open

4 tasks

BradLarson approved these changes Feb 14, 2020

View reviewed changes

BradLarson merged commit dd3e547 into tensorflow:master Feb 14, 2020

Shashi456 mentioned this pull request Feb 20, 2020

Support Advanced Layers tensorflow/swift-apis#685

Open

22 tasks

xanderdunn mentioned this pull request Dec 15, 2020

Batcher and Parallel Data Processing #440

Closed

[WIP] Added support for BERT. #231

[WIP] Added support for BERT. #231

Uh oh!

Conversation

eaplatanios commented Nov 26, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Shashi456 commented Nov 26, 2019

Uh oh!

Uh oh!

Shashi456 commented Nov 30, 2019

Uh oh!

Shashi456 commented Dec 6, 2019

Uh oh!

BradLarson commented Dec 17, 2019

Uh oh!

eaplatanios commented Dec 20, 2019

Uh oh!

eaplatanios commented Dec 20, 2019

Uh oh!

dan-zheng Dec 20, 2019

Choose a reason for hiding this comment

Uh oh!

eaplatanios Dec 20, 2019

Choose a reason for hiding this comment

Uh oh!

BradLarson Jan 6, 2020

Choose a reason for hiding this comment

Uh oh!

Shashi456 Jan 6, 2020

Choose a reason for hiding this comment

Uh oh!

BradLarson Jan 6, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

BradLarson commented Jan 8, 2020

Uh oh!

dan-zheng commented Feb 14, 2020

Uh oh!

BradLarson left a comment

Choose a reason for hiding this comment

Uh oh!

eaplatanios commented Feb 15, 2020

Uh oh!

Uh oh!

eaplatanios commented Nov 26, 2019 •

edited

Loading

BradLarson Jan 6, 2020 •

edited

Loading