You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* refactor inconsistent/improper namings of write model strategies
* add UpdateOneTimestamps strategy with basic test case
* add documentation and fix variable naming
this solves #44
Copy file name to clipboardExpand all lines: README.md
+96-25Lines changed: 96 additions & 25 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -362,24 +362,27 @@ These settings cause:
362
362
363
363
Note the use of the **"." character** as navigational operator in both examples. It's used in order to refer to nested fields in sub documents of the record structure. The prefix at the very beginning is used as a simple convention to distinguish between the _key_ and _value_ structure of a document.
364
364
365
-
### Custom Write Model Filters
366
-
The default behaviour for the connector whenever documents are written to MongoDB collections is to make use of a proper [ReplaceOneModel](http://mongodb.github.io/mongo-java-driver/3.6/javadoc/com/mongodb/client/model/ReplaceOneModel.html) with [upsert mode](http://mongodb.github.io/mongo-java-driver/3.6/javadoc/com/mongodb/client/model/UpdateOptions.html) and **create the filter documents based on the _id field** as it is given in the value structure of the sink record.
367
-
However, there are other use cases which need a different approach and the **customization option for generating filter documents** can support these.
368
-
A new configuration option (_mongodb.replace.one.strategy_) allows for such customizations. Currently, the following two strategies are implemented:
365
+
### Custom Write Models
366
+
The default behaviour for the connector whenever documents are written to MongoDB collections is to make use of a proper [ReplaceOneModel](http://mongodb.github.io/mongo-java-driver/3.6/javadoc/com/mongodb/client/model/ReplaceOneModel.html) with [upsert mode](http://mongodb.github.io/mongo-java-driver/3.6/javadoc/com/mongodb/client/model/UpdateOptions.html) and **create the filter document based on the _id field** which results from applying the configured DocumentIdAdder in the value structure of the sink document.
***business key** (see description of use case below) at.grahsl.kafka.connect.mongodb.writemodel.filter.strategy.**ReplaceOneBusinessKeyFilterStrategy**
368
+
However, there are other use cases which need different approaches and the **customization option for generating custom write models** can support these. The configuration entry (_mongodb.writemodel.strategy_) allows for such customizations. Currently, the following strategies are implemented:
***business key** (-> see [use case 1](https://github.com/hpgrahsl/kafka-connect-mongodb#use-case-1-employing-business-keys)) at.grahsl.kafka.connect.mongodb.writemodel.strategy.**ReplaceOneBusinessKeyStrategy**
372
+
***delete on null values** at.grahsl.kafka.connect.mongodb.writemodel.strategy.**DeleteOneDefaultStrategy** implicitly used when config option _mongodb.delete.on.null.values=true_ for [convention-based deletion](https://github.com/hpgrahsl/kafka-connect-mongodb#convention-based-deletion-on-null-values)
373
+
***add inserted/modified timestamps** (-> see [use case 2](https://github.com/hpgrahsl/kafka-connect-mongodb#use-case-2-add-inserted-and-modified-timestamps)) at.grahsl.kafka.connect.mongodb.writemodel.strategy.**UpdateOneTimestampsStrategy**
374
+
375
+
_NOTE:_ Future versions will allow to make use of arbitrary, individual strategies that can be registered and easily used as _mongodb.writemodel.strategy_ configuration setting.
376
+
377
+
##### Use Case 1: Employing Business Keys
374
378
Let's say you want to re-use a unique business key found in your sink records while at the same time have _BSON ObjectIds_ created for the resulting MongoDB documents.
375
379
To achieve this a few simple configuration steps are necessary:
376
380
377
381
1) make sure to **create a unique key constraint** for the business key of your target MongoDB collection
378
382
2) use the **PartialValueStrategy** as the DocumentIdAdder's strategy in order to let the connector know which fields belong to the business key
379
-
3) use the **ReplaceOneBusinessKeyFilterStrategy** instead of the default behaviour
383
+
3) use the **ReplaceOneBusinessKeyStrategy** instead of the default behaviour
380
384
381
-
These configuration settings then allow to have **filter documents based on the original business key but still have _BSON ObjectIds_ created for the _id field** during the first upsert into your target MongoDB target collection.
382
-
Find below how such a setup might look like:
385
+
These configuration settings then allow to have **filter documents based on the original business key but still have _BSON ObjectIds_ created for the _id field** during the first upsert into your target MongoDB target collection. Find below how such a setup might look like:
383
386
384
387
Given the following fictional Kafka record
385
388
@@ -398,23 +401,16 @@ together with the sink connector config below
will eventually result in MongoDB documents looking like:
413
+
will eventually result in a MongoDB document looking like:
418
414
419
415
```json
420
416
{
@@ -426,17 +422,91 @@ will eventually result in MongoDB documents looking like:
426
422
}
427
423
```
428
424
429
-
All upsert operations are done based on the unique business key which for this example is a compound one that consists of the two fields _(fieldA,fieldB)_.
425
+
All upsert operations are done based on the unique business key which for this example is a compound one that consists of the two fields _(fieldA,fieldB)_.
426
+
427
+
##### Use Case 2: Add Inserted and Modified Timestamps
428
+
Let's say you want to attach timestamps to the resulting MongoDB documents such that you can store the point in time of the document insertion and at the same time maintain a second timestamp reflecting when a document was modified.
429
+
430
+
All that needs to be done is use the **UpdateOneTimestampsStrategy** instead of the default behaviour. What results from this is that
431
+
the custom write model will take care of attaching two timestamps to MongoDB documents:
432
+
433
+
1)**_insertedTS**: will only be set once in case the upsert operation results in a new MongoDB document being inserted into the corresponding collection
434
+
2)**_modifiedTS**: will be set each time the upsert operation
435
+
results in an existing MongoDB document being updated in the corresponding collection
436
+
437
+
Given the following fictional Kafka record
438
+
439
+
```json
440
+
{
441
+
"_id": "ABCD-1234",
442
+
"fieldA": "Anonymous",
443
+
"fieldB": 42,
444
+
"active": true,
445
+
"values": [12.34, 23.45, 34.56, 45.67]
446
+
}
447
+
```
448
+
449
+
together with the sink connector config below
430
450
431
-
NOTE: Future versions will allow to make use of arbitrary, individual strategies that can be registered and as used for the _mongodb.replace.one.strategy_ configuration setting.
will result in a new MongoDB document looking like:
463
+
464
+
```json
465
+
{
466
+
"_id": "ABCD-1234",
467
+
"_insertedTS": ISODate("2018-07-22T09:19:000Z"),
468
+
"_modifiedTS": ISODate("2018-07-22T09:19:000Z"),
469
+
"fieldA": "Anonymous",
470
+
"fieldB": 42,
471
+
"active": true,
472
+
"values": [12.34, 23.45, 34.56, 45.67]
473
+
}
474
+
```
475
+
476
+
If at some point in time later there is a Kafka record referring to the same _id but containing updated data
477
+
478
+
```json
479
+
{
480
+
"_id": "ABCD-1234",
481
+
"fieldA": "anonymous",
482
+
"fieldB": -23,
483
+
"active": false,
484
+
"values": [12.34, 23.45]
485
+
}
486
+
```
487
+
488
+
then the existing MongoDB document will get updated together with a fresh timestamp for the **_modifiedTS** value:
489
+
490
+
```json
491
+
{
492
+
"_id": "ABCD-1234",
493
+
"_insertedTS": ISODate("2018-07-22T09:19:000Z"),
494
+
"_modifiedTS": ISODate("2018-07-31T19:09:000Z"),
495
+
"fieldA": "anonymous",
496
+
"fieldB": -23,
497
+
"active": false,
498
+
"values": [12.34, 23.45]
499
+
}
500
+
```
432
501
433
502
### Change Data Capture Mode
434
503
The sink connector can also be used in a different operation mode in order to handle change data capture (CDC) events. Currently, the following CDC events from [Debezium](http://debezium.io/) can be processed:
***Oracle**_coming soon!_ ([early preview at Debezium Project](http://debezium.io/docs/connectors/oracle/))
509
+
***SQL Server** ([not yet finished at Debezium Project](http://debezium.io/docs/connectors/sqlserver/))
440
510
441
511
This effectively allows to replicate all state changes within the source databases into MongoDB collections. Debezium produces very similar CDC events for MySQL and PostgreSQL. The so far addressed use cases worked fine based on the same code which is why there is only one _RdbmsHandler_ implementation to support them both at the moment. Debezium Oracle CDC format will be integrated in a future release.
442
512
@@ -462,7 +532,8 @@ The sink connector configuration offers a property called *mongodb.change.data.c
462
532
}
463
533
```
464
534
465
-
**NOTE:** There are scenarios in which there is no CDC enabled source connector in place. However, it might be required to still be able to handle record deletions. For these cases the sink connector can be configured to delete records in MongoDB whenever it encounters sink records which exhibit _null_ values. This is a simple convention that can be activated by setting the following configuration option:
535
+
##### Convention-based deletion on null values
536
+
There are scenarios in which there is no CDC enabled source connector in place. However, it might be required to still be able to handle record deletions. For these cases the sink connector can be configured to delete records in MongoDB whenever it encounters sink records which exhibit _null_ values. This is a simple convention that can be activated by setting the following configuration option:
466
537
467
538
```properties
468
539
mongodb.delete.on.null.values=true
@@ -494,7 +565,7 @@ At the moment the following settings can be configured by means of the *connecto
494
565
| mongodb.key.projection.list | comma separated list of field names for key projection | string | "" || low |
495
566
| mongodb.key.projection.type | whether or not and which key projection to use | string | none |[none, blacklist, whitelist]| low |
496
567
| mongodb.post.processor.chain | comma separated list of post processor classes to build the chain with | string | at.grahsl.kafka.connect.mongodb.processor.DocumentIdAdder || low |
497
-
| mongodb.replace.one.strategy | how to build the filter doc for the replaceOne write model | string |at.grahsl.kafka.connect.mongodb.writemodel.filter.strategy.ReplaceOneDefaultFilterStrategy || low |
568
+
| mongodb.replace.one.strategy | how to build the filter doc for the replaceOne write model | string | ReplaceOneDefaultFilterStrategy || low |
498
569
| mongodb.value.projection.list | comma separated list of field names for value projection | string | "" || low |
499
570
| mongodb.value.projection.type | whether or not and which value projection to use | string | none |[none, blacklist, whitelist]| low |
0 commit comments