Skip to content

Commit 46c9dee

Browse files
committed
Adding tests using First & Last filter operators for deduplicate plugin.
1 parent 94ec33f commit 46c9dee

File tree

4 files changed

+116
-0
lines changed

4 files changed

+116
-0
lines changed

core-plugins/src/e2e-test/features/deduplicate/FileToDeduplicate.feature

Lines changed: 104 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -156,3 +156,107 @@ Feature: Deduplicate - Verification of Deduplicate pipeline with File as source
156156
Then Close the pipeline logs
157157
Then Validate OUT record count of deduplicate is equal to IN record count of sink
158158
Then Validate output file generated by file sink plugin "fileSinkTargetBucket" is equal to expected output file "deduplicateTest3OutputFile"
159+
160+
@GCS_DEDUPLICATE_TEST @FILE_SINK_TEST
161+
Scenario: To verify complete flow of data extract and transfer from File source to File sink using Deduplicate Plugin with Last filter option
162+
Given Open Datafusion Project to configure pipeline
163+
When Select plugin: "File" from the plugins list as: "Source"
164+
When Expand Plugin group in the LHS plugins list: "Analytics"
165+
When Select plugin: "Deduplicate" from the plugins list as: "Analytics"
166+
Then Connect plugins: "File" and "Deduplicate" to establish connection
167+
When Expand Plugin group in the LHS plugins list: "Sink"
168+
When Select plugin: "File" from the plugins list as: "Sink"
169+
Then Connect plugins: "Deduplicate" and "File2" to establish connection
170+
Then Navigate to the properties page of plugin: "File"
171+
Then Enter input plugin property: "referenceName" with value: "FileReferenceName"
172+
Then Enter input plugin property: "path" with value: "gcsDeduplicateTest"
173+
Then Select dropdown plugin property: "format" with option value: "csv"
174+
Then Click plugin property: "skipHeader"
175+
Then Click on the Get Schema button
176+
Then Verify the Output Schema matches the Expected Schema: "deduplicateOutputSchema"
177+
Then Validate "File" plugin properties
178+
Then Close the Plugin Properties page
179+
Then Navigate to the properties page of plugin: "Deduplicate"
180+
Then Enter Deduplicate plugin property: filterOperation field name with value: "deduplicateFieldName"
181+
Then Select Deduplicate plugin property: filterOperation field function with value: "deduplicateFilterFunctionLast"
182+
Then Select dropdown plugin property: "uniqueFields" with option value: "fname"
183+
Then Press ESC key to close the unique fields dropdown
184+
Then Select dropdown plugin property: "uniqueFields" with option value: "lname"
185+
Then Press ESC key to close the unique fields dropdown
186+
Then Enter input plugin property: "deduplicateNumPartitions" with value: "deduplicateNumberOfPartitions"
187+
Then Validate "Deduplicate" plugin properties
188+
Then Close the Plugin Properties page
189+
Then Navigate to the properties page of plugin: "File2"
190+
Then Enter input plugin property: "referenceName" with value: "FileReferenceName"
191+
Then Enter input plugin property: "path" with value: "fileSinkTargetBucket"
192+
Then Replace input plugin property: "pathSuffix" with value: "yyyy-MM-dd-HH-mm-ss"
193+
Then Select dropdown plugin property: "format" with option value: "csv"
194+
Then Validate "File" plugin properties
195+
Then Close the Plugin Properties page
196+
Then Save the pipeline
197+
Then Preview and run the pipeline
198+
Then Wait till pipeline preview is in running state
199+
Then Open and capture pipeline preview logs
200+
Then Verify the preview run status of pipeline in the logs is "succeeded"
201+
Then Close the pipeline logs
202+
Then Close the preview
203+
Then Deploy the pipeline
204+
Then Run the Pipeline in Runtime
205+
Then Wait till pipeline is in running state
206+
Then Open and capture logs
207+
Then Verify the pipeline status is "Succeeded"
208+
Then Close the pipeline logs
209+
Then Validate OUT record count of deduplicate is equal to IN record count of sink
210+
Then Validate output file generated by file sink plugin "fileSinkTargetBucket" is equal to expected output file "deduplicateTest5OutputFile"
211+
212+
@GCS_DEDUPLICATE_TEST @FILE_SINK_TEST
213+
Scenario: To verify complete flow of data extract and transfer from File source to File sink using Deduplicate Plugin with First filter option
214+
Given Open Datafusion Project to configure pipeline
215+
When Select plugin: "File" from the plugins list as: "Source"
216+
When Expand Plugin group in the LHS plugins list: "Analytics"
217+
When Select plugin: "Deduplicate" from the plugins list as: "Analytics"
218+
Then Connect plugins: "File" and "Deduplicate" to establish connection
219+
When Expand Plugin group in the LHS plugins list: "Sink"
220+
When Select plugin: "File" from the plugins list as: "Sink"
221+
Then Connect plugins: "Deduplicate" and "File2" to establish connection
222+
Then Navigate to the properties page of plugin: "File"
223+
Then Enter input plugin property: "referenceName" with value: "FileReferenceName"
224+
Then Enter input plugin property: "path" with value: "gcsDeduplicateTest"
225+
Then Select dropdown plugin property: "format" with option value: "csv"
226+
Then Click plugin property: "skipHeader"
227+
Then Click on the Get Schema button
228+
Then Verify the Output Schema matches the Expected Schema: "deduplicateOutputSchema"
229+
Then Validate "File" plugin properties
230+
Then Close the Plugin Properties page
231+
Then Navigate to the properties page of plugin: "Deduplicate"
232+
Then Enter Deduplicate plugin property: filterOperation field name with value: "deduplicateFieldName"
233+
Then Select Deduplicate plugin property: filterOperation field function with value: "deduplicateFilterFunctionFirst"
234+
Then Select dropdown plugin property: "uniqueFields" with option value: "fname"
235+
Then Press ESC key to close the unique fields dropdown
236+
Then Select dropdown plugin property: "uniqueFields" with option value: "lname"
237+
Then Press ESC key to close the unique fields dropdown
238+
Then Enter input plugin property: "deduplicateNumPartitions" with value: "deduplicateNumberOfPartitions"
239+
Then Validate "Deduplicate" plugin properties
240+
Then Close the Plugin Properties page
241+
Then Navigate to the properties page of plugin: "File2"
242+
Then Enter input plugin property: "referenceName" with value: "FileReferenceName"
243+
Then Enter input plugin property: "path" with value: "fileSinkTargetBucket"
244+
Then Replace input plugin property: "pathSuffix" with value: "yyyy-MM-dd-HH-mm-ss"
245+
Then Select dropdown plugin property: "format" with option value: "csv"
246+
Then Validate "File" plugin properties
247+
Then Close the Plugin Properties page
248+
Then Save the pipeline
249+
Then Preview and run the pipeline
250+
Then Wait till pipeline preview is in running state
251+
Then Open and capture pipeline preview logs
252+
Then Verify the preview run status of pipeline in the logs is "succeeded"
253+
Then Close the pipeline logs
254+
Then Close the preview
255+
Then Deploy the pipeline
256+
Then Run the Pipeline in Runtime
257+
Then Wait till pipeline is in running state
258+
Then Open and capture logs
259+
Then Verify the pipeline status is "Succeeded"
260+
Then Close the pipeline logs
261+
Then Validate OUT record count of deduplicate is equal to IN record count of sink
262+
Then Validate output file generated by file sink plugin "fileSinkTargetBucket" is equal to expected output file "deduplicateTest6OutputFile"

core-plugins/src/e2e-test/resources/pluginParameters.properties

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -174,6 +174,8 @@ deduplicateFileCsvFile=testdata/file/CSV_DEDUP_TEST.csv
174174
deduplicateFilterFunctionMax=Max
175175
deduplicateFilterFunctionMin=Min
176176
deduplicateFilterFunctionAny=Any
177+
deduplicateFilterFunctionLast=Last
178+
deduplicateFilterFunctionFirst=First
177179
deduplicateFieldName=fname
178180
deduplicateFilterOperation=cost:Max
179181
deduplicateNumberOfPartitions=2
@@ -185,6 +187,8 @@ deduplicateTest1OutputFile=e2e-tests/expected_outputs/CSV_DEDUPLICATE_TEST1_Outp
185187
deduplicateTest2OutputFile=e2e-tests/expected_outputs/CSV_DEDUPLICATE_TEST2_Output.csv
186188
deduplicateTest3OutputFile=e2e-tests/expected_outputs/CSV_DEDUPLICATE_TEST3_Output.csv
187189
deduplicateMacroOutputFile=e2e-tests/expected_outputs/CSV_DEDUPLICATE_TEST4_Output.csv
190+
deduplicateTest5OutputFile=e2e-tests/expected_outputs/CSV_DEDUPLICATE_TEST5_Output.csv
191+
deduplicateTest6OutputFile=e2e-tests/expected_outputs/CSV_DEDUPLICATE_TEST6_Output.csv
188192
## Deduplicate-PLUGIN-PROPERTIES-END
189193

190194
## GROUPBY-PLUGIN-PROPERTIES-START
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
alice,smith,30.21,56789
2+
bob,jones,30.64,23456
3+
alice,jones,500.93,67890
4+
bob,smith,0.50,45678
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
bob,jones,30.64,23456
2+
alice,smith,1.50,34567
3+
bob,smith,50.23,12345
4+
alice,jones,500.93,67890

0 commit comments

Comments
 (0)