Friday 24 October 2014

Quality Stage Interview Questions

QS Interview Questions




10)   How many outputs investigate stage can support?
Ans: That depends up on the kind of options you are selecting in investigate stage. Word investigation it can support at max 2 rest all 1
11)   How many types of investigation we can perform on the data by using Investigation Stage?
Ans: Character concatenate investigation, character discrete investigation and word investigation (Token and Pattern)
12)   What is Word Investigation and how it will helps the Business?
Ans: Using word we can create two kinds of reports. Token report will gives you the information about tokens (single words) coming form source. Pattern report will reveal the kind of pattern we are getting from source (Pattern: Combination of user defined and default classification codes combination)
13)   What type of reports we can generate in Word Investigation?
Ans: Pattern Report and Token Report
14)   What is the Character discrete Investigation and how it will helps the Business?
Ans: By using character discrete investigation we can identify the type of data we are getting. If you want we can mask the data also by selecting the mask. It will give you the sample also for each pattern.
15)   What is the character concatenate Investigation and how it will helps the Business?
Ans: Similar to the character discrete the only thing is on the combination of columns it will give you the results.
16)   What are masks and its importance?
Ans: Masks are used as part of investigate stage to identify the kind of data we are getting. We can also skip the part of data from the consideration.
17)   What is type C mask?
Ans: it will display the actual character value you are getting from source column.
18)   What is type T mask?
Ans: It will give the data type of the incoming data. For alphabets it gives a, for numeric it gives n and for special characters it will give as it is.
19)   What is type X mask?
Ans: By using this mask we can hide the data from the consideration.
20)   What is the result for the below string if you use mask C?
02116 ->ccccc 
         Ans: 02116
21)   What is the result for the below string if you use mask CX?1234-> cxcc
Ans: 134
22)   What is the result for the below string if you use mask T?  013-345-> all Ts
Ans: nnn-nnn
23)   What is the result for the below string if you use mask T?abc123-> all Ts
Ans: abcnnn
24)   What is the result for the below string if you use mask C?(123)123-123 -> CCCCCCCCCCCC
Ans: (123)123-123
25)   What is the result for the below string if you use mask CX?(123)123-> XCCCXCCC
Ans: 123123
26)   If I want to send only the patterns which have more than 5 matched records then what option I have to mention in Investigate change and how ? What is default?
Ans: In Investigate stage advanced options tab you have to mention 5 for the field frequency cut off. Default is 1.
27)   If I want to display more than one sample record for each pattern what option need to mention and where ? What is default?
Ans: In Investigate stage advanced options tab you have to mention the value as more than 1 for the field number of samples. Default is 1.
28)   What are the output field names are in Investigate stage( character discrete and character concatenate)? Importance?
Ans: qsInvColumnName: Identifies the names of the column that is investigated.
qsInvPattern: Displays the character and includes the character in the frequency count and pattern analysis.
qsInvSample: Shows one or more samples of the content of this column. The number to be displayed is configurable.
qsInvCount: Shows the actual number of occurrences of the value in the qsInvPattern column.
qsInvPercent: Shows the percentage occurrences of the value in the qsInvPattern column to the total number of records on this file.

29)   Which column in Investigate output columns will gives you the frequency values?
Ans: qsInvCount
30)   How many columns we may have in Token Report? Importance?
Ans: qsInvCount :Indicates the number of times this token was encountered across the entire input data.
qsInvWord: Identifies the individual token or word value that is found inside the selected input columns.
qsInvClassCode: Identifies the classification of the token that is based on the selected rule set classification table. Unclassified tokens, if selected, get a question mark “?” for alpha or a carat “^” for numeric.
31)   What are the Default Classification codes? Are these common across QS?
Ans: A,+,?,<,>,@ ,~\# etc. and these are common across the QS
32)   +,?, <,> and @ what are these? In what situation we will get these in the report?
Ans: These are default classification codes. When the input tokens are not identified by the classification table then the default classification codes will be applied for the tokens.
33)   What is pattern report?
Ans: It will gives the what kind of patterns we are getting form the source.
34)   What mechanism will be used by QS to split the source data in to Tokens?
Ans: SEPLIST and STRIPLIST
35)   What is the SEPLIST?
Ans: The characters which you mentioned in the SEPLIST will act as token separators and also act as tokens.
36)   What is the STRIPLIST?
Ans: what ever the characters you mentioned in the STRIPLIST will remove from the token consideration.
37)   SEPLIST/STRIPLIST which one will execute first?
Ans: SEPLIST
38)    What will be the output?
Input String
300 St.David’s Street,USA
SEPLIST
Space.,
STRIPLIST
Space.,
How many tokens will get and what are they ?
5 tokens. Those are 300,St,David’s,Street and USA

39)   What is the mask key for numeric values?
Ans: n
40)   What is the mask key for alpha values?
Ans: a
41)   What is the mask key for special characters?
Ans: as it is it will display
42)   What is the pattern report for the string 120 main streets apt 6c if you are using US Rule set?
Ans: ^?TU>
43)   Character discrete investigate examines a single domain? T/F
Ans: True
44)   Word investigation examines a single domain?(T/F)
Ans: False
45)   What is the use of Standardize stage?
Ans: By using standardize stage we can do spell correction and convert the data into standard format or consistent format and it will validate the source data. By using this we can overcome all the quality issues.
46)   What are the components in prebuilt rule set or in every rule set?
Ans: Every rule set will have 5 components like classification file, dictionary file, Pattern action file, Overrides and reference tables.
47)   Which stage gives you fully cleans information? How?
Ans: By using standardize stage we can get the cleanse information. We have  many existed rule sets by using them we can get cleansed data.
48)   What is the default key value for single numeric?
Ans:  ^
49)   What is the default key value for one or more unknown alphas?
Ans: ?
50)   What is the default key value for single unclassified alpha (word)?
Ans: +
51)   What is the Complex mixed token?
Ans:  @ ex:C3PO
52)   What is the Leading numeric token?
Ans: > ex:6A
53)   What is the trailing numeric token?
Ans: < ex:A6
54)   What is the default key value for street type Standardize Stage?
Ans: T
55)   What is the default key value Unit in Standardize Stage?
Ans: U
56)   What is classification file and its importance?
Ans: It contains the tokens, corresponding standard values to tokens, Threshold weights and comments. The file will be used by the PAT file to identify the tokens coming from the source.

57)   What is the syntax to create the classification file?
Ans: token /standard value/class /[threshold-weights]/ [; comments]
58)   What is dictionary file? Its importance?
Ans: It contains the Meta data of the columns generated by the standardize stage.
59)   Syntax in creating the Dictionary File?
Ans: field-identifier/ field-type/ field-length/missing value/description
60)   What is Pattern action file and its importance?
Ans: It contains the set of action statements to handle the patterns generated by classification table and SEPLIST/STRIPLIST.
61)   What are override and lookup tables and importance?
Ans: Any updates done by the user from GUI will be handled by overrides tables. Reference tables are just like look up tables in DS just to check the incoming data in a given list of values.
62)   What are the types of rule sets available in Standardize stage?
Ans: Country Identifier
Domain Preprocessor
Domain Specific
Validation
63)   What is country rule set and its importance?
Ans: If we are getting multi nations data in a single source before passing this data to actual standardize stage need to split the data country wise because we have country wise rule sets in standardize stage. To categorize the data country wise will use country rule set.
64)   If the country rule set not able to identify the country coming from the data then what country code it will assign?
Ans: The default country code what ever you mention in the job.
65)   What are the output columns generated by the COUNTRY rule set? Use of them?
Ans: it will generate two extra columns along with the actual columns like country identifier flag and ISO country code. If the country is identified by the rule set then you will get y for flag and respective ISO country code for the country code column. If it is not identified then N for flag and default country code for country code column.
66)   What are the domain specific rule sets? When we will apply these rule sets on the data?
Ans: Once the data categorized into country wise or we are receiving specific country then will go for Domain specific rule sets.
67)   What are the domain pre-processor rule sets? When we will apply these rule sets on the data?
Ans: If we are receiving multiple domains data into single column then to split the data into proper domains we will apply pre-processor rule sets on the data then the output of the pre-processor rule sets will pass to the domain specific rule sets.
68)   What are the NYSIIS and Soundex algorithms and where these algorithms will execute?
Ans: These are the two algorithms used to identify the matched records. These two algorithms will execute as part of the PAT file to identify the matched records.
69)   What are the output columns generated by the standardize stage and the importance of few important columns?
Ans: The standardize stage will generate around 20 to 25 columns along with the input columns. Out of these few important are input pattern, unhandled pattern and unhandled text. Depending on these columns will separate the valid data from invalid data.
70)   What is Input pattern, input text, unhandled pattern and unhandled text in the standardize stage?
Ans: Input Pattern: It gives the input text pattern.
Input Text: It will give the actual value of the source
Unhandled pattern: it will give pattern for the unhandled data.
Unhandled data: it will give the actual value which is not processed by the standardize stage.
71)   What is threshold value and its importance?
Ans: Threshold value will be used to identify the matched records. Tis value should be in between 0 to 900. 900 means exact match. 850 means one character mismatch etc.
72)   What is the importance of custom rule sets?
Ans: If the existed rule sets are not sufficient to handle the source data then we will build custom rule sets with our own logic.
73)   How will you build the custom rule sets?
Ans: Right click on any of the folder ---  > new ---> Data Quality --- > Rule set
74)   Can we modify the existed rule sets?
Ans: we can not modify the existed rule sets directly because those will be in read only mode. If you want you can take the copy of the existed rule set and do your changes on the copy rule set.
75)   What is WAVES and its importance?
Ans: WAVES is one of the stage provided by QS to validate the Address fields.
76)   What in MNS and its importance?
Ans: MNS is also provided by QS to validate the address fields.
        Note: Both WAVES and MNS are old stages provided by QS. From 8.7 onwards instead of these 2 stages AVI added it can do same work what these both stages can do
77)   What is SQA stage and its importance?
Ans: SQA – Standardize Query Assessment. SQA can create report in graphical format for the outputs generated by standardize stage.
78)   What is the execution order of the Investigate, Standardize, Match and Survive Stages?
Ans: Investigate -- > Standardize Stage --> Match - - > Survive
79)   How can I get the substring of a string in pattern action language?
Ans: +[{}(1:3)=”IBM”] it will take the first 3 characters of the unclassified token and that should be equal to IBM
80)   What is the importance of copy_s statement?
Ans: we can preserve the spaces between words by using this statement.
Ex:
^ | ? | T
COPY [1] {HouseNumber}
COPY_S [2] {StreetName}
COPY_A [3] {StreetSuffixType}
81)   What is the importance of copy statement?
Ans:  Dictionary columns can be copied to the other dictionary columns or to user variables.
Ex:
COPY {HouseNumber} {HC}
COPY {HouseNumber} temp

82)   What is bucketing?
Ans: The process of moving the data to dictionary field columns after the execution of PAT statements called Bucketing the data.
83)   What is the country code identifier for United States?
Ans:  ZQUSZQ
84)   How the pattern action language is define the below string (10 Hollow Oak Road)?    
Ans: Pattern (^ ? T) , standard form( HN SN ST)
85)   What is the use of lookup table?
Ans: Just to check the incoming data with the given list of values. This list of values we will put in reference table.
86)   Please explain How the Standardization Stage is process the below string ( 10 MAPPLE STREET APARTMENT 222)
1)      First SS do the parsing of data it means it divide the data in to tokens like
PARSING
10
MAPPLE
STREET
APARTMENT
222

2)      Using classification table it assigns the default key words to the above tokens likes
PARSING
10
MAPPLE
STREET
APARTMENT
222
CLASSIFICATION
^
?
T
U
^

3)       Using dictionary file it defines the output columns like
PARSING
10
MAPPLE
STREET
APARTMENT
222
CLASSIFICATION
^
?
T
U
^
DICTIONARY FILE
HOUSENUMBER
STREET NAME
STREET TYPE
UNIT TYPE
UNIT
4)      Finally using pattern action file SS defines the standard format data.
PARSING
10
MAPPLE
STREET
APARTMENT
222
CLASSIFICATION
^
?
T
U
^
DICTIONARY FILE
HOUSENUMBER
STREET NAME
STREET TYPE
UNIT TYPE
UNIT
PATTERN FILE
10
MAPPLE
ST
APT
222
87)   What are the standardize results?  Business intelligence fields, Matching fields, Reporting fields
Ans: These 3 types of fields are nothing but the categories of dictionary file columns.
88)   What are the Business intelligence fields?
Ans: Parsed from the original data, they may be used in matching and generally they are moved to the target system. Ex First Name, Generational Unit types, Box types, Zip5
89)   What are the matching fields?
Ans: Generally these fields are created to help during the match process and are dropped after successful matching Ex: Phonetic Coding(NYSIIS), Hash Keys(First 2 characters of the first fiver words), Packed keys( Data concatenated)
90)   What are the reporting fields?
Ans: Specifically created to help review results of standardize and recognized handled and unhandled data. Ex: Unhandled pattern (The pattern for tokens not processed by the selected rule set) ,Unhandled Data(The remaining tokens not processed by the selected rule set) , Input Pattern( The pattern generated for the stream of input tokens based on the parsing rules and token classifications.)
91)   In which situations you can modify the rule set?
Ans: The existed PAT file logic or the token mentioned in the classification file etc not sufficient to handle the data then you can modify the rule sets accordingly to handle the data.
92)   What are the extensions for classification file, Dictionary file and Pattern action file?
Ans: .cls for classification and .dct for dictionary and .pat for pattern action file.
93)   Suppose you find that “SMITH”  is a unhandled data how can you override the classification? 
Ans: 
1)      Open override table
2)      Enter the ‘SMITH’  value in the Token value
3)      Enter the ‘SMITH’ value in the Standard Value
4)      Enter “F” (First Name) in the Class field
94)   “JOHN SMITH”  is the valid names in the USNAME.CLS table what is the value pattern?
 Ans:  F L(First Name, Last Name)
95)   What is the use of User overrides?
Ans: It provides the user with the ability to modify rule sets
96)   Which types of rule sets can be modified using user overrides?
Ans:  Domain Pre-processor rule sets and Domain rule sets
97)   What are the pre require information to override?
Ans: Dictionary field name to move the token to , original value or standard value of token , leading space or no leading space for multiple tokens moved to the same dictionary field.
98)   What is ‘Input Text overrides’?
Ans:  It allow the user to specify overrides based on an entire test string
99)   WAVES can standardize name fields (T/F)?
Ans:  False
100)                       Rule sets are used in standardization processing (T/F)?
Ans: True
101)                       When you create a copy of existed rule set what components are copied?
Ans: Classification file, dictionary file, pattern action file and overrides.
102)                       ^ | D | ? | T how many operands this pattern has?
Ans: 4
103)                       What pattern action command denotes the universal class?
Ans: **
104)                       When would you use the 'At Least One' survive technique?
Ans:  You want to ensure that a record from each match group survives.
106 )Which set of processes describes the proper order of steps applied during the Standardize process?
                    
Ans :Parse, Tokenize, Bucket.

107) When defining the match variable Last Name to be used in the match, what would be the
appropriate match type to use in the processing logic?
Ans. UNCERT

108) In a reference match, which match option would allow a single reference source record to match a many data source records?
Ans: Many to One

109) Which of the following would you make an entry in to remove all occurrences of a character from the input data?
Ans: Striplist

110) What is the pattern for the below string?  1ST & MAIN ST 20
Ans: >|\&|?|T
111) What is the pattern for the below string? '115 1/2 South Oak St'
Ans: ^|^|/|^|D|?|T
112) what is Match Stage and its importance?
Ans: It will identifies the duplicates on the standardize data.
113) Which file we will pass as input to the Match Stage?
Ans: Standardize stage out put file and Match frequency file.
114) What is the importance Match frequency stage and its importance?
Ans: It will generate the frequency of the data in a form which can understand by the Match stage
115 ) What is the default frequency value?
Ans: 100
116) What are the output columns generated by the Match Frequency Stage?
Ans: qsFreqValue, qsFreqCounts,qsFreqColumnID,qsFreqHeaderFlag
117) What is Blocking columns in Match Specification?
Ans: Blocking provides a method of limiting the number of pairs to examine. In other words we can say to process the data by likely records.
118) What is Match Specification?
Ans: Match specification is one of the QS Match component which will contains the mechanism to identify the duplicate records, clerical records, Master records and residual records. This match specification we need to call as part of the Match stages.
119) What is Match Commands in Match Specification?
Ans: It is the process to identify the weights for the input records depends on the columns you mentioned in Match commands.
120) What are the cutoff values?
Ans: The cut off values are used to classify the weighted records into Master, duplicate, clerical and residual records.
121) What is m- probability?
Ans: It is defined as the probability of the variable agreeing in a matched pair.
122) What is u – probability?
Ans: The u- probability can e approximated as the probability that a field agrees at random.
123) What is Master  Record?
Ans: The maximum weighted record will treat as master record nothing but ht e duplicate record.
124) What is Duplicate Record? What is the strategy to identify duplicate records?
Ans: If the record is getting the weight more than the match cut off then that record will treat as duplicate record.
125) What is Clerical Records? What is the strategy to identify clerical values?
Ans: the duplicates that fall in the clerical range.
126) What are agreement and disagreement weights?
Ans: If the compared fields matched it will assign the agreement weight. It is always positive.
if the compared records are not matched then it will assign the disagreement weight. It is negative value.
127) What are the different kind of Match Comparison types available?
Ans: Match comparison contains the logic to determine which columns matched and which columns do not match. There are around 25 to 30 comparison types based on your requirement you have to select necessary comparison type.
128) Explain about CHAR and CNT_DIFF Comparison types?
Ans: CHAR: Compares data values on a character-by-character basis. This comparison is often used to catch spelling mistakes or inverted letters
CNT_DIFF: Compares two strings of numbers and assigns agreement or disagreement weights based on the number of differences between the numbers in the strings. Weights are prorated according to the magnitude of the disagreement.
129) What are the types of Match Stages?
Ans: Unduplicate Match stage and Reference Match Stage
130) What is Unduplicate Match Stage?
Ans: Unduplicate match locates and groups all similar records within a single input data source. This process identifies potential duplicate records, which might then be removed.
131) What is Reference Match Stage?
Ans: Reference Match identifies relationships among records in two data sources.
132) What are the different kind of match types we can perform by using Unduplicate Match Stage?
Ans: Unduplicate dependant, unduplicate transitive and unduplicate independent
133) What are the different kind of Match types we can perform by using reference match stage?
Ans: Reference one to one, one to many multiple, one to many duplicate and many to one.
134) Dependant, independent and transitive match types?
Ans:

135) one to one, Many to one duplicate, Many to one multiple and Many to one match types?
Ans:

136) what is residual record?
Ans: Residual records are non duplicate records. The frequency value for these records will be less than clerical cut off value.
137) What are the output columns generated by Match stage?
Ans:  see the below question answer.
138) what are qsMatchSetID,qsMatchDataID,qsMatchType columns and its importance?
Ans:
1.       qsMatchSetID: it will assign the  same set id for the similar records. In the same set id you can see the Master Records, Duplicate records and Clerical Records.
2.       qsMatchType: we will get the codes to identify master, duplicate and clerical records. MP – Master Records, DA – Duplicae Records, CA – Clerical Records. These records will be identified by the cutoff values mentioned in the specification.
3.       qsMatchPassNumber : If we have multiple passes in the match specification it will gives the pass number where the records identified as duplicate, master or Clerical.
4.       qsMatchWeight: It will gives us the weights generated by the match specification. The weight is consolidated weight both agreement and disagreement weight. These weights will be calculated the m – probability and u – probability specified in the specification.
5.       qsMatchDataID: it will gives us the source row number.
6.       qsMatchPattern: The qsMatchPattern output column contains one character for each of the              
          first 16 match comparisons in the match pass where the pair matched. Each character   
           indicates the result of the match comparison that it corresponds to.

                                        The character can be one  of the following values:

                                         0 -  Indicates one of the following conditions:
                                         The values are not missing and the values for the column disagree.If the match pass contains           
                                         fewer than 16          match comparisons, no match comparison occurred. For example, if the        
                                          qsMatchPattern column is set to 3321000000000000, the match pass might contain only   
                                          four match comparisons.
                                          1 -  Indicates that both values are missing.
                                          2 - Indicates that one of the values is missing.
                                          3 - Indicates that the values agree. Both values are present and the weight for the  
                                               comparison is above the missing weight for the column. By default, the missing weight is  
                                                set to 0.
7.       qsMatchLRFlag:

The qsMatchLRFlag output column indicates the house number interval that a particular address matches. Possible values for this column are L, which indicates that an address matches the first interval in the comparison, and R, which indicates that an address matches the second interval.

139) What is qsMatch pattern column in Match output columns?
Ans: see the above question answer
140) How the Match stage calculates the composite weight for the input data? ( formulas for M-probability and u probability )
Ans:  m – probability: An agreement weight is computed when the comparison between a pair of columns agrees.
log2(m probability / u probability)
u – probability: A disagreement weight is computed when the comparison between the pair of columns disagrees.
log2((1 - m probability)/(1 - u probability))

141) What is survive stage and its importance?
Ans: The Survive stage consolidates duplicate records, which creates a best-of-breed representation of the matched data. Survive consolidates duplicate records, creating the best representation of the match data so companies can use it to load a master data record, cross-populate all data sources, or both.
142) What is the input for the Survive stage?
Ans: The match stage output will act as an input to the survive stage
143) What are the different kind of survive techniques available?
Ans:
144) What is complex survive expression and its importance?
Ans: By using this we can implement complex survive logics to identify the best breed record.
145) What are b.colname and c.colname?
Ans: b is best record  and c is current record
146) What will happen if we mention “Longest” technique for the Target column?
Ans: It will compare the both the records and what ever the record has longest value in a column which we are using to compare it will pass the longest string to the output.
147) What will happen if we select “Most Frequent( Non blank )” technique for the target column?
Ans: it will send the strings to the output which are coming many times with out any blanks by comparing multiple records.
148) What is the importance of AVI stage?
Ans: It will validate the address fields coming from the source and will give us the accurate information by correcting the source data if need. This one is newly added in DS 8.7
149) What is the use of reference data available for AVI?
Ans: Reference files are just like reference tables. AVI will take the source value and check in the reference files. If any change need it will correct the data as mentioned in the files otherwise it will send the data as it is.
150) What is the difference between AVI and Standardize stage?
Ans: AVI stage will work on the actual data but standardize stage will work on patterns.
151) can we do name validation by using AVI?
Ans: no
152) What are the fast paths in AVI and importance?
Ans: In AVI we have 4 fast paths. By using this we can provide the required properties in AVI in fast manner
153) What kind of reports we can generate in AVI?
Ans: Suggestion report and correction report.
154) What is Suggestion report and importance?
Ans: As part of the suggestion report it will give us the suggestions for the data. It will check each source record in reference file and if it found any suggestions it will include those records also in the output file as suggestions.
155) What is correction report and its importance?
Ans: Ii will take the source data and will check in the reference files and if it found any correction to the incoming data it will directly correct that record while sending the record to output.
156) If any wrong data we are getting as part of source data can AVI validate it and correct the data?
Ans: Yes it corrects.
157) What are the status codes generated by AVI and each code importance?
Ans:
Value
Description
0
Field is not applicable to return field status
1
Verified using reference data; no changes needed
2
Verified using reference data; an alias change was made
3
Verified using reference data; a small spelling change was made
4
Verified using reference data; a large spelling change was made
5
This field was added from the reference data
6
Identified using lexicon data; no changes needed
7
Identified using lexicon data; an alias change was made
8
Field was identified using context rules
9
Field is empty
10
Field was unrecognized

158) What is the error output file in AVI and its importance?
Ans: generates separate output for error codes and error messages.
159) What are the different kind of error codes and messages we have in AVI?
Ans:
Table 1. Error codes and messages for unprocessed records
Value
Description
204
Country or region is not recognized
205
Country or region postal validation reference file - not found
206
Country or region postal validation reference file - wrong format or the data is corrupted
207
Country or region postal validation reference file - access denied
AD_STAT_CRR_NOT_INITIALISED = -9
This error is triggered if the Validate or Parse method of the AddressDoctor object is called without calling the Initialize method first. Ensure that the AddressDoctor object always calls the Initialize method before the Validate or Parse methods are called for the first time.
AD_STAT_CERR_ILLEGAL_ACCESS_CODE = -151
The unlock code supplied in the Initialize call is incorrect. Please check the code. When using the C interface, please make sure to escape all occurrences of the \ character to ensure proper interpretation of the code.
AD_STAT_WR_PRELOADING_FAILED = 158
Postal validation reference files that were added to the preloading object could not be preloaded. The most likely reason for this error is insufficient memory to preload the file. However, insufficient access rights to the reference files can trigger this error as well.
AD_STAT_ERR_NOT_POSSIBLE_ FOR_SELECTED_ELEMENT = 201
This error is triggered if a method or property can not be called for the selected CurrentView element of the AddressObject. In most cases, this error is triggered if setting conflicts occur. Examples are setting LINE_1 after setting LINES_ALL or setting STREET after setting DELIVERY_ADDRESSLINE.
AD_STAT_ERR_COUNTRY_DB_NOT_FOUND = 205
The postal validation reference file could not be found. Ensure that you have the required postal validation reference file available in the path that is supplied in the Initialize call.

160) What is validation summary report and its importance
Ans: It will gives the information about the total number of records passed to AVI, how many are identified and validated and how many are not identified ec.
161) What is verification level in AVI and what are different types of verification levels we have in AVI?
Ans: It will gives the information about to which level your input data is matched.
Table 5. Address levels used in statuses and descriptions
Level
Description
5
Delivery point, post office box, or subbuilding.
4
Premise or building.
3
Thoroughfare. An example might be a suburb, a neighborhood, or section of a town or city.
2
Locality. An example might be an area that is outside of a city or town but within a specific radius.
1
Administrative area. An example might be a province, district, or county.
0
No validation processing performed. Addresses that cannot be validated or corrected are left unchanged in the output.

162) What are the columns added by AVI to the output?
Ans: what ever the columns you are passing to the AVI it will give all those columns as it is and also it will generate few other columns also ending with ‘_QSAV’. Few important columns are Accuracy code, Status code etc.
163) What is Accuracy code and its importance?
Ans: It will gives you the information about to what level your input data matched and to what level your data has been changed and validated.
Ex: V44-I44-P8-100 ( For more information see the QS 8.7 Doc )

164) What is Datarules satge and its importance?
Ans: Data rules stage newly added in QS 8.7. By using this we can do IA work from QS itself with out going to IA. By using this stag ewe can build rules how we can create in IA and run those rules on incoming data.
165) What is binding in Data Rule?
Ans: Mapping the input column to the rule definition
166)What are the different kind of functions we have in Data rules stage?
Ans: we have several functions in rule stage like we have in IA.
167) What is rule logic?
Ans: it is a combination functions and variables and set of operators.
168) How can we create new rule?
Ans: By selecting the create new option available in the data rules window and in rule logic block we have to mention the actual logic of the rule. Need to validate the rule logic  before mapping the columns to the rule.
169) What are the different kind of components we have as part Data rules stage?
Ans: we have different set of components like input links, rule variables, selected rule definitions, input etc.
170) How can we publish the rules?
Ans: After creation of rule logic if you want to make it global then you have to publish the rule logic by selecting the ‘publish’ option available in rules window.
171) What are the columns generated by the rules stage?
Ans: Data rules stage will not generate any stage generated columns.
172) How can we select the invalid data?
Ans: Depending up on the type of rule you have created it will generate the out put file with the data.


Some More questions

What is Data Quality? Why we need Data Quality in Today’s Business?
2.     Ans:    The data which you can trust. To get the accuracy and consistent or standard data in to the DWH
3.     What type of data quality changes you faced?
4.       Ans: Inconsistent standards, Spell mistakes, Default or invalid values, Buried Information, Data anomalies and Data surprises etc
5.       What are different Quality Tools available? Differences between IA and QS?
6.     Ans: From IBM two tools are available like IA and QS. For QS you need some technical skills and also by using QS we can modify/standardize the data. IA we don’t need much technical skills and also using IA we can not standardize the data just to see what kind of patterns and types of the data we are getting.
7.     What are the stages in WebSphere quality stage process?
8.     Ans: Investigate, Standardize, Match frequency, Unduplicate Match, Reference match, MNS, WAVES, AVI (Added in 8.7) and Data rules stage (Added in 8.7)
9.     Which stage gives you complete visibility into the actual condition of data? How?
10. Ans: By using Investigate stage we can get complete visibility of the data by creating the pattern and token reports and also character concatenate and character discrete reports.
11. In which stage you can build the best available view of related information?
12. Ans: Standardize stage
13. What is the use of investigate stage?
14. Ans: Investigate source data to understand the nature, scope, and detail of data quality challenges.
15. How many input links support the investigate stage?
16. Ans: Only one.
17. Can we give data stage output as input to the investigate stage?
18. Ans: Yes we can







                            



2 comments:

  1. Merhaba,

    Jeez oh man,while I applaud for your writing , it’s just so damn straight to the point Quality Stage Interview Questions

    When moving from Unix to Linux how the existing applications will move which already has several FTP's batch jobs in Production systems?
    Is it easy enough to retrofit # all the Unix functionalities, CRON jobs etc..?

    Super likes !!! for this amazing post. I thinks everyone should bookmark this.

    Thanks,
    Irene Hynes

    ReplyDelete
  2. This comment has been removed by the author.

    ReplyDelete