QS Interview Questions
10)
How many outputs investigate
stage can support?
Ans: That depends up on the kind of options you are
selecting in investigate stage. Word investigation it can support at max 2 rest
all 1
11)
How many types of investigation
we can perform on the data by using Investigation Stage?
Ans: Character concatenate investigation, character discrete
investigation and word investigation (Token and Pattern)
12)
What is Word Investigation and
how it will helps the Business?
Ans: Using word we can create two kinds of reports. Token
report will gives you the information about tokens (single words) coming form
source. Pattern report will reveal the kind of pattern we are getting from
source (Pattern: Combination of user defined and default classification codes
combination)
13)
What type of reports we can
generate in Word Investigation?
Ans: Pattern Report and Token Report
14)
What is the Character discrete
Investigation and how it will helps the Business?
Ans: By using character discrete investigation we can
identify the type of data we are getting. If you want we can mask the data also
by selecting the mask. It will give you the sample also for each pattern.
15)
What is the character
concatenate Investigation and how it will helps the Business?
Ans: Similar to the character discrete the only thing is on
the combination of columns it will give you the results.
16)
What are masks and its
importance?
Ans: Masks are used as part of investigate stage to identify
the kind of data we are getting. We can also skip the part of data from the
consideration.
17)
What is type C mask?
Ans: it will display the actual character value you are
getting from source column.
18)
What is type T mask?
Ans: It will give the data type of the incoming data. For
alphabets it gives a, for numeric it gives n and for special characters it will
give as it is.
19)
What is type X mask?
Ans: By using this mask we can hide the data from the
consideration.
20)
What is the result for the
below string if you use mask C?
02116 ->ccccc
Ans: 02116
21)
What is the result for the
below string if you use mask CX?1234-> cxcc
Ans: 134
22)
What is the result for the
below string if you use mask T?
013-345-> all Ts
Ans: nnn-nnn
23)
What is the result for the
below string if you use mask T?abc123-> all Ts
Ans: abcnnn
24)
What is the result for the
below string if you use mask C?(123)123-123 -> CCCCCCCCCCCC
Ans: (123)123-123
25)
What is the result for the
below string if you use mask CX?(123)123-> XCCCXCCC
Ans: 123123
26)
If I want to send only the
patterns which have more than 5 matched records then what option I have to
mention in Investigate change and how ? What is default?
Ans: In Investigate stage advanced options tab you have to
mention 5 for the field frequency cut off. Default is 1.
27)
If I want to display more than
one sample record for each pattern what option need to mention and where ? What
is default?
Ans: In Investigate stage advanced options tab you have to
mention the value as more than 1 for the field number of samples. Default is 1.
28)
What are the output field names
are in Investigate stage( character discrete and character concatenate)?
Importance?
Ans: qsInvColumnName: Identifies the
names of the column that is investigated.
qsInvPattern: Displays the character and
includes the character in the frequency count and pattern analysis.
qsInvSample: Shows one or more samples
of the content of this column. The number to be displayed is configurable.
qsInvCount: Shows the actual number of
occurrences of the value in the qsInvPattern column.
qsInvPercent: Shows the percentage
occurrences of the value in the qsInvPattern column to the total number of
records on this file.
29)
Which column in Investigate
output columns will gives you the frequency values?
Ans: qsInvCount
30)
How many columns we may have in
Token Report? Importance?
Ans: qsInvCount :Indicates the number
of times this token was encountered across the entire input data.
qsInvWord: Identifies the individual
token or word value that is found inside the selected input columns.
qsInvClassCode:
Identifies the classification of the token that is based on the selected rule
set classification table. Unclassified tokens, if selected, get a question mark
“?” for alpha or a carat “^” for numeric.
31)
What are the Default
Classification codes? Are these common across QS?
Ans: A,+,?,<,>,@ ,~\# etc. and these are common across
the QS
32)
+,?, <,> and @ what are
these? In what situation we will get these in the report?
Ans: These are default classification codes. When the input
tokens are not identified by the classification table then the default
classification codes will be applied for the tokens.
33)
What is pattern report?
Ans: It will gives the what kind of patterns we are getting
form the source.
34)
What mechanism will be used by
QS to split the source data in to Tokens?
Ans: SEPLIST and STRIPLIST
35)
What is the SEPLIST?
Ans: The characters which you mentioned in the SEPLIST will
act as token separators and also act as tokens.
36)
What is the STRIPLIST?
Ans: what ever the characters you mentioned in the STRIPLIST
will remove from the token consideration.
37)
SEPLIST/STRIPLIST which one
will execute first?
Ans: SEPLIST
38)
What will be the output?
Input String
|
300 St.David’s Street,USA
|
SEPLIST
|
Space.,
|
STRIPLIST
|
Space.,
|
How many tokens will get and what are they ?
|
5 tokens. Those are 300,St,David’s,Street and
USA
|
39)
What is the mask key for
numeric values?
Ans: n
40)
What is the mask key for alpha
values?
Ans: a
41)
What is the mask key for
special characters?
Ans: as it is it will display
42)
What is the pattern report for
the string 120 main streets apt 6c if you are using US Rule set?
Ans: ^?TU>
43)
Character discrete investigate
examines a single domain? T/F
Ans: True
44)
Word investigation examines a
single domain?(T/F)
Ans: False
45)
What is the use of Standardize
stage?
Ans: By using standardize stage we can do spell correction
and convert the data into standard format or consistent format and it will
validate the source data. By using this we can overcome all the quality issues.
46)
What are the components in
prebuilt rule set or in every rule set?
Ans: Every rule set will have 5 components like
classification file, dictionary file, Pattern action file, Overrides and
reference tables.
47)
Which stage gives you fully
cleans information? How?
Ans: By using standardize stage we can get the cleanse
information. We have many existed rule
sets by using them we can get cleansed data.
48)
What is the default key value
for single numeric?
Ans: ^
49)
What is the default key value
for one or more unknown alphas?
Ans: ?
50)
What is the default key value
for single unclassified alpha (word)?
Ans: +
51)
What is the Complex mixed
token?
Ans: @ ex:C3PO
52)
What is the Leading numeric
token?
Ans: > ex:6A
53)
What is the trailing numeric
token?
Ans: < ex:A6
54)
What is the default key value
for street type Standardize Stage?
Ans: T
55)
What is the default key value
Unit in Standardize Stage?
Ans: U
56)
What is classification file and
its importance?
Ans: It contains the tokens,
corresponding standard values to tokens, Threshold weights and comments. The
file will be used by the PAT file to identify the tokens coming from the
source.
57)
What is the syntax to create
the classification file?
Ans:
token /standard value/class /[threshold-weights]/ [; comments]
58)
What is dictionary file? Its
importance?
Ans: It
contains the Meta data of the columns generated by the standardize stage.
59)
Syntax in creating the
Dictionary File?
Ans: field-identifier/
field-type/ field-length/missing value/description
60)
What is Pattern action file and
its importance?
Ans: It
contains the set of action statements to handle the patterns generated by
classification table and SEPLIST/STRIPLIST.
61)
What are override and lookup
tables and importance?
Ans: Any updates done by the user from GUI will be handled
by overrides tables. Reference tables are just like look up tables in DS just
to check the incoming data in a given list of values.
62)
What are the types of rule sets
available in Standardize stage?
Ans: Country Identifier
Domain Preprocessor
Domain Specific
Validation
63)
What is country rule set and
its importance?
Ans: If we are getting multi nations data in a single source
before passing this data to actual standardize stage need to split the data
country wise because we have country wise rule sets in standardize stage. To
categorize the data country wise will use country rule set.
64)
If the country rule set not
able to identify the country coming from the data then what country code it
will assign?
Ans: The default country code what ever you mention in the
job.
65)
What are the output columns
generated by the COUNTRY rule set? Use of them?
Ans: it will generate two extra columns along with the
actual columns like country identifier flag and ISO country code. If the
country is identified by the rule set then you will get y for flag and
respective ISO country code for the country code column. If it is not
identified then N for flag and default country code for country code column.
66)
What are the domain specific
rule sets? When we will apply these rule sets on the data?
Ans: Once the data categorized into country wise or we are
receiving specific country then will go for Domain specific rule sets.
67)
What are the domain
pre-processor rule sets? When we will apply these rule sets on the data?
Ans: If we are receiving multiple domains data into single
column then to split the data into proper domains we will apply pre-processor
rule sets on the data then the output of the pre-processor rule sets will pass
to the domain specific rule sets.
68)
What are the NYSIIS and Soundex
algorithms and where these algorithms will execute?
Ans: These are the two algorithms used to identify the
matched records. These two algorithms will execute as part of the PAT file to
identify the matched records.
69)
What are the output columns
generated by the standardize stage and the importance of few important columns?
Ans: The standardize stage will generate around 20 to 25
columns along with the input columns. Out of these few important are input
pattern, unhandled pattern and unhandled text. Depending on these columns will
separate the valid data from invalid data.
70)
What is Input pattern, input
text, unhandled pattern and unhandled text in the standardize stage?
Ans: Input Pattern: It gives the input text pattern.
Input Text: It will give the actual value of the source
Unhandled pattern: it will give pattern for the unhandled
data.
Unhandled data: it will give the actual value which is not
processed by the standardize stage.
71)
What is threshold value and its
importance?
Ans: Threshold value will be used to identify the matched
records. Tis value should be in between 0 to 900. 900 means exact match. 850
means one character mismatch etc.
72)
What is the importance of
custom rule sets?
Ans: If the existed rule sets are not sufficient to handle
the source data then we will build custom rule sets with our own logic.
73)
How will you build the custom
rule sets?
Ans: Right click on any of the folder --- > new ---> Data Quality --- > Rule
set
74)
Can we modify the existed rule
sets?
Ans: we can not modify the existed rule sets directly
because those will be in read only mode. If you want you can take the copy of
the existed rule set and do your changes on the copy rule set.
75)
What is WAVES and its
importance?
Ans: WAVES is one of the stage provided by QS to validate
the Address fields.
76)
What in MNS and its importance?
Ans: MNS is also provided by QS to validate the address
fields.
Note: Both WAVES and MNS are old stages
provided by QS. From 8.7 onwards instead of these 2 stages AVI added it can do
same work what these both stages can do
77)
What is SQA stage and its importance?
Ans: SQA – Standardize Query Assessment. SQA can create
report in graphical format for the outputs generated by standardize stage.
78)
What is the execution order of
the Investigate, Standardize, Match and Survive Stages?
Ans: Investigate -- > Standardize Stage --> Match - -
> Survive
79)
How can I get the substring of
a string in pattern action language?
Ans: +[{}(1:3)=”IBM”] it will take the first 3 characters of
the unclassified token and that should be equal to IBM
80)
What is the importance of
copy_s statement?
Ans: we can preserve the spaces between words by using this
statement.
Ex:
^ | ? | T
COPY [1] {HouseNumber}
COPY_S [2] {StreetName}
COPY_A [3]
{StreetSuffixType}
81)
What is the importance of copy
statement?
Ans: Dictionary
columns can be copied to the other dictionary columns or to user variables.
Ex:
COPY {HouseNumber} {HC}
COPY {HouseNumber}
temp
82)
What is bucketing?
Ans: The process of moving the data to dictionary field
columns after the execution of PAT statements called Bucketing the data.
83)
What is the country code
identifier for United States?
Ans: ZQUSZQ
84)
How the pattern action language
is define the below string (10 Hollow Oak Road)?
Ans: Pattern (^ ? T) , standard form( HN SN ST)
85)
What is the use of lookup
table?
Ans: Just to check the incoming data with the given list of
values. This list of values we will put in reference table.
86)
Please explain How the
Standardization Stage is process the below string ( 10 MAPPLE STREET APARTMENT
222)
1)
First SS do the parsing of data
it means it divide the data in to tokens like
PARSING
|
10
|
MAPPLE
|
STREET
|
APARTMENT
|
222
|
2)
Using classification table it
assigns the default key words to the above tokens likes
PARSING
|
10
|
MAPPLE
|
STREET
|
APARTMENT
|
222
|
CLASSIFICATION
|
^
|
?
|
T
|
U
|
^
|
3)
Using dictionary file it defines the output
columns like
PARSING
|
10
|
MAPPLE
|
STREET
|
APARTMENT
|
222
|
CLASSIFICATION
|
^
|
?
|
T
|
U
|
^
|
DICTIONARY
FILE
|
HOUSENUMBER
|
STREET
NAME
|
STREET
TYPE
|
UNIT
TYPE
|
UNIT
|
4)
Finally using pattern action
file SS defines the standard format data.
PARSING
|
10
|
MAPPLE
|
STREET
|
APARTMENT
|
222
|
CLASSIFICATION
|
^
|
?
|
T
|
U
|
^
|
DICTIONARY
FILE
|
HOUSENUMBER
|
STREET
NAME
|
STREET
TYPE
|
UNIT
TYPE
|
UNIT
|
PATTERN
FILE
|
10
|
MAPPLE
|
ST
|
APT
|
222
|
87)
What are the standardize
results? Business intelligence fields,
Matching fields, Reporting fields
Ans: These 3 types of fields are nothing but the categories
of dictionary file columns.
88)
What are the Business
intelligence fields?
Ans: Parsed from the original data, they may be used in
matching and generally they are moved to the target system. Ex First Name,
Generational Unit types, Box types, Zip5
89)
What are the matching fields?
Ans: Generally these fields are created to help during the
match process and are dropped after successful matching Ex: Phonetic
Coding(NYSIIS), Hash Keys(First 2 characters of the first fiver words), Packed
keys( Data concatenated)
90)
What are the reporting fields?
Ans: Specifically created to help review results of
standardize and recognized handled and unhandled data. Ex: Unhandled pattern
(The pattern for tokens not processed by the selected rule set) ,Unhandled
Data(The remaining tokens not processed by the selected rule set) , Input
Pattern( The pattern generated for the stream of input tokens based on the
parsing rules and token classifications.)
91)
In which situations you can
modify the rule set?
Ans: The existed PAT file logic or the token mentioned in
the classification file etc not sufficient to handle the data then you can
modify the rule sets accordingly to handle the data.
92)
What are the extensions for
classification file, Dictionary file and Pattern action file?
Ans: .cls for classification and .dct for dictionary and
.pat for pattern action file.
93)
Suppose you find that
“SMITH” is a unhandled data how can you
override the classification?
Ans:
1)
Open override table
2)
Enter the ‘SMITH’ value in the Token value
3)
Enter the ‘SMITH’ value in the
Standard Value
4)
Enter “F” (First Name) in the
Class field
94)
“JOHN SMITH” is the valid names in the USNAME.CLS table
what is the value pattern?
Ans: F L(First Name, Last Name)
95)
What is the use of User
overrides?
Ans: It provides the user with the ability to modify rule
sets
96)
Which types of rule sets can be
modified using user overrides?
Ans: Domain
Pre-processor rule sets and Domain rule sets
97)
What are the pre require
information to override?
Ans: Dictionary field name to move the token to , original
value or standard value of token , leading space or no leading space for
multiple tokens moved to the same dictionary field.
98)
What is ‘Input Text overrides’?
Ans: It allow the
user to specify overrides based on an entire test string
99)
WAVES can standardize name
fields (T/F)?
Ans: False
100)
Rule sets are used in
standardization processing (T/F)?
Ans: True
101)
When you create a copy of
existed rule set what components are copied?
Ans: Classification file, dictionary file, pattern action
file and overrides.
102)
^ | D | ? | T how many operands
this pattern has?
Ans: 4
103)
What pattern action command
denotes the universal class?
Ans: **
104)
When would you use the 'At
Least One' survive technique?
Ans: You want to
ensure that a record from each match group survives.
106 )Which set of processes describes the proper order of
steps applied during the Standardize process?
Ans :Parse, Tokenize, Bucket.
107) When defining the
match variable Last Name to be used in the match, what would be the
appropriate match type to
use in the processing logic?
Ans.
UNCERT
108) In a reference
match, which match option would allow a single reference source record to match
a many data source records?
Ans: Many to One
109) Which of the
following would you make an entry in to remove all occurrences of a character
from the input data?
Ans: Striplist
110) What is the pattern
for the below string? 1ST & MAIN ST
20
Ans: >|\&|?|T
111) What is the pattern
for the below string? '115 1/2 South Oak St'
Ans: ^|^|/|^|D|?|T
112)
what is Match Stage and its importance?
Ans:
It will identifies the duplicates on the standardize data.
113)
Which file we will pass as input to the Match Stage?
Ans:
Standardize stage out put file and Match frequency file.
114)
What is the importance Match frequency stage and its importance?
Ans:
It will generate the frequency of the data in a form which can understand by
the Match stage
115
) What is the default frequency value?
Ans:
100
116)
What are the output columns generated by the Match Frequency Stage?
Ans:
qsFreqValue, qsFreqCounts,qsFreqColumnID,qsFreqHeaderFlag
117)
What is Blocking columns in Match Specification?
Ans:
Blocking provides a method of
limiting the number of pairs to examine. In other words we can say to process
the data by likely records.
118)
What is Match Specification?
Ans:
Match specification is one of the QS Match component which will contains the
mechanism to identify the duplicate records, clerical records, Master records
and residual records. This match specification we need to call as part of the
Match stages.
119)
What is Match Commands in Match Specification?
Ans:
It is the process to identify the weights for the input records depends on the
columns you mentioned in Match commands.
120)
What are the cutoff values?
Ans:
The cut off values are used to classify the weighted records into Master,
duplicate, clerical and residual records.
121)
What is m- probability?
Ans:
It is defined as the probability of the variable agreeing in a matched pair.
122)
What is u – probability?
Ans:
The u- probability can e approximated as the probability that a field agrees at
random.
123)
What is Master Record?
Ans:
The maximum weighted record will treat as master record nothing but ht e
duplicate record.
124)
What is Duplicate Record? What is the strategy to identify duplicate records?
Ans:
If the record is getting the weight more than the match cut off then that
record will treat as duplicate record.
125)
What is Clerical Records? What is the strategy to identify clerical values?
Ans:
the duplicates that fall in the clerical range.
126)
What are agreement and disagreement weights?
Ans:
If the compared fields matched it will assign the agreement weight. It is
always positive.
if the compared records are not matched then it will assign the disagreement weight. It is negative value.
if the compared records are not matched then it will assign the disagreement weight. It is negative value.
127)
What are the different kind of Match Comparison types available?
Ans:
Match comparison contains the logic to determine which columns matched and
which columns do not match. There are around 25 to 30 comparison types based on
your requirement you have to select necessary comparison type.
128)
Explain about CHAR and CNT_DIFF Comparison types?
Ans:
CHAR: Compares data values on a character-by-character basis. This comparison
is often used to catch spelling mistakes or inverted letters
CNT_DIFF:
Compares two strings of numbers and assigns agreement or disagreement weights based
on the number of differences between the numbers in the strings. Weights are
prorated according to the magnitude of the disagreement.
129)
What are the types of Match Stages?
Ans:
Unduplicate Match stage and Reference Match Stage
130)
What is Unduplicate Match Stage?
Ans:
Unduplicate match locates and groups all similar records within a single input
data source. This process identifies potential duplicate records, which might
then be removed.
131)
What is Reference Match Stage?
Ans:
Reference Match identifies relationships among records in two data sources.
132)
What are the different kind of match types we can perform by using Unduplicate
Match Stage?
Ans:
Unduplicate dependant, unduplicate transitive and unduplicate independent
133)
What are the different kind of Match types we can perform by using reference
match stage?
Ans:
Reference one to one, one to many multiple, one to many duplicate and many to
one.
134)
Dependant, independent and transitive match types?
Ans:
135)
one to one, Many to one duplicate, Many to one multiple and Many to one match
types?
Ans:
136)
what is residual record?
Ans:
Residual records are non duplicate records. The frequency value for these
records will be less than clerical cut off value.
137)
What are the output columns generated by Match stage?
Ans: see the below question answer.
138)
what are qsMatchSetID,qsMatchDataID,qsMatchType columns and its importance?
Ans:
1.
qsMatchSetID:
it will assign the same set id for the
similar records. In the same set id you can see the Master Records, Duplicate
records and Clerical Records.
2.
qsMatchType:
we will get the codes to identify master, duplicate and clerical records. MP –
Master Records, DA – Duplicae Records, CA – Clerical Records. These records
will be identified by the cutoff values mentioned in the specification.
3.
qsMatchPassNumber
: If we have multiple passes in the match specification it will gives the pass
number where the records identified as duplicate, master or Clerical.
4.
qsMatchWeight:
It will gives us the weights generated by the match specification. The weight
is consolidated weight both agreement and disagreement weight. These weights
will be calculated the m – probability and u – probability specified in the specification.
5.
qsMatchDataID:
it will gives us the source row number.
6. qsMatchPattern: The qsMatchPattern output column
contains one character for each of the
first 16 match comparisons in the match pass where the pair matched.
Each character
indicates the result of the match
comparison that it corresponds to.
The
character can be one of the following
values:
0
- Indicates one of the following
conditions:
The
values are not missing and the values for the column disagree.If the match pass
contains
fewer
than 16 match comparisons, no match
comparison occurred. For example, if the
qsMatchPattern column is set to 3321000000000000, the match pass might
contain only
four
match comparisons.
1 - Indicates that both values are missing.
2 -
Indicates that one of the values is missing.
3 -
Indicates that the values agree. Both values are present and the weight for
the
comparison is above the missing weight for the column. By default, the
missing weight is
set to 0.
7.
qsMatchLRFlag:
The
qsMatchLRFlag output column indicates the house number interval that a
particular address matches. Possible values for this column are L, which
indicates that an address matches the first interval in the comparison, and R,
which indicates that an address matches the second interval.
139)
What is qsMatch pattern column in Match output columns?
Ans:
see the above question answer
140)
How the Match stage calculates the composite weight for the input data? (
formulas for M-probability and u probability )
Ans: m – probability: An agreement weight is computed when the comparison
between a pair of columns agrees.
log2(m
probability / u probability)
u – probability: A disagreement weight is computed when the comparison
between the pair of columns disagrees.
log2((1
- m probability)/(1 - u probability))
141)
What is survive stage and its importance?
Ans: The Survive stage consolidates duplicate
records, which creates a best-of-breed representation of the matched data.
Survive consolidates duplicate records, creating the best representation of the
match data so companies can use it to load a master data record, cross-populate
all data sources, or both.
142) What is the input for the Survive stage?
Ans: The match stage output will act as an input
to the survive stage
143)
What are the different kind of survive techniques available?
Ans:


144)
What is complex survive expression and its importance?
Ans:
By using this we can implement complex survive logics to identify the best
breed record.

145)
What are b.colname and c.colname?
Ans:
b is best record and c is current record
146)
What will happen if we mention “Longest” technique for the Target column?
Ans:
It will compare the both the records and what ever the record has longest value
in a column which we are using to compare it will pass the longest string to
the output.
147)
What will happen if we select “Most Frequent( Non blank )” technique for the
target column?
Ans:
it will send the strings to the output which are coming many times with out any
blanks by comparing multiple records.
148)
What is the importance of AVI stage?
Ans:
It will validate the address fields coming from the source and will give us the
accurate information by correcting the source data if need. This one is newly
added in DS 8.7
149)
What is the use of reference data available for AVI?
Ans:
Reference files are just like reference tables. AVI will take the source value
and check in the reference files. If any change need it will correct the data
as mentioned in the files otherwise it will send the data as it is.
150)
What is the difference between AVI and Standardize stage?
Ans:
AVI stage will work on the actual data but standardize stage will work on patterns.
151)
can we do name validation by using AVI?
Ans:
no
152)
What are the fast paths in AVI and importance?
Ans:
In AVI we have 4 fast paths. By using this we can provide the required
properties in AVI in fast manner
153)
What kind of reports we can generate in AVI?
Ans:
Suggestion report and correction report.
154)
What is Suggestion report and importance?
Ans:
As part of the suggestion report it will give us the suggestions for the data.
It will check each source record in reference file and if it found any
suggestions it will include those records also in the output file as
suggestions.
155)
What is correction report and its importance?
Ans:
Ii will take the source data and will check in the reference files and if it
found any correction to the incoming data it will directly correct that record
while sending the record to output.
156)
If any wrong data we are getting as part of source data can AVI validate it and
correct the data?
Ans:
Yes it corrects.
157)
What are the status codes generated by AVI and each code importance?
Ans:
Value
|
Description
|
0
|
Field
is not applicable to return field status
|
1
|
Verified
using reference data; no changes needed
|
2
|
Verified
using reference data; an alias change was made
|
3
|
Verified
using reference data; a small spelling change was made
|
4
|
Verified
using reference data; a large spelling change was made
|
5
|
This
field was added from the reference data
|
6
|
Identified
using lexicon data; no changes needed
|
7
|
Identified
using lexicon data; an alias change was made
|
8
|
Field
was identified using context rules
|
9
|
Field
is empty
|
10
|
Field
was unrecognized
|
158)
What is the error output file in AVI and its importance?
Ans:
generates separate output for error codes and error messages.
159)
What are the different kind of error codes and messages we have in AVI?
Ans:
Table 1. Error codes and messages for
unprocessed records
|
|
Value
|
Description
|
204
|
Country or region is not recognized
|
205
|
Country or region postal validation
reference file - not found
|
206
|
Country or region postal validation
reference file - wrong format or the data is corrupted
|
207
|
Country or region postal validation
reference file - access denied
|
AD_STAT_CRR_NOT_INITIALISED = -9
|
This error is triggered if the
Validate or Parse method of the AddressDoctor object is called without
calling the Initialize method first. Ensure that the AddressDoctor object
always calls the Initialize method before the Validate or Parse methods are
called for the first time.
|
AD_STAT_CERR_ILLEGAL_ACCESS_CODE =
-151
|
The unlock code supplied in the
Initialize call is incorrect. Please check the code. When using the C
interface, please make sure to escape all occurrences of the \ character to
ensure proper interpretation of the code.
|
AD_STAT_WR_PRELOADING_FAILED = 158
|
Postal validation reference files that
were added to the preloading object could not be preloaded. The most likely
reason for this error is insufficient memory to preload the file. However,
insufficient access rights to the reference files can trigger this error as well.
|
AD_STAT_ERR_NOT_POSSIBLE_
FOR_SELECTED_ELEMENT = 201
|
This error is triggered if a method or
property can not be called for the selected CurrentView element of the
AddressObject. In most cases, this error is triggered if setting conflicts
occur. Examples are setting LINE_1 after setting LINES_ALL or setting STREET
after setting DELIVERY_ADDRESSLINE.
|
AD_STAT_ERR_COUNTRY_DB_NOT_FOUND = 205
|
The postal validation reference file
could not be found. Ensure that you have the required postal validation
reference file available in the path that is supplied in the Initialize call.
|
160)
What is validation summary report and its importance
Ans:
It will gives the information about the total number of records passed to AVI,
how many are identified and validated and how many are not identified ec.
161)
What is verification level in AVI and what are different types of verification
levels we have in AVI?
Ans:
It will gives the information about to which level your input data is matched.
Table 5. Address levels used in statuses and descriptions
|
|
Level
|
Description
|
5
|
Delivery point, post office box, or subbuilding.
|
4
|
Premise or building.
|
3
|
Thoroughfare. An example might be a suburb, a neighborhood, or section
of a town or city.
|
2
|
Locality. An example might be an area that is outside of a city or town
but within a specific radius.
|
1
|
Administrative area. An example might be a province, district, or
county.
|
0
|
No validation processing performed. Addresses that cannot be validated
or corrected are left unchanged in the output.
|
162)
What are the columns added by AVI to the output?
Ans:
what ever the columns you are passing to the AVI it will give all those columns
as it is and also it will generate few other columns also ending with ‘_QSAV’.
Few important columns are Accuracy code, Status code etc.
163)
What is Accuracy code and its importance?
Ans:
It will gives you the information about to what level your input data matched
and to what level your data has been changed and validated.
Ex:
V44-I44-P8-100 ( For more information see the QS 8.7 Doc )
164)
What is Datarules satge and its importance?
Ans:
Data rules stage newly added in QS 8.7. By using this we can do IA work from QS
itself with out going to IA. By using this stag ewe can build rules how we can
create in IA and run those rules on incoming data.
165)
What is binding in Data Rule?
Ans:
Mapping the input column to the rule definition
166)What
are the different kind of functions we have in Data rules stage?
Ans:
we have several functions in rule stage like we have in IA.

167)
What is rule logic?
Ans:
it is a combination functions and variables and set of operators.
168)
How can we create new rule?
Ans:
By selecting the create new option available in the data rules window and in
rule logic block we have to mention the actual logic of the rule. Need to
validate the rule logic before mapping
the columns to the rule.
169)
What are the different kind of components we have as part Data rules stage?
Ans:
we have different set of components like input links, rule variables, selected
rule definitions, input etc.
170)
How can we publish the rules?
Ans:
After creation of rule logic if you want to make it global then you have to
publish the rule logic by selecting the ‘publish’ option available in rules window.
171)
What are the columns generated by the rules stage?
Ans:
Data rules stage will not generate any stage generated columns.
172)
How can we select the invalid data?
Ans:
Depending up on the type of rule you have created it will generate the out put
file with the data.
Some More questions
What is Data Quality? Why we need Data Quality in Today’s Business?
2. Ans: The data which you can trust. To get the accuracy and consistent or standard data in to the DWH
3. What type of data quality changes you faced?
4. Ans: Inconsistent standards, Spell mistakes, Default or invalid values, Buried Information, Data anomalies and Data surprises etc
5. What are different Quality Tools available? Differences between IA and QS?
6. Ans: From IBM two tools are available like IA and QS. For QS you need some technical skills and also by using QS we can modify/standardize the data. IA we don’t need much technical skills and also using IA we can not standardize the data just to see what kind of patterns and types of the data we are getting.
7. What are the stages in WebSphere quality stage process?
8. Ans: Investigate, Standardize, Match frequency, Unduplicate Match, Reference match, MNS, WAVES, AVI (Added in 8.7) and Data rules stage (Added in 8.7)
9. Which stage gives you complete visibility into the actual condition of data? How?
10. Ans: By using Investigate stage we can get complete visibility of the data by creating the pattern and token reports and also character concatenate and character discrete reports.
11. In which stage you can build the best available view of related information?
12. Ans: Standardize stage
13. What is the use of investigate stage?
14. Ans: Investigate source data to understand the nature, scope, and detail of data quality challenges.
15. How many input links support the investigate stage?
16. Ans: Only one.
17. Can we give data stage output as input to the investigate stage?
18. Ans: Yes we can
Merhaba,
ReplyDeleteJeez oh man,while I applaud for your writing , it’s just so damn straight to the point Quality Stage Interview Questions
When moving from Unix to Linux how the existing applications will move which already has several FTP's batch jobs in Production systems?
Is it easy enough to retrofit # all the Unix functionalities, CRON jobs etc..?
Super likes !!! for this amazing post. I thinks everyone should bookmark this.
Thanks,
Irene Hynes
This comment has been removed by the author.
ReplyDelete