Reference Guide#
Introduction#
Overview#
MatchUp Object Global is an extremely fast and powerful programmer’s tool that can be integrated into custom applications to eliminate duplicate records. Because merge/purge and data quality initiatives go hand in hand, the powerful features of this tool fulfill the needs of many companies. Reducing printing costs, increasing response rates, maintaining an efficient database, and achieving better quality data are just some of the many benefits of the merge/purge process.
MatchUp Object Global allows developers to customize exactly how to merge and purge data to suit their business needs. This gives people the flexibility to integrate MatchUp at different points of their processes, from point of entry to batch processing on the back end.
Component Matching
MatchUp Object Global can find matches in any combination of over 35 different components — from common ones like address, city, state, ZIP™, name, and phone — to lesscommon elements, such as email address, company, gender, and social security number. Developers can even specify their own custom components. Each set of rules for matching is referred to as a matchcode, and a matchcode can apply up to 16 rules at a time. These rules are specified as combinations of components. A commonly used combination would be {Last Name + Street # + Street Name + ZIP Code™}, while another combination in the same matchcode would substitute PO Box™ for Street # and Street Name. With these options, the number of potential matching rules is limitless.
Matching Algorithms
MatchUp Object Global is a very sophisticated tool. If record 1 and 3 match with combination #1 in a matchcode, and record 2 and 3 match using combination #2- MatchUp Object Global will use inferred matching and put records 1, 2, and 3 into the same group. It can split address, city/state/ZIP and name fields on the fly, as well as recognize phonemes like “ph” and “sh,” nicknames (Liz, Beth, Betty, Elizabeth), and alternate spellings of names (Gene, Jean, Jeanne).
MatchUp Object Global can also handle nearly-exact strings of characters, such as “Lewis” vs. “Ewis,” and “Palacino” vs. “Al Pacino” as well as initials such as “John Smith” to “J Smith.” These are just a few examples of the powerful matching algorithms at your disposal: Exact Match; Phonetic; Soundex; Containment; Frequency; Frequency Near; Fast Near; Accurate Near; Vowels Only; Consonants Only; Alphas Only; and Numerics Only.
Speed
Speed also is an important feature of MatchUp Object Global. It can process an average of 10 to 50 million records per hour. MatchUp Object Global includes a 64-bit version to take advantage of newer processors and operating systems. The COM and .NET version of MatchUp Object Global eases integration with Microsoft languages.
Key Concepts#
The following concepts are essential to understanding the logic behind how MatchUp Object methods and successfully integrating the product into applications.
Match Keys#
Match Keys are string tokens that represent a database record. They contain only enough information necessary to determine a record’s unique or duplicate status.
Because they only contain a reduced portion of the data in the actual record, MatchUp Object is able to use these keys more efficiently than if it had to compare the complete record against every other record in the database.
Clustering#
Once a matchcode key is generated for a given record, it can be compared to the keys of other records. Ideally, every record’s key would be compared to every other record’s key. This, however, is not practical in all but very trivial applications because the number of comparisons grows geometrically with the number of records processed. For example, a record set of 100 records requires 4,950 comparisons (99 + 98 +…). A larger set of 10,000 records requires 49,995,000 comparisons (9,999 + 9,998 +…). Large record sets would take prohibitive amounts of time to process.
So we made the assumption that in order for two matchcode keys to be considered matching, there must be something in the keys that must match exactly. In many cases, this will be all or part of the ZIP/Postal Code. So what MatchUp Object does is only compare records that are (in this example) in the same ZIP or Postal Code. On the average (in the US using 5-digit ZIP codes), this will cut the average number of comparisons per record by a factor of thousands.
This concept is known as “break grouping,” “clustering,” “partitioning,” or “neighborhood sorting.” It is very likely that most, if not all other deduping programs have used some form of clustering method.
Here is an example set of matchcode keys using ZIP/Postal Code (5 characters), Last Name(4), First Name(2), Street Number(3), Street Name(5):
02346BERNMA49 GARD
02346BERNMA49 GARD
02357STARBR18 DAME
02357MILLLI123MAIN
03212STARMA18 DAME
Licensing#
Entering Your MatchUp Object License Key
The License Key is a software key that unlocks the functionality of the component. Without this key, the object does not function. You set the License Key using an environment variable called MD_LICENSE. If you are just trying out MatchUp Object and have a demo License Key, you can use the environment variable MD_LICENSE_DEMO for this purpose. This avoids conflicts or confusion if you already have active subscriptions to other Melissa Data object products.
In earlier versions of MatchUp Object, you would set this value with a call to the SetLicenseString function. Using an environment variable makes it much easier to update the License Key without having to edit and re-compile the application.
It used to be necessary, even when employing an environment variable, to call the SetLicenseString function without passing the License Key value. This is no longer true. MatchUp Object will still recognize the SetLicenseString function, but you should eventually remove any reference to it from your code.
Windows
Windows users can set environment variables by doing the following:
Select Start > Settings, and then click Control Panel.
Double-click System, and then click the Advanced tab.
Click Environment Variables, and then select either System Variables or Variables for the user X.
Click New.
Enter “MD_LICENSE” in the Variable Name box.
Enter the License Key in the Variable Value box, and then click OK.
Please remember that these settings take effect only upon start of the program. It may be necessary to quit and restart the application to incorporate the changes.
Linux
Unix-based OS users can simply set the License Key via the following (use the actual License Key, instead):
export MD_LICENSE=A1B2C3D4E5
If this setting is placed in the profile, remember to restart the shell.
MatchUp Object also used to employ its own environment variable, mdMatchUp_LICENSE. The MD_LICENSE variable is shared across the entire Melissa Data product line of programming tools. MatchUp Object will still use the old License Key variable for the time being, but you should transition to using MD_LICENSE as soon as possible.
MatchUp Object Global Features & Benefits#
Fast processing, about 10-50 million records per hour
Extremely flexible and customizable
22 powerful matching algorithms
Split name, address, and city/state/ZIP fields on the fly
Easy to learn and use
Sample Code provided in C#, VB.NET, C++, FoxPro, Java, SQL Server
Free tech support
Setup and List of Files#
To see how to set up the MatchUp Object Global, the list of files that will be downloaded and used, and the system requirements to run the Object, please visit the GitHub Sample Code.
Interfaces#
This API provides the developer with different interfaces allowing for maximum flexibility in selecting the best method for the application.
Read-Write Interface#
Overview#
The Read/Write Interface is usually used for processing entire lists. It works in a manner similar to the way that the MatchUp software products does. A calling program passes an entire list to the Read/Write deduping engine one record at a time. When the entire list has been passed, the calling program tells the API to process the records. Then, the calling program retrieves each record, along with additional deduplication information, from the Read/Write interface.
Read/Write deduping consists of the following steps:
One by one, the program sends a series of record data (ZIP/PC, Name, Address, etc) to the MatchUp API.
When completely done (1), the program sends a “process” command to the API.
The program retrieves the results for each record with deduplication information.
Order of Output Records
The program will send records in a particular sequence, either in record (raw) order, or maybe in a more sophisticated manner (by ZIP/PC, record type, and so on). MatchUp Object will not return the records in the same order. By default, records are output in cluster order. This order will be loosely based on the matchcode. For example, if the matchcode has Zip5 as its first component, output records will be more or less sorted by ZIP Code (but the developer should not count on this). If the application called the SetGroupSorting function, records in the same dupe group will be adjacent. Otherwise, duplicate records may or may not be adjacent (though they usually are near each other).
If a certain sequence is important (for example, records ordered in the same sequence they were input), sort the results after MatchUp Object has processed the data.
Data Lifetime
A Read/Write deduping session is relatively short-lived. Although the actual action of reading and writing records may take time (hours or days), the process is strictly defined into three distinct steps. The key file does not persist beyond this point. Because of this, Read/Write deduping is not usually the choice for ongoing or online processes.
Record Identity
Because MatchUp Object does not read or write directly to the database, some mechanism must be provided so that the application can match each record back to the original data source. The SetUserInfo function allows the application to pass an unique identifier for each record.
Order of Operations#
Basic Steps#
These are the basic steps of a typical implementation of the Read-Write Interface.
Initialize the Read-Write Interface.
After creating an instance of the Read-Write Interface, point the object toward its supporting data file, select a matchcode and key file to use, and initialize these files.
Create field mappings.
In order to build a key to be written to the key file, the Read-Write Interface needs to know which types of data the application will be passing to the Interface and in what order.
Read the records from the database.
Loop through the master database and get the data fields needed to build a key, according to the mappings defined in step 2.
Build a match key for each record.
This consists of passing the actual data to the Interface in the same order used when creating the field mapping. After passing the necessary fields (usually a small subset of the fields from each record) via the AddField function, the Interface uses this information to generate a match key.
Write each match key to the key file.
The WriteRecord function stores each match key in a temporary key file.
Process the keys.
After building the keys, calling the Process function loops through the keys and compares them to each other.
Loop through the records and read the deduping data for each one.
The ReadRecord function loops through the entire set of deduped records and allows the application to read information on the record’s duplicate/unique status, the number of duplicates for each record and the record dupe group.
Pseudocode Implementation#
This is a common implementation of the Read-Write Interface using pseudocode for maximum clarity. Working sample programs in several programming languages can be found on the MatchUp Object install disc.
1. Initialize the Read-Write Interface
After creating an instance of the Read-Write Interface, point the object toward its supporting data file, select a matchcode and key file to use, and initialize these files.
First, create a new instance of the Read-Write Interface.
SET mu = NEW mdMUReadWrite
In order to successfully initialize this new instance, the application must point it toward its data files and supply a valid License Key.
CALL mu.SetLicenseString with LicenseString
CALL mu.SetPathToMatchUpFiles with PathToMatchUpFiles
Before initialization, the application must specify which matchcode and key file will be used for the current deduping operation.
CALL mu.SetMatchcodeName with MatchCodeName
CALL mu.SetKeyFile with PathToKeyFile
If all of the above have been set correctly, calling the InitializeDataFiles function should return a value of ErrorNone. If it does not, call the GetInitializeErrorString function to determine the reason for the failure to initialize.
CALL mu.InitializeDataFiles RETURNING ProgramStatusResult
IF ProgramStatusResult is not 0 THEN
PRINT "Initialization Error: " + mu.GetInitializeErrorString
EXIT ROUTINE
END IF
If the initialization was successful, the application can call the following methods to display version and expiration information about the instance of MatchUp Object currently in use on the local computer.
PRINT "Confirming Initialization: " + mu.GetInitializeErrorString
PRINT "Build Number: " + mu.GetBuildNumber
PRINT "Database Date: " + mu.GetDatabaseDate
PRINT "Database Expiration Date: " + mu.GetDatabaseExpirationDate
PRINT "License Expiration Date: " + mu.GetLicenseExpirationDate
2. Create field mappings
Field mappings define which types of data the Read-Write Interface is expecting. In this case, the selected matchcode looks for a five-digit ZIP Code, a first name, a last name and a street address.
CALL mu.ClearMappings
After clearing any mappings from a previous use of the Read-Write Interface, call the AddMapping function once for each field being considered.
CALL mu.AddMapping with mu.Zip5 RETURNING mapOK
CALL mu.AddMapping with mu.First RETURNING mapOK
CALL mu.AddMapping with mu.Last RETURNING mapOK
CALL mu.AddMapping with mu.Address RETURNING mapOK
3. Loop through database records and build keys
The Read-Write Interface builds a temporary key file out of the data from the database. To do this, the application loops through each record and pulls the data from the fields that match the mappings made above.
FOR EACH Record in database
Read Zip5, FirstName, LastName, StreetAddress, userInfo fields from database
After pulling the data from the database, pass it to the Read-Write Interface with the AddField function. The application must do this in the same order that it mapped the data types in the step above.
Even if the fields in a database do not exactly match the components required by the matchcode, MatchUp Object is able to extract only the information it needs. For example, if the database only contained a full name field, that field could be passed twice and MatchUp Object would recognize the first names and last names and only use the parts it needed.
After passing each set of fields, call the BuildKey function to create the match key according to the mappings and the current matchcode.
CALL mu.ClearFields
CALL mu.AddField with Zip5
CALL mu.AddField with FirstName
CALL mu.AddField with LastName
CALL mu.AddField with StreetAddress
CALL mu.BuildKey
The UserInfo is a unique identifier for each record. The application will need this to later match the deduping information to the original records.
CALL mu.SetUserInfo with userInfo
The WriteRecord function adds the current key and UserInfo to the key field.
CALL mu.WriteRecord
Repeat for every record in the current data set.
NEXT Record
4. Begin processing the records
The Process function switches the Read-Write Interface from writing new keys to comparing the stored keys to each other.
CALL mu.Process
5. Examine the processed records
At this point, loop through the processed records, and get information on each record’s unique/duplicate status and how many duplicates of each record exist in the data set.
WHILE mu.ReadRecord does not return 0
Each call to the ReadRecord function advances the Interface to the next record and populates the fields returned by the methods below.
PRINT "Record: " + mu.GetUserInfo
PRINT "Key: " + mu.GetKey
PRINT "Dupe Group: " + mu.GetDupeGroup
PRINT mu.GetCount + " records in this dupe group."
PRINT "This is record #" + mu.GetEntry + " in this dupe group."
The Results property indicates whether the record is unique, a record with duplicates, or a duplicate of another record.
CASE mu.GetResults Contains
MS03 :PRINT "This record is a duplicate."
MS02 :PRINT "This record has duplicates."
MS01 :PRINT "This record is unique."
ENDCASE
The result codes returned by the GetResults function also indicate which combination or combinations defined by the matchcode produced the hit. In addition to the status code, other potential result codes correspond to a specific combination number, so the application needs to use logical AND operation for each bit to actually make use of this information.
CALL mu.GetResults Contains
MS06 Match: Rule 1 Matched another record by matchcode combination 1
MS07 Match: Rule 2 Matched another record by matchcode combination 2
MS08 Match: Rule 3 Matched another record by matchcode combination 3
MS09 Match: Rule 4 Matched another record by matchcode combination 4
MS10 Match: Rule 5 Matched another record by matchcode combination 5
MS11 Match: Rule 6 Matched another record by matchcode combination 6
MS12 Match: Rule 7 Matched another record by matchcode combination 7
MS13 Match: Rule 8 Matched another record by matchcode combination 8
MS14 Match: Rule 9 Matched another record by matchcode combination 9
MS15 Match: Rule 10 Matched another record by matchcode combination 10
MS16 Match: Rule 11 Matched another record by matchcode combination 11
MS17 Match: Rule 12 Matched another record by matchcode combination 12
MS18 Match: Rule 13 Matched another record by matchcode combination 13
MS19 Match: Rule 14 Matched another record by matchcode combination 14
MS20 Match: Rule 15 Matched another record by matchcode combination 15
MS21 Match: Rule 16 Matched another record by matchcode combination 16
Methods#
Initialization - Set#
The following methods prepare the Read/Write interface for use and link it to its supporting data files.
InitializeDataFiles#
- Syntax:
mdMUReadWriteInitializeDataFiles(String EncodingFormat);
- Returns:
Initialize Status
- Return Type:
ProgramStatus
This function opens the needed data files and prepares the MatchUp Object Global for use. Before calling this function, the application must have successfully called the SetLicenseString and SetPathToMatchUpFiles methods.
Check the return value of the GetInitializeErrorString function to retrieve the result of the initialization call. Any result other than “No Error” means the initialization failed for some reason.
This returns a value of the enumerated type ProgramStatus.
Value |
Enumeration |
Description |
---|---|---|
0 |
ErrorNone |
No error - Initialization was successful. |
1 |
ErrorConfigFile |
Could not find mdMatchUp.dat. |
2 |
ErrorLicenseExpired |
The License Key has expired. |
3 |
ErrorDatabaseExpired |
The database has expired. |
4 |
ErrorMatchcodeNotSpecified |
No matchcode was specified. |
5 |
ErrorMatchcodeNotFound |
Specified Matchcode does not exist. |
6 |
ErrorInvalidMatchcode |
The specified matchcode is not valid. |
7 |
ErrorKeyFile |
The specified key file was not found. |
If any other value other than “ErrorNone” is returned, check the GetInitializeErrorString function to see the reason for the error.
SetEncoding#
- Syntax:
mdMUReadWriteSetEncoding(String EncodingFormat);
- Return Type:
Void
Optional. This function define encoded format of the input file and the resultant format of the build keys. Input data is converted into the UTF-8 format and processed as such. When GetKey is called, the data is converted back from UTF-8 to the specified input format.
The return value is 1 if the desired encoding can be set or 0 if the encoding is not recognized or can not be set.
It must be called before calling the InitializeDataFiles function.
The default encoding type is “ISO-8859-1”, if this function is not used.
SetGroupSorting#
- Syntax:
mdMUReadWriteSetGroupSorting();
- Return Type:
Void
This function sets the Read-Write interface to return processed records in dupe group order. By default, the Read-Write interface returns records sorted by their match key. This should return records in the same dupe group together or close to each other.
Passing a boolean “True” value to this function will cause the Read-Write interface to return the processed records sorted into dupe groups.
The additional processing can increase the time needed to dedupe a large list and it is often possible to use the information returned by the Read-Write interface to sort records into this order programmatically.
SetKeyFile#
- Syntax:
mdMUReadWriteSetKeyFile(String KeyFile);
- Return Type:
Void
This function selects the name and file path for the key file that will be used for the current Read-Write deduping operation.
Every instance of the Read-Write interface creates a new key file for each session. Any existing key file with the same name is overwritten. If more than one instance of the Read-Write interface is running on either the same computer or the same network, make certain that they do not point to the same key file. If one instance overwrites the key file being used by another instance, it can cause the second instance to fail.
SetLicenseString#
- Syntax:
mdMUReadWriteSetLicenseString(String License);
- Returns:
Status
- Return Type:
Integer
This function passes the License Key required for MatchUp Object Global to function. A value of “1” is returned if the License Key is valid, a value of “0” is returned for an invalid or empty string.
This function is only required if the environment variable method is not used.
Each customer is issued a License Key when purchasing MatchUp Object Global or renewing a subscription. This string must be passed to this function to unlock the functionality of MatchUp Object Global.
The License Key is normally set using an environment variable, either MD_LICENSE or MD_LICENSE_DEMO. Calling this function is an alternative method for setting the License Key, but applications developed for a production environment should only use the environment variable.
When using an environment variable, it is not necessary to call this function.
For more information on setting the environment variable, see Entering Your MatchUp Object Global License Key.
SetMatchcodeName#
- Syntax:
mdMUReadWriteSetMatchcodeName(String MatchcodeName);
- Return Type:
Void
This function selects the matchcode to use for the current Read-Write deduping operation. The SetMatchcodeName function accepts a string value that must match the name of an existing matchcode in the current matchcode file.
SetMatchcodeObject#
- Syntax:
mdMUReadWriteSetMatchcodeObject(mdMUMatchcode);
- Return Type:
Void
This methods selects the matchcode to use for the current Read-Write deduping operation. It largely duplicates the purpose of the SetMatchcodeName function, but instead of accepting a character value containing the name of a matchcode in the current matchcode file, this function accepts a Matchcode object created using the Matchcode Editing interface.
Because this function requires the use of a separate interface to create the Matchcode object variable, it is usually simpler to use the SetMatchcodeName function.
It is possible to use this function to build a new matchcode on the fly using the Matchcode Editing interface. Unless a specific application demands such flexibility, it is usually much simpler to add a new matchcode to the matchcode file and call it using the SetMatchcodeName function.
SetMaximumCharacterSize#
- Syntax:
mdMUReadWriteSetMaximumCharacterSize(int Size);
- Return Type:
Void
Optional. This function will accommodate UTF-8 input data. Since a single UTF-8 character can be up to 4 bytes long, the storage of the matchcode keys may need to be altered to accommodate this maximum size. However, if you are working in a part of the world where you are sure that a byte sequence will always be shorter, you can use this function to override the storage overhead.
It must be called before calling the InitializeDataFiles function.
Valid input values are 1, 2, 3 or 4. Invalid values will be ignored and the default size (e.g. if this function is not called) is 1.
SetPathToMatchUpFiles#
- Syntax:
mdMUReadWriteSetPathToMatchUpFiles(String FilePath);
- Return Type:
Void
String value. This function accepts a string value containing the path to the folder containing the MatchUp Read-Write data files. It must be called before calling the InitializeDataFiles function.
To provide maximum compatibility with Windows, three files are installed in your ‘Common App Data’ directory:
- Windows Vista and newer
C:\ProgramData\MelissaDATA\MatchUp
- Windows XP
C:\Documents and Settings\All Users\Application Data\Melissa DATA\MatchUp
The location of this directory can be changed by users so please note this, as it can often be the source of issues when running the samples/demos.
SetReserved#
- Syntax:
mdMUReadWriteSetReserved(String PropertyType, String PropertyValue);
- Return Type:
Void
Optional. This function accepts two parameters: A string parameter containing the property to be set; and a string parameter representing the value for the property being set.
It must be called before calling the InitializeDataFiles function.
The unique identifier set by SetUserInfo and attached to built match keys is 1024 bytes by default. This lets you pass an advanced custom identifier or even source data to the key file. This can have data handling advantages but will slow down the process because the key file and temporary sort files will grow much larger than needed for most jobs. This function property has been added, allowing you to override the default UserInfo to a more efficient size.
Initialization - Get#
The following methods get values related to the initialization process.
GetBuildNumber#
- Syntax:
mdMUReadWriteGetBuildNumber();
- Returns:
Build Number
- Return Type:
String
This function returns the current development release build number of MatchUp Object Global.
GetDatabaseDate#
- Syntax:
mdMUReadWriteGetDatabaseDate();
- Returns:
Database Date
- Return Type:
String
This function returns a string value that represents the revision date of the MatchUp Object Global data files.
GetDatabaseExpirationDate#
- Syntax:
mdMUReadWriteGetDatabaseExpirationDate();
- Returns:
Database Expiration Date
- Return Type:
String
This function returns a string value containing the expiration date of the current database file (mdMatchUp.dat).
GetInitializeErrorString#
- Syntax:
mdMUReadWriteGetInitializeErrorString();
- Returns:
Initialize Status
- Return Type:
String
This function returns a string to describe the error caused when the InitializeDataFiles function cannot be successfully called.
The possible strings returned by this function are:
“No error”
“Could not find mdMatchUp.dat.”
“The License Key has expired.”
“The database has expired.”
“No matchcode was specified.”
“Specified Matchcode does not exist.”
“The specified matchcode is not valid.”
“The specified key file was not found.”
GetLicenseExpirationDate#
- Syntax:
mdMUReadWriteGetLicenseExpirationDat();
- Returns:
License Expiration Date
- Return Type:
String
This function returns a string value containing the expiration date of the current License Key. After this date, MatchUp Object will no longer function.
Mapping#
Before generating match keys for the database records, the application must supply the Read-Write interface with information about what sort of data it will be handling.
AddMapping#
- Syntax:
mdMUReadWriteAddMapping(mdMU.MatchcodeMapping);
- Returns:
Mapping Allowed
- Return Type:
Integer
This function selects the types of fields that will be used to build the match key and the order in which they will be added using the AddField function.
The function accepts an enumerated value of the type MatchcodeMapping. It tells the Read-Write interface which data types will be used for this deduping operation and in what order they will be passed to the deduper when passing data using the AddField function.
The data types used must contain the data expected by the matchcode being used, but it does not have to be an exact match. For example, if the matchcode requires a five-digit ZIP Code but the data in the list uses a single “City/State/ZIP” field, simply add the CityStZip mapping and pass the full string to the AddField function later. MatchUp Object Global is smart enough to use only the information it needs.
In another example, a matchcode calls for both last name and first name but database contains only full names. The application would simply apply the FullName mapping twice and pass the full name data twice to the AddField function.
To demonstrate the above:
mdMU->AddMapping(mdMU.CityStZip) // uses only ZIP Code
mdMU->AddMapping(mdMU.FullName) // uses last name only
mdMU->AddMapping(mdMU.FullName) // uses first name only
mdMU->AddMapping(mdMU.Address)
For a list of these enumerations, see Matchcode Mapping Enumerations.
The function returns a non-zero value if the mapping is allowed by the selected matchcode, false if the mapping caused an error.
ClearMappings#
- Syntax:
mdMUReadWriteClearMappings();
- Return Type:
Void
This function clears any existing field mappings.
It is a good idea to call this function before beginning to map fields, especially if the application may be required to perform multiple deduping operations in a single session.
Match Key#
These methods take the real data being compared and construct a match key according to the mappings defined with the Mapping methods and the matchcode defined when the Read/Write deduper was initialized.
AddField#
- Syntax:
AddField(String Field);
- Return Type:
Void
This function passes the contents of a field from a database to the deduper prior to calling the BuildKey function.
Fields must be passed to this function in the same order that the corresponding data types were mapped using the AddMapping function.
The following example expands on the previous AddMapping example. The matchcode uses five-digit ZIP codes, last and first names, in that order, and the street addresses. The list includes only a single “City/ST/ZIP” and a single full name field:
mdMU->AddField("Rancho Santa Margarita, CA 92688")
mdMU->AddField("Raymond F. Melissa")
mdMU->AddField("Raymond F. Melissa")
mdMU->AddField("22382 Avenida Empresa")
The deduper would use only the ZIP Code from the first field, the last name from the second AddField and first name from the third AddField.
BuildKey#
- Syntax:
mdMUReadWriteBuildKey();
- Return Type:
Void
This function builds a match key using information passed via the AddField function.
The match key is built from:
Information passed via calls to the AddField function. Mapping defined by the AddMapping function. The pattern defined by the matchcode being used. A match key is a character string built according to a pattern defined by the current matchcode, consisting only of enough information to determine if the current record is unique or has a duplicate within the key file.
For example, let’s assume the matchcode called for a five-digit ZIP Code, first ten characters of a last name, first ten of a first name, a street number and the first ten characters of a street name. The current record is for Raymond F. Melissa at 22382 Avenida Empresa in the 92688 ZIP Code. The match key would be:
92688MELISSA RAYMOND 22382EMPRESA
Because “Empresa” is only seven characters, the key would be padded with three spaces at the end.
ClearFields#
- Syntax:
mdMUReadWriteClearFields();
- Return Type:
Void
This function clears all values from previous calls to the AddField or ReadRecord function. The application should call this function after calling the WriteRecord function, before the first call to the AddField function, or before each call to the ReadRecord function.
SetKey#
- Syntax:
mdMUReadWriteSetKey(String MatchKey);
- Return Type:
Void
This function accepts a match key before calling the ReadRecord function.
The BuildKey function creates a key from input data. If, however, the match keys are already stored in the source database, use this function to pass the keys to the deduper before calling MatchRecord.
SetUserInfo#
- Syntax:
mdMUReadWriteSetUserInfo(String UserInfo);
- Return Type:
Void
This function accepts a character value that uniquely identifies each record in a set of data. The character value passed to this function must be unique for every record. This enables the application to associate the match key in the key file to the corresponding record in the list.
WriteRecord#
- Syntax:
mdMUReadWriteWriteRecord();
- Return Type:
Void
This function creates a record of the current key and user info and writes it to the key file. This function requires that either the BuildKey or SetKey function, plus the SetUserInfo function, have previously been called. Then this function writes the information stored by the previously listed methods to a new record in the current key file.
The application cannot call this function after the Process function has been called.
Processing#
Process#
- Syntax:
mdMUReadWriteProcess();
- Return Type:
Void
This function switches the Read-Write interface from write mode to read mode. After calling this function, the application can no longer add more records to the key file with the WriteRecord function.
This function takes every record passed via the WriteRecord function and passes them through the Read-Write deduping logic to determine which are unique, which have duplicates, and which are duplicates.
Retrieval#
The methods in this section cycle through each record processed and return output unique/duplicate information.
ReadRecord#
- Syntax:
mdMUReadWriteReadRecord();
- Returns:
Was Record Found
- Return Type:
Integer
This function reads the next record from the key file, if there is another record, and populates the fields used by the following methods:
GetResults
GetCount
GetDupeGroup
GetEntry
GetKey
GetUserInfo.
This function cannot be called until after the application has called the Process function.
If there are no more records, this returns an integer value of zero.
GetCount#
- Syntax:
mdMUReadWriteGetCount();
- Returns:
Total Matching Records
- Return Type:
Integer
If there were matches detected during processing, this function will return an integer value indicating the total number of matching records in this dupe group.
GetDupeGroup#
- Syntax:
mdMUReadWriteGetDupeGroup();
- Returns:
Dupe Group Number
- Return Type:
Long
Every unique record (one with no duplicates) will have a unique “Dupe Group” number. Any duplicate record will be assigned this same number. This function returns the Dupe Group number (a long integer value) of a matching record in the key file.
GetEntry#
- Syntax:
mdMUReadWriteGetEntry();
- Returns:
Current Record Duplicate Entry
- Return Type:
Integer
If the ReadRecord function detected at least one duplicate, this function will return an integer value that indicates where the current record falls within its dupe group.
For example, if this is the sixth matching record found, this function will return a 6.
GetKey#
- Syntax:
mdMUReadWriteGetKey();
- Returns:
Match Key
- Return Type:
String
This function returns the match key created by the last call to the BuildKey function and used by the last call to the ReadRecord function.
GetResults#
- Syntax:
mdMatchUpGetResults();
- Returns:
Result Codes
- Return Type:
String
This function returns a comma-delimited string of four-character codes that detail the output disposition of the last call to the ReadRecord function. It will also contain the result code of any matchcode combination which contributed to the present record matching other records in its dupe group.
This function is intended to replace the GetStatusCode (Deprecated) and GetCombinations (Deprecated) methods, providing a single source of information about the last Process function call and eliminating the need to perform bitwise operations on the GetCombinations (Deprecated) return value to determine which matchcode combinations contributed to the record matching other records in its Dupe Group.
The function returns one or more of the MatchUp Object Result Codes in a comma-delimited list.
GetUserInfo#
- Syntax:
mdMUReadWriteGetUserInfo();
- Returns:
User Info
- Return Type:
String
This function returns a character value containing the value passed to the SetUserInfo function. It returns the unique identifier associated with the record being checked by the Read-Write interface.
The application will need this information if the application has to match the current matchkey back to an original data source.
Deprecated Methods#
The following methods are recorded here for compatability and archiving purposes. Alternative methods should be used in practice.
GetCombinations (Deprecated)#
- Syntax:
mdMUReadWriteGetCombinations();
- Returns:
Matched Combination
- Return Type:
Long
This function is deprecated. You should use the GetResults function instead.
Each matchcode may contain as many as 16 different combinations of data types that may be used to detect a match. A matching record may match more than one combination. This function returns a long integer value that can be used to determine which combination produced the match, if the ReadRecord function detected a matching key.
GetStatusCode (Deprecated)#
This function is deprecated. You should use the GetResults function instead.
Incremental Interface#
Overview#
The Incremental interface is usually used for real-time data entry validation. For example, a call center data-entry system where an operator would like to determine whether or not the caller is an existing customer. At any time, a calling program can pass the Incremental interface the contents of a record; the interface will then report as to whether or not this record is a dupe, and if so, which record or records it matches.
Incremental deduping consists of the following steps:
The program processes a record and sends the specific information (ZIP/PC, Name, Address, etc) to MatchUp Object.
Based on previous records sent to the API, it reports whether or not the record from the first step matches any of these previous records.
Optionally, the application can tell MatchUp Object to add this record to its database for consideration in future comparisons.
The Historical Database
The Incremental interface relies heavily on a historical database that it maintains.
The lifetime of this database is as long as necessary (seconds, days, even years). This database is constructed and maintained by MatchUp Object, so it can determine whether or not an incoming record matches other records fairly quickly.
Multi-User/Multi-Thread Considerations
The Incremental interface is unique in that multiple users or multiple processes can access the same historical database simultaneously. The API maintains a locking system to ensure that competing processes don’t collide. In order for two processes to work in this fashion, the initialization function for each process must specify the same historical database (a.k.a. “key file”).
Transaction-Based Processing
The Incremental interface of MatchUp Object features the option of using transaction-based operations on the historical database. This enables an application to process multiple calls to the AddRecord function as one, speeding up processing of large lists.
Order of Operations#
Basic Steps#
These are the basic steps of a typical implementation of the Incremental interface.
Initialize the Incremental interface.
After creating an instance of the Incremental interface, point the object toward its supporting data file, select a matchcode and key file to use, and initialize these files.
Create field mappings.
In order to build a key to compare to the key file, the Incremental interface needs to know which types of data the program will be passing to the interface and in what order.
Read the record from the data source.
This can be a new address passed from a website, a single record from a newly acquired list or data source, to be compared against the master list.
Build a match key for the incoming record.
This consists of passing the actual data to the interface in the same order used when creating a field mapping. After passing the necessary fields (usually a small subset of the fields from each record) via the AddField function, the Incremental interface uses this information to generate a match key.
Compare the match key to the key file.
The MatchRecord function searches the key file for any keys that match the new record. If it finds a match, it provides information on the duplicate records in the key file.
Write new records to the key file.
The new key, whether or not it is unique, can then be written to the key file, so it can be used for future deduping operations. The program code must also write the new address record to the database separately.
Pseudocode Implementation#
This is a common implementation of the Incremental interface using pseudocode for maximum clarity. Working sample programs in several programming languages can be found on the MatchUp Object install disc.
1. Initialize the Incremental interface
After creating an instance of the Incremental interface, point the object toward its supporting data file, select a matchcode and key file to use, and initialize these files.
First, create a new instance of the Incremental interface.
SET mu = NEW mdMUIncremental
In order to successfully initialize this new instance, point it toward its data files and supply a valid License Key.
CALL mu.SetLicenseString with LicenseString
CALL mu.SetPathToMatchUpFiles with PathToMatchUpFiles
Before initialization, specify which matchcode and key file will be used for the current deduping operation.
CALL mu.SetMatchcodeName with MatchCodeName
CALL mu.SetKeyFile with PathToKeyFile
If all of the above have been set correctly, calling the InitializeDataFiles function should return a value of NoError. If it does not, call the GetInitializeErrorString function to determine the reason for the failure to initialize.
CALL mu.InitializeDataFiles RETURNING initResult
IF initResult is not NoError THEN
PRINT "Initialization Error: " + mu.GetInitializeErrorString
EXIT PROGRAM
END IF
If the initialization was successful, call the following functions to display version and expiration information about the instance of MatchUp Object currently in use on the local computer.
PRINT "Confirming Initialization: " + mu.GetInitializeErrorString
PRINT "Build Number: " + mu.GetBuildNumber
PRINT "Database Date: " + mu.GetDatabaseDate
PRINT "Database Expiration Date: " + mu.GetDatabaseExpirationDate
PRINT "License Expiration Date: " + mu.GetLicenseExpirationDate
2. Create field mappings
Field mappings define a valid incoming type of data for each matchcode component. For example, a typical matchcode may include a five-digit ZIP Code, a last name, and a street address. But the data coming in, however, may contain the city, state, and ZIP as a single character field and the person’s full name as a single field as well.
Even if the fields in a database do not exactly match the components required by the matchcode, MatchUp Object is able to extract only the information it needs.
CALL mu.ClearMappings
After clearing any mappings from a previous use of the Incremental interface, call the AddMapping function once for each field being considered.
CALL mu.AddMapping with mu.CityStZip RETURNING mapOK
CALL mu.AddMapping with mu.FullName RETURNING mapOK
CALL mu.AddMapping with mu.Address RETURNING mapOK
3. Get the record from the data source
Regardless of the source, the object only needs to read the fields containing the data that the Incremental interface needs for comparison.
READ Record FROM database RETURNING userInfo, CityStateZip, TheFullName, StreetAddress
The userInfo field is any identifying character string that is unique to the current record.
Note that the Incremental interface does not handle database input and output. This must be done programmatically using whatever database interface is being used.
4. Build the match key for the current record
Now we use the data from the previous step to construct a match key to use in the next step.
CALL mu.ClearFields
After clearing any data from a previous use of the Incremental interface, call the AddField function once for each field being considered.
CALL mu.AddField with CityStateZip
CALL mu.AddField with TheFullName
CALL mu.AddField with StreetAddress
Pass the userInfo to the interface using the SetUserInfo function.
CALL mu.SetUserInfo with userInfo
The BuildKey function constructs the match key out of the information passed via the AddField function calls.
CALL mu.BuildKey
5. Compare the match key to the key file
The MatchRecord function compares the match key to the key file and determines if the key already exists in the key file.
CALL mu.MatchRecord
Check the GetResults function to determine if the MatchRecord call produced a match, meaning that the current record is a duplicate to another one in the key file.
CALL mu.GetResults RETURNING ResultsCodes
If the record is a duplicate, this code retrieves information about the other duplicate records in the database.
IF ResultsCodes contains “MS03” THEN // Record is a duplicate
PRINT "This record is in the database."
PRINT "There are " + mu.GetCount + " records in dupe group #" + mu.GetDupeGroup
PRINT "It matches these records:"
WHILE mu.NextMatchingRecord
PRINT "#" + mu.GetEntry + " is Record: " + mu.GetUserInfo
ENDWHILE
“Dupe Group” is a number that gets assigned to each unique record. Duplicate records are assigned the same number.
===Add the new key to the key file===
In this example, duplicate records are being rejected while unique records are being added to the database. Depending on the end user needs, the program may handle duplicate records differently.
<pre>
ELSE
CALL mu.AddRecord
Add New Record to master database
ENDIF
The AddRecord function only adds the new key to the key file. To add the data to the database, the developer would need to implement that in code, according to whatever database engine is in use.
Using the Transaction Methods#
The Transaction methods allow MatchUp Object to delay writing changes to the historical database until all records have been compared and all duplicates detected. After a call to the BeginTransaction function, the Incremental interface will cache all calls to the AddRecord function until a call to the CommitTransaction function, which writes all changes to the keyfile in a single operation, significantly speeding up processing.
If any errors are detected, the RollbackTransaction function flushes all AddRecord function calls since the call to the BeginTransaction function and no changes will be written to the historical database.
Below is a simplified example of how transactions work with the Incremental interface.
CALL BeginTransaction
FOR EACH Record in DatabaseTable
READ Record
Build Match Key
CALL MatchRecord
IF Record Is Unique THEN
CALL AddRecord
ENDIF
NEXT Record
IF ERROR
CALL RollbackTransaction
Return
ENDIF
CALL CommitTransaction
Methods#
Initialization - Set#
InitializeDataFiles#
- Syntax:
mdMUIncrementalInitializeDataFiles(String EncodingFormat)
- Returns:
Initialize Status
- Return Type:
ProgramStatus
This function opens the needed data files and prepares the MatchUp Object for use.
Before calling this function, the code must have successfully called the SetLicenseString, SetMatchcodeName (or SetMatchcodeObject) and SetPathToMatchUpFiles functions.
Check the return value of the GetInitializeErrorString function to retrieve the result of the initialization call. Any result other than “No Error” means the initialization failed for some reason.
This returns a value of the enumerated type ProgramStatus.
Value |
Enumeration |
Description |
---|---|---|
0 |
ErrorNone |
No error - Initialization was successful. |
1 |
ErrorConfigFile |
Could not find mdMatchUp.dat. |
2 |
ErrorLicenseExpired |
The License Key has expired. |
3 |
ErrorDatabaseExpired |
The database has expired. |
4 |
ErrorMatchcodeNotSpecified |
No matchcode was specified. |
5 |
ErrorMatchcodeNotFound |
Specified Matchcode does not exist. |
6 |
ErrorInvalidMatchcode |
The specified matchcode is not valid. |
7 |
ErrorKeyFile |
The specified key file was not found. |
If any other value other than “ErrorNone” is returned, check the GetInitializeErrorString function to see the reason for the error.
SetEncoding#
- Syntax:
mdMUIncrementalSetEncoding(String EncodingFormat);
- Return Type:
Void
Optional. This function define encoded format of the input file and the resultant format of the build keys. Input data is converted into the UTF-8 format and processed as such. When GetKey is called, the data is converted back from UTF-8 to the specified input format.
The return value is 1 if the desired encoding can be set or 0 if the encoding is not recognized or can not be set.
It must be called before calling the InitializeDataFiles function.
The default encoding type is “ISO-8859-1”, if this function is not used.
SetKeyFile#
- Syntax:
mdMUIncrementalSetKeyFile(String KeyFile);
- Return Type:
Void
This function selects the name and file path for the key file that will be used for the current Incremental deduping operation.
If the SetMustExist function has been set to True, the string value passed to this function must contain a valid path to an existing key file.
If the SetMustExist function has been set to False, MatchUp Object will create an empty key file if none is found during initialization.
SetLicenseString#
- Syntax:
mdMUIncrementalSetLicenseString(String License);
- Returns:
Status
- Return Type:
Integer
This function passes the License Key required for MatchUp Object Global to function. A value of “1” is returned if the License Key is valid, a value of “0” is returned for an invalid or empty string.
This function is only required if the environment variable method is not used.
Each customer is issued a License Key when purchasing MatchUp Object Global or renewing a subscription. This string must be passed to this function to unlock the functionality of MatchUp Object Global.
The License Key is normally set using an environment variable, either MD_LICENSE or MD_LICENSE_DEMO. Calling this function is an alternative method for setting the License Key, but applications developed for a production environment should only use the environment variable.
When using an environment variable, it is not necessary to call this function.
For more information on setting the environment variable, see Entering Your MatchUp Object Global License Key.
SetMatchcodeName#
- Syntax:
mdMUIncrementalSetMatchcodeName(String MatchcodeName);
- Return Type:
Void
This function selects the matchcode to use for the current Read-Write deduping operation. The SetMatchcodeName function accepts a string value that must match the name of an existing matchcode in the current matchcode file.
SetMatchcodeObject#
- Syntax:
mdMUIncrementalSetMatchcodeObject(mdMUMatchcode);
- Return Type:
Void
This methods selects the matchcode to use for the current Read-Write deduping operation. It largely duplicates the purpose of the SetMatchcodeName function, but instead of accepting a character value containing the name of a matchcode in the current matchcode file, this function accepts a Matchcode object created using the Matchcode Editing interface.
Because this function requires the use of a separate interface to create the Matchcode object variable, it is usually simpler to use the SetMatchcodeName function.
It is possible to use this function to build a new matchcode on the fly using the Matchcode Editing interface. Unless a specific application demands such flexibility, it is usually much simpler to add a new matchcode to the matchcode file and call it using the SetMatchcodeName function. ————————————————————————————————–
SetMaximumCharacterSize#
- Syntax:
mdMUIncrementalSetMaximumCharacterSize(int Size);
- Return Type:
Void
Optional. This function will accommodate UTF-8 input data. Since a single UTF-8 character can be up to 4 bytes long, the storage of the matchcode keys may need to be altered to accommodate this maximum size. However, if you are working in a part of the world where you are sure that a byte sequence will always be shorter, you can use this function to override the storage overhead.
It must be called before calling the InitializeDataFiles function.
Valid input values are 1, 2, 3 or 4. Invalid values will be ignored and the default size (e.g. if this function is not called) is 1.
SetMustExist#
- Syntax:
mdMUIncrementalSetMustExist(Bool Option);
- Return Type:
Void
This function determines whether or not the path specified by the SetKeyFile function must point to an existing key file.
If this option is set to true, initialization of MatchUp Object will fail if the path specified in the SetKeyFile function does not point to an existing key file.
If this option is false, and the path specified in the SetKeyFile function does not point to an existing key file, a new empty key file will be created.
SetPathToMatchUpFiles#
- Syntax:
mdMUIncrementalSetPathToMatchUpFiles(String FilePath);
- Return Type:
Void
String value. This function accepts a string value containing the path to the folder containing the MatchUp data files. It must be called before calling the InitializeDataFiles function.
To provide maximum compatibility with Windows, three files are installed in your ‘Common App Data’ directory:
- Windows Vista and newer
C:\ProgramData\MelissaDATA\MatchUp
- Windows XP
C:\Documents and Settings\All Users\Application Data\Melissa DATA\MatchUp
The location of this directory can be changed by users so please note this, as it can often be the source of issues when running the samples/demos.
Initialization - Get#
GetBuildNumber#
- Syntax:
mdMUIncrementalGetBuildNumber();
- Returns:
Build Number
- Return Type:
String
This function returns the current development release build number of MatchUp Object.
GetDatabaseDate#
- Syntax:
mdMUIncrementalGetDatabaseDate();
- Returns:
Database Date
- Return Type:
String
This function returns a string value that represents the revision date of the MatchUp Object data files.
GetDatabaseExpirationDate#
- Syntax:
mdMUIncrementalGetDatabaseExpirationDate();
- Returns:
Database Expiration Date
- Return Type:
String
This function returns a string value containing the expiration date of the current database file (mdMatchUp.dat).
GetInitializeErrorString#
- Syntax:
mdMUIncrementalGetInitializeErrorString();
- Returns:
Initialize Status
- Return Type:
String
This function returns a string to describe the error caused when the InitializeDataFiles function cannot be successfully called.
The possible strings returned by this function are:
“No error”
“Could not find mdMatchUp.dat.”
“The License Key has expired.”
“The database has expired.”
“No matchcode was specified.”
“Specified Matchcode does not exist.”
“The specified matchcode is not valid.”
“The specified key file was not found.”
GetLicenseExpirationDate#
- Syntax:
mdMUIncrementalGetLicenseExpirationDate();
- Returns:
License Expiration Date
- Return Type:
String
This function returns a string value containing the expiration date of the current License Key. After this date, MatchUp Object will no longer function.
Mapping#
Before generating match keys for the records in the database, the code must supply the Incremental interface with information about what sort of data it will be handling.
AddMapping#
- Syntax:
mdMUIncrementalAddMapping(mdMU.mdMatchUpMatchmodeMapping);
- Returns:
Mapping Value
- Return Type:
Integer
Applying the two above examples to a matchcode that uses 5-digit ZIP codes, street addresses, last and first names, in that order, use the following mappings:
mapOK = mdMU->AddMapping(mdMU.CityStZip)// uses only ZIP Code
mapOK = mdMU->AddMapping(mdMU.FullName) // uses last name only
mapOK = mdMU->AddMapping(mdMU.FullName) // uses first name only
mapOK = mdMU->AddMapping(mdMU.Address)
ClearMappings#
- Syntax:
mdMUIncrementalClearMappings();
- Return Type:
Void
This function clears any existing field mappings.
It is a good idea to call this function before beginning to map fields, especially if the application may be required to perform multiple deduping operations in a single session.
Match Key#
The following functions take the real data being compared and construct a match key according to the mappings defined with the above functions and the matchcode specified when the Incremental deduper was initialized.
AddField#
- Syntax:
mdMUIncrementalAddField(String Field);
- Return Type:
Void
The following example expands on the AddMapping example. The matchcode uses five-digit ZIP codes, the street addresses, last and first names, in that order. The database contains a single “City/ST/ZIP” and a single full name field:
mdMU->AddField("Rancho Santa Margarita, CA 92688")
mdMU->AddField("Raymond F. Melissa")
mdMU->AddField("Raymond F. Melissa")
mdMU->AddField("22382 Avenida Empresa")
The deduper would use only the ZIP Code from the first AddField mapping, the last name from the second mapping, the first name from the third, etc.
BuildKey#
- Syntax:
mdMUIncrementalBuildKey();
- Return Type:
Void
For example, let’s assume the matchcode called for a five-digit ZIP Code, first ten characters of a last name, a street number and the first ten characters of a street name. The current record is for Raymond F. Melissa at 22382 Avenida Empresa in the 92688 ZIP Code. The match key would be:
92688MELISSA RAYMOND 22382EMPRESA
Because “Empresa” is only seven characters, the key would be padded with three spaces at the end.
ClearFields#
- Syntax:
mdMUIncrementalClearFields();
- Return Type:
Void
This function clears all values from previous calls to the AddField function.
To ensure that no extraneous information carried over from one record to the next, call this function after calling the BuildKey function or before the first call to the AddField function.
SetKey#
- Syntax:
mdMUIncrementalClearMappings(String MatchKey);
- Return Type:
Void
This function accepts a match key before calling the MatchRecord function.
The BuildKey function creates a key from input data. If, however, the match keys are already stored in the source database, use this function to pass the keys to the deduper before calling MatchRecord.
SetUserInfo#
- Syntax:
mdMUIncrementalSetUserInfo(String UserInfo);
- Return Type:
Void
This function accepts a character value that uniquely identifies each record in a set of data. The character value passed to this function must be unique for every record. This enables you to associate the match key in the key file to the corresponding record in the database.
Comparison#
These functions compare the new key with the existing key file and, if a duplicate is found, return information about the duplicate records in the file.
GetCount#
- Syntax:
mdMUIncrementalGetCount();
- Returns:
Total Matching Records
- Return Type:
Integer
This function returns an integer value indicating the number of records in the key file that matched the current key.
If there were matches detected during a call to the MatchRecord function, this function will return a integer value equalling the number of duplicate keys found.
GetDupeGroup#
- Syntax:
mdMUIncrementalGetDupeGroup();
- Returns:
Dupe Group Number
- Return Type:
Long
This function returns a long integer value indicating the group of duplicate records that the current key matches.
Every unique record (one with no duplicates) will have a unique “Dupe Group” number.
Any duplicate record will be assigned the same number. This function returns the Dupe Group number of a matching record in the key file.
GetEntry#
- Syntax:
mdMUIncrementalGetEntry();
- Returns:
Current Record Duplicate Entry
- Return Type:
Integer
This function returns an integer value indicating where the current record falls within the order of its dupe group.
For example, if this is the sixth matching record found, this function will return a “6”.
GetKey#
- Syntax:
mdMUIncrementalGetKey();
- Returns:
Match Key
- Return Type:
String
This function returns the match key created by the last call to the BuildKey function and used by the last call to the MatchRecord function.
GetResults#
- Syntax:
mdMatchUpGetResults();
- Returns:
Result Codes
- Return Type:
String
This function returns a comma-delimited string of four-character codes that detail the output disposition of the last call to the MatchRecord function. It will also contain the result code of any matchcode combination which contributed to the present record matching other records in its dupe group.
This function is intended to replace the GetStatusCode and GetCombinations functions, providing a single source of information about the last MatchRecord function call and eliminating the need to perform bitwise operations on the GetCombinations return value to determine which matchcode combinations contributed to the record matching other records in its Dupe Group.
The function returns one or more of the MatchUp Object Result Codes in a comma-delimited list.
GetUserInfo#
- Syntax:
mdMUIncrementalGetUserInfo();
- Returns:
User Info
- Return Type:
String
This function returns the unique identifier associated with the record being checked by the Incremental interface. This is the same value passed to the SetUserInfo function.
The Incremental interface will need this information to match the current match key back to an original data source.
MatchRecord#
- Syntax:
mdMUIncrementalMatchRecord();
- Return Type:
Void
This function compares the current match key to the keys in the key file and determines if this key matches a record that is already in the file.
If it is not a duplicate, a typical program would call the AddRecord function to add this key to the current key file.
If it is a duplicate, you can use the GetKey, GetCount, GetDupeGroup, GetResults, and GetEntry functions to gather information about the existing duplicate records in the file.
NextMatchingRecord#
- Syntax:
mdMUIncrementalNextMatchingRecord();
- Returns:
1 if there is another match; 0 if no match
- Return Type:
Integer
This function recalls the match data about the next record in the key file that matches the current search key.
After the MatchRecord function has detected a match between the input record, use this function to loop through all of the matching records in the key file, returning the match data for each record.
This function returns a true value if there is another matching record and false if there are no more matching records. This gives you the option of using a WHILE loop to repeatedly call the GetEntry, GetResults, GetDupeGroup, and GetKey functions for each matching record.
Key File#
AddRecord#
- Syntax:
mdMUIncrementalAddRecord();
- Return Type:
Void
This function compares the key generated by the most recent call to the BuildKey function to ones previously submitted to the incremental deduper and reports on whether or not any matches were found. It appends the key to the current key file. A typical application would use this function to add a new unique record to the key file if no duplicate was found by the last call to the MatchRecord function. It is also used as a common way to add transactions (adding the data relative to the key) to a master database, while flagging and linking it to existing matching records. This is also known as record linking without deduping.
Note: This function does not add the current record to the database. It only appends a new key to the key file.
Transaction#
These functions enable the Incremental interface to use transactions, processing multiple calls to the AddRecord function before committing the changes to the key file.
BeginTransaction#
- Syntax:
mdMUIncrementalBeginTransaction();
- Returns:
true if transaction intialization succeeds
- Return Type:
Boolean
This function tells MatchUp Object to wrap subsequent multiple calls to the AddRecord function with a transaction block.
This will greatly speed up processing when adding large numbers of records in the Incremental processor because Records will not be physically written to the Incremental database until the CommitTransaction function is called. The transaction functions are used in the same way that BEGIN, COMMIT and ROLLBACK are used in SQL.
Even though the keys have not been permanently added, Records will still be matched properly. However, other running processes that may be matching against the same database WILL NOT see these new records until after a call to the CommitTransaction function. Thus, transaction processing should not be used if multiple threads, processes, users, or machines are accessing the same Incremental database.
This function returns true when transaction processing is successfully initialized.
CommitTransaction#
- Syntax:
mdMUIncrementalCommitTransaction();
- Returns:
true if transaction completes and exits successfully
- Return Type:
Boolean
This function tells MatchUp Object to commit, or add, the previous calls to the AddRecord function to the key file since the BeginTransaction function was called.
This will greatly speed up processing when adding large numbers of records in the Incremental processor because Records will not be physically written to the Incremental database until this function is called. The transaction functions are used in the same way that BEGIN, COMMIT and ROLLBACK are used in SQL.
Even though the keys have not been permanently added, Records will still be matched properly. However, other running processes that may be matching against the same database WILL NOT see these new records until after the call to this function. Thus, transaction processing should not be used if multiple threads, processes, users, or machines are accessing the same Incremental database.
This function returns true when an existing transaction has successfully completed.
RollbackTransaction#
- Syntax:
mdMUIncrementalRollbackTransaction();
- Returns:
true if a successful rollback is completed
- Return Type:
Boolean
This function enables you to roll back or erase the previous calls to the AddRecord function from the last call BeginTransaction function. A boolean value of true is returned if a successful rollback is completed.
Even though this function will ensure that the keys have not been permanently added, Records will still be matched properly. Therefore, other running processes that may be matching against the same database will not see these new records. Thus, transaction processing SHOULD NOT be used if multiple threads/processes/ users/machines are accessing the same Incremental database.
This function may also be used when an error is raised with an input or master database, prompting you to gracefully rollback the key file to the point before the current problematic data was processed.
Deprecated Methods#
The following methods are recorded here for compatability and archiving purposes. Alternative methods should be used in practice.
GetCombinations (Deprecated)#
This function is deprecated. You should use the GetResults function instead.
GetStatusCode (Deprecated)#
This function is deprecated. You should use the GetResults function instead.
SetReserved (Deprecated)#
- Syntax:
mdMUIncrementalSetReserved(String Property, String Value);
- Returns:
true if a successful rollback is completed
- Return Type:
Boolean
The unique identifier set by SetUserInfo and attached to built match keys is 1024 bytes by default. Overriding this size with SetReserved(“UserInfosize”,”size”) is deprecated for the Incremental interface and a call to this method will be ignored. Newer Incremental key storage mechanisms have made this parameter obsolete.
Optional. This function accepts two parameters: A string parameter containing the property to be set; and a string parameter representing the value for the property being set.
It must be called before calling the InitializeDataFiles function.
Hybrid Interface#
Overview#
The Hybrid interface differs from the Incremental and Read-Write interfaces in that it does not maintain a key file of its own. It is up to the developer to maintain a list of match keys to use for deduping operations. This increases the flexibility of the Hybrid interface but at the expense of programming complexity.
The main advantage of Hybrid deduping is that it allows the developer to build smaller lists of match keys on the fly and quickly compare records to a small subset of the database.
Clustering
The concept of Clustering, outlined in the key concepts section, is essential to the Hybrid interface. Unlike the other interfaces, where the clustering is taking place behind the scenes, the Hybrid interface allows the developer to use clustering to compare a record against only a small portion of a list.
The Hybrid interface uses the concept of a cluster size, which is the maximum number of characters at the beginning of a key that can be used to group a number of keys into smaller groups that can be compared against each other. For example, a cluster size of 5 means that the first five characters of a match key are used to create the clusters.
In other words, only the records where the first five characters of the match key for one record are identical to the first five characters of the match key for another record are considered when performing a Hybrid deduping operation.
Key Maintenance
Unlike the other interfaces, the Hybrid interface does not automatically handle the read/write operations to a key file. While this forces the developer to do more work, it allows a great deal of flexibility in how match keys are stored and handled.
In the previous example, with a cluster size of 5, if the match keys are stored in a field within a SQL database, a cluster could be built quickly by performing a SELECT query where the first five characters of the match key field matches the first five characters of the match key for the new record.
While this gives the developer far more flexibility, it also requires a great deal more coding and a greater understanding of certain MatchUp concepts.
Order of Operations#
Using the Hybrid interface allows for greater flexibility than the other interfaces, as it gives you more control to handle storage and management of match keys.
Basic Steps#
These are the basic steps of a typical implementation of the Hybrid interface.
Initialize the Hybrid interface.
After creating an instance of the Hybrid interface, point the object toward its supporting data file, select a matchcode to use, and initialize these files.
Create field mappings.
In order to build keys to compare, the Hybrid interface needs to know which types of data the program will be passing to the interface and in what order.
Build a master list of keys.
Each record must have a match key so the Hybrid interface can select a cluster of records or check for duplicates. This consists of passing the data used in record comparison from each record to the interface in the same order used when creating a field mapping. After passing the necessary fields (usually a small subset of the fields from each record) via the AddField function, the Hybrid interface uses this information to generate a match key.
Build a match key for the new address record.
Repeat the step above to create a match key for the record to be compared against the cluster.
Build the cluster list.
Cycle through the master key list, extract only those records where the first part of the match key equals the first part of the match key for the new record.
Compare the match key to the cluster list.
Loop through the cluster key file for any keys that match the new record. If it finds a match, the CompareKeys function indicates a match.
Pseudocode Implementation#
This is a common implementation of the Hybrid interface using pseudocode for maximum clarity. Working sample programs in several programming languages can be found on the MatchUp Object install disc
1. Initialize the Hybrid interface
After creating an instance of the Hybrid interface, point the object toward its supporting data file, select a matchcode and key file to use, and initialize these files.
First, create a new instance of the Hybrid interface.
SET mu = NEW mdMUHybrid
In order to successfully initialize this new instance, point it toward its data files and supply a valid License Key. Also, select a matchcode, by name, before initializing.
CALL mu.SetLicenseString with LicenseString
CALL mu.SetPathToMatchUpFiles with DataPath
CALL mu.SetMatchcodeName with MatchcodeName
If all of the above have been set correctly, calling the InitializeDataFiles function should return a ProgramStatus value of ErrorNone. If it does not, call the GetInitializeErrorString function to determine the reason for the failure to initialize.
CALL mu.InitializeDataFiles RETURNING ProgramStatus
IF ProgramStatus is not ErrorNone THEN
CALL mu.GetInitializeErrorString RETURNING ErrorMsg
Display ErrorMsg
Exit Routine
END IF
If the initialization was successful, call the following functions to display version and expiration information about the instance of MatchUp Object currently in use on the local computer.
PRINT "Confirming Initialization: " + mu.GetInitializeErrorString
PRINT "Build Number: " + mu.GetBuildNumber
PRINT "Database Date: " + mu.GetDatabaseDate
PRINT "Database Expiration Date: " + mu.GetDatabaseExpirationDate
PRINT "License Expiration Date: " + mu.GetLicenseExpirationDate
2. Create field mappings
Field mappings define which types of data the Hybrid interface is expecting. For example, a typical matchcode may include a five-digit ZIP Code, a last name, and a street address. The data coming in, however, may contain the city, state, and ZIP as a single character field and the person’s full name as a single field as well.
As long as MatchUp Object knows what kind of data is being passed to it, the object is smart enough to pull what it needs from the data supplied to it.
CALL mu.ClearMappings
After clearing any mappings from a previous use of the Hybrid interface, call the AddMapping function once for each field being considered.
CALL mu.AddMapping with mu.Zip9
CALL mu.AddMapping with mu.First
CALL mu.AddMapping with mu.Last
CALL mu.AddMapping with mu.Address
3. Create Master Key File
Unlike the Incremental and Read-Write interfaces, the Hybrid interface requires the developer to maintain a list of keys for the deduping operation. In this example, the keys are stored in a text file generated on the fly.
Open KeyFile as text file for writing
Each record is read from the database, converted to a match key and written to the text file.
FOR EACH Record in Database
Read Zip9, FirstName, LastName, StreetAddress fields from database
CALL mu.ClearFields
CALL mu.AddField with Zip9
CALL mu.AddField with FirstName
CALL mu.AddField with LastName
CALL mu.AddField with StreetAddress
CALL mu.BuildKey
CALL mu.GetKey RETURNING Key
Write Key to KeyFile
NEXT
Close KeyFile
4. Create the Match Key for the Input Data
The next step is to take the record that is to be checked and create a match key for it.
GET Zip9, FirstName, LastName, StreetAddress from data source
CALL mu.ClearFields
CALL mu.AddField with Zip9
CALL mu.AddField with FirstName
CALL mu.AddField with LastName
CALL mu.AddField with StreetAddress
CALL mu.BuildKey
CALL mu.GetKey RETURNING Key
5. Create the Cluster List
Use the key generated in the last step to select only those records where the first part of the match key matches the same part of the match key for the record to be checked. The size of the portion of the match key to be checked is determined by the GetClusterSize function.
CALL mu.GetKeySize RETURNING KeySize
CALL mu.GetClusterSize RETURNING ClusterSize
SET ClusterKey = Left part of Key, size = ClusterSize
ClusterKey is a string, with a length equalling ClusterSize, used to match the first part of the match key from the input record. Cycle through the key list and create a cluster of only those records that match the cluster key to a new text file.
Open KeyFile for reading
FOR EACH Record in KeyFile
Read MasterKey
IF First ClusterSize characters of MasterKey = ClusterKey THEN
ADD Record to Cluster
END IF
NEXT
Close KeyFile
6. Check Input Record Against Cluster List
With the cluster list built, check the whole key for the input record against each line of the cluster list, using the CompareKeys function to determine if there was a match.
FOR EACH Record in Cluster
Read MatchKey
CALL mu.CompareKey with MasterKey, MatchKey RETURNING NoError
IF NoError is True
PRINT MasterKey matches MatchKey
END IF
NEXT
Methods#
Initialization - Set#
The following functions prepare the Hybrid interface for use and link it to its supporting data files.
InitializeDataFiles#
- Syntax:
mdMUHybridInitializeDataFiles();
- Return Type:
Void
This function opens the needed data files and prepares the MatchUp Object for use.
Before calling this function, the application must have successfully called the SetLicenseString and SetPathToMatchUpFiles functions.
Check the return value of the GetInitializeErrorString function to retrieve the result of the initialization call. Any result other than “No Error” means the initialization failed for some reason.
This returns a value of the enumerated type ProgramStatus.
Value |
Enumeration |
Description |
---|---|---|
0 |
ErrorNone |
No error - Initialization was successful. |
1 |
ErrorConfigFile |
Could not find mdMatchUp.dat. |
2 |
ErrorLicenseExpired |
The License Key has expired. |
3 |
ErrorDatabaseExpired |
The database has expired. |
4 |
ErrorMatchcodeNotSpecified |
No matchcode was specified. |
5 |
ErrorMatchcodeNotFound |
Specified Matchcode does not exist. |
6 |
ErrorInvalidMatchcode |
The specified matchcode is not valid. |
If any other value other than “ErrorNone” is returned, check the GetInitializeErrorString function to see the reason for the error.
SetEncoding#
- Syntax:
mdMUHybridSetEncoding(String EncodingFormat);
- Return Type:
Void
Optional. This function define encoded format of the input file and the resultant format of the build keys. Input data is converted into the UTF-8 format and processed as such. When GetKey is called, the data is converted back from UTF-8 to the specified input format.
The return value is 1 if the desired encoding can be set or 0 if the encoding is not recognized or can not be set.
It must be called before calling the InitializeDataFiles function.
The default encoding type is “ISO-8859-1”, if this function is not used.
SetLicenseString#
- Syntax:
mdMUHybridSetLicenseString(String License);
- Returns:
Status
- Return Type:
Integer
This function passes the License Key required for MatchUp Object to function. A value of “1” is returned if the License Key is valid, a vlue of “0” is returned for an invalid or empty string.
This function is only required if the environment variable method is not used.
Each customer is issued a License Key when purchasing MatchUp Object or renewing a subscription. This string must be passed to this function to unlock the functionality of MatchUp Object.
The License Key is normally set using an environment variable, either MD_LICENSE or MD_LICENSE_DEMO. Calling this function is an alternative method for setting the License Key, but applications developed for a production environment should only use the environment variable.
When using an environment variable, it is not necessary to call this function.
For more information on setting the environment variable, see Entering Your MatchUp Object Global License Key.
SetMatchcodeName#
- Syntax:
mdMUHybridSetMatchcodeName(String MatchcodeName);
- Return Type:
Void
This function accepts a string value that must match the name of an existing matchcode in the current matchcode file.
SetMatchcodeObject#
- Syntax:
mdMUHybridSetMatchcodeObject(mdMUMatchcode);
- Return Type:
Void
This functions selects the matchcode to use for the current Hybrid deduping operation.
This function largely duplicates the purpose of the SetMatchcodeName function, but instead of accepting a character value containing the name of a matchcode in the current matchcode file, this function accepts a Matchcode object created using the Matchcode Editor interface.
Because this function requires the use of a separate interface to create the Matchcode object variable, it is normally simpler to use the SetMatchcodeName function.
It is possible, however, to use this function to build a new matchcode on the fly using the Matchcode Editor interface. Unless a specific application demands such flexibility, it is usually much simpler to add a new matchcode to the matchcode file and call it using the SetMatchcodeName function.
SetMaximumCharacterSize#
- Syntax:
mdMUHybridSetMaximumCharacterSize(int Size);
- Return Type:
Void
Optional. This function will accommodate UTF-8 input data. Since a single UTF-8 character can be up to 4 bytes long, the storage of the matchcode keys may need to be altered to accommodate this maximum size. However, if you are working in a part of the world where you are sure that a byte sequence will always be shorter, you can use this function to override the storage overhead.
It must be called before calling the InitializeDataFiles function.
Valid input values are 1, 2, 3 or 4. Invalid values will be ignored and the default size (e.g. if this function is not called) is 1.
SetPathToMatchUpFiles#
- Syntax:
mdMUHybridSetPathToMatchUpFiles(String FilePath);
- Return Type:
Void
This function accepts a string value containing the path to the folder containing the MatchUp Hybrid data files.
This function must be called before calling the InitializeDataFiles function.
To provide maximum compatibility with Windows, three files are installed in your ‘Common App Data’ directory.
- Windows Vista and newer
C:\ProgramData\MelissaDATA\MatchUp
- Windows XP
C:\Documents and Settings\All Users\Application Data\Melissa DATA\MatchUp
The location of this directory can be changed by users so please note this, as it can often be the source of issues when running the samples/demos.
SetReserved#
- Syntax:
mdMUHybridSetReserved(String Property, String Value);
- Return Type:
Void
Optional. This function accepts two parameters: A string parameter containing the property to be set; and a string parameter representing the value for the property being set.
It must be called before calling the InitializeDataFiles function.
Initialization - Get#
GetBuildNumber#
- Syntax:
mdMUHybridGetBuildNumber();
- Returns:
Build Number
- Return Type:
String
This function returns the current development release build number of MatchUp Object.
GetDatabaseDate#
- Syntax:
mdMUHybridGetDatabaseDate();
- Returns:
Database Date
- Return Type:
String
This function returns a string value that represents the revision date of the MatchUp Object data files.
GetDatabaseExpirationDate#
- Syntax:
mdMUHybridGetDatabaseExpirationDate();
- Returns:
Database Expiration Date
- Return Type:
String
This function returns a string value containing the expiration date of the current database fil (mdMatchUp.dat).
GetInitializeErrorString#
- Syntax:
mdMUHybridGetInitializeErrorString();
- Returns:
Initialize Status
- Return Type:
String
This function returns a string to describe the error caused when the InitializeDataFiles function cannot be successfully called.
The possible strings returned by this function are:
“No error”
“Could not find mdMatchUp.dat.”
“The License Key has expired.”
“The database has expired.”
“No matchcode was specified.”
“Specified Matchcode does not exist.”
“The specified matchcode is not valid.”
“The specified key file was not found.”
This function returns a string describing the error caused when the InitializeDataFiles function cannot be called successfully.
GetLicenseExpirationDate#
- Syntax:
mdMUHybridGetLicenseExpirationDate();
- Returns:
License Expiration Date
- Return Type:
String
This function returns a string value containing the expiration date of the current License Key. After this date, MatchUp Object will no longer function.
Mapping#
Before generating match keys for the records in the database, the code must supply the Hybrid with information about what sort of data it will be handling.
AddMapping#
- Syntax:
mdMUHybridAddMapping(mdMU.mdMatchUpMatchmodeMapping);
- Returns:
Mapping Value
- Return Type:
Integer
This function selects the types of fields that will be used to build the match key and the order in which they will be added using the AddField function.
The function accepts an enumerated value of the type MatchcodeMapping. It tells the Hybrid interface which data types will be used for this deduping operation and in what order they will be passed to the deduper when passing data using the AddField function.
The data types used must contain the data expected by the matchcode being used, but it does not have to be an exact match. For example, if the matchcode requires a five-digit ZIP Code but the database contains a single “City/State/ZIP” field, simply add the CityStZip mapping and pass the full string to the AddField function later. MatchUp Object is smart enough to use only the information it needs.
In another example, a matchcode calls for both last name and first name but database contains only full names. The application would simply apply the FullName mapping twice and pass the full name data twice to the AddField function.
Let’s apply the two above examples to a matchcode that uses 5-digit ZIP codes, street addresses, last and first names, in that order:
mdMU->AddMapping(mdMU.CityStZip) // uses only ZIP Code
mdMU->AddMapping(mdMU.FullName) // uses last name only
mdMU->AddMapping(mdMU.FullName) // uses first name only
mdMU->AddMapping(mdMU.Address)
For a list of these enumerations, see Matchcode Mapping Enumerations.
The function returns a non-zero value if the mapping is allowed by the selected matchcode, false if the mapping caused an error.
ClearMappings#
- Syntax:
mdMUHybridClearMappings();
- Return Type:
Void
This function clears any existing field mappings.
It is a good idea to call this function before beginning to map fields, especially if the application is required to perform multiple deduping operations in a single session.
Match Key#
The following functions gather the input data and use it to generate match keys according to the mapping and the selected matchcode.
AddField#
- Syntax:
mdMUHybridAddField(String Field);
- Return Type:
Void
This function passes a component of data to the deduper prior to calling the BuildKey function.
Fields must be passed to this function in the same order that the corresponding data types were mapped using the AddMapping function.
For example, if the matchcode uses five-digit ZIP codes, last and first names, and the street addresses, in that order. Then the file will only include a single “City/ST/ZIP” and a single full name field:
mdMU->AddField("Rancho Santa Margarita, CA 92688")
mdMU->AddField("Raymond F. Melissa")
mdMU->AddField("Raymond F. Melissa")
mdMU->AddField("22382 Avenida Empresa")
The interface would use only the ZIP Code from the first field, the last name from the second and first name from the third.
BuildKey#
- Syntax:
mdMUHybridBuildKey();
- Return Type:
Void
This function takes the information passed via calls to the AddField function and, using the mapping defined by the AddMapping function and the pattern defined by the matchcode being used, builds a match key.
A match key is a character string built according to a pattern defined by the current matchcode, consisting only of enough information to determine if the current record is unique or has a duplicate within the key file.
For example, assuming we have a matchcode called for a five-digit ZIP Code, first ten characters of a last name, a street number and the first ten characters of a street name. The current record is for Raymond F. Melissa at 22382 Avenida Empresa in the 92688 ZIP Code. The match key would be:
92688MELISSA RAYMOND 22382EMPRESA
Because “Empresa” is only seven characters, the key would be padded with three spaces at the end.
ClearFields#
- Syntax:
mdMUHybridClearFields();
- Return Type:
Void
Use this function before the first call to AddField function for each record or after calling the BuildKey function.
GetKey#
- Syntax:
mdMUHybridGetKey();
- Return Type:
Void
This function returns a string value containing the match key generated by the most recent call to the BuildKey function.
Use this function to recall the most recently generated match key before writing it to a key file or passing it to the CompareKeys function.
Comparison#
Use the following functions to determine how much of each match key will be used to select records for the cluster and compare the input data to the keys in the cluster.
CompareKeys#
- Syntax:
mdMUHybridCompareKeys();
- Returns:
1 if match; 0 if no match
- Return Type:
Long
This function compares two match keys and returns a long integer value indicating whether they match.
If this function does not find a match, the return value will be zero. If there was a match, this function returns an unsigned long integer value indicating which combination or combinations within the matchcode produced the match.
Each bit of the integer value matches a specific combination. Use a logical AND operation to determine if a particular combination produced the match. For example:
IF (mdMU->GetCombination AND 0x8000)
THEN Print "Combo 16 Matched"
Although this function does not return an enumerated value, it does use the same values as the MatchcodeCombination enumeration used by the Matchcode Editing interface.
For a list of these enumerations, see Matchcode Combinations Enumerations.
This function does not return any information about dupe groups or the number of duplicate records. This is because the Hybrid interface does not keep track of matching records (it is up to you to do this).
GetClusterSize#
- Syntax:
mdMUHybridGetClusterSize();
- Returns:
Cluster Size
- Return Type:
Integer
This function returns an integer value indicating the maximum size of the portion of the match key that can be used for clustering.
Use this function to determine how much of the key to use for comparison when building the cluster file, a subset of keys from your master database.
GetKeySize#
- Syntax:
mdMUHybridGetKeySize();
- Returns:
Key Size
- Return Type:
Integer
This function returns an integer value indicating the number of characters in a key generated using the matchcode selected using the SetMatchcodeName function.
This function can be useful for determining field sizes or how much memory will need to be allocated, if the programming language requires the developer to handle memory management.
GetResults#
- Syntax:
mdMatchUpGetResults();
- Returns:
Result Codes
- Return Type:
String
This function returns a comma-delimited string of four-character codes that detail the output disposition of the last call to the CompareKeys function. It will also contain the result code of any matchcode combination which contributed to the present record matching other records in its dupe group.
This function is intended to add new functionality of a returned Status Code and matchcode Combinations previously available by evaluating the CompareKeys return value, providing a single source of information about the last CompareKeys function call, and eliminating the need to perform bitwise operations on the return value to determine which matchcode combinations contributed to the record matching the other match key.
The function returns one or more of the MatchUp Object Result Codes in a comma-delimited list.
Matchcode Interface#
Overview#
The Matchcode Editor for Windows will handle the task of creating and modifying matchcodes for many situations. The editor application will also run on a Linux system under WINE.
However, because MatchUp Object works across multiple platforms and not all users will have access to a Windows emulator, MatchUp Object also includes an interface for creating, modifying and viewing matchcodes programmatically on any system.
The Matchcode Interface also enables applications to create and use matchcodes on the fly, if this becomes advantageous to do.
Creating matchcodes programmatically is more complicated and advanced than using the Windows GUI editor. The Windows Matchcode Editor handles error checking and enforces many of the rules described in the chapter on matchcodes, while the Matchcode Interface returns the necessary error codes to detect such problems but requires that the developer implement the necessary error handling.
Using the Matchcode Interface requires a more thorough understanding of matchcodes and how they are used by MatchUp Object. We recommend you carefully reading the section on Matchcodes before attempting to use the features of this interface.
Order of Operations#
Creating Matchcodes Basic Steps#
Using the Matchcode interface is not overly complicated but there are many options that must be considered for matchcodes and matchcode components.
These are the basic steps of a typical implementation of the Matchcode interface.
Initialize the Matchcode interface and set the data path.
The Matchcode interface does not require a License Key so this step is much simpler than with the other interfaces.
Create a new matchcode.
The CreateNewMatchcode function creates a new, blank matchcode for editing.
Create new matchcode components.
Matchcode components are created as class variables. Create an instance of the MatchcodeComponent for each component.
Set the options for each matchcode component.
Use the functions of the matchcode component class to select the options for the component type, size, matching strategy, swap pair, and to which combinations the component belongs.
Add the components to the new matchcode.
Use the AddMatchcodeItem function to add the component to the new matchcode. At this point, the Matchcode interface checks the component for errors.
Save the changes to the matchcode file.
The Matchcode interface can either save the changes to the original matchcode file or to a new copy of the file.
Creating Matchcodes Pseudocode Implementation#
This is a simplified code sample written in pseudocode showing the creation of a basic matchcode using the Matchcode interface.
1. Initialize the Matchcode interface
Initialization consists of creating a new instance of the Matchcode class and connecting that instance to the data files. The Matchcode interface does not require a License Key, so many of the functions found in the other interfaces are not present here.
SET mc = NEW mdMUMatchcode
CALL mc.SetPathToMatchUpFiles with PathToMatchUpFiles
IF mc.ErrorNone == mc.InitializeDataFiles THEN
PRINT "Initializing DataFiles..."
PRINT "Confirm Init Non-Error: " + mc.GetInitializeErrorString
ELSE
PRINT "Init Error: " + mc.GetInitializeErrorString
ENDIF
2. Create a new matchcode
A new matchcode requires a name. This program forces the users to keep entering a name until a valid name is entered.
REPEAT
PRINT "Enter New Matchcode Name: "
GET INPUT NewMatchCodeName
CALL mc.CreateNewMatchcode with NewMatchCodeName RETURNING Created
IF Created = 0 THEN PRINT "Could Not Create Matchcode!"
UNTIL Created <> 0
3. Create new matchcode components
Matchcode components are created as instances of the MatchcodeComponent class. Another approach would be to create an array or simply create and add each component as part of a loop.
SET mcComp, mcComp2, mcComp3, mcComp4, mcComp5 = NEW mdMUMatchcodeComponent
4. Set the options for each matchcode component
Use the functions of the matchcode component class to select the options for the component type, size, matching strategy, swap pair, and to which combinations the component belongs.
CALL mcComp.SetComponentType WITH MatchCodeComponentType.Zip5
CALL mcComp.SetStart WITH MatchcodeStart.Left
CALL mcComp.SetFuzzy WITH MatchcodeFuzzy.Exact
CALL mcComp.SetSwap WITH MatchcodeSwap.NoSwap
CALL mcComp.SetFieldMatch WITH MatchcodeFieldMatch.NoFieldMatch
CALL mcComp.SetSize WITH 5
CALL mcComp.SetCombination WITH (MatchcodeCombination.Combo1 OR MatchcodeCombination.Combo2)
5. Add the components to the new matchcode
Use the AddMatchcodeItem function to add the component to the new matchcode. At this point, the Matchcode Editing Matchcode interface allows you to check for errors, when attempting to add this components.
CALL mc.AddMatchcodeItem WITH mcComp RETURNING mcCompAdded
IF mcCompAdded <> 0 THEN PRINT "Component Added."
6. Repeat for each component
Repeat steps 4 & 5 for each component.
CALL mcComp2.SetComponentType WITH MatchCodeComponentType.Last
CALL mcComp2.SetStart WITH MatchcodeStart.Left
CALL mcComp2.SetFuzzy WITH MatchcodeFuzzy.AccurateNear
CALL mcComp2.SetNear WITH 1
CALL mcComp2.SetSwap WITH MatchcodeSwap.NoSwap
CALL mcComp2.SetFieldMatch WITH mcComp.BothBlankMatch
CALL mcComp2.SetSize WITH 7
CALL mcComp2.SetCombination WITH (MatchcodeCombination.Combo1 OR MatchcodeCombination.Combo2)
CALL mc.AddMatchcodeItem WITH mcComp2 RETURNING mcCompAdded
IF mcCompAdd <> 0 THEN PRINT "Component Added"
Repeat until all five components have been created and added to the current matchcode.
7. Save the changes to the matchcode file
The Matchcode Editing Matchcode interface can either save the changes to the original matchcode file or to a new copy of the file.
CALL mc.Save
Reading Matchcodes Basic Steps#
Reading the matchcode file is simpler in that it requires no thought about the rules for matchcodes, but it does require some programming to translate the values returned into meaningful information.
Initialize the Matchcode interface and set the data path.
The Matchcode interface does not require a License Key so this step is much simpler than with the other interfaces.
Retrieve the matchcode.
The FindMatchcode function loads the specified matchcode into memory.
Begin cycling through every component in the matchcode.
The Matchcode interface returns the number of components in the current matchcode. Use this number to loop through all of the components, assigning each component into to a MatchcodeComponent class variable.
Retrieve the component settings.
Call the MatchcodeComponent functions that return the settings for each component and, if necessary, translate them into meaningful information.
Reading Matchcodes Pseudocode Implementation#
These functions would normally be used in conjunction with those for creating and modifying matchcodes but, for simplicity and clarity, this section will concentrate solely on showing how to retrieve the matchcode and component information.
1. Initialize the Matchcode interface
Initialization consists of creating a new instance of the Matchcode class and connecting that instance to the data files. The Matchcode interface does not require a License Key so many of the functions found in the other interfaces are not present here.
SET mc = NEW mdMUMatchcode
CALL mc.SetPathToMatchUpFiles with PathToMatchUpFiles
IF mc.ErrorNone == mc.InitializeDataFiles THEN
PRINT "Initializing DataFiles..."
PRINT "Confirm Init Non-Error: " + mc.GetInitializeErrorString
ELSE
PRINT "Init Error: " + mc.GetInitializeErrorString
ENDIF
2. Retrieve a matchcode
The FindMatchcode function requires the name of an existing matchcode in the current matchcode file.
PRINT "Enter Existing Matchcode to Look Up: "
INPUT MatchcodeName
CALL mc.FindMatchcode WITH MatchcodeName RETURNING errorCode
IF errorCode IS NOT 1 THEN
PRINT "Matchcode can not be OPENED: " + errorCode
END ROUTINE
END IF
3. Begin cycling through every component in the matchcode
Use the GetMatchcodeItemCount function, determine how many components are present in the current matchcode.
FOR MatchcodeItem = 1 TO mc.GetMatchcodeItemCount
CALL mc.GetMatchcodeItem with MatchcodeItem RETURNING mcComp
4. Retrieve the component settings
Begin calling the MatchcodeComponent functions to return the settings for each matchcode.
CALL mcComp.GetLabel RETURNING Label
IF LABEL IS NOT EMPTY THEN PRINT LABEL
CALL mcComp->GetComponentType RETURNING ComponentTypeName
CASE ComponentTypeName OF
1: Type = "Prefix"
2: Type = "First"
3: Type = "Middle"
4: Type = "Last"
5: Type = "Suffix"
6: Type = "Gender"
7: Type = "FirstNickname"
8: Type = "MiddleNickname"
9: Type = "Title"
10: Type = "Company"
11: Type = "CompanyAcronym"
12: Type = "StreetNumber"
13: Type = "StreetPreDir"
14: Type = "StreetName"
15: Type = "StreetSuffix"
16: Type = "StreetPostDir"
17: Type = "POBox"
This is an incomplete list, but with a separate case for each component type, the program determines the component type and displays the name.
Other: Type = "UNDETERMINED"
ENDCASE
PRINT "Type:" + Type
Retrieve and display the size of the current component.
CALL mcComp->GetSize RETURNING Size : Print "Size: " + Size
Repeat the same basic procedure for the Component starting position.
CALL mcComp->GetStart RETURNING ComponentStart
CASE OF ComponentStart
0x08: Start = "Left"
0x10: Start = "Right"
0x20: Start = "Pos:"
0x40: Start = "Word:"
Other: Start = "UNKNOWN"
ENDCASE
PRINT "Start: " + Start
Repeat again for the fuzzy matching rule for the current component. The following is an incomplete list.
CALL mcComp->GetFuzzy RETURNING ComponentFuzzy
CASE OF ComponentFuzzy
0x0000 : Fuzzy = "Exact"
0x0001 : Fuzzy = "SoundEx"
0x0002 : Fuzzy = "Phonetex"
0x0004 : Fuzzy = "Containment"
0x0008 : Fuzzy = "Frequency"
0x0010 : Fuzzy = "FastNear"
0x0020 : Fuzzy = "AccrNear"
0x0040 : Fuzzy = "Vowels"
0x0080 : Fuzzy = "Consonants"
0x0100 : Fuzzy = "Alphas"
0x0200 : Fuzzy = "Numerics"
0x0400 : Fuzzy = "FreqNear"
Other: Fuzzy = "UNKNOWN"
ENDCASE
PRINT "Fuzzy Matching: " + Fuzzy
Repeat again for the blank field matching rules.
Call mcComp->GetFieldMatch RETURNING ComponentField
CASE OF ComponentField
0: FieldMatch = "NO Blank"
0x0100: FieldMatch = "BothBlank"
0x0200: FieldMatch = "OneBlank"
0x0400: FieldMatch = "Initial"
0x0300: FieldMatch = "Both/One"
0x0500: FieldMatch = "Both/Init"
0x0600: FieldMatch = "Init/One"
0x0700: FieldMatch = "Both/One/Init"
Other: FieldMatch = "UNKNOWN"
ENDCASE
PRINT "Blank Field Matching: " + Field
The process for getting the information about which combinations the component belongs to is somewhat more complicated. It involves using a logic AND operation to compare the value returned by the GetCombination function to each of the possible values shown in the Enumerations.
CALL mcComp->GetCombination RETURNING ComponentCombos
SET CombinationsList to empty string
IF ComponentCombos AND 0x0001 THEN
Concatenate "1" TO CombinationsList
ELSE
Concatenate "." TO CombinationsList
END IF
IF ComponentCombos AND 0x0002 THEN
Concatenate "2" TO CombinationsList
ELSE
Concatenate "." TO CombinationsList
END IF
IF ComponentCombos AND 0x0004 THEN
Concatenate "3" TO CombinationsList
ELSE
Concatenate "." TO CombinationsList
END IF
This snippet of code adds a digit for each combination that uses the component. Otherwise, it adds a period to represent a blank. Repeat the above structure for each possible value from the Enumerations.
IF ComponentCombos AND 0x8000 THEN
Concatenate "F" TO CombinationsList
ELSE
Concatenate "." TO CombinationsList
END IF
PRINT "This component is used for these combinations: " + CombinationsList
Repeat for each component in the matchcode.
ENDFOR
Mapping Information#
In addition to information about the components used by a matchcode, the Matchcode interface can also return information about the required mapping for each matchcode. These can be different from the Component mapping types because the component type tells you the data type which will be used to match records, while the matchcode mapping tells the API the format of the incoming data.
One use for this would be for an application to retrieve the information from a matchcode and dynamically create the mappings based on that information.
Use the SetMatchcodeObject (Read-Write | Incremental | Hybrid) function instead of the SetMatchcodeName (Read-Write | Incremental | Hybrid) function to set the matchcode used by the interface.
For example, a few of the many possible matchcode mappings required by a respective component type:
CASE MatchCode.ComponentType.PrefixType: CALL mdMU.AddMapping WITH mm.FullName
CASE MatchCode.ComponentType.FirstType: CALL mdMU.AddMapping WITH mm.FullName
CASE MatchCode.ComponentType.LastType: CALL mdMU.AddMapping WITH mm.FullName
CASE MatchCode.ComponentType.SuffixType: CALL mdMU.AddMapping WITH mm.FullName
CASE MatchCode.ComponentType.FirstNicknameType: CALL mdMU.AddMapping WITH mm.FullName
Keep in mind that these are not all the matchcode mapping targets. For a full list of these targets, see MatchcodeMappingTarget. If a matchcode contains component types that can not be extracted from the database you want to process, that matchcode should not be used for that process.
Methods#
Initializing the Matchcode interface is simpler than the other interfaces, since no License Key is required.
Initialization - Set#
InitializeDataFiles#
- Syntax:
mdMUMatchcodeInitializeDataFiles();
- Returns:
ProgramStatus value
- Return Type:
Enumerated type ProgramStatus
This function opens the needed data files and prepares the MatchUp Object for use.
Before calling this function, you must have successfully called SetPathToMatchUpFiles function.
Check the return value of the GetInitializeErrorString function to retrieve the result of the initialization call. Any result other than “No Error” means the initialization failed for some reason.
This returns a value of the enumerated type ProgramStatus.
Value |
Enumeration |
Description |
---|---|---|
0 |
ErrorNone |
No error - Initialization was successful. |
5 |
ErrorMatchcodeNotFound |
Specified Matchcode does not exist. |
SetPathToMatchUpFiles#
- Syntax:
mdMUMatchcodeSetPathToMatchUpFiles(String FilePath);
- Return Type:
Void
This function accepts a string value indicating the file path to the folder containing the MatchUp Object files.
This function must be called before calling the InitializeDataFiles function.
To provide maximum compatibility with Windows, three files are installed in your Common App Data directory.
Windows Vista and newer
C:\ProgramData\MelissaDATA\MatchUp.
Windows XP
C:\Documents and Settings\All Users\Application Data\Melissa DATA\MatchUp
The location of this directory can be changed by users so please note this, as it can often be the source of issues when running the samples/demos.
Initialization - Get#
GetInitializeErrorString#
- Syntax:
mdMUMatchcodeGetInitializeErrorString();
- Returns:
Initialize Status
- Return Type:
String
This function returns a string to describe the error caused when the InitializeDataFiles function cannot be successfully called.
The possible strings returned by this method are:
“No Error”
“Could not open mdName.dat”
“Matchcode not found”
Creation#
CreateNewMatchcode#
- Syntax:
mdMUMatchcodeCreateNewMatchcode(String MatchcodeName);
- Returns:
0 if error
- Return Type:
Integer
This function creates a new, blank matchcode, represented by the current instance of the Matchcode class.
This function accepts a single character string, the name for the newly created matchcode. This name must be unique. It cannot be used for another matchcode in the same matchcode file.
If the function successfully creates a new matchcode, it returns a non-zero integer value. A zero value means that there was an error, most likely because the matchcode name was already in use.
Retrieval#
These functions are used to retrieve a specific matchcode from the matchcode file.
GetMatchcodeName#
- Syntax:
mdMUMatchcodeGetMatchcodeName();
- Returns:
Name of the current matchcode
- Return Type:
String
This function returns a string value containing the name of the current matchcode, assuming one has been loaded via the FindMatchcode function or created with the CreateNewMatchcode function.
FindMatchcode#
- Syntax:
mdMUMatchcodeFindMatchcode(String MatchcodeName)
- Returns:
1 if valid
- Return Type:
Integer
This function populates the current instance of the Matchcode object with the settings of the matchcode specified in the character string passed to the function. It accepts a single character string as its input parameter. This must be the name of an existing matchcode in the current matchcode file.
If the matchcode name is valid (represents an existing matchcode), this function returns an integer value of 1.
Properties - Set#
These functions help with defining various Matchcode properties.
SetDescription#
- Syntax:
mdMUMatchcodeSetDescription(String Description)
- Return Type:
Void
This function allows you to assign a description to the matchcode.
For example, it may describe what the matchcode evaluates or the type of process the matchcode is used in. When viewing the matchcode in the matchcode editor, the description will be present along with the actual properties of the matchcode.
SetNGram#
- Syntax:
mdMUMatchcodeSetNGram(int NGramValue);
- Return Type:
Void
Sets a matchcode’s N-gram setting.
Since a matchcode may contain multiple components each using a different fuzzy algorithm, many of which require an N-gram setting, the N-gram setting is applied to all relevant components. In other words, the N-gram is set at the matchcode level, not the component level.
Properties - Get#
GetDescription#
- Syntax:
mdMUMatchcodeGetDescription();
- Returns:
Matchcode description
- Return Type:
String
This function retrieves a matchcode’s user-specified description associated with this matchcode.
GetNGram#
- Syntax:
mdMUMatchcodeGetNGram();
- Returns:
N-gram setting value
- Return Type:
Integer
This function retrieves a matchcode’s N-gram setting.
Since a matchcode may contain multiple components each using a different fuzzy algorithm, many of which require an N-gram setting, the N-gram setting is applied to all relevant components. In other words, the N-gram is set at the matchcode level, not the component level.
Component Information#
The functions in this section retrieve the number of components in a given matchcode and retrieve the contents of a specific matchcode component.
GetMatchcodeItem#
- Syntax:
mdMUMatchcodeGetMatchcodeItem(int Value)
- Returns:
MatchcodeComponent object
- Return Type:
MatchcodeComponent
This function returns the MatchcodeComponent object located at the position indicated by the integer value passed to this function.
GetMatchcodeItemCount#
- Syntax:
mdMUMatchcodeGetMatchcodeItemCount();
- Returns:
Number of MatchcodeComponent objects
- Return Type:
Integer
This function returns an integer indicating the number of MatchcodeComponent objects in the current Matchcode object.
Mapping#
Mapping information is different from component information, revealing the order and mapping types that should be used when creating the mappings in any one of the deduper interfaces.
GetMappingItemCount#
- Syntax:
mdMUMatchcodeGetMappingItemCount();
- Returns:
Number of mappings
- Return Type:
Integer
This function returns an integer value showing the number of mappings required for the current matchcode.
Mapping items differ from matchcode components, mostly in how street address lines are represented. Components include the individual address components that are used to construct the match key, such as street number and street name.
The same components are represented by the mapping items for address lines (address1 through address8). No matter what order the components appear in the matchcode, the address lines mapping items appear at the end of the list of mapping items.
GetMappingItemLabel#
- Syntax:
mdMUMatchcodeGetMappingItemLabel (int Value)
- Returns:
Label of mapping item
- Return Type:
String
This function returns a character string containing the label, if any, of the mapping item specified by an integer value.
If the specified mapping item does not have a label, this function returns the name of the mapping item type.
GetMappingItemType#
- Syntax:
mdMUMatchcodeGetMappingItemType (int Value)
- Returns:
MatchcodeMappingTarget value
- Return Type:
MatchcodeMappingTarget Enumaration
This function returns the specific type of a mapping item specified by an integer value.
The return value is a variable of type MatchcodeMappingTarget that indicates the type of mapping item found at the position indicated by the integer value passed to this function.
For a list of these enumerations, see MatchcodeMappingTarget Enumerations.
Change Settings#
The methods in this section set the values for the various settings of a matchcode component object. They can be used to construct new matchcode components when adding them to a matchcode, or to change the settings of an existing component. Every function in this section requires a variable based on the mdMatchcodeComponent class.
SetCombination#
- Syntax:
mdMUMatchcodeComponentSetCombination(MatchcodeCombination Combination);
- Return Type:
Void
SetComponentType#
- Syntax:
mdMUMatchcodeComponentSetComponentType(MatchcodeComponentType ComponentType);
- Return Type:
Void
SetFieldMatch#
- Syntax:
mdMUMatchcodeComponentSetFieldMatch(MatchcodeFieldMatch FieldMatch);
- Return Type:
Void
SetFuzzy#
- Syntax:
mdMUMatchcodeComponentSetFuzzy(MatchcodeFuzzy Fuzzy);
- Return Type:
Void
The function selects the matching algorithm used when comparing this MatchcodeComponent. It accepts an enumerated value of the type MatchcodeFuzzy. For a list of these enumerations, see Matchcode Fuzzy Enumerations. See Matchcode Component Properties for more information on the various matching strategies.
SetLabel#
- Syntax:
mdMUMatchcodeComponentSetLabel(String Label);
- Return Type:
Void
SetNear#
- Syntax:
mdMUMatchcodeComponentSetNear(int Nearness);
- Return Type:
Void
SetNearDbl#
- Syntax:
mdMUMatchcodeComponentSetNearDbl(double Percentage);
- Return Type:
Void
This function sets the minimum percentage of similarity which will return a match between two strings when the SetFuzzy function is set to any of the following:
Proximity
N-Gram
Jaro
Jaro Winkler
LCS
Needleman
MDKeyboard
Smith Waterman
Dice’s Coefficient
Jaccard
Overlap Coefficient
DoubleMetaphone algorithm.
The double value from 100 to 0 sets the minimum threshold percent similarity between two keys which will be considered a match when one of the NearDbl matching strategies is selected with the SetFuzzy function.
SetSize#
- Syntax:
mdMUMatchcodeComponentSetSize(int Size);
- Return Type:
Void
SetStart#
- Syntax:
mdMUMatchcodeComponentSetStart(MatchcodeStart Start);
- Return Type:
Void
SetStartPos#
- Syntax:
mdMUMatchcodeComponentSetStartPos(int Position);
- Return Type:
Void
SetSwap#
- Syntax:
mdMUMatchcodeComponentSetSwap(MatchcodeSwap Swap);
- Return Type:
Void
Example:
mcComp2->SetSwap(mdMUMatchcodeComponent::MatchcodeSwap::SwapA);
mcComp3->SetSwap(mdMUMatchcodeComponent::MatchcodeSwap::SwapA);
mcComp4->SetSwap((mdMUMatchcodeComponent::MatchcodeSwap) (mdMUMatchcodeComponent::SwapB | mdMUMatchcodeComponent::BothB));
mcComp5->SetSwap((mdMUMatchcodeComponent::MatchcodeSwap) (mdMUMatchcodeComponent::SwapB | mdMUMatchcodeComponent::BothB));
joe@work,JoeB@home
to JoeB@home,MaryB@home
.John,Smith
to Smith,John
- but not John,Smith
to Smith,Mary
.SetTrim#
- Syntax:
mdMUMatchcodeComponentSetTrim(MatchcodeTrim Trim);
- Return Type:
Void
SetWordCount#
- Syntax:
mdMUMatchcodeComponentSetWordCount(int Count);
- Return Type:
Void
Read Settings#
GetCombination#
- Syntax:
mdMUMatchcodeComponentGetCombination();
- Returns:
MatchcodeCombination value
- Return Type:
MatchcodeCombination
This function shows which combinations in the current matchcode will use this component. It returns an enumerated value of the type MatchcodeCombination.
For a list of these enumerations, see MatchcodeCombination Enumerations.
These selections are not mutually exclusive. In order to determine which settings are being used, you will need to use a logical AND operation to check the return value against each of the above values.
Some languages, such as C++, do not easily handle using logical operation on enumerations. In these cases, it may be necessary to cast the return values as an integer before using the AND operation to check the values.
GetComponentType#
- Syntax:
mdMUMatchcodeComponentGetComponentType();
- Returns:
Type of MatchcodeComponent object
- Return Type:
MatchcodeComponentType
This function returns the component type of the current MatchcodeComponent object.
The return value for this function is an enumerated value of the type MatchcodeComponentType.
For a list of these enumerations, see Matchcode Component Type Enumerations.
GetFieldMatch#
- Syntax:
mdMUMatchcodeComponentGetFieldMatch();
- Returns:
MatchcodeFieldMatch value
- Return Type:
MatchcodeFieldMatch
This function returns an enumerated value of the type MatchcodeFieldMatch, which determines how MatchUp Object handles blank or partial fields when applying a matchcode.
For a list of these enumerations, see MatchcodeFieldMatch Enumerations.
These selections are not mutually exclusive. In order to determine which settings are being used, you will need to use a logical AND operation to check the return value against each of the above values.
Some languages, such as C++, do not easily handle using logical operation on enumerations. In these cases, it may be necessary to cast the return values as an integer before using the AND operation to check the values.
For more information, see Blank Field Matching.
GetFuzzy#
- Syntax:
mdMUMatchcodeComponentGetFuzzy();
- Returns:
MatchcodeFuzzy value
- Return Type:
MatchcodeFuzzy
This function returns an enumerated value of the type MatchcodeFuzzy used when comparing this MatchcodeComponent.
For a list of these enumerations, see MatchcodeFuzzy Enumerations.
For more information, see Matchcode Component Properties.
GetLabel#
- Syntax:
mdMUMatchcodeComponentGetLabel();
- Returns:
Label of MatchcodeComponent object
- Return Type:
String
This function returns the label, if any, of the current MatchcodeComponent object.
Not all components accept a label. For example, none of the street address components (Street number, street name, and so on) use a label because they are not used for mapping.
Components that are not assigned a label will return the name of their component type.
GetNear#
- Syntax:
mdMUMatchcodeComponentGetNear();
- Returns:
Degree of precision, value from 1 to 4
- Return Type:
Integer
This function returns the degree of precision used when the SetFuzzy function is set to Fast Near, Accurate Near, or Frequency Near.
The integer value from 1 to 4 shows how many differences are allowed before two keys are no longer considered a match when one of the Near matching strategies is selected with the SetFuzzy function.
GetNearDbl#
- Syntax:
mdMUMatchcodeComponentGetNearDbl();
- Returns:
Minimum percentage of similarity, value from 100 to 0
- Return Type:
Double
This function returns the minimum percentage of similarity which will return a match between two strings when the SetFuzzy function is set to Proximity, N-Gram, Jaro, Jaro Winkler, LCS, Needleman, MDKeyboard, Smith Waterman, Dice’s Coefficient, Jaccard, Overlap Coefficient, or DoubleMetaphone algorithm.
The double value from 100 to 0 shows the minimum threshold percent similarity between two keys which will be considered a match when one of the NearDbl matching strategies is selected with the SetFuzzy function.
GetSize#
- Syntax:
mdMUMatchcodeComponentGetSize();
- Returns:
The number of characters used from the related field
- Return Type:
Integer
This function returns how many characters from the source data will be used by the current MatchcodeComponent.
This integer value shows the number of characters that this component will use from the related field from each record. If the field is longer than this value, the data will be truncated. If the field is shorter, it will be padded with spaces.
Size is only applied to a piece of data after all other component properties have been considered.
GetStart#
- Syntax:
mdMUMatchcodeComponentGetStart();
- Returns:
MatchcodeStart value
- Return Type:
MatchcodeStart
This function returns an enumerated value of the type MatchcodeStart that shows where MatchUp Object starts counting when applying the component size.
For a list of these enumerations, see MatchcodeStart Enumerations.
If the selected value is either StartAtPos or StartAtWord, the application will need to call the GetStartPos function to discover what starting word or character position is being used.
GetStartPos#
- Syntax:
mdMUMatchcodeComponentGetStartPos();
- Returns:
Character position or word used as the starting point
- Return Type:
Integer
This functions returns the specific character position or word used as the starting point, when the SetStart function is set to Position or Word.
It will return an integer value when the SetStart function has been set to either StartAtPos or StartAtWord.
It returns either the character position or the word where MatchUp Object starts counting when adding a field to a match key.
GetSwap#
- Syntax:
mdMUMatchcodeComponentGetMatchcodeSwap();
- Returns:
MatchcodeSwap value
- Return Type:
MatchcodeSwap
This function shows which swap pairs in the current matchcode will use this component. It accepts an enumerated value of the type MatchcodeSwap.
For a list of these enumerations, see MatchcodeSwap Enumerations.
These selections are not mutually exclusive. In order to determine which settings are being used, you will need to use a logical AND operation to check the return value against each of the above values.
Some languages, such as C++, do not easily handle using logical operation on enumerations. In these cases, it may be necessary to cast the return values as an integer before using the AND operation to check the values.
GetTrim#
- Syntax:
mdMUMatchcodeComponentGetTrim();
- Returns:
MatchcodeTrim value
- Return Type:
MatchcodeTrim
This function returns an enumerated value of the type MatchcodeTrim, showing whether the current matchcode will trim beginning or ending spaces from the data before performing other operations upon it.
For a list of these enumerations, see MatchcodeTrim Enumerations.
For most applications, this function will return the value AllTrim, which trims excess blank spaces from both the start and end of a field before adding to a match key.
GetWordCount#
- Syntax:
mdMUMatchcodeComponentGetWordCount();
- Returns:
Maximum number of words
- Return Type:
String
This function returns the maximum number of words used by the current MatchcodeComponent object.
The maximum number of words offers further control over the amount of data used by each component. If this function is set to 1, then MatchUp Object will take every character up to, but not including, the first space.
If the first word is shorter than the value passed to the SetSize function, then the data will still be truncated at that character, regardless of the setting returned by this function.
Modification#
These functions add, insert, update, or delete matchcode components from the current Matchcode object.
AddMatchcodeItem#
- Syntax:
mdMUMatchcodeAddMatchcodeItem(MatchcodeComponent Component);
- Returns:
MatchcodeStatus value
- Return Type:
MatchcodeStatus
This function accepts a MatchcodeComponent object as its only argument and adds this component as the last component for this matchcode.
To add a component at any position other than the last component, use the InsertMatchcodeItem function instead. To modify an existing matchcode, use the ChangeMatchcodeItem function.
This function returns an enumerated value of the type MatchcodeStatus that indicates if the component was successfully added to the matchcode and, if not, the reason for the error.
For a list of these enumerations, see MatchcodeStatus Enumerations.
ChangeMatchcodeItem#
- Syntax:
mdMUMatchcodeChangeMatchcodeItem(MatchcodeComponent Component, int Position)
- Returns:
MatchcodeStatus value
- Return Type:
MatchcodeStatus
This function replaces an existing MatchcodeComponent object at a specific position in the component order of the current Matchcode object with a modified or new component.
This function accepts two arguments: the MatchcodeComponent object to be added; and an integer value indicating the position where the component is to be replaced. The integer value can be from one to the number of components currently stored in the current Matchcode object.
This function returns an enumerated value of the type MatchcodeStatus that indicates if the component was successfully added to the matchcode and, if not, then the reason for the error.
For a list of these enumerations, see MatchcodeStatus Enumerations.
DeleteMatchcodeItem#
- Syntax:
mdMUMatchcodeDeleteMatchcodeItem(int Position)
- Returns:
MatchcodeStatus value
- Return Type:
MatchcodeStatus
This function removes a specific MatchcodeComponent object from a Matchcode object.
This function accepts a single argument: the integer value indicating the position where the component is to be deleted. The integer value can be from one to the number of components currently stored in the current Matchcode object.
This function returns an enumerated value of the type MatchcodeStatus that indicates if the component was successfully added to the matchcode and, if not, then the reason for the error.
For a list of these enumerations, see MatchcodeStatus Enumerations.
InsertMatchcodeItem#
- Syntax:
mdMUMatchcodeInsertMatchcodeItem(MatchcodeComponent Component, int Position)
- Returns:
MatchcodeStatus value
- Return Type:
MatchcodeStatus
This function adds a new MatchcodeComponent object to the current matchcode in any position other than the very last.
This function accepts two arguments: the MatchcodeComponent object to be added; and an integer value indicating the position where the component is to be inserted. The integer value can be from one to the number of components currently stored in the current Matchcode object.
This function returns an enumerated value of the type MatchcodeStatus that indicates if the component was successfully added to the matchcode and, if not, then the reason for the error.
For a list of these enumerations, see MatchcodeStatus Enumerations
Saving#
These functions save changes to the current matchcode file, either back to the original default file or to a new file.
DeleteMatchcode#
- Syntax:
mdMUMatchcodeDeleteMatchcode();
- Returns:
0 if there's error
- Return Type:
Integer
This function deletes a matchcode.
Calling this function will permanently remove the matchcode from the MatchUp mdMatchUp.mc matchcode database.
RenameMatchcode#
- Syntax:
mdMUMatchcodeRenameMatchcode(String MatchcodeName)
- Returns:
0 if there's error
- Return Type:
Integer
This funciton changes a matchcode’s name.
This function can be used when you want to edit an existing matchcode and have that new functionality reflected in the matchcode name.
Save#
- Syntax:
mdMUMatchcodeSave();
- Return Type:
Void
This function saves the current matchcode to the default matchcode file used by MatchUp Object. If an existing matchcode was edited, then that matchcode will be overwritten with the current Matchcode object. If the current Matchcode object was newly created using the CreateNewMatchcode function, then this matchcode will be added to the current file.
SaveToFile#
- Syntax:
mdMUMatchcodeSaveToFile(String FilePath)
- Return Type:
Void
This function saves the current matchcode to a new copy of the current matchcode file in a location specified by a character string that contains a valid path to an existing directory and valid filename.
Matchcode Editor#
The Matchcode Editor is a Windows-based application that creates and edits the matchcode file used by MatchUp Object. This program allows you to customize copies of the original matchcodes that ship with MatchUp Object or create new matchcodes from scratch.
If you have ever used Melissa Data’s MatchUp software for Windows, you will already be familiar with the functionality of the Matchcode Editor.
Starting the Matchcode Editor#
The default installation location for the MatchUpEditor executable is:
C:\Program Files\Melissa DATA\DQT\MatchUp\
To run, specify the location of the mdMatchUp.mc matchcode file as a command-line parameter (ie, in the program’s shortcut), as in:
MatchUpEditor.exe “C:\programdata\Melissa Data\MatchUp”
Or you can alter the shortcut’s “Start in” location so that it starts in the same location as the mdMatchup.mc file. You can also run it directly from the Start menu, if you chose that as an installation option.
Interface#
The Matchcode Editor screen is divided into three distinct sections: a list of available matchcodes in the matchcode database; the properties of the selected Matchcode; and a description of the Matching Rules for the selected matchcode.
Matchcode Name#
The top portion of the screen contains a drop-down menu of all the matchcodes found in the current matchcode file.
Below this is a Description: section that contains the description for the currently selected matchcode.
To the right are the Create Matchcode, Remove Matchcode, Copy Matchcode, and Rename Matchcode buttons with which you can create and modify matchcodes. Copying a current matchcode is often the best starting point for creating new matchcodes.
Create Matchcode
Click the Create Matchcode button.
Type a name for the new matchcode in the Matchcode Name dialog box and click OK.
The Matchcode editor presents a blank matchcode screen with no components.
Begin adding components. Once a Data Type is selected, click anywhere in the window, or press the Enter key. This will input that data type, and have another row appear that may be edited.
Remove Matchcode
Select the matchcode to be deleted in the Matchcode Name: drop-down menu.
Click the Remove Matchcode button.
Click Yes in the Remove Matchcode dialog box to confirm the deletion.
Copy Matchcode
Select the matchcode to be copied in the Matchcode Name: drop-down menu.
Click the Copy Matchcode button.
Type a name for the new matchcode in the Matchcode Name dialog box and click OK.
Rename Matchcode
Select the matchcode to be renamed in the Matchcode Name: drop-down menu.
Click the Rename Matchcode button.
Type a new name for the matchcode in the Matchcode Name dialog box and click OK.
Matchcode List#
Below the matchcode name is the Matchcode List section, a list of components used by the currently selected matchcode.
This list shows the basic settings for each combination.
Field |
Description |
---|---|
Data Type |
The type of data used by this component. See Matchcode Components for a list of all available types. |
Label |
(Optional) A description of the data found in this component. Not all component types use this field. Max size of description is 20 characters. |
Size |
The maximum number of characters from this component to be used by this matchcode. If the data has fewer characters, it will be padded with spaces. |
Start |
Sets where the current matchcode starts counting when selecting characters to use: the left (beginning); the right (end); a specific character position; or a specific word. |
Fuzzy |
The type of matching to be used on the selected data type. |
Distance |
Context sensitive, sets a range for specific data types or fuzzy matching. |
Short/Empty |
These settings control matching between incomplete or empty fields. |
Swap |
Swap matching is the ability to compare one component to another component. |
For more information these settings, see Component Properties.
Following these fields, to the right side of the list, there is a grid of editable check boxes that shows the combinations in which component is used.
Add Component
Click the down arrow to open the drop-down menu named Select Data Type. (There will always be a Select Data Type below the last defined matchcode component.)
Select the desired data type from the drop-down menu.
The new component is added as the last component in the matchcode.
Select the settings for the new component by clicking the field you want to change. See the sections below for more information on the controls within this dialog.
Remove Component
Click the down arrow to open the drop-down menu of the component to be deleted.
Select Remove Component from the top of the list in the drop-down menu.
Once selected, click anywhere in the window, or press the Enter key. This will confirm the removal, and remove the component from the matchcode list.
Change Component Order
Click and drag the name of the component.
Drag the component to the new position.
For more information on how combinations of components are used, see Component Combinations.
Matching Strategies#
This setting controls what criteria the matchcode will use to determine how to compare this component of one match key to another match key.
Fuzzy Matching Strategies
Phonetex
Soundex
Containment
Frequency
Fast Near
Accurate Near
Frequency Near
Vowels Only
Consonants Only
Alphas Only
Numerics Only
Jaro
Jaro-Winkler
n-Gram
Needleman-Wunch
Smith-Waterman-Gotoh
Dice’s Coefficient
Jaccard Similarity Coefficient
Overlap Coefficient
Longest Common Substring
Double MetaPhone
Short-Empty Settings#
This setting controls whether blank or incomplete fields are considered matches to populated fields or other blank fields. These settings are not exclusive, so two or all three may be selected at one time.
Match if both fields are blank
If two records have the same empty component, that component will be counted as matching.
Match if one field is blank
Allows matching missing data with the full data. For example, “Smith” matches “John Smith.” However, two records with the same component missing will not match.
Match initial to full field
Allows matching abbreviated data with the full data. For example, “J Smith” matches “John Smith.”
Swap Match Pairs#
The Swap Match section selects which combination belong to which swap pairs.
Swap Matching allows matching “John Smith” with “Smith John.”
The components must be of the same size and should have the same set of matching options (for example, one can’t use Phonetex the other SoundEx). Up to eight pairs, A through H, can be defined.
For more information on using swap pairs, see Swap Matching Uses.
We recommend using the Matchcode Interface to configure swap pairs.
If coding matchcodes, see the Matchode Editor interface section for swap pair syntax.
Swap Pair Configuration
Click the Swapping… button.
The Matchcode Swap Pairs dialog will open.
First select the pair tab you desire to edit. Pair A is selected by default.
Select the two components that will be used for this swap pair by selecting them in their respective drop down menus.
Then select the swapping rule:
Both components must match
The contents of both components must be a match according to fuzzy matching strategy in use for both components. “John Smith matches “Smith John” but not “Smith <blank>.”
Either component can match
At least one of the components must match. “John Smith matches both “Smith John” and “Smith <blank>.”
Click OK.
Combinations#
Use these check boxes to select which of the 16 possible combinations will use this component.
It is easier to visualize the effects of these boxes if you look at the list of matchcode components as well:
Matching Rules#
This section details the matching rules, depending on your selections under the Matchcode List.
Enumerations#
MatchcodeCombination#
These Matchcode Combination enumerations are used by the Matchcode SetCombination function.
Name |
Value |
---|---|
Combo1 |
0x0001 |
Combo2 |
0x0002 |
Combo3 |
0x0004 |
Combo4 |
0x0008 |
Combo5 |
0x0010 |
Combo6 |
0x0020 |
Combo7 |
0x0040 |
Combo8 |
0x0080 |
Combo9 |
0x0100 |
Combo10 |
0x0200 |
Combo11 |
0x0400 |
Combo12 |
0x0800 |
Combo13 |
0x1000 |
Combo14 |
0x2000 |
Combo15 |
0x4000 |
Combo16 |
0x8000 |
MatchcodeComponentType#
These Matchcode Component Type enumerations are used by the GetComponentType, and SetComponentType methods.
Name |
Value |
---|---|
Prefix |
1 |
First |
2 |
Middle |
3 |
Last |
4 |
Suffix |
5 |
Gender |
6 |
FirstNickname |
7 |
MiddleNickname |
8 |
Title |
9 |
Company |
10 |
CompanyAcronym |
11 |
StreetNumber |
12 |
StreetPreDir |
13 |
StreetName |
14 |
StreetSuffix |
15 |
StreetPostDir |
16 |
POBox |
17 |
Secondary |
18 |
Address |
19 |
City |
20 |
State |
21 |
Zip9 |
22 |
Zip5 |
23 |
Zip4 |
24 |
Country |
28 |
CanadianPC |
29 |
UKCity |
30 |
UKCounty |
31 |
UKPC |
32 |
Phone |
33 |
34 |
|
CreditCard |
35 |
General |
36 |
GeoDistance |
38 |
Date |
39 |
Numeric |
40 |
MatchcodeFieldMatch#
These Matchcode Field Match enumerations are used by the Matchcode SetFieldMatch function.
Name |
Value |
---|---|
NoFieldMatch |
0x0000 |
BothBlankMatch |
0x0100 |
OneBlankMatch |
0x0200 |
InitialMatch |
0x0400 |
MatchcodeFuzzy#
These Matchcode Fuzzy enumerations are used by the Matchcode SetFuzzy function.
Name |
Value |
---|---|
NoFieldMatch |
0x0000 |
BothBlankMatch |
0x0100 |
OneBlankMatch |
0x0200 |
InitialMatch |
0x0400 |
MatchcodeMapping#
These Matchcode Mapping enumerations are used by the Read-Write interface AddMapping, Incremental interface AddMapping, and Hybrid interface AddMapping methods.
Name |
Value |
---|---|
Prefix |
1 |
Gender |
2 |
First |
3 |
MixedFirst |
4 |
Middle |
5 |
Last |
6 |
MixedLast |
7 |
Suffix |
8 |
FullName |
9 |
InverseName |
10 |
GovernmentInverseName |
11 |
Title |
12 |
Company |
13 |
Address |
14 |
City |
15 |
State |
16 |
Zip9 |
17 |
Zip5 |
18 |
Zip4 |
19 |
CityStZip |
20 |
Country |
21 |
CanadianPostalCode |
22 |
UKCity |
23 |
UKCounty |
24 |
UKPostcode |
25 |
UKCityCountyPC |
26 |
Phone |
27 |
28 |
|
CreditCard |
29 |
General |
30 |
Latitude |
40 |
Longitude |
41 |
Date |
42 |
Numeric |
43 |
Address1 |
250 |
Address2 |
251 |
Address2 |
252 |
MatchcodeMappingTarget#
These Matchcode Mapping Target enumerations are used by the GetMappingItemType function.
Name |
Value |
---|---|
PrefixType |
1 |
FirstType |
2 |
MiddleType |
3 |
LastType |
4 |
SuffixType |
5 |
GenderType |
6 |
FirstNicknameType |
7 |
MiddleNicknameType |
8 |
TitleType |
9 |
CompanyType |
10 |
CompanyAcronymType |
11 |
AddressType |
12 |
CityType |
13 |
StateType |
14 |
Zip9Type |
15 |
Zip5Type |
16 |
Zip4Type |
17 |
CountryType |
18 |
CanadianPCType |
19 |
UKCityType |
20 |
UKCountyType |
21 |
UKPCType |
22 |
PhoneType |
23 |
EMailType |
24 |
CreditCardType |
25 |
GeneralType |
26 |
Address1Type |
28 |
Address2Type |
29 |
Address3Type |
30 |
LatitudeType |
34 |
LongitudeType |
35 |
DateType |
36 |
NumericType |
37 |
MatchcodeStart#
These Matchcode enumerations are used by the Matchcode SetStart function.
Name |
Value |
Description |
---|---|---|
Left |
0x08 |
The default. MatchUp Object Global starts counting from the beginning of the field. |
Right |
0x10 |
MatchUp Object Global starts counting backwards from the end of the field. |
StartAtPos |
0x20 |
MatchUp Object Global starts counting from the character position indicated by the SetStartPos function. |
StartAtWord |
0x40 |
MatchUp Object Global starts counting from the word indicated by the SetStartPos function. |
MatchcodeStatus#
These Matchcode Status enumerations are used by the AddMatchcodeItem, InsertMatchcodeItem, ChangeMatchcodeItem, and DeleteMatchcodeItem methods.
Name |
Value |
---|---|
MCNoError |
0 |
MCFirstComponentFuzzyOptions |
1 |
MCFirstComponentNoSwapPair |
2 |
MCDataTypeNoFuzzy |
3 |
MCComponentFuzzyIncorrectSize |
4 |
MCDataTypeNoMaximumNumberWords |
5 |
MCDataTypeNoStartRightOrWordOrPos |
6 |
MCIncorrectMaximumNumberWords |
7 |
MCNearOutOfRange |
8 |
MCFirstComponentNotUsedInEveryCondition |
9 |
MCCannotChangeFirstComponent |
10 |
MCInvalidSwapPair |
11 |
MCInvalidMatchcodeComponentType |
13 |
MatchcodeSwap#
These Matchcode Field Match enumerations are used by the Matchcode SetSwap function.
Name |
Value |
---|---|
NoSwap |
0x00 |
SwapA |
0x01 |
SwapB |
0x02 |
SwapC |
0x04 |
SwapD |
0x08 |
SwapE |
0x10 |
SwapF |
0x20 |
SwapG |
0x40 |
SwapH |
0x80 |
BothA |
0x0100 |
BothB |
0x0200 |
BothC |
0x0400 |
BothD |
0x0800 |
BothE |
0x1000 |
BothF |
0x2000 |
BothG |
0x4000 |
BothH |
0x8000 |
MatchcodeTrim#
These Matchcode Trim enumerations are used by the Matchcode SetTrim function.
Name |
Value |
---|---|
LeftTrim |
0x02 |
RightTrim |
0x04 |
AllTrim |
0x06 |
Result Codes#
For details of all result codes please visit here
MS - Match Status#
Code |
Short Description |
Long Description |
---|---|---|
|
Unique Record |
The record did not match any other records. |
|
Has Duplicates |
The record matched other records and was tagged as the output record. |
|
Is Duplicate |
The record matched other records and was tagged as a duplicate. |
|
Record Suppressed |
ETL Only. The source record was suppressed. |
|
Record Not Intersected |
ETL Only. The source record was not intersected. |
|
Match: Rule 1 |
Records were matched by matchcode combination 1. |
|
Match: Rule 2 |
Records were matched by matchcode combination 2. |
|
Match: Rule 3 |
Records were matched by matchcode combination 3. |
|
Match: Rule 4 |
Records were matched by matchcode combination 4. |
|
Match: Rule 5 |
Records were matched by matchcode combination 5. |
|
Match: Rule 6 |
Records were matched by matchcode combination 6. |
|
Match: Rule 7 |
Records were matched by matchcode combination 7. |
|
Match: Rule 8 |
Records were matched by matchcode combination 8. |
|
Match: Rule 9 |
Records were matched by matchcode combination 9. |
|
Match: Rule 10 |
Records were matched by matchcode combination 10. |
|
Match: Rule 11 |
Records were matched by matchcode combination 11. |
|
Match: Rule 12 |
Records were matched by matchcode combination 12. |
|
Match: Rule 13 |
Records were matched by matchcode combination 13. |
|
Match: Rule 14 |
Records were matched by matchcode combination 14. |
|
Match: Rule 15 |
Records were matched by matchcode combination 15. |
|
Match: Rule 16 |
Records were matched by matchcode combination 16. |
|
Suppressor Record |
ETL Only. The lookup record suppressed a source record. |
|
Intersector Record |
ETL Only. The lookup record intersected a source record. |