Quickstart#
Introduction#
The Profiler Object® is a data quality tool that provides statistical analysis and assessment of your data quality needs for consistency, uniqueness and correctness.
Sample Code#
Profiler Object is compatible with multiple coding languages across different systems. The table below will link you to the sample code for each language hosted in the github repositories.
Language |
System |
Repository |
---|---|---|
C# .NET |
Windows |
|
Linux |
||
C++ |
Windows |
|
Linux |
||
Java |
Windows |
|
Linux |
||
Python3 |
Windows |
|
Linux |
||
How To Get Data#
Use the Melissa Updater to download the data files using the manifest named profiler_data.
Getting Started#
Basic Flow of Actions#
To use the Profiler Object, you can give it a license, data path, and input data. It will then analyze the data and store relevant information about it. It can also return result codes based on the results of the input.
This is the flow of how Profiler Object is usually implemented:
The Reference Guide goes into detail on every method available with the Profiler Object. All the Result Codes and descriptions are also in the reference guide.
1. Initialize Profiler Object#
Start by creating an instance of the Melissa Profiler Object.
Example Implementation:
// Create instance of Melissa Profiler Object
public mdProfiler mdProfilerObj = new mdProfiler();
// Create instance of Melissa Profiler Object
mdProfiler* mdProfilerObj = new mdProfiler;
// Create instance of Melissa Profiler Object
mdProfiler mdProfilerObj = new mdProfiler();
# Create instance of Melissa Profiler Object
md_profiler_obj = mdProfiler_pythoncode.mdProfiler()
2. Set a License#
To set a license, either configure the environmental variable for the license or use the method SetLicenseString.
Example Implementation:
// Set license string
mdProfilerObj.SetLicenseString(MELISSA_LICENSE_STRING);
// Set license string
mdProfilerObj->SetLicenseString(MELISSA_LICENSE_STRING.c_str());
// Set license string
mdProfilerObj.SetLicenseString(MELISSA_LICENSE_STRING);
# Set license string
md_profiler_obj.SetLicenseString(MELISSA_LICENSE_STRING)
To see when the license will expire, use the method GetLicenseExpirationDate.
Example Implementation:
Console.WriteLine($"Expiration Date: {mdProfilerObj.GetLicenseExpirationDate()}");
cout << "Expiration Date: " + string(mdProfilerObj->GetLicenseExpirationDate()) << endl;
System.out.println("Expiration Date: " + mdProfilerObj.GetLicenseExpirationDate());
print(f"Expiration Date: {md_profiler_obj.GetLicenseExpirationDate()}")
3. Initialize Data Files#
To set the path for the data files, use the method SetPathToProfilerDataFiles.
Use the method InitializeDataFiles to setup the data files.
ProgramStatus can be used to store the result from InitializeDataFiles to ensure it worked as expected.
Example Implementation:
// Set path to data files (.dat, etc)
mdProfilerObj.SetPathToProfilerDataFiles(PATH_TO_DATA_FILES);
mdProfiler.ProgramStatus pStatus = mdProfilerObj.InitializeDataFiles();
if (pStatus != mdProfiler.ProgramStatus.ErrorNone)
{
// Problem during initialization
Console.WriteLine("Failed to Initialize Object.");
Console.WriteLine(pStatus);
return;
}
// Set path to datafiles (.dat, etc)
mdProfilerObj->SetPathToProfilerDataFiles(PATH_TO_DATA_FILES.c_str());
mdProfiler::ProgramStatus pStatus = mdProfilerObj->InitializeDataFiles();
if (pStatus != mdProfiler::ProgramStatus::ErrorNone)
{
// Problem during initialization
cout << "Failed to Initialize Object." << endl;
cout << pStatus << endl;
return;
}
// Set path to data files (.dat, etc)
mdProfilerObj.SetPathToProfilerDataFiles(PATH_TO_DATA_FILES);
mdProfiler.ProgramStatus pStatus = mdProfilerObj.InitializeDataFiles();
if (pStatus != mdProfiler.ProgramStatus.ErrorNone) {
// Problem during initialization
System.out.println("Failed to Initialize Object.");
System.out.println(pStatus);
return;
}
# Set path to data files (.dat, etc)
md_profiler_obj.SetPathToProfilerDataFiles(PATH_TO_DATA_FILES)
p_status = md_profiler_obj.InitializeDataFiles()
if (p_status != mdProfiler_pythoncode.ProgramStatus.ErrorNone):
# Problem during initialization
print("Failed to Initialize Object.")
print(p_status)
return
To check at what date the database was updated, use the method GetDatabaseDate.
The method GetBuildNumber gives the development build number of Profiler Object.
Example Implementation:
// If you see a different date than expected, check your license string and either download
// the new data files or use the Melissa Updater program to update your data files.
Console.WriteLine($"DataBase Date: {mdProfilerObj.GetDatabaseDate()}");
// This number should match with file properties of the Melissa Object binary file.
// If TEST appears with the build number, there may be a license key issue.
Console.WriteLine($"Object Version: {mdProfilerObj.GetBuildNumber()}\n");
// If you see a different date than expected, check your license string and either
// download the new data files or use the Melissa Updater program to update your data files.
cout << "DataBase Date: " + string(mdProfilerObj->GetDatabaseDate()) << endl;
// This number should match with file properties of the Melissa Object binary file.
// If TEST appears with the build number, there may be a license key issue.
cout << "Object Version: " + string(mdProfilerObj->GetBuildNumber()) << endl;
// If you see a different date than expected, check your license string and either
// download the new data files or use the Melissa Updater program to update your data files.
System.out.println("DataBase Date: " + mdProfilerObj.GetDatabaseDate());
// This number should match with file properties of the Melissa Object binary file.
// If TEST appears with the build number, there may be a license key issue.
System.out.println("Object Version: " + mdProfilerObj.GetBuildNumber());
# If you see a different date than expected, check your license string and either download
# the new data files or use the Melissa Updater program to update your data files.
print(f"DataBase Date: {md_profiler_obj.GetDatabaseDate()}")
# This number should match with file properties of the Melissa Object binary file.
# If TEST appears with the build number, there may be a license key issue.
print(f"Object Version: {md_profiler_obj.GetBuildNumber()}\n")
The method GetInitializeErrorString can get the status on errors in initialization. This can help determine if the code should continue running.
Example Implementation:
bool shouldContinueRunning = true;
if (mdProfilerObj.GetInitializeErrorString() != "No error.")
{
shouldContinueRunning = false;
}
bool shouldContinueRunning = true;
if (string(mdProfilerObj->GetInitializeErrorString()) != "No error.")
{
shouldContinueRunning = false;
}
Boolean shouldContinueRunning = true;
if (!mdProfilerObj.GetInitializeErrorString().equals("No error."))
shouldContinueRunning = false;
should_continue_running = True
if md_profiler_obj.GetInitializeErrorString() != "No error.":
should_continue_running = False
4. Configure Profiler Object#
The following methods configure what information Profiler Object will look for:
SetFileName: Set name of the profiling output file
SetAppendMode: Set the mode for how the output file needs to be handled
SetSortAnalysis: Enable the sortation analysis
SetMatchUpAnalysis: Enable duplicate record detection
SetRightFielderAnalysis: Enable inferred data type analysis
SetDataAggregationAnalysis: Enable all forms of aggregation and value gathering
Example Implementation:
// These are the configuarble pieces of the Profiler Object.
mdProfilerObj.SetFileName("testFile.prf");
mdProfilerObj.SetAppendMode(mdProfiler.AppendMode.Overwrite);
mdProfiler.SetSortAnalysis(1);
mdProfiler.SetMatchupAnalysis(1);
mdProfiler.SetRightFielderAnalysis(1);
mdProfiler.SetDataAggregationAnalysis(1);
// These are the configuarble pieces of the Profiler Object.
mdProfilerObj->SetFileName(prfFile.c_str());
mdProfilerObj->SetAppendMode(mdProfiler::AppendMode::Overwrite);
mdProfilerObj->SetSortAnalysis(1);
mdProfilerObj->SetMatchupAnalysis(1);
mdProfilerObj->SetRightFielderAnalysis(1);
mdProfilerObj->SetDataAggregationAnalysis(1);
// These are the configuarble pieces of the Profiler Object.
mdProfilerObj.SetFileName("testFile.prf");
mdProfilerObj.SetAppendMode(mdProfiler.AppendMode.Overwrite);
mdProfilerObj.SetSortAnalysis(1);
mdProfilerObj.SetMatchupAnalysis(1);
mdProfilerObj.SetRightFielderAnalysis(1);
mdProfilerObj.SetDataAggregationAnalysis(1);
# These are the configuarble pieces of the Profiler Object.
md_profiler_obj.SetFileName("testFile.prf");
md_profiler_obj.SetAppendMode(mdProfiler_pythoncode.AppendMode.Overwrite);
md_profiler_obj.SetSortAnalysis(1);
md_profiler_obj.SetMatchupAnalysis(1);
md_profiler_obj.SetRightFielderAnalysis(1);
md_profiler_obj.SetDataAggregationAnalysis(1);
5. Add the Profiler Columns#
AddColumn describes a column to be profiled. To successfully profile the data, at least one column should be added.
Example Implementation:
mdProfilerObj.AddColumn("first", mdProfiler.ProfilerColumnType.ColumnTypeVariableUnicodeString, mdProfiler.ProfilerDataType.DataTypeFirstName);
mdProfilerObj.AddColumn("last", mdProfiler.ProfilerColumnType.ColumnTypeVariableUnicodeString, mdProfiler.ProfilerDataType.DataTypeLastName);
mdProfilerObj.AddColumn("address", mdProfiler.ProfilerColumnType.ColumnTypeVariableUnicodeString, mdProfiler.ProfilerDataType.DataTypeAddress);
mdProfilerObj.AddColumn("city", mdProfiler.ProfilerColumnType.ColumnTypeVariableUnicodeString, mdProfiler.ProfilerDataType.DataTypeCity);
mdProfilerObj.AddColumn("state", mdProfiler.ProfilerColumnType.ColumnTypeVariableUnicodeString, mdProfiler.ProfilerDataType.DataTypeStateOrProvince);
mdProfilerObj.AddColumn("zip", mdProfiler.ProfilerColumnType.ColumnTypeVariableUnicodeString, mdProfiler.ProfilerDataType.DataTypeZipOrPostalCode);
mdProfilerObj->AddColumn("first", mdProfiler::ProfilerColumnType::ColumnTypeVariableUnicodeString, mdProfiler::ProfilerDataType::DataTypeFirstName);
mdProfilerObj->AddColumn("last", mdProfiler::ProfilerColumnType::ColumnTypeVariableUnicodeString, mdProfiler::ProfilerDataType::DataTypeLastName);
mdProfilerObj->AddColumn("address", mdProfiler::ProfilerColumnType::ColumnTypeVariableUnicodeString, mdProfiler::ProfilerDataType::DataTypeAddress);
mdProfilerObj->AddColumn("city", mdProfiler::ProfilerColumnType::ColumnTypeVariableUnicodeString, mdProfiler::ProfilerDataType::DataTypeCity);
mdProfilerObj->AddColumn("state", mdProfiler::ProfilerColumnType::ColumnTypeVariableUnicodeString, mdProfiler::ProfilerDataType::DataTypeStateOrProvince);
mdProfilerObj->AddColumn("zip", mdProfiler::ProfilerColumnType::ColumnTypeVariableUnicodeString, mdProfiler::ProfilerDataType::DataTypeZipOrPostalCode);
mdProfilerObj.AddColumn("first", mdProfiler.ProfilerColumnType.ColumnTypeVariableUnicodeString, mdProfiler.ProfilerDataType.DataTypeFirstName);
mdProfilerObj.AddColumn("last", mdProfiler.ProfilerColumnType.ColumnTypeVariableUnicodeString, mdProfiler.ProfilerDataType.DataTypeLastName);
mdProfilerObj.AddColumn("address", mdProfiler.ProfilerColumnType.ColumnTypeVariableUnicodeString, mdProfiler.ProfilerDataType.DataTypeAddress);
mdProfilerObj.AddColumn("city", mdProfiler.ProfilerColumnType.ColumnTypeVariableUnicodeString, mdProfiler.ProfilerDataType.DataTypeCity);
mdProfilerObj.AddColumn("state", mdProfiler.ProfilerColumnType.ColumnTypeVariableUnicodeString, mdProfiler.ProfilerDataType.DataTypeStateOrProvince);
mdProfilerObj.AddColumn("zip", mdProfiler.ProfilerColumnType.ColumnTypeVariableUnicodeString, mdProfiler.ProfilerDataType.DataTypeZipOrPostalCode);
self.md_profiler_obj.AddColumn("first", mdProfiler_pythoncode.ProfilerColumnType.ColumnTypeVariableUnicodeString, mdProfiler_pythoncode.ProfilerDataType.DataTypeFirstName)
self.md_profiler_obj.AddColumn("last", mdProfiler_pythoncode.ProfilerColumnType.ColumnTypeVariableUnicodeString, mdProfiler_pythoncode.ProfilerDataType.DataTypeLastName)
self.md_profiler_obj.AddColumn("address", mdProfiler_pythoncode.ProfilerColumnType.ColumnTypeVariableUnicodeString, mdProfiler_pythoncode.ProfilerDataType.DataTypeAddress)
self.md_profiler_obj.AddColumn("city", mdProfiler_pythoncode.ProfilerColumnType.ColumnTypeVariableUnicodeString, mdProfiler_pythoncode.ProfilerDataType.DataTypeCity)
self.md_profiler_obj.AddColumn("state", mdProfiler_pythoncode.ProfilerColumnType.ColumnTypeVariableUnicodeString, mdProfiler_pythoncode.ProfilerDataType.DataTypeStateOrProvince)
self.md_profiler_obj.AddColumn("zip", mdProfiler_pythoncode.ProfilerColumnType.ColumnTypeVariableUnicodeString, mdProfiler_pythoncode.ProfilerDataType.DataTypeZipOrPostalCode)
6. Set Data To Be Processed#
You must call StartProfiling before you start specifying column and row content. Then set the data to be processed.
Example Implementation:
mdProfilerObj.StartProfiling();
mdProfilerObj.SetColumn("first", COLUMN_VALUE);
mdProfilerObj.SetColumn("last", COLUMN_VALUE);
mdProfilerObj.SetColumn("address", COLUMN_VALUE);
mdProfilerObj.SetColumn("city", COLUMN_VALUE);
mdProfilerObj.SetColumn("state", COLUMN_VALUE);
mdProfilerObj.SetColumn("zip", COLUMN_VALUE);
mdProfilerObj.AddRecord();
mdProfilerObj->StartProfiling();
mdProfilerObj->SetColumn("first", COLUMN_VALUE.c_str());
mdProfilerObj->SetColumn("last", COLUMN_VALUE.c_str());
mdProfilerObj->SetColumn("address", COLUMN_VALUE.c_str());
mdProfilerObj->SetColumn("city", COLUMN_VALUE.c_str());
mdProfilerObj->SetColumn("state", COLUMN_VALUE.c_str());
mdProfilerObj->SetColumn("zip", COLUMN_VALUE.c_str());
mdProfilerObj->AddRecord();
mdProfilerObj.StartProfiling();
mdProfilerObj.SetColumn("first", COLUMN_VALUE);
mdProfilerObj.SetColumn("last", COLUMN_VALUE);
mdProfilerObj.SetColumn("address", COLUMN_VALUE);
mdProfilerObj.SetColumn("city", COLUMN_VALUE);
mdProfilerObj.SetColumn("state", COLUMN_VALUE);
mdProfilerObj.SetColumn("zip", COLUMN_VALUE);
mdProfilerObj.AddRecord();
self.md_profiler_obj.StartProfiling()
self.md_profiler_obj.SetColumn("first", COLUMN_VALUE)
self.md_profiler_obj.SetColumn("last", COLUMN_VALUE)
self.md_profiler_obj.SetColumn("address", COLUMN_VALUE)
self.md_profiler_obj.SetColumn("city", COLUMN_VALUE)
self.md_profiler_obj.SetColumn("state", COLUMN_VALUE)
self.md_profiler_obj.SetColumn("zip", COLUMN_VALUE)
self.md_profiler_obj.AddRecord()
7. Initiate Profiling#
Use the method ProfileData to initiate profiling.
Example Implementation:
// Profile data
mdProfilerObj.ProfileData();
// Profile data
mdProfilerObj->ProfileData();
// Profile data
mdProfilerObj.ProfileData();
# Profile data
md_profiler_obj.ProfileData()
8. Get Profiler Object Information#
The processed data can yield table-based statistics and column-based statistics.
The following methods, and more, can get specific statistics from the processed data:
Example Implementation:
mdProfilerObj.GetTableRecordCount();
mdProfilerObj.GetColumnCount();
mdProfilerObj.GetTableExactMatchDistinctCount();
mdProfilerObj.GetTableExactMatchDupesCount();
mdProfilerObj.GetTableExactMatchLargestGroup();
mdProfilerObj->GetTableRecordCount();
mdProfilerObj->GetColumnCount();
mdProfilerObj->GetTableExactMatchDistinctCount();
mdProfilerObj->GetTableExactMatchDupesCount();
mdProfilerObj->GetTableExactMatchLargestGroup();
mdProfilerObj.GetTableRecordCount();
mdProfilerObj.GetColumnCount();
mdProfilerObj.GetTableExactMatchDistinctCount();
mdProfilerObj.GetTableExactMatchDupesCount();
mdProfilerObj.GetTableExactMatchLargestGroup();
md_profiler_obj.GetTableRecordCount();
md_profiler_obj.GetColumnCount();
md_profiler_obj.GetTableExactMatchDistinctCount();
md_profiler_obj.GetTableExactMatchDupesCount();
md_profiler_obj.GetTableExactMatchLargestGroup();
9. Get the Melissa Result Codes#
To get the result codes, use the method GetResults. This will return all of the result codes stacked together in a single String separated by ‘,’ delimiters.
Example Implementation:
// Get Result Codes
String ResultCodes = mdProfilerObj.GetResults();
// Get Result Codes
string ResultCodes = mdProfilerObj->GetResults();
// Get Result Codes
String resultCodes = mdProfilerObj.GetResults();
# Get Result Codes
result_codes = md_profiler_obj.GetResults()
The following implementation shows one way of interpreting the results by using the method GetResultCodeDescription:
mdProfilerObj.GetResultCodeDescription(RESULT_CODE_STRING, mdProfiler.ResultCdDescOpt.ResultCodeDescriptionLong);
// ResultsCodes explain any issues Profiler Object has with the object.
mdProfilerObj->GetResultCodeDescription(RESULT_CODE_STRING.c_str(), mdProfilerObj->ResultCodeDescriptionLong);
// ResultsCodes explain any issues Profiler Object has with the object.
mdProfilerObj.GetResultCodeDescription(RESULT_CODE_STRING, mdProfiler.ResultCdDescOpt.ResultCodeDescriptionLong);
// ResultsCodes explain any issues Profiler Object has with the object.
md_profiler_obj.GetResultCodeDescription(RESULT_CODE_STRING, mdProfiler_pythoncode.ResultCdDescOpt.ResultCodeDescriptionLong)
# ResultsCodes explain any issues Profiler Object has with the object.