A correspondence table serves as a translation between two statistical classifications. When a correspondence table between two classifications does not yet exist, but both are linked to one or more intermediate classifications through existing correspondence tables, a new correspondence table can be generated automatically.
For the general case, where classifications \(A\) and \(B\) are indirectly linked via one or more
intermediate classifications \(C_1, \dots
,C_k\), the newCorrespondenceTable() function can
automatically generate a new correspondence table.
A special case occurs when a classification \(A\) is updated to a new version \(A^*\) (with the correspondence table \(A:A^*\) assumed to have been created as part of this update), and a correspondence table \(A:B\) between the old version of \(A\) and another classification of interest \(B\) already exists.
Here, the updateCorrespondenceTable() function can be
used to automatically generate the new correspondence table \(A^*:B\). (The
newCorrespondenceTable() function could also be applied to
achieve this, but the updateCorrespondenceTable() function
takes into consideration the fact that \(A\) and \(A^*\) are two versions of the same
classification, and is therefore recommended for this updating
scenario.
In the case of newCorrespondenceTable(), the number of
intermediate classifications is variable.
For this reason, the function accepts a flexible, matrix-like input
structure that represents the relationships between classifications and
their correspondence tables.
The input must be provided either:
In both cases, the diagonal elements of the structure correspond to classification tables (e.g. \(A\), \(B\), \(C\)), while the off-diagonal elements represent the correspondence tables linking consecutive classifications (e.g. \(A:B\), \(B:C\)).
To generate a correspondence table between classifications \(A\) and \(C\) from the correspondence tables \(A:B\) and \(B:C\), the function requires a matrix-like input structure with classifications on the diagonal and correspondence tables on the off-diagonal. Schematically, this structure can be represented as follows:
\[ \begin{bmatrix} A & A\!:\!B & \\ & B & B\!:\!C \\ & & C \end{bmatrix} \]
This representation naturally extends to cases with multiple intermediate classifications.
The input for updateCorrespondenceTable() simply
requires the classifications (\(A,
A^*\) and \(B\)) and
correspondence tables (\(A:B\) and
\(A:A^*\)) as data frames.
As output, both newCorrespondenceTable() and
updateCorrespondenceTable() return a list containing:
When newCorrespondenceTable() is used with a CSV-based
input structure, the CSV file that specifies the input layout must
contain full file paths to the referenced CSV files, rather than file
names alone. Accordingly, in the sample input, the file names appearing
in the CSV table cells must be prefixed with their full path.
To streamline this task, the utility function fullPath,
defined below, is used in all the following examples.
tmp_dir <- tempdir()
fullPath <- function(CSVraw, CSVappended){
NamesCsv <- system.file("extdata/test", CSVraw, package = "correspondenceTables")
A <- read.csv(NamesCsv, header = FALSE, sep = ",")
for (i in 1:nrow(A)) {
for (j in 1:ncol(A)) {
if (A[i,j]!="") {
A[i, j] <- system.file("extdata/test", A[i, j], package = "correspondenceTables")
}}}
write.table(x = A, file = file.path(tmp_dir,CSVappended), row.names = FALSE, col.names = FALSE, sep = ",")
return(A)
}newCorrespondenceTable()Execute the following code to apply function
newCorrespondenceTable() and generate the correspondence
table linking ISIC Rev. 4 (classification A) to CPA 2.1 (classification
B) through the intermediate classification CPC 2.1. When no trimming is
executed (Redundancy_trim = FALSE), redundant records are
shown, together with the redundancy flag.
NCT <- newCorrespondenceTable(
Tables = file.path(tmp_dir, "names.csv"),
Reference = "A",
MismatchTolerance = 0.5,
Redundancy_trim = FALSE,
Progress = FALSE
)
knitr::kable(
(NCT[[1]][3748:3753, 1:9]),
caption = "ISIC Rev. 4 to CPA Ver. 2.1 (via CPC Ver. 2.1): Subsample of the new Correspondence Table",
align = "c"
)| ISIC Rev. 4 | CPC 2.1 | CPA 2.1 | Review | Redundancy | Redundancy_keep | Unmatched | NoMatchFromA | NoMatchFromB | |
|---|---|---|---|---|---|---|---|---|---|
| 3748 | 1030 | 21495 | 10.39.23 | 0 | 0 | 0 | 0 | 0 | 0 |
| 3749 | 1030 | 21496 | 10.39.24 | 0 | 0 | 0 | 0 | 0 | 0 |
| 3750 | 1030 | 21429 | 10.39.25 | 0 | 1 | 1 | 0 | 0 | 0 |
| 3751 | 1030 | 21421 | 10.39.25 | 0 | 1 | 0 | 0 | 0 | 0 |
| 3752 | 1030 | 21424 | 10.39.25 | 0 | 1 | 0 | 0 | 0 | 0 |
| 3753 | 1030 | 21422 | 10.39.25 | 0 | 1 | 0 | 0 | 0 | 0 |
The table above represents a subset of the correspondence table generated in this example. Each row represents a candidate correspondence between an ISIC code and a CPA code, possibly mediated by one or more intermediate classifications.
Here, the ISIC code 1030 is linked to several CPA
codes:
The rows linking 1030 to 10.39.23 and
10.39.24 are unique and unambiguous.
These rows have Redundancy = 0, Unmatched = 0,
and no review or mismatch flags set.
The CPA code 10.39.25 appears multiple
times in combination with the same ISIC code 1030,
via different CPC codes.
These rows are therefore flagged with
Redundancy = 1.
When Redundancy_trim = FALSE, all redundant rows are
retained and an additional column, Redundancy_keep, is
included:
Redundancy_keep = 1 identifies the record that would be
kept if redundancy trimming were applied.Redundancy_keep = 0 represent redundant
alternatives.All rows in this example have Unmatched = 0, indicating
that each ISIC code is matched to at least one CPA code and vice
versa.
Similarly, NoMatchFromA = 0 and
NoMatchFromB = 0 show that no codes from the original
classification tables are missing from the correspondence tables
involved in the construction.
Finally, the Review flag is equal to 0 for
all rows, indicating that given the selected reference classification,
no hierarchical inconsistencies are detected.
knitr::kable(
head(NCT[[2]]),
caption = "ISIC Rev. 4 to CPA Ver. 2.1 (via CPC Ver. 2.1): Names of the classifications involved",
align = "c"
)| Classification: Name |
|---|
| A: ISIC Rev. 4 |
| C1: CPC 2.1 |
| B: CPA 2.1 |
The table above is the second element generated with
newCorrespondenceTable, which simply is a data frame
containing the names of all classifications involved.
Execute the following code to apply function
newCorrespondenceTable() and generate the correspondence
table linking NACE Rev. 2 (classification A) to SITC 4 (classification
B) through the intermediate classifications CPA Ver. 2.1 and CN 2022.
Given the option Redundancy_trim = TRUE, when there are
redundant records, these are removed and kept exactly one record for
each unique combination.
NCT <- newCorrespondenceTable(
Tables = file.path(tmp_dir, "names.csv"),
Reference = "none",
MismatchTolerance = 0.96,
Redundancy_trim = TRUE,
Progress = FALSE
)
knitr::kable(
head(NCT[[1]][5442:5450, 1:8]),
caption = "NACE Rev. 2 : SITC 4 (via CPA Ver. 2.1 and CN 2022): Subsample of the new Correspondence Table",
align = "c"
)| NACE Rev. 2 | CPA 2.1 | CN 2022 | SITC4 | Redundancy | Unmatched | NoMatchFromA | NoMatchFromB | |
|---|---|---|---|---|---|---|---|---|
| 5442 | 28.41 | 28.41.24 | 84623210 | 73314 | 0 | 0 | 0 | 0 |
| 5443 | 28.41 | 28.41.24 | 84623290 | 73315 | 0 | 0 | 0 | 0 |
| 5444 | 28.41 | 28.41.32 | 84624200 | 73316 | 0 | 0 | 0 | 0 |
| 5445 | 28.41 | 28.41.32 | 84624900 | 73317 | 0 | 0 | 0 | 0 |
| 5446 | 28.41 | 28.41.33 | Multiple | 73318 | 1 | 0 | 0 | 0 |
| 5447 | 28.41 | Multiple | Multiple | 73399 | 1 | 0 | 0 | 0 |
Also in this case, the table above represents a subset of the correspondence table generated in this example. Each row corresponds to a correspondence between a NACE code and a SITC code, possibly mediated by multiple intermediate classifications.
In this example, the NACE code 28.41 is mapped to
several SITC codes:
The first four rows represent unique and unambiguous
correspondences, where specific CPA and CN codes are associated with
specific SITC codes.
These rows have Redundancy = 0 and
Unmatched = 0, indicating clear one-to-one mappings across
all classifications involved.
The last two rows are flagged with
Redundancy = 1.
In these cases, multiple intermediate codes (in CPA and/or CN)
contribute to the same NACE–SITC mapping. As a result, the corresponding
intermediate classification values are reported as
"Multiple".
All rows have Unmatched = 0, indicating that each
correspondence links a valid NACE code to a valid SITC code.
Additionally, NoMatchFromA = 0 and
NoMatchFromB = 0 for all rows confirm that no
classification codes are missing from the correspondence tables used to
construct the result.
knitr::kable(
head(NCT[[2]]),
caption = "NACE Rev. 2 : SITC 4 (via CPA Ver. 2.1 and CN 2022): Names of the classifications involved",
align = "c"
)| Classification: Name |
|---|
| A: NACE Rev. 2 |
| C1: CPA 2.1 |
| C2: CN 2022 |
| B: SITC4 |
The table above corresponds to the second element returned by
newCorrespondenceTable and is a data frame containing the
names of all the classifications involved in the process.
updateCorrespondenceTable()Execute the following code in order to get the path of the required input files.
A <- read.csv(
system.file("extdata/test", "CN2021.csv", package = "correspondenceTables"),
colClasses = "character"
)
AStar <- read.csv(
system.file("extdata/test", "CN2022.csv", package = "correspondenceTables"),
colClasses = "character"
)
B <- read.csv(
system.file("extdata/test", "CPA21.csv", package = "correspondenceTables"),
colClasses = "character"
)
AB <- read.csv(
system.file("extdata/test", "CN2021_CPA21.csv", package = "correspondenceTables"),
colClasses = "character"
)
AAStar <- read.csv(
system.file("extdata/test", "CN2021_CN2022.csv", package = "correspondenceTables"),
colClasses = "character"
)Execute the following code line to apply function
updateCorrespondenceTable() and generate the updated
correspondence table. In this case the classification CN 2021 (A) has
been updated to CN 2022 (A*), and the correspondence to CPA 2.1 (B) is
revised accordingly. Given the option
Redundancy_trim = TRUE, when there are redundant records,
these are removed and kept exactly one record for each unique
combination.
UPC <- updateCorrespondenceTable(
A = A,
B = B,
AStar = AStar,
AB = AB,
AAStar = AAStar,
Reference = "B",
MismatchToleranceB = 0.4,
MismatchToleranceAStar = 0.4,
Redundancy_trim = TRUE
)
knitr::kable(
(UPC[[1]][7950:7955, 1:11]),
caption = "Updating CN 2021 : CPA Ver. 2.1 (triggered by CN update): Subsample of the new CorrespondenceTable",
align = "c"
)| CN.2021 | CN.2022 | CPA.2.1 | CodeChange | Review | Redundancy | NoMatchToAStar | NoMatchToB | NoMatchFromAStar | NoMatchFromB | LabelChange | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 7950 | 84148080 | 84148080 | 28.13.28 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 7951 | 84149000 | 84149000 | 28.13.32 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 |
| 7952 | 84219990 | 84149000 | 28.29.82 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 |
| 7953 | 84151010 | 84151010 | 28.25.12 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 7954 | 84151090 | 84151090 | 28.25.12 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 7955 | 84152000 | 84152000 | 28.25.12 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
The table above represents a subset of the correspondence table generated in this example. Each row links a CN 2022 code to a CPA 2.1 code and reflects changes from the previous version.
In this example:
The first three rows are flagged with
CodeChange = 1, indicating that the original CN 2021 codes
are associated with updated CN 2022 codes in a way that differs from the
previous mapping.
These rows also have LabelChange = 1, meaning that the
labels of the corresponding CN codes have changed between
versions.
Rows where Review = 1 indicate potential
hierarchical inconsistencies with respect to the selected reference
classification, and therefore require manual inspection.
The remaining rows have CodeChange = 0 and
LabelChange = 0, showing that both the code and its label
remain unchanged between CN 2021 and CN 2022 for the given
correspondence to CPA 2.1.
All rows have Redundancy = 0, meaning that each CN
2022–CPA 2.1 combination appears only once in the updated correspondence
table.
Similarly, NoMatchToAStar = 0 and
NoMatchToB = 0 indicate that each row contains valid codes
for both CN 2022 and CPA 2.1.
Finally, the flags NoMatchFromAStar = 0 and
NoMatchFromB = 0 for all rows confirm that every code
appearing in the updated correspondence is consistently represented in
both the updated classification table and the underlying concordance
tables.
knitr::kable(
head(UPC[[2]]),
caption = "Updating CN 2021 : CPA Ver. 2.1 (triggered by CN update): Names of the classifications involved",
align = "c",
col.names = "Classification: Name"
)| Classification: Name |
|---|
| A: CN.2021 |
| B: CPA.2.1 |
| AStar: CN.2022 |
The table above is the second element generated with
updateCorrespondenceTable, which simply is a data frame
containing the names of all classifications involved.
Execute the following code in order to get the path of the required input files.
A <- read.csv(
system.file("extdata/test", "NAICS2017.csv", package = "correspondenceTables"),
colClasses = "character"
)
AStar <- read.csv(
system.file("extdata/test", "NAICS2022.csv", package = "correspondenceTables"),
colClasses = "character"
)
B <- read.csv(
system.file("extdata/test", "NACE.csv", package = "correspondenceTables"),
colClasses = "character"
)
AB <- read.csv(
system.file("extdata/test", "NAICS2017_NACE.csv", package = "correspondenceTables"),
colClasses = "character"
)
AAStar <- read.csv(
system.file("extdata/test", "NAICS2017_NAICS2022.csv", package = "correspondenceTables"),
colClasses = "character"
)Execute the following code line to apply function
updateCorrespondenceTable() and generate the updated
correspondence table. In this case the classification NAICS 2017 (A) has
been updated to NAICS 2022 (A*), and the correspondence to NACE Rev. 2
(B) is revised accordingly. Given the option
Redundancy_trim = TRUE, when there are redundant records,
these are removed and kept exactly one record for each unique
combination.
UPC3 <- updateCorrespondenceTable(
A = A,
B = B,
AStar = AStar,
AB = AB,
AAStar = AAStar,
Reference = "none",
MismatchToleranceB = 0.5,
MismatchToleranceAStar = 0.8,
Redundancy_trim = TRUE
)
knitr::kable(
head(UPC3[[1]][1208:1218, 1:10]),
caption = "Updating NAICS : NACE (triggered by NAICS update): Subsample of the new Correspondence Table",
align = "c"
)| NAICS.2017 | NAICS.2022 | NACE.Rev..2 | CodeChange | Redundancy | NoMatchToAStar | NoMatchToB | NoMatchFromAStar | NoMatchFromB | LabelChange | |
|---|---|---|---|---|---|---|---|---|---|---|
| 1208 | 332313 | 332313 | 25.11 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1209 | 332313 | 332313 | 25.29 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1210 | 332313 | 332313 | 25.30 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1211 | 332313 | 332313 | 28.22 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1212 | 332313 | 332313 | 28.91 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1213 | 332313 | 332313 | 30.11 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
The table above represents a subset of the correspondence table generated in this example. Each row represents a candidate correspondence between a NAICS 2022 code and a NACE Rev. 2 code, derived from the previous version of the classification (NAICS 2017).
In this example:
The NAICS code 332313 is unchanged between NAICS
2017 and NAICS 2022, as indicated by CodeChange = 0 for all
rows. This shows that the classification update did not introduce any
code-level changes for this activity.
The same NAICS code 332313 is mapped to multiple
NACE Rev. 2 codes (25.11, 25.29,
25.30, 28.22, 28.91,
30.11), reflecting a one-to-many correspondence that
already existed and remains valid after the update.
All rows have Redundancy = 0, meaning that each
NAICS 2022–NACE Rev. 2 combination appears only once in the updated
correspondence table.
The flags NoMatchToAStar = 0 and
NoMatchToB = 0 indicate that every row contains valid and
consistent codes for both the updated classification (NAICS 2022) and
the target classification (NACE Rev. 2).
Similarly, NoMatchFromAStar = 0 and
NoMatchFromB = 0 confirm that all codes appearing in the
updated correspondence are present in the respective classification
tables and supported by the underlying concordance tables.
Finally, LabelChange = 0 for all rows shows that the
labels associated with the NAICS codes are identical between the 2017
and 2022 versions.
knitr::kable(
head(UPC3[[2]]),
caption = "Updating NAICS : NACE (triggered by NAICS update): Names of the classifications involved",
align = "c",
col.names = "Classification: Name"
)| Classification: Name |
|---|
| A: NAICS.2017 |
| B: NACE.Rev..2 |
| AStar: NAICS.2022 |
The table above corresponds to the second element returned by
updateCorrespondenceTable and is a data frame containing
the names of all relevant classifications.