CEMPRE

Overview

Employment, salary and firm data from IBGE’s Cadastro Central de Empresas (CEMPRE). This comprehensive dataset provides information on companies and other organizations registered with Brazil’s tax authority (Receita Federal), including employment levels, wage information, and business establishment data across Brazilian municipalities and sectors.

The CEMPRE dataset is one of the most detailed sources of firm-level data in Brazil, covering virtually all formal enterprises and organizations operating in the country.

Data Coverage

The CEMPRE dataset includes:

Dataset Description

Key Variables

  1. Establishment Count: Number of registered establishments
  2. Employment: Total number of employees and average employees per establishment
  3. Wages: Average salary, total payroll, and wage bill information
  4. Economic Classification: CNAE codes for sector identification
  5. Location: State, region, and municipality identifiers

Geographic Aggregation Levels

The data is available at three different aggregation levels: - Country Level: Aggregate statistics for all of Brazil - State Level: Data aggregated by state (27 units) - Municipality Level: Data disaggregated to municipality level (5,570+ municipalities)

Sectoral Detail

Data can be retrieved with sector disaggregation or aggregate form: - Sectoral Disaggregation: Detailed breakdown by CNAE 2.0 (main divisions and subdivisions) - Aggregate: Total across all sectors


Function Parameters

Options:

  1. dataset: "cempre"

  2. raw_data:

    • TRUE: Returns the data in its original format from IBGE
    • FALSE: Returns cleaned and standardized data
  3. geo_level:

    • "country": National aggregate
    • "state": Aggregated by state
    • "municipality": Disaggregated to municipality level (detailed results)
  4. time_period: Specifies the years for which data will be downloaded (e.g., 2010:2020 for 2010 through 2020)

  5. language:

    • "pt": Portuguese language (variable names and labels)
    • "eng": English language
  6. sectors:

    • TRUE: Data is returned separated and disaggregated by economic sector (CNAE)
    • FALSE: Data is aggregated across all sectors

Examples

# download raw data at the country level from 2008 to 2010
data <- load_cempre(
  raw_data = TRUE,
  geo_level = "country",
  time_period = 2008:2010,
  language = "eng"
)

# download treated state-level data split by sector in portuguese
data <- load_cempre(
  raw_data = FALSE,
  geo_level = "state",
  time_period = 2008:2010,
  language = "pt",
  sectors = TRUE
)