Using the CEOdata package

Joel Ardiaca

27/03/2026 - Version 1.4.0

1. Introduction

CEOdata provides convenient access to the microdata (individual-level survey responses) produced by the Centre d’Estudis d’Opinió (CEO), the public opinion institute of the Government of Catalonia.

This vignette is fully offline and uses the bundled small example datasets in the data/ folder.

The central entry point is the function

Depending on the arguments provided, CEOdata() can retrieve either:

  1. An accumulated microdata series, identified by a codi_serie (e.g. “BOP_presencial”), or
  2. A single study dataset, identified by its REO code (e.g. “1145”).

In addition to data retrieval, the packages includes:

Together, these functions provide a coherent workflow for discovering, downloading, and exploring CEO survey microdata directly from R.

2. Accumulated microdata series

2.1. What is a series?

An accumulated microdata series is a dataset that combines the individual responses from multiple CEO surveys conducted under a common design and topic.

For example, the series “BOP_presencial” contains the accumulated microdata of the Baròmetres d’Opinió Política conducted face-to-face since 2014. Each row corresponds to an individual respondent, while the dataset aggregates responses across several survey waves (each identified by a different REO code).

In contrast to downloading a single study (via its REO code), working with an accumulated series allows users to: - Analyse trends across time - Pool observations to increase statistical power - Work with a harmonised questionnaire structure across waves

Each series is identified by a codi_serie, which can be inspected using CEOaccumulated_meta().

2.2. List series

The available accumulated microdata series can be inspected using CEOaccumulated_meta():

head(CEOaccumulated_meta())
## # A tibble: 6 × 10
##   codi_serie    titol_serie mode_admin data_inici data_fi    reo   estat univers
##   <chr>         <chr>       <chr>      <date>     <date>     <chr> <chr> <chr>  
## 1 BOP_telefoni… Microdades… Telefònica 2006-03-06 2013-11-14 346,… Seri… Poblac…
## 2 BOP_presenci… Microdades… Presencial 2014-03-02 2025-06-28 746,… Seri… Poblac…
## 3 Context       Microdades… Presencia… 2024-12-09 2021-05-19 760,… Seri… Poblac…
## 4 VdG_telefoni… Microdades… Telefònica 2007-12-12 2021-12-15 406,… Seri… Poblac…
## 5 VdG_presenci… Microdades… Presencial 2009-05-26 2019-12-12 511,… Seri… Poblac…
## 6 VdG_autoadmi… Microdades… Autoadmin… 2022-12-02 2025-12-08 1044… Seri… Poblac…
## # ℹ 2 more variables: microdades_1 <chr>, microdades_2 <chr>

This function returns a tibble where each row corresponds to an accumulated series. The most relevant columns are:

To see only the identifiers:

head(unique(CEOaccumulated_meta()$codi_serie))
## [1] "BOP_telefonica"       "BOP_presencial"       "Context"             
## [4] "VdG_telefonica"       "VdG_presencial"       "VdG_autoadministrada"

You can also filter the metadata to inspect a specific series:

head(CEOaccumulated_meta(series = "BOP_presencial"))
## # A tibble: 1 × 10
##   codi_serie    titol_serie mode_admin data_inici data_fi    reo   estat univers
##   <chr>         <chr>       <chr>      <date>     <date>     <chr> <chr> <chr>  
## 1 BOP_presenci… Microdades… Presencial 2014-03-02 2025-06-28 746,… Seri… Poblac…
## # ℹ 2 more variables: microdades_1 <chr>, microdades_2 <chr>

2.3. Load series

Once a codi_serie has been identified, the corresponding dataset can be loaded using CEOdata(). In this offline vignette, the available accumulated series example is “BOP_presencial”.

d <- CEOdata()
head(d)
## # A tibble: 6 × 14
##   PONDERA   REO BOP_NUM   ANY   DATA_FIN   DATA_INI   SEXE   EDAT INGRESSOS_1_15
##     <dbl> <dbl> <fct>     <fct> <date>     <date>     <fct> <dbl> <fct>         
## 1       1  1119 Feb. 25 … 2025  2025-03-10 2025-03-10 Masc…    26 Més de 6.000 €
## 2       1  1119 Feb. 25 … 2025  2025-03-05 2025-03-05 Masc…    30 De 3.001 a 4.…
## 3       1  1119 Feb. 25 … 2025  2025-03-12 2025-03-12 Masc…    48 No té cap tip…
## 4       1  1119 Feb. 25 … 2025  2025-02-18 2025-02-18 Feme…    30 De 1001 a 120…
## 5       1  1119 Feb. 25 … 2025  2025-03-03 2025-03-03 Feme…    36 De 2.001 a 2.…
## 6       1  1119 Feb. 25 … 2025  2025-02-17 2025-02-17 Masc…    22 Més de 6.000 €
## # ℹ 5 more variables: LLOC_NAIX <fct>, INTERES_POL <fct>,
## #   SATIS_DEMOCRACIA <fct>, EDAT_GR <fct>, EDAT_CEO <fct>

This is equivalent to explicitly specifying the series:

d <- CEOdata(series = "BOP_presencial")
head(d)
## # A tibble: 6 × 14
##   PONDERA   REO BOP_NUM   ANY   DATA_FIN   DATA_INI   SEXE   EDAT INGRESSOS_1_15
##     <dbl> <dbl> <fct>     <fct> <date>     <date>     <fct> <dbl> <fct>         
## 1       1  1119 Feb. 25 … 2025  2025-03-10 2025-03-10 Masc…    26 Més de 6.000 €
## 2       1  1119 Feb. 25 … 2025  2025-03-05 2025-03-05 Masc…    30 De 3.001 a 4.…
## 3       1  1119 Feb. 25 … 2025  2025-03-12 2025-03-12 Masc…    48 No té cap tip…
## 4       1  1119 Feb. 25 … 2025  2025-02-18 2025-02-18 Feme…    30 De 1001 a 120…
## 5       1  1119 Feb. 25 … 2025  2025-03-03 2025-03-03 Feme…    36 De 2.001 a 2.…
## 6       1  1119 Feb. 25 … 2025  2025-02-17 2025-02-17 Masc…    22 Més de 6.000 €
## # ℹ 5 more variables: LLOC_NAIX <fct>, INTERES_POL <fct>,
## #   SATIS_DEMOCRACIA <fct>, EDAT_GR <fct>, EDAT_CEO <fct>

Attempting to load a different accumulated series in this offline vignette returns an informative error. This is available if the computer has internet connection:

try(CEOdata(series = "Longitudinal"))
## Error : Offline vignette example includes only series = 'BOP_presencial'.

The returned object is a tibble where each row represents an individual respondent and columns correspond to survey variables. Accumulated series typically combine multiple survey waves that share a comparable questionnaire structure.

By default, SPSS labelled variables are converted into standard R factors. To retain the original haven_labelled format:

d_raw <- CEOdata(series = "BOP_presencial", raw = TRUE)
head(d_raw)
## # A tibble: 6 × 14
##   PONDERA   REO BOP_NUM             ANY      DATA_FIN   DATA_INI   SEXE     EDAT
##     <dbl> <dbl> <dbl+lbl>           <dbl+lb> <date>     <date>     <dbl+l> <dbl>
## 1       1  1119 61 [Feb. 25 - 1119] 2025     2025-03-10 2025-03-10 1 [Mas…    26
## 2       1  1119 61 [Feb. 25 - 1119] 2025     2025-03-05 2025-03-05 1 [Mas…    30
## 3       1  1119 61 [Feb. 25 - 1119] 2025     2025-03-12 2025-03-12 1 [Mas…    48
## 4       1  1119 61 [Feb. 25 - 1119] 2025     2025-02-18 2025-02-18 2 [Fem…    30
## 5       1  1119 61 [Feb. 25 - 1119] 2025     2025-03-03 2025-03-03 2 [Fem…    36
## 6       1  1119 61 [Feb. 25 - 1119] 2025     2025-02-17 2025-02-17 1 [Mas…    22
## # ℹ 6 more variables: INGRESSOS_1_15 <dbl+lbl>, LLOC_NAIX <dbl+lbl>,
## #   INTERES_POL <dbl+lbl>, SATIS_DEMOCRACIA <dbl+lbl>, EDAT_GR <dbl+lbl>,
## #   EDAT_CEO <dbl+lbl>

3. Individual studies (REO)

3.1. List studies

All individual surveys from the Generalitat de Catalunya are identified by a REO code (Registre d’Estudis d’Opinió). Each REO corresponds to a specific survey wave conducted at a given time.

The available studies in the offline example can be inspected using CEOmeta():

meta <- CEOmeta()
head(meta)
## # A tibble: 6 × 48
##   REO   `Titol enquesta`                   `Titol estudi` `Metodologia enquesta`
##   <fct> <chr>                              <chr>          <fct>                 
## 1 1150  Baròmetre de la bicicleta. 2025    Baròmetre de … quantitativa          
## 2 1149  Enquesta de valoració del Govern … Enquesta de v… quantitativa          
## 3 1148  Enquesta de satisfacció dels serv… Enquesta de s… quantitativa          
## 4 1147  Elements cohesionadors de la soci… Elements cohe… quantitativa          
## 5 1146  Avaluació de la satisfacció de le… Avaluació de … quantitativa          
## 6 1145  Baròmetre d'Opinió Política. 3a o… Baròmetre d'O… quantitativa          
## # ℹ 44 more variables: `Metode de recollida de dades` <fct>, Objectius <chr>,
## #   `Ambit territorial` <fct>, Cost <dbl>, `Promotors enquesta` <chr>,
## #   `Executors enquesta` <chr>, `Promotors estudi` <chr>,
## #   `Executors estudi` <chr>, `Data de treball de camp` <chr>,
## #   `Dia inici treball de camp` <date>, `Dia final treball de camp` <date>,
## #   Univers <chr>, Mostra <chr>, `Mostra estudis quantitatius` <dbl>,
## #   `Mostra estudis qualitatius` <chr>, `Error mostral` <chr>, …

This function returns a tibble where each row corresponds to a study. Among the most relevant columns are:

Internal surveys from the CEO have publicly available microdata, but there are other surveys from different institutions of the catalan government that might not have available microdata to retrieve. To get only the surveys that can be retrieved:

available <- CEOmeta() |> dplyr::filter(microdata_available)
head(available)
## # A tibble: 3 × 48
##   REO   `Titol enquesta`                   `Titol estudi` `Metodologia enquesta`
##   <fct> <chr>                              <chr>          <fct>                 
## 1 1149  Enquesta de valoració del Govern … Enquesta de v… quantitativa          
## 2 1145  Baròmetre d'Opinió Política. 3a o… Baròmetre d'O… quantitativa          
## 3 1143  Enquesta sobre postveritat i teor… Enquesta sobr… quantitativa          
## # ℹ 44 more variables: `Metode de recollida de dades` <fct>, Objectius <chr>,
## #   `Ambit territorial` <fct>, Cost <dbl>, `Promotors enquesta` <chr>,
## #   `Executors enquesta` <chr>, `Promotors estudi` <chr>,
## #   `Executors estudi` <chr>, `Data de treball de camp` <chr>,
## #   `Dia inici treball de camp` <date>, `Dia final treball de camp` <date>,
## #   Univers <chr>, Mostra <chr>, `Mostra estudis quantitatius` <dbl>,
## #   `Mostra estudis qualitatius` <chr>, `Error mostral` <chr>, …

The search argument allows users to look for keywords across several descriptive fields (such as title, summary, objectives…). Search words should be in Catalan.

specific_reo <- CEOmeta(reo = "1145")
head(specific_reo)
## # A tibble: 1 × 48
##   REO   `Titol enquesta`                   `Titol estudi` `Metodologia enquesta`
##   <fct> <chr>                              <chr>          <fct>                 
## 1 1145  Baròmetre d'Opinió Política. 3a o… Baròmetre d'O… quantitativa          
## # ℹ 44 more variables: `Metode de recollida de dades` <fct>, Objectius <chr>,
## #   `Ambit territorial` <fct>, Cost <dbl>, `Promotors enquesta` <chr>,
## #   `Executors enquesta` <chr>, `Promotors estudi` <chr>,
## #   `Executors estudi` <chr>, `Data de treball de camp` <chr>,
## #   `Dia inici treball de camp` <date>, `Dia final treball de camp` <date>,
## #   Univers <chr>, Mostra <chr>, `Mostra estudis quantitatius` <dbl>,
## #   `Mostra estudis qualitatius` <chr>, `Error mostral` <chr>, …

3.2. Load studies

Once you have identified the REO code of a study, you can load its microdata using CEOdata(reo = ...).

d1145 <- CEOdata(reo = "1145")
head(d1145)
## # A tibble: 6 × 11
##   PONDERA   REO SEXE    BOP_NUM       EDAT  INGRESSOS_1_15 LLOC_NAIX INTERES_POL
##     <dbl> <dbl> <fct>   <fct>         <fct> <fct>          <fct>     <fct>      
## 1       1  1145 Femení  Oct. 25 - 11… 37    De 1.801 a 2.… A Catalu… Gens       
## 2       1  1145 Masculí Oct. 25 - 11… 19    De 1.801 a 2.… A Catalu… Gens       
## 3       1  1145 Femení  Oct. 25 - 11… 37    De 2.401 a 3.… A Catalu… Poc        
## 4       1  1145 Femení  Oct. 25 - 11… 83    De 1001 a 120… A altres… Bastant    
## 5       1  1145 Masculí Oct. 25 - 11… 70    De 2.001 a 2.… A Catalu… Bastant    
## 6       1  1145 Masculí Oct. 25 - 11… 65    De 4.501 a 5.… A altres… Bastant    
## # ℹ 3 more variables: SATIS_DEMOCRACIA <fct>, EDAT_GR <fct>, EDAT_CEO <fct>

The returned object is a tibble where each row corresponds to an individual respondent and columns correspond to survey variables. If a REO has not available microdata, CEOdata() will return an informative error when retrieving the information.

As with accumulated series, by default the package converts SPSS-labelled variables into standard R factors. To keep the raw haven_labelled format:

d1145_raw <- CEOdata(reo = "1145", raw = TRUE)
head(d1145_raw)
## # A tibble: 6 × 11
##   PONDERA   REO SEXE       BOP_NUM  EDAT    INGRESSOS_1_15 LLOC_NAIX INTERES_POL
##     <dbl> <dbl> <dbl+lbl>  <dbl+lb> <dbl+l> <dbl+lbl>      <dbl+lbl> <dbl+lbl>  
## 1       1  1145 2 [Femení] 63 [Oct… 37 [37]  8 [De 1.801 … 1 [A Cat… 4 [Gens]   
## 2       1  1145 1 [Mascul… 63 [Oct… 19 [19]  8 [De 1.801 … 1 [A Cat… 4 [Gens]   
## 3       1  1145 2 [Femení] 63 [Oct… 37 [37] 10 [De 2.401 … 1 [A Cat… 3 [Poc]    
## 4       1  1145 2 [Femení] 63 [Oct… 83 [83]  6 [De 1001 a… 2 [A alt… 2 [Bastant]
## 5       1  1145 1 [Mascul… 63 [Oct… 70 [70]  9 [De 2.001 … 1 [A Cat… 2 [Bastant]
## 6       1  1145 1 [Mascul… 63 [Oct… 65 [65] 13 [De 4.501 … 2 [A alt… 2 [Bastant]
## # ℹ 3 more variables: SATIS_DEMOCRACIA <dbl+lbl>, EDAT_GR <dbl+lbl>,
## #   EDAT_CEO <dbl+lbl>

4. Search for keywords in the labels

Once a dataset has been downloaded using CEOdata(), the function CEOsearch() can be used to look for keywords in the variable labels or value labels. this is especially useful when working with large questionnaires and searching for specific topics.

You can search for keywords in the variable labels, for example, look for “trust” in the last retrieved dataset. Keywords must be typed in catalan language.

head(CEOsearch(d1145, keyword = "democràcia"))
## # A tibble: 1 × 2
##   Variable         Label                                                        
##   <chr>            <chr>                                                        
## 1 SATIS_DEMOCRACIA 26. Està vostè molt, bastant, poc o gens satisfet/a amb el f…

Sometimes, information might be on the value labels instead of the variables themselves. You can also search within response categories.

head(CEOsearch(d1145, keyword = "Catalunya", where = "values"))
## # A tibble: 6 × 2
##   Variable  Value                        
##   <fct>     <fct>                        
## 1 LLOC_NAIX A Catalunya                  
## 2 LLOC_NAIX A altres comunitats autònomes
## 3 LLOC_NAIX Unió Europea                 
## 4 LLOC_NAIX Resta del món                
## 5 LLOC_NAIX Fora d'Espanya               
## 6 LLOC_NAIX No ho sap

5. Working with labelled data (raw vs factors)

CEO microdata are originally distributed as SPSS (.sav) files. These files store categorical variables using value labels (e.g. 1 = Yes, 2 = No) rather than plain R factors.

By default, CEOdata()converts SPSS-labelled variables into standard R factors. This makes the dataset immediately convenient for descriptive statistics, modelling, and plotting in R, as most workflows expect factors rather than labelled vectors. If you prefer labelled structure, for example to retain exact numeric codings, you can set the argument raw = TRUE when retrieving any dataset.

6. Notes on reproducibility and data updates

In online use, CEOdata retrieves datasets directly from the official open data platform of the Generalitat de Catalunya. In this vignette, all examples are run offline with fixed local files. Online retrieval has implications for reproducibility:

As a consequence, repeated calls to CEOdata() at different points in time may return slightly different datasets.

6.1. Ensuring reproducibility

To enhance reproducibility in applied research, it is recommended to:

packageVersion("CEOdata")
## [1] '1.4.0'

CEOdata aims to provide convenient and transparent access to official survey data, but reproducible research practices remain the responsibility of the analyst.