The DataGen_rare_group
function generates synthetic data
for rare group analysis, simulating structured datasets for testing and
validating algorithms. This vignette demonstrates how to use
DataGen_rare_group
with example inputs.
Ensure the MUGS
package is loaded before running the
example:
Run the DataGen_rare_group
function to generate the
synthetic dataset:
# Generate data
seed =1
p = 5
n1 = 100
n2 = 100
n.common = 50
n.group = 30
sigma.eps.1 = 1
sigma.eps.2 = 3
ratio.delta = 0.05
network.k = 5
rho.beta = 0.5
rho.U0 = 0.4
rho.delta = 0.7
sigma.rare = 10
n.rare = 20
group.size = 5
DataGen.out <- DataGen_rare_group(seed, p, n1, n2, n.common, n.group, sigma.eps.1, sigma.eps.2, ratio.delta, network.k, rho.beta, rho.U0, rho.delta, sigma.rare, n.rare, group.size)
#> Warning: package 'MASS' was built under R version 4.4.1
#> Warning: package 'fastDummies' was built under R version 4.4.2
#> Warning: package 'rsvd' was built under R version 4.4.1
#> Warning: package 'Rcpp' was built under R version 4.4.2
#> Warning: package 'RcppArmadillo' was built under R version 4.4.3
#> Warning: package 'inline' was built under R version 4.4.3
#>
#> Attaching package: 'inline'
#> The following object is masked from 'package:Rcpp':
#>
#> registerPlugin
Explore the structure and key components of the generated dataset:
# View structure of the output
str(DataGen.out)
#> List of 12
#> $ delta1 : num [1:100, 1:5] 0 0 0 0 0 0 0 0 0 0 ...
#> $ delta2 : num [1:100, 1:5] 0 0 0 -0.697 0 ...
#> $ u.1 : num [1:100, 1:5] 0.206 1.437 0.28 0.71 -0.543 ...
#> $ u.2 : num [1:100, 1:5] 0.468 1.595 -0.152 -1.827 -0.165 ...
#> $ S.1 : num [1:100, 1:100] 1.749 -0.43 4.116 0.382 -0.407 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : chr [1:100] "1" "2" "3" "4" ...
#> .. ..$ : chr [1:100] "1" "2" "3" "4" ...
#> $ S.2 : num [1:100, 1:100] 2.539 2.442 9.46 -0.169 -3.279 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : chr [1:100] "51" "52" "53" "54" ...
#> .. ..$ : chr [1:100] "51" "52" "53" "54" ...
#> $ S.1.0 : num [1:100, 1:100] 2.019 0.0913 2.4329 1.0762 -0.636 ...
#> $ S.2.0 : num [1:100, 1:100] 2.471 0.644 0.321 0.353 -0.615 ...
#> $ X.group.source:'data.frame': 100 obs. of 30 variables:
#> ..$ .data_1 : int [1:100] 1 0 0 0 0 0 0 0 0 0 ...
#> ..$ .data_2 : int [1:100] 0 1 0 0 0 0 0 0 0 0 ...
#> ..$ .data_3 : int [1:100] 0 0 1 0 0 0 0 0 0 0 ...
#> ..$ .data_4 : int [1:100] 0 0 0 1 0 0 0 0 0 0 ...
#> ..$ .data_5 : int [1:100] 0 0 0 0 1 0 0 0 0 0 ...
#> ..$ .data_6 : int [1:100] 0 0 0 0 0 1 0 0 0 0 ...
#> ..$ .data_7 : int [1:100] 0 0 0 0 0 0 1 0 0 0 ...
#> ..$ .data_8 : int [1:100] 0 0 0 0 0 0 0 1 0 0 ...
#> ..$ .data_9 : int [1:100] 0 0 0 0 0 0 0 0 1 0 ...
#> ..$ .data_10: int [1:100] 0 0 0 0 0 0 0 0 0 1 ...
#> ..$ .data_11: int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ .data_12: int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ .data_13: int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ .data_14: int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ .data_15: int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ .data_16: int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ .data_17: int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ .data_18: int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ .data_19: int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ .data_20: int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ .data_21: int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ .data_22: int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ .data_23: int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ .data_24: int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ .data_25: int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ .data_26: int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ .data_27: int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ .data_28: int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ .data_29: int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ .data_30: int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#> $ X.group.target:'data.frame': 100 obs. of 30 variables:
#> ..$ .data_1 : int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ .data_2 : int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ .data_3 : int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ .data_4 : int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ .data_5 : int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ .data_6 : int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ .data_7 : int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ .data_8 : int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ .data_9 : int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ .data_10: int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ .data_11: int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ .data_12: int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ .data_13: int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ .data_14: int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ .data_15: int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ .data_16: int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ .data_17: int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ .data_18: int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ .data_19: int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ .data_20: int [1:100] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ .data_21: int [1:100] 1 0 0 0 0 0 0 0 0 0 ...
#> ..$ .data_22: int [1:100] 0 1 0 0 0 0 0 0 0 0 ...
#> ..$ .data_23: int [1:100] 0 0 1 0 0 0 0 0 0 0 ...
#> ..$ .data_24: int [1:100] 0 0 0 1 0 0 0 0 0 0 ...
#> ..$ .data_25: int [1:100] 0 0 0 0 1 0 0 0 0 0 ...
#> ..$ .data_26: int [1:100] 0 0 0 0 0 1 0 0 0 0 ...
#> ..$ .data_27: int [1:100] 0 0 0 0 0 0 1 0 0 0 ...
#> ..$ .data_28: int [1:100] 0 0 0 0 0 0 0 1 0 0 ...
#> ..$ .data_29: int [1:100] 0 0 0 0 0 0 0 0 1 0 ...
#> ..$ .data_30: int [1:100] 0 0 0 0 0 0 0 0 0 1 ...
#> $ pairs.rel.CV :'data.frame': 305 obs. of 3 variables:
#> ..$ row : chr [1:305] "63" "53" "111" "47" ...
#> ..$ col : chr [1:305] "91" "141" "143" "137" ...
#> ..$ type: chr [1:305] "related" "related" "related" "related" ...
#> $ pairs.rel.EV :'data.frame': 305 obs. of 3 variables:
#> ..$ row : chr [1:305] "122" "3" "23" "43" ...
#> ..$ col : chr [1:305] "123" "121" "141" "133" ...
#> ..$ type: chr [1:305] "related" "related" "related" "related" ...
# Print the first few rows and columns of the S.1 matrix
cat("\nFirst 5 rows and columns of S.1:\n")
#>
#> First 5 rows and columns of S.1:
print(DataGen.out$S.1[1:5, 1:5])
#> 1 2 3 4 5
#> 1 1.7487128 -0.4304881 4.116069 0.3823598 -0.4065799
#> 2 -0.4304881 12.9561681 6.394615 3.3992487 -4.3596035
#> 3 4.1160688 6.3946152 6.740570 1.4372787 -3.8007060
#> 4 0.3823598 3.3992487 1.437279 6.8734021 -5.1412995
#> 5 -0.4065799 -4.3596035 -3.800706 -5.1412995 7.3565161
# Print the first few rows and columns of the S.2 matrix
cat("\nFirst 5 rows and columns of S.2:\n")
#>
#> First 5 rows and columns of S.2:
print(DataGen.out$S.2[1:5, 1:5])
#> 51 52 53 54 55
#> 51 2.539193 2.4424766 9.459866 -0.168739 -3.2789684
#> 52 2.442477 6.0322212 -3.807697 -4.968738 -0.4424126
#> 53 9.459866 -3.8076972 3.373226 -3.002737 -11.4961204
#> 54 -0.168739 -4.9687376 -3.002737 1.957099 -1.7665817
#> 55 -3.278968 -0.4424126 -11.496120 -1.766582 1.2357447
p
, n1
, n2
, n.group
,
and others to test different scenarios.seed
parameter
ensures reproducibility of results.This vignette demonstrated how to use the
DataGen_rare_group
function to simulate structured data for
rare group analysis. Adjust input parameters to suit specific use cases
or experimental setups. For further details, refer to the package
documentation.