Package: dataPreparation 1.1.1

dataPreparation: Automated Data Preparation

Do most of the painful data preparation for a data science project with a minimum amount of code; Take advantages of 'data.table' efficiency and use some algorithmic trick in order to perform data preparation in a time and RAM efficient way.

Authors:Emmanuel-Lin Toulemonde [aut, cre]

dataPreparation_1.1.1.tar.gz
dataPreparation_1.1.1.zip(r-4.5)dataPreparation_1.1.1.zip(r-4.4)dataPreparation_1.1.1.zip(r-4.3)
dataPreparation_1.1.1.tgz(r-4.4-any)dataPreparation_1.1.1.tgz(r-4.3-any)
dataPreparation_1.1.1.tar.gz(r-4.5-noble)dataPreparation_1.1.1.tar.gz(r-4.4-noble)
dataPreparation_1.1.1.tgz(r-4.4-emscripten)dataPreparation_1.1.1.tgz(r-4.3-emscripten)
dataPreparation.pdf |dataPreparation.html
dataPreparation/json (API)
NEWS

# Install 'dataPreparation' in R:
install.packages('dataPreparation', repos = c('https://eltoulemonde.r-universe.dev', 'https://cloud.r-project.org'))

Peer review:

Bug tracker:https://github.com/eltoulemonde/datapreparation/issues

Datasets:

On CRAN:

data-preparationdata-preprocessingdata-sciencedate-conversionspeedvariable-eliminationvariable-selection

44 exports 31 stars 2.57 score 21 dependencies 84 scripts 1.2k downloads

Last updated 1 years agofrom:54d7ff6e98. Checks:OK: 3 NOTE: 4. Indexed: yes.

TargetResultDate
Doc / VignettesOKAug 27 2024
R-4.5-winNOTEAug 27 2024
R-4.5-linuxNOTEAug 27 2024
R-4.4-winNOTEAug 27 2024
R-4.4-macNOTEAug 27 2024
R-4.3-winOKAug 27 2024
R-4.3-macOKAug 27 2024

Exports:aggregate_by_keyas.POSIXct_fastbuild_binsbuild_date_factorbuild_encodingbuild_scalesbuild_target_encodingcompute_probability_ratiocompute_weight_of_evidencedata_preparation_newsdate_format_unifierdescriptionfast_discretizationfast_filter_variablesfast_handle_nafast_is_equalfast_roundfast_scalefind_and_transform_datesfind_and_transform_numericsgenerate_date_diffsgenerate_factor_from_dategenerate_from_charactergenerate_from_factorget_most_frequent_elementidentify_datesone_hot_encoderprepare_setremove_percentile_outlierremove_rare_categoricalremove_sd_outliersame_shapeset_as_numeric_matrixset_col_as_characterset_col_as_dateset_col_as_factorset_col_as_numericshape_settarget_encodeun_factorwhich_are_bijectionwhich_are_constantwhich_are_in_doublewhich_are_included

Dependencies:clicpp11crayondata.tablegenericsgluehmslatticelifecyclelubridatemagrittrMatrixpkgconfigprettyunitsprogressR6rlangstringistringrtimechangevctrs

Readme and manuals

Help Manual

Help pageTopics
Adult for UCI repositoryadult
Automatic data_set aggregation by keyaggregate_by_key
Faster date transformationas.POSIXct_fast
Compute binsbuild_bins
Date Factorbuild_date_factor
Compute encodingbuild_encoding
Compute scalesbuild_scales
Build target encodingbuild_target_encoding
Compute probability ratiocompute_probability_ratio
Compute weight of evidencecompute_weight_of_evidence
Show the NEWS filedata_preparation_news
Unify dates formatdate_format_unifier
Describe data setdescription
Discretizationfast_discretization
Filtering useless variablesfast_filter_variables
Handle NA valuesfast_handle_na
Fast checks of equalityfast_is_equal
Fast roundfast_round
scalefast_scale
Identify date columnsfind_and_transform_dates
Identify numeric columns in a data_set setfind_and_transform_numerics
Date differencegenerate_date_diffs
Generate factor from datesgenerate_factor_from_date
Recode charactergenerate_from_character
Recode factorgenerate_from_factor
Get most frequent elementget_most_frequent_element
Identify date columnsidentify_dates
Adult with some ugly columns addedmessy_adult
One hot encoderone_hot_encoder
Preparation pipelineprepare_set
Percentile outlier filteringremove_percentile_outlier
Filter rare categoriesremove_rare_categorical
Standard deviation outlier filteringremove_sd_outlier
Give same shapesame_shape
Numeric matrix preparation for Machine Learning.set_as_numeric_matrix
Set columns as characterset_col_as_character
Set columns as POSIXctset_col_as_date
Set columns as factorset_col_as_factor
Set columns as numericset_col_as_numeric
Final preparation before ML algorithmshape_set
Target encodetarget_encode
First 500 rows of 'messy_adult'tiny_messy_adult
Unfactor factor with too many valuesun_factor
Identify bijectionswhich_are_bijection
Identify constant columnswhich_are_constant
Identify double columnswhich_are_in_double
Identify columns that are included in otherswhich_are_included