Title: | X-Engineering or Supporting Functions |
---|---|
Description: | Miscellaneous functions used for x-engineering (feature engineering) or for supporting in other packages maintained by 'Shichen Xie'. |
Authors: | Shichen Xie [aut, cre] |
Maintainer: | Shichen Xie <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.6 |
Built: | 2025-02-12 13:05:08 UTC |
Source: | https://github.com/shichenxie/xefun |
Converting a vector to a list with names specified.
as.list2(x, name = TRUE, ...)
as.list2(x, name = TRUE, ...)
x |
a vector. |
name |
specify the names of list. Setting the names of list as x by default. |
... |
Additional parameters provided in the as.list function. |
as.list2(c('a', 'b')) as.list2(c('a', 'b'), name = FALSE) as.list2(c('a', 'b'), name = c('c', 'd'))
as.list2(c('a', 'b')) as.list2(c('a', 'b'), name = FALSE) as.list2(c('a', 'b'), name = c('c', 'd'))
The ceiling2 is ceiling of numeric values by digits. The floor2 is floor of numeric values by digits.
ceiling2(x, digits = 1) floor2(x, digits = 1)
ceiling2(x, digits = 1) floor2(x, digits = 1)
x |
a numeric vector. |
digits |
integer indicating the number of significant digits. |
ceiling2 rounds the elements in x to the specified number of significant digits that is the smallest number not less than the corresponding elements.
floor2 rounds the elements in x to the specified number of significant digits that is the largest number not greater than the corresponding elements.
x = c(12345, 54.321) ceiling2(x) ceiling2(x, 2) ceiling2(x, 3) floor2(x) floor2(x, 2) floor2(x, 3)
x = c(12345, 54.321) ceiling2(x) ceiling2(x, 2) ceiling2(x, 3) floor2(x) floor2(x, 2) floor2(x, 3)
The columns name of a data frame with constant value.
cols_const(dt)
cols_const(dt)
dt |
a data frame. |
dt = data.frame(a = sample(0:9, 6), b = sample(letters, 6), c = rep(1, 6), d = rep('a', 6)) dt cols_const(dt)
dt = data.frame(a = sample(0:9, 6), b = sample(letters, 6), c = rep(1, 6), d = rep('a', 6)) dt cols_const(dt)
The columns name of a data frame by given data types.
cols_type(dt, type)
cols_type(dt, type)
dt |
a data frame. |
type |
a string of data types, available values including character, numeric, double, integer, logical, factor, datetime. |
dt = data.frame(a = sample(0:9, 6), b = sample(letters, 6), c = Sys.Date()-1:6, d = Sys.time() - 1:6) dt # numeric columns cols_type(dt, 'numeric') # or cols_type(dt, 'n') # numeric and character columns cols_type(dt, c('character', 'numeric')) # or cols_type(dt, c('c', 'n')) # date time columns cols_type(dt, 'datetime')
dt = data.frame(a = sample(0:9, 6), b = sample(letters, 6), c = Sys.Date()-1:6, d = Sys.time() - 1:6) dt # numeric columns cols_type(dt, 'numeric') # or cols_type(dt, 'n') # numeric and character columns cols_type(dt, c('character', 'numeric')) # or cols_type(dt, c('c', 'n')) # date time columns cols_type(dt, 'datetime')
It counts the number of continuous identical values.
conticnt(x, cnt = FALSE, ...)
conticnt(x, cnt = FALSE, ...)
x |
a vector or data frame. |
cnt |
whether to count the number rows in each continuous groups. |
... |
ignored |
A integer vector indicating the number of continuous identical elements in x.
# example I x1 = c(0,0,0, 1,1,1) conticnt(x1) conticnt(x1, cnt=TRUE) x2 = c(1, 2,2, 3,3,3) conticnt(x2) conticnt(x2, cnt=TRUE) x3 = c('c','c','c', 'b','b', 'a') conticnt(x3) conticnt(x3, cnt=TRUE) # example II dt = data.frame(c1=x1, c2=x2, c3=x3) conticnt(dt, col=c('c1', 'c2')) conticnt(dt, col=c('c1', 'c2'), cnt = TRUE)
# example I x1 = c(0,0,0, 1,1,1) conticnt(x1) conticnt(x1, cnt=TRUE) x2 = c(1, 2,2, 3,3,3) conticnt(x2) conticnt(x2, cnt=TRUE) x3 = c('c','c','c', 'b','b', 'a') conticnt(x3) conticnt(x3, cnt=TRUE) # example II dt = data.frame(c1=x1, c2=x2, c3=x3) conticnt(dt, col=c('c1', 'c2')) conticnt(dt, col=c('c1', 'c2'), cnt = TRUE)
The date of bop (beginning of period) or eop (end of period).
date_bop(freq, x, workday = FALSE) date_eop(freq, x, workday = FALSE)
date_bop(freq, x, workday = FALSE) date_eop(freq, x, workday = FALSE)
freq |
the frequency of period. It supports weekly, monthly, quarterly and yearly. |
x |
a date |
workday |
logical, whether to return the latest workday |
date_bop returns the beginning date of period of corresponding x by frequency.
date_eop returns the end date of period of corresponding x by frequency.
date_bop('weekly', Sys.Date()) date_eop('weekly', Sys.Date()) date_bop('monthly', Sys.Date()) date_eop('monthly', Sys.Date())
date_bop('weekly', Sys.Date()) date_eop('weekly', Sys.Date()) date_bop('monthly', Sys.Date()) date_eop('monthly', Sys.Date())
The date before a specified date by date_range.
date_from(date_range, to = Sys.Date(), default_from = "1000-01-01")
date_from(date_range, to = Sys.Date(), default_from = "1000-01-01")
date_range |
date range, available value including nd, nm, mtd, qtd, ytd, ny, max. |
to |
a date, default is current system date. |
default_from |
the default date when date_range is sett to max |
It returns the start date of a date_range with a specified end date.
date_from(3) date_from('3d') date_from('3m') date_from('3q') date_from('3y') date_from('mtd') date_from('qtd') date_from('ytd')
date_from(3) date_from('3d') date_from('3m') date_from('3q') date_from('3y') date_from('mtd') date_from('qtd') date_from('ytd')
The latest workday date of n days before a specified date.
date_lwd(n, to = Sys.Date())
date_lwd(n, to = Sys.Date())
n |
number of days |
to |
a date, default is current system date. |
It returns the latest workday date that is n days before a specified date.
date_lwd(5) date_lwd(3, "2016-01-01") date_lwd(3, "20160101")
date_lwd(5) date_lwd(3, "2016-01-01") date_lwd(3, "20160101")
It converts date to numeric value in specified unit.
date_num(x, unit = "s", origin = "1970-01-01", scientific = FALSE)
date_num(x, unit = "s", origin = "1970-01-01", scientific = FALSE)
x |
date. |
unit |
time unit, available values including milliseconds, seconds, minutes, hours, days, weeks. |
origin |
original date, defaults to 1970-01-01. |
scientific |
logical, whether to encode the number in scientific format, defaults to FALSE. |
# setting unit date_num(Sys.time(), unit='milliseconds') date_num(Sys.time(), unit='mil') date_num(Sys.time(), unit='seconds') date_num(Sys.time(), unit='s') date_num(Sys.time(), unit='days') date_num(Sys.time(), unit='d') # setting origin date_num(Sys.time(), unit='d', origin = '1970-01-01') date_num(Sys.time(), unit='d', origin = '2022-01-01') # setting scientific format date_num(Sys.time(), unit='mil', scientific = FALSE) date_num(Sys.time(), unit='mil', scientific = TRUE) date_num(Sys.time(), unit='mil', scientific = NULL)
# setting unit date_num(Sys.time(), unit='milliseconds') date_num(Sys.time(), unit='mil') date_num(Sys.time(), unit='seconds') date_num(Sys.time(), unit='s') date_num(Sys.time(), unit='days') date_num(Sys.time(), unit='d') # setting origin date_num(Sys.time(), unit='d', origin = '1970-01-01') date_num(Sys.time(), unit='d', origin = '2022-01-01') # setting scientific format date_num(Sys.time(), unit='mil', scientific = FALSE) date_num(Sys.time(), unit='mil', scientific = TRUE) date_num(Sys.time(), unit='mil', scientific = NULL)
Returns the (regular or parallel) maxima and minima of the input values. For numeric NAs, it returns NA instead of Inf or -Inf.
max2(..., na.rm = FALSE) min2(..., na.rm = FALSE)
max2(..., na.rm = FALSE) min2(..., na.rm = FALSE)
... |
numeric or character arguments |
na.rm |
a logical indicating whether missing values should be removed. |
max2(c(NA), na.rm=TRUE) max(c(NA), na.rm=TRUE) min2(c(NA), na.rm=TRUE) min(c(NA), na.rm=TRUE)
max2(c(NA), na.rm=TRUE) max(c(NA), na.rm=TRUE) min2(c(NA), na.rm=TRUE) min(c(NA), na.rm=TRUE)
Merge a list of data.frames by common columns or row names.
merge2(datlst, by = NULL, all = TRUE, ...)
merge2(datlst, by = NULL, all = TRUE, ...)
datlst |
a list of data.frames. |
by |
A vector of shared column names in x and y to merge on. This defaults to the shared key columns between the two tables. If y has no key columns, this defaults to the key of x. |
all |
logical; all = TRUE is shorthand to save setting both all.x = TRUE and all.y = TRUE. |
... |
Additional parameters provided in the merge function. |
reprate estimates the max rate of character repetition.
reprate(x, col)
reprate(x, col)
x |
a character vector or a data frame. |
col |
a character column name. |
a numeric vector indicating the max rate of character repetition in the corresponding elements in argument x vector.
x = c('a', 'aa', 'ab', 'aab', 'aaab') reprate(x) reprate(data.frame(x=x), 'x')
x = c('a', 'aa', 'ab', 'aab', 'aaab') reprate(x) reprate(data.frame(x=x), 'x')
Split vector x into chunks of equal size n
split2(x, n)
split2(x, n)
x |
a vector. |
n |
a numeric, size of n. |
x = 1:9 split2(x, 3) split2(x, 6)
x = 1:9 split2(x, 3) split2(x, 6)