Data Transformation with data.table :: CHEAT SHEET

BasicsCC BY SAErik Petrovski Learn morewith the Updated: 2019-01 data Transformation with :: CHEAT SHEETM anipulate columns with jFunctions for is an extremely fast and memory efficient package for transforming data in R. It works by converting R s native data frame objects into with new and enhanced functionality. The basics of working with are:dt[i, j, by]Take dt,subset rows using iand manipulate columns with j, grouped according to are also data frames functions that work with data frames therefore also work with (a = c(1, 2), b = c("a", "b")) create a from scratch. Analogous to ().setDT(df)* or (df) convert a data frame or a list to a a [1:2, ] subset rows based on row [a > 5, ] subset rows based on values in one or more rows using iLOGICAL OPERATORS TO USE IN i<<= ()%in%|%like%>>=! ()!&%between%dt[, c(2)] extract columns by number. Prefix column numbers with - to [, .(b, c)] extract columns by [, .(x = sum(a))] create a new columns based on the summarized values of functions like mean(), median(), min(), max(), etc.

Can be used to summarize [, .(c = sum(b)), by = a] summarize rows within [,c := sum(b), by = a] create a new column and compute rows within [, .SD[1], by = a] extract first row of [, .SD[.N], by = a] extract last row of GROUPED OPERATIONSCOMPUTE COLUMNS*c33dt[,c := 1 + 2] compute a column based on an (dt, a, -b) reorder a according to specified columns. Prefix column names with - for descending [a == 1, c := 1 + 2] compute a column based on an expression but only for a subset of according to byaaadt[, j, by = .(a)] group rows by values in specified [, j, keyby= .(a)] group and simultaneously sortrows by values in specified [..][..] perform a sequence of by chainingmultiple [] . *SET FUNCTIONS AND := sfunctions prefixed with set and the operator := work without <- to alter data without making copies in memory. , the more efficient setDT(df) is analogous to df< (df) .cd1212dt[,`:=`(c = 1 , d = 2)] compute multiple columns based on separate COLUMNcdt[,c := NULL] delete a COLUMN [,b := (b)] convert the type of a column using (), (), (), (), BY SAErik Petrovski Learn more with the homepageor vignette version Updated: 2019-01 BINDA pply function to [dt_b,on =.]

(b = y)] join on rows with equal JOINxy3b2c1a+abx3b31c22a1=dt_a[dt_b,on = .(b = y, c > z)] join on rows with equal and unequal +abcx3b431c52 NAa81=dt_a[dt_b, on = .(id = id, date = date), roll = TRUE] join on matching rows in id columns but only keep the most recent preceding match with the left according to date columns. roll = -Inf reverses (dt_a, dt_b) combine rows of two +ab=cbind(dt_a, dt_b) combine columns of two +abxy=aiddate1A01-01-20102A01-01-20123A0 1-01-20141B01-01-20102B01-01-2012+=bidda te1A01-01-20131B01-01-2013aiddateb2A01-0 1-201312B01-01-20131setkey(dt, a, b) set keys to enable fast repeated lookup in specified columns using dt[.(value), ] or for merging without specifying merging columns using dt_a[dt_b] .SET KEYSR eshape a TO WIDE FORMATRESHAPE TO LONG FORMAT dcast(dt, id ~ y, c("a", "b"))idyabAx13Az24Bx13Bz24 APPLY A FUNCTION TO MULTIPLE COLUMNS setnames(dt, c("a","b"), c("x", "y")) rename COLUMNSida_xa_zb_xb_zA1234B1234melt(dt, c("id"), patterns("^a", "^b"), "y", c("a", "b"))Reshape a from long to wide format.

IdyabA113B113A224B224ida_xa_zb_xb_zA1234 B1234 Reshape a from wide to long rowsdtA ~ yFormula with a LHS: ID columns containing IDs for multiple entries. And a RHS: columns with values to spread in column headers. containing values to fill into columns with IDs for multiple containing values to fill into cells (often in pattern form). , of new columns for variables and values derived from old (dt, by = c("a", "b")) extract unique rows based on columns specified in by . Leave out by to use all ROWS uniqueN(dt, by = c("a", "b")) count the number of unique rows based on columns specified in by . read & write filesIMPORT fread(" ") read data from a flat file such as .csv or .tsvinto R. fread(" ", select = c("a", "b")) read specified columns from a flat file into (dt, " ") write data to a flat file from R. dt[, c := 1:.N, by = b] within groups, compute a column with sequential row IDSdt[, c := shift(a, 1), by = b] within groups, duplicate a column with rows laggedby specified [, c := shift(a, 1, type = "lead"), by = b] within groups, duplicate a column with rows leadingby specified & LEADdt[, lapply(.)]

SD, mean), .SDcols= c("a", "b")] apply a function mean(), (), () to columns specified in .SDcolswith lapply() and the .SD symbol. Also works with <-c("a")dt[, paste0(cols, "_m") := lapply(.SD, mean), .SDcols= cols] apply a function to specified columns and assign the result with suffixed variable names to the original

Data Transformation with data.table :: CHEAT SHEET

Tags:

Information

Advertisement

Transcription of Data Transformation with data.table :: CHEAT SHEET

Related search queries

Data Transformation with data.table :: CHEAT SHEET

Tags:

Information

Advertisement

Documents from same domain

Related documents

Related search queries