『Data Science』R语言学习笔记，基础语法

发布时间：2021-03-01 06:16:39 所属栏目：大数据来源：网络整理

导读：Data Types Data Object Vector x - c(0.5,0.6) ## numericx - c(TRUE,FALSE) ## logicalx - c(T,F) ## logicalx - c("a","b","c") ## characterx - 9:29 ## integerx - c(1+0i,2+4i) ## complexx - vector("numeric",length = 10) ## create a numeric vect

Main Arguments:

file
header
sep,columns separate,like ,.
colClasses,the data class types of the column.
nrows,number of the rows.
comment.character,a character vector indicating the class of each column in the dataset.
skip,the number of lines to skip from the beginning.
stringsAsFactors,should character variables be coded as factors?

Usages:

read.table(file,header = FALSE,sep = "",quote = ""'",dec = ".",numerals = c("allow.loss","warn.loss","no.loss"),row.names,col.names,as.is = !stringsAsFactors,na.strings = "NA",colClasses = NA,nrows = -1,skip = 0,check.names = TRUE,fill = !blank.lines.skip,strip.white = FALSE,blank.lines.skip = TRUE,comment.char = "#",allowEscapes = FALSE,flush = FALSE,stringsAsFactors = default.stringsAsFactors(),fileEncoding = "",encoding = "unknown",text,skipNul = FALSE)

read.csv(file,header = TRUE,sep = ",",quote = """,fill = TRUE,comment.char = "",...)

read.csv2(file,sep = ";",dec = ",...)

read.delim(file,sep = "t",...)

read.delim2(file,...)

Writing Data

Description: write.table prints its required argument x (after converting it to a data frame if it is not one nor a matrix) to a file or connection.

Main Points:

write.table
writeLines
dump
dput
save
serialize

Usages:

write.table(x,file = "",append = FALSE,quote = TRUE,sep = " ",eol = "n",na = "NA",row.names = TRUE,col.names = TRUE,qmethod = c("escape","double"),fileEncoding = "")

write.csv(...)
write.csv2(...)

Reading Large Tables

Read the help page for read.table,which contains many hints.
Make a rough calculation of the memory required to store your dataset. If the dataset is larger than the amount of RAM on your computer,you can probably stop right here.
Set comment.char = "" if there are no commented lines in your file.
Use the colClasses argument. Specifying this option instead of using the default can make read.table run MUCH faster,often twice as fast. In order to use this option,you have to know the class of each column in your data frame. If all of the columns are "numeric",for example,then you can just set colClasses = "numeric". A quick an dirty way to figure out the classes of each column is the following:

> initial <- read.table("db.txt",nrows = 100,sep = "t")
> classes <- sapply(initial,class)
> tabAll <- read.table("db.txt",colClasses = classes)

Set nrows. This doesn't make R run faster but it helps with memory usage. A mild overestimate is okay. You can use the Unix tool wc to calculate the number of lines in a file.

Reading Data Formats

`dput` and `dget`

> y <- data.frame(a = 1,b = "a") ## Create a `data.frame` object for `dput`
> dput(y)                         ## `dput` the object created before

structure(list(a = 1,b = structure(1L,.Label = "a",class = "factor")),.Names = c("a","b"),row.names = c(NA,-1L),class = "data.frame")

> dput(y,file = 'y.R')           ## `dput` the object created before into a file which named 'y.R'
> new.y <- dget('y.R')            ## get the data store in the file 'y.R'
> new.y                           ## print the data in the 'y.R'

  a b
1 1 a

`dump`

Multiple objects can be deparsed using the dump function and read back in using source.

> x <- "foo"                          ## create the first data object
> y <- data.frame(a = 1,b = "a")     ## create the second data object
> dump(c("x","y"),file = "data.R")  ## store the both data object in to a file called 'data.R'
> rm(x,y)                            ## remove the both data object from RAM
> source("data.R")                    ## import the dumped file 'data.R'
> y                                   ## print the data object 'y' from 'data.R'
  a b
1 1 a
> x                                   ## print the data object 'x' from 'data.R'
[1] "foo"

Connections: Interfaces to the Outside World

Data are read in using connection interfaces. Connections can be made to files (most common) or to other more exotic things.

file,opens a connection to a file
gzfile,opens a connection to a file compressed with gzip
bzfile,opens a connection to a file compressed with bzip2
url,opens a connection to a webpage.

> con <- file('db.txt','r')
> readLines(con)

Subsetting

[always returns an object of the same class as the original; can be used to select more than one element (there is one exception)
[[is used to extract elements of a list or a data frame; it can only be used to extract a single element and the class of the returned object will not necessarily be a list or data frame.
$ is used to extract elements of a list or data frame by name; semantics are similar to hat of [[.

Basic

> x <- c("a","c","d","e")
> x[1]
[1] "a"
> x[2]
[1] "b"
> x[1:3]
[1] "a" "b" "c"
> x[x > "a"]
[1] "b" "c" "d" "e"
> u  <- x>"a"
> u
[1] FALSE  TRUE  TRUE  TRUE  TRUE
> x[u]
[1] "b" "c" "d" "e"

Lists

> x <- list(foo = 1:4,bar = 0.6)

> x[1]
$foo
[1] 1 2 3 4
> x[[1]]
[1] 1 2 3 4
> x[[2]]
[1] 0.6

> x$bar
[1] 0.6
> x$foo
[1] 1 2 3 4

> x[["bar"]]
[1] 0.6
> x["bar"]
$bar
[1] 0.6

> x <- list(foo = 1:4,bar = 0.6,baz = "hello")

> x[c(1,3)]
$foo
[1] 1 2 3 4
$baz
[1] "hello"

> name <- "foo"
> x[[name]]
[1] 1 2 3 4
> x$name          ## `name` is a variable,not a `level`,so does not has x$name in the list `x`.
NULL
> x$foo
[1] 1 2 3 4