加入收藏 | 设为首页 | 会员中心 | 我要投稿 东莞站长网 (https://www.0769zz.com/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 大数据 > 正文

『Data Science』R语言学习笔记,基础语法

发布时间:2021-03-01 06:16:39 所属栏目:大数据 来源:网络整理
导读:Data Types Data Object Vector x - c(0.5,0.6) ## numericx - c(TRUE,FALSE) ## logicalx - c(T,F) ## logicalx - c("a","b","c") ## characterx - 9:29 ## integerx - c(1+0i,2+4i) ## complexx - vector("numeric",length = 10) ## create a numeric vect

Main Arguments:

  • file
  • header
  • sep,columns separate,like ,.
  • colClasses,the data class types of the column.
  • nrows,number of the rows.
  • comment.character,a character vector indicating the class of each column in the dataset.
  • skip,the number of lines to skip from the beginning.
  • stringsAsFactors,should character variables be coded as factors?

Usages:

read.table(file,header = FALSE,sep = "",quote = ""'",dec = ".",numerals = c("allow.loss","warn.loss","no.loss"),row.names,col.names,as.is = !stringsAsFactors,na.strings = "NA",colClasses = NA,nrows = -1,skip = 0,check.names = TRUE,fill = !blank.lines.skip,strip.white = FALSE,blank.lines.skip = TRUE,comment.char = "#",allowEscapes = FALSE,flush = FALSE,stringsAsFactors = default.stringsAsFactors(),fileEncoding = "",encoding = "unknown",text,skipNul = FALSE)

read.csv(file,header = TRUE,sep = ",",quote = """,fill = TRUE,comment.char = "",...)

read.csv2(file,sep = ";",dec = ",...)

read.delim(file,sep = "t",...)

read.delim2(file,...)

Writing Data

Description: write.table prints its required argument x (after converting it to a data frame if it is not one nor a matrix) to a file or connection.

Main Points:

  • write.table
  • writeLines
  • dump
  • dput
  • save
  • serialize

Usages:

write.table(x,file = "",append = FALSE,quote = TRUE,sep = " ",eol = "n",na = "NA",row.names = TRUE,col.names = TRUE,qmethod = c("escape","double"),fileEncoding = "")

write.csv(...)
write.csv2(...)

Reading Large Tables

  • Read the help page for read.table,which contains many hints.
  • Make a rough calculation of the memory required to store your dataset. If the dataset is larger than the amount of RAM on your computer,you can probably stop right here.
  • Set comment.char = "" if there are no commented lines in your file.
  • Use the colClasses argument. Specifying this option instead of using the default can make read.table run MUCH faster,often twice as fast. In order to use this option,you have to know the class of each column in your data frame. If all of the columns are "numeric",for example,then you can just set colClasses = "numeric". A quick an dirty way to figure out the classes of each column is the following:
> initial <- read.table("db.txt",nrows = 100,sep = "t")
> classes <- sapply(initial,class)
> tabAll <- read.table("db.txt",colClasses = classes)
  • Set nrows. This doesn't make R run faster but it helps with memory usage. A mild overestimate is okay. You can use the Unix tool wc to calculate the number of lines in a file.

Reading Data Formats

dput and dget

> y <- data.frame(a = 1,b = "a") ## Create a `data.frame` object for `dput`
> dput(y)                         ## `dput` the object created before

structure(list(a = 1,b = structure(1L,.Label = "a",class = "factor")),.Names = c("a","b"),row.names = c(NA,-1L),class = "data.frame")

> dput(y,file = 'y.R')           ## `dput` the object created before into a file which named 'y.R'
> new.y <- dget('y.R')            ## get the data store in the file 'y.R'
> new.y                           ## print the data in the 'y.R'

  a b
1 1 a

dump

Multiple objects can be deparsed using the dump function and read back in using source.

> x <- "foo"                          ## create the first data object
> y <- data.frame(a = 1,b = "a")     ## create the second data object
> dump(c("x","y"),file = "data.R")  ## store the both data object in to a file called 'data.R'
> rm(x,y)                            ## remove the both data object from RAM
> source("data.R")                    ## import the dumped file 'data.R'
> y                                   ## print the data object 'y' from 'data.R'
  a b
1 1 a
> x                                   ## print the data object 'x' from 'data.R'
[1] "foo"

Connections: Interfaces to the Outside World

Data are read in using connection interfaces. Connections can be made to files (most common) or to other more exotic things.

  • file,opens a connection to a file
  • gzfile,opens a connection to a file compressed with gzip
  • bzfile,opens a connection to a file compressed with bzip2
  • url,opens a connection to a webpage.
> con <- file('db.txt','r')
> readLines(con)

Subsetting

  • [always returns an object of the same class as the original; can be used to select more than one element (there is one exception)
  • [[is used to extract elements of a list or a data frame; it can only be used to extract a single element and the class of the returned object will not necessarily be a list or data frame.
  • $ is used to extract elements of a list or data frame by name; semantics are similar to hat of [[.

Basic

> x <- c("a","c","d","e")
> x[1]
[1] "a"
> x[2]
[1] "b"
> x[1:3]
[1] "a" "b" "c"
> x[x > "a"]
[1] "b" "c" "d" "e"
> u  <- x>"a"
> u
[1] FALSE  TRUE  TRUE  TRUE  TRUE
> x[u]
[1] "b" "c" "d" "e"

Lists

> x <- list(foo = 1:4,bar = 0.6)

> x[1]
$foo
[1] 1 2 3 4
> x[[1]]
[1] 1 2 3 4
> x[[2]]
[1] 0.6

> x$bar
[1] 0.6
> x$foo
[1] 1 2 3 4

> x[["bar"]]
[1] 0.6
> x["bar"]
$bar
[1] 0.6
> x <- list(foo = 1:4,bar = 0.6,baz = "hello")

> x[c(1,3)]
$foo
[1] 1 2 3 4
$baz
[1] "hello"

> name <- "foo"
> x[[name]]
[1] 1 2 3 4
> x$name          ## `name` is a variable,not a `level`,so does not has x$name in the list `x`.
NULL
> x$foo
[1] 1 2 3 4

Matrices

(编辑:东莞站长网)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!