Conversation
This edit allows for multiple tables per page to be read using list_matrices method. If there is only one table on a page, a matrix is returned, else a list of matrices are returned.
|
@leeper I haven't added any tests for this - it's just the bare change. I can add tests, or anything else you'd like - just let me know. |
Returns a list of strings if there are multiple tables per page.
Now produces a list of data frames if there's multiple tables per page.
The java import using 'asis' thinks there's two tables present. This wasn't an issue in the previous version of extract_tables (as it only extracted the first table), but it is now when we extract all tables.
|
Of course, |
Codecov Report
@@ Coverage Diff @@
## master #31 +/- ##
=========================================
Coverage ? 57.82%
=========================================
Files ? 12
Lines ? 569
Branches ? 0
=========================================
Hits ? 329
Misses ? 240
Partials ? 0
Continue to review full report at Codecov.
|
|
Sorry, @SteveLane, for the delay on this. I will try to get to it as soon as I can. |
R/output.R
Outdated
| for (j in seq_len(ncol(out[[n]]))) { | ||
| out[[n]][i, j] <- tab$getCell(i-1L, j-1L)$getText() | ||
| outTab <- list() | ||
| for(nTabs in seq_len(nxt$size())){ |
| outTab[[nTabs]] <- matrix(NA_character_, | ||
| nrow = tab$getRows()$size(), | ||
| ncol = tab$getCols()$size()) | ||
| for (i in seq_len(nrow(outTab[[nTabs]]))) { |
There was a problem hiding this comment.
Can you add comments indicating what's going on here?
R/output.R
Outdated
| if (!is.null(encoding)) { | ||
| Encoding(out[[n]]) <- encoding | ||
| ## Put outTab into out, depending on size | ||
| if(nxt$size() == 1L){ |
R/output.R
Outdated
| m <- list_matrices(tables, encoding = encoding, ...) | ||
| lapply(m, function(x) { | ||
| paste0(apply(x, 1, paste, collapse = sep), collapse = "\n") | ||
| if(inherits(x, "matrix")){ |
R/output.R
Outdated
| o <- try(read.delim(text = x, stringsAsFactors = stringsAsFactors, ...)) | ||
| if (inherits(o, "try-error")) { | ||
| return(x) | ||
| if(inherits(x, "character")){ |
R/output.R
Outdated
| if (inherits(o, "try-error")) { | ||
| return(x) | ||
| if(inherits(x, "character")){ | ||
| o <- try(read.delim(text = x, stringsAsFactors = stringsAsFactors, ...)) |
R/output.R
Outdated
| } | ||
| } else { | ||
| return(o) | ||
| lapply(x, function(y){ |
R/output.R
Outdated
| } else { | ||
| return(o) | ||
| lapply(x, function(y){ | ||
| o <- try(read.delim(text = y, stringsAsFactors = stringsAsFactors, ...)) |
|
let me test how these changes go with tabula 1.0.5 |
|
hi @SteveLane |
This edit allows for multiple tables per page to be read using list_matrices
method. If there is only one table on a page, a matrix is returned, else a list
of matrices are returned.