library(tidyverse)
library(here)
2023: Day 1 - Trebuchet?!
Setup
Chosen language: R
Notes:
- Puzzle input is a “calibration document”
- Has been “amended” by an Elf
- Each line was a calibration value, that now needs to be recovered
- Value can be obtained by combining the first digit with the last digit to form a single two-digit number
Examples:
- 1abc2 -> 12
- pqr3stu8vwx -> 38
- a1b2c3d4e5f -> 15
- treb7uchet -> 77
Adding these together produces 142.
Do the same for the whole puzzle input
Part 1
Transforming the data to an appropriate data structure:
<- read_lines(here('2023', 'day', '1', 'input'))
amended 1:10] amended[
[1] "nqninenmvnpsz874"
[2] "8twofpmpxkvvdnpdnlpkhseven4ncgkb"
[3] "six8shdkdcdgseven8xczqrnnmthreecckfive"
[4] "qlcnz54dd75nine7jfnlfgz"
[5] "7vrdhggdkqbnltlgpkkvsdxn2mfpghkntzrhtjgtxr"
[6] "cdhmktwo6kjqbprvfour8"
[7] "ninekkvkeight9three"
[8] "ms9five71lrfpqxqlbj"
[9] "9five9sevenldshqfgcnq"
[10] "1one4seven"
Solving for a particular case
<- amended[1]) (first_case
[1] "nqninenmvnpsz874"
Extracting the numbers
<-
(matrix_str_numbers str_extract_all(first_case, pattern = '\\d', simplify = TRUE))
[,1] [,2] [,3]
[1,] "8" "7" "4"
Extracting all the rows from columns one and three (the first and last number of each row)
<- matrix_str_numbers[, c(1,3)]
numbers_i_want numbers_i_want
[1] "8" "4"
(This code will generalise ONLY IF all the other rows have 3 numbers too).
The following code returns the first and last number “pasted” together and converts the result from string to numeric.
::str_flatten(numbers_i_want) %>%
stringras.numeric()
[1] 84
Now let’s try to generalise to all input rows:
%>%
amended str_extract_all(pattern = '\\d', simplify = TRUE) %>%
head(5)
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] "8" "7" "4" "" "" "" ""
[2,] "8" "4" "" "" "" "" ""
[3,] "8" "8" "" "" "" "" ""
[4,] "5" "4" "7" "5" "7" "" ""
[5,] "7" "2" "" "" "" "" ""
Oh no. Rows have a different quantity of digits.
Maybe using simplify=TRUE
was a mistake here. Given that each element has a different quantity of digits, a list seems to be a more suitable data structure:
<-
list_digits_preview %>%
amended str_extract_all(pattern = '\\d') %>%
head(5)
list_digits_preview
[[1]]
[1] "8" "7" "4"
[[2]]
[1] "8" "4"
[[3]]
[1] "8" "8"
[[4]]
[1] "5" "4" "7" "5" "7"
[[5]]
[1] "7" "2"
Now, how can I extract the first and last element from each element of the list? My gut feeling is that I’ll need purrr
for this:
pluck
and keep_at
seem to be useful functions for this task
%>%
list_digits_preview # This should extract the first and last digit from each element of the list
map(~keep_at(., c(1, length(.))))
[[1]]
[1] "8" "4"
[[2]]
[1] "8" "4"
[[3]]
[1] "8" "8"
[[4]]
[1] "5" "7"
[[5]]
[1] "7" "2"
It is working!!
Now I’ll apply the action to the whole list:
<-
list_useful_digits %>%
amended str_extract_all(pattern = '\\d') %>%
map(~keep_at(., c(1, length(.))))
1:10] list_useful_digits[
[[1]]
[1] "8" "4"
[[2]]
[1] "8" "4"
[[3]]
[1] "8" "8"
[[4]]
[1] "5" "7"
[[5]]
[1] "7" "2"
[[6]]
[1] "6" "8"
[[7]]
[1] "9"
[[8]]
[1] "9" "1"
[[9]]
[1] "9" "9"
[[10]]
[1] "1" "4"
Hmmm… Why does element 7 have only 1 digit?
7] amended[
[1] "ninekkvkeight9three"
It’s got just one digit as a number, but there are other digits written out in words. Should we count those? The prompt isn’t really clear about this. But since the example only talks about digits as numbers, I guess I’ll just go with those for now. If I’ve got this wrong and my solution gets rejected, well, I can always give it another shot
Now let’s concatenate the digits row-wise, turn them into a number and add them up:
<-
solution %>%
list_useful_digits map(str_flatten) %>%
map(as.integer) %>%
as_vector() %>%
sum()
solution
[1] 39347
👎🏽 Solution was rejected. The website says it’s too low 🤔
The most likely suspect here is the way map(~keep_at(., c(1, length(.))))
is handling the rows with only one digit.
Some alternatives I could try:
If the input row has only one digit, then use it as first AND last digit when building the number (e.g. if the row only contains 7, the number I would obtain from that row should be 77).
Take into account digits that are written as words (e.g. “three”). This is less likely to work and it would be weird for the exercise to expect people to do this and not mentioning it explicitely in the prompt or the examples.
I’ll go with alternative 1.
Handling rows with just one digit
I’ll take a slice from the first 10 rows, as the first case with one digit appears in row 7:
<-
list_digits_preview %>%
amended str_extract_all(pattern = '\\d') %>%
head(10)
list_digits_preview
[[1]]
[1] "8" "7" "4"
[[2]]
[1] "8" "4"
[[3]]
[1] "8" "8"
[[4]]
[1] "5" "4" "7" "5" "7"
[[5]]
[1] "7" "2"
[[6]]
[1] "6" "8"
[[7]]
[1] "9"
[[8]]
[1] "9" "7" "1"
[[9]]
[1] "9" "9"
[[10]]
[1] "1" "4"
I have a feeling that purrr::pluck
may do the trick here:
<- function(x) {
get_first_and_last c(pluck(x, 1), pluck(x, length(x)))
}
%>%
list_digits_preview map(get_first_and_last)
[[1]]
[1] "8" "4"
[[2]]
[1] "8" "4"
[[3]]
[1] "8" "8"
[[4]]
[1] "5" "7"
[[5]]
[1] "7" "2"
[[6]]
[1] "6" "8"
[[7]]
[1] "9" "9"
[[8]]
[1] "9" "1"
[[9]]
[1] "9" "9"
[[10]]
[1] "1" "4"
This function does what I want (and yes, using pluck
is a bit unnecessary since I could just have used regular subsetting, e.g. x[[1]]
and x[[length(x)]]
, but I’m doing this with the purrr cheat sheet in front of me, so pluck
was what came to my mind).
2nd attempt, this time using get_first_and_last
.
<-
solution2 %>%
amended str_extract_all(pattern = '\\d') %>%
map(get_first_and_last) %>%
map(str_flatten) %>%
map(as.integer) %>%
as_vector() %>%
sum()
solution2
[1] 56397
This is the correct solution!! 🥳🥳🥳
Onto the second part!
Part 2
Saw this one coming: now I need to tackle the digits spelled out as words 😬.
First (naive) attempt
IMHO the simplest approach is to write a function that converts digits spelled out in a string into their numeric form. Then, I can incorporate this function into my pipeline using a map
call, right before str_extract_all
.
<- amended[7]
case_with_spelled_digit case_with_spelled_digit
[1] "ninekkvkeight9three"
%>%
case_with_spelled_digit str_replace_all(
c(
"nine" = "9",
"eight" = "8",
"three" = "3"
) )
[1] "9kkvk893"
Working as expected.
Now let’s apply the function to the full data.
I’ll start by creating the replacement vector (although I’m pretty sure it’s already a part of some R package that’s slipping my mind right now).
<- c(
digits "one" = "1",
"two" = "2",
"three" = "3",
"four" = "4",
"five" = "5",
"six" = "6",
"seven" = "7",
"eight" = "8",
"nine" = "9"
)
<-
solution3 %>%
amended str_replace_all(pattern = digits) %>%
str_extract_all(pattern = '\\d') %>%
map(get_first_and_last) %>%
map(str_flatten) %>%
map(as.integer) %>%
as_vector() %>%
sum()
solution3
[1] 55266
The solution wasn’t right.
Second attempt: handling overlapping patterns
Time to take a closer look at the pipeline using a tibble
.
tibble(
original = amended,
new = original %>%
str_replace_all(pattern = digits),
digits = amended %>%
str_replace_all(pattern = digits) %>%
str_extract_all(pattern = '\\d') %>%
map(get_first_and_last) %>%
map(str_flatten) %>%
as_vector()
)
# A tibble: 1,000 × 3
original new digits
<chr> <chr> <chr>
1 nqninenmvnpsz874 nq9nmvnpsz874 94
2 8twofpmpxkvvdnpdnlpkhseven4ncgkb 82fpmpxkvvdnpdnlpkh74ncgkb 84
3 six8shdkdcdgseven8xczqrnnmthreecckfive 68shdkdcdg78xczqrnnm3cck5 65
4 qlcnz54dd75nine7jfnlfgz qlcnz54dd7597jfnlfgz 57
5 7vrdhggdkqbnltlgpkkvsdxn2mfpghkntzrhtjgtxr 7vrdhggdkqbnltlgpkkvsdxn2m… 72
6 cdhmktwo6kjqbprvfour8 cdhmk26kjqbprv48 28
7 ninekkvkeight9three 9kkvk893 93
8 ms9five71lrfpqxqlbj ms9571lrfpqxqlbj 91
9 9five9sevenldshqfgcnq 9597ldshqfgcnq 97
10 1one4seven 1147 17
# ℹ 990 more rows
Hypothesis: The issue might arise in instances like "zoneight234"
, where the spellings of two numbers are overlapping.
str_replace_all("zoneight234", pattern = digits)
[1] "z1ight234"
I’ll try to handle these cases to make sure that every spelled-out digit in the text is returned, even if they overlap.
Maybe I can get closer to that by using the stringi
package?
::stri_replace_all_regex("zoneight234", pattern = names(digits), replacement = digits, vectorize_all = FALSE) stringi
[1] "z1ight234"
This is not working.
Something else I could try:
- Using the
_first
and_last
functions from stringi to identify digits, whether they are numerically written or spelled out.
- Then, replacing the spelled-out digits that appear at either the beginning or end of the string.
<- "zone2344oneight"
problematic_string
<- "[0123456789]|one|two|three|four|five|six|seven|eight|nine"
all_digits_regex
<- c(
all_digits_vector names(digits),
0:9
)
::stri_extract_last_regex(
stringistr = problematic_string,
pattern = all_digits_vector
)
[1] "one" NA NA NA NA NA NA "eight" NA
[10] NA NA "2" "3" "4" NA NA NA NA
[19] NA
Okay, this method DOES return all the digits that appear in the string, but I end up losing details about where the digits are positioned, which is needed for identifying the first and last digits in each row.
More ideas:
- Using
str_localte_all
, astringr
function that allows matching each digit as an separate pattern (thus avoiding the overlap problem) and gives information about the POSITION of the pattern. Then use that info to find the first and the last match in each row.
Example:
::str_locate_all(
stringrstring = problematic_string,
pattern = all_digits_vector
1:8] )[
[[1]]
start end
[1,] 2 4
[2,] 9 11
[[2]]
start end
[[3]]
start end
[[4]]
start end
[[5]]
start end
[[6]]
start end
[[7]]
start end
[[8]]
start end
[1,] 11 15
- CRAZY IDEA!!! What if I reverse the string, and then look for first match of the reversed names of the digits??
After all, the only matches that matter for solving this puzzle are the first one and the last one. For the first match, overlapping is not a real problem because it causes the second digit not to match (example: “eight” in “oneight”, here “one” is always matched or detected).
Reversing the strings and their patterns could mirror this dynamic when matching the last digit.
<- "enin|thgie|neves|xis|evif|ruof|eerht|owt|eno|[0123456789]"
regex_reversed
<- c(
vector_reversed "eno" = "1",
"owt" = "2",
"eerht" = "3",
"ruof" = "4",
"evif" = "5",
"xis" = "6",
"neves" = "7",
"thgie" = "8",
"enin" = "9",
"1" = "1",
"2" = "2",
"3" = "3",
"4" = "4",
"5" = "5",
"6" = "6",
"7" = "7",
"8" = "8",
"9" = "9",
"0" = "0"
)
In this example, I should be able to match “eight” as the last string, despite it overlapping with “one”.
problematic_string
[1] "zone2344oneight"
My strategy is to invert it and then look for “thgie” 😂💀
<-
reversed_problematic_string ::stri_reverse(
stringi
problematic_string
) reversed_problematic_string
[1] "thgieno4432enoz"
<- str_extract(
found_reversed_string
reversed_problematic_string,"enin|thgie|neves|xis|evif|ruof|eerht|owt|eno|[0123456789]"
)
found_reversed_string
[1] "thgie"
After finding it, I can use the following code to get back the original number
vector_reversed[found_reversed_string]
thgie
"8"
It’s working. Now, let’s wrap this pipeline in a function to apply it to whole dataset:
<- "enin|thgie|neves|xis|evif|ruof|eerht|owt|eno|[0123456789]"
regex_reversed
<- "[0123456789]|one|two|three|four|five|six|seven|eight|nine"
all_digits_regex
<- c(
digits_reversed "eno" = "1",
"owt" = "2",
"eerht" = "3",
"ruof" = "4",
"evif" = "5",
"xis" = "6",
"neves" = "7",
"thgie" = "8",
"enin" = "9",
"1" = "1",
"2" = "2",
"3" = "3",
"4" = "4",
"5" = "5",
"6" = "6",
"7" = "7",
"8" = "8",
"9" = "9",
"0" = "0"
)
<- c(
digits "one" = "1",
"two" = "2",
"three" = "3",
"four" = "4",
"five" = "5",
"six" = "6",
"seven" = "7",
"eight" = "8",
"nine" = "9",
"1" = "1",
"2" = "2",
"3" = "3",
"4" = "4",
"5" = "5",
"6" = "6",
"7" = "7",
"8" = "8",
"9" = "9",
"0" = "0"
)
<- function(x) {
get_last_number <-
found_reversed_string ::stri_reverse(x) %>%
stringistr_extract(regex_reversed)
%>%
digits_reversed[found_reversed_string] set_names(NULL)
}
<- function(x) {
get_first_number <- str_extract(
found_string
x,
all_digits_regex
)
%>%
digits[found_string] set_names(NULL)
}get_first_number(problematic_string)
[1] "1"
get_last_number(problematic_string)
[1] "8"
It’s working as intended, now let’s generalise it:
tibble(
amended = amended,
first_digit = get_first_number(amended),
last_digit = get_last_number(amended),
combined = as.integer(str_c(first_digit, last_digit))
%>%
) pull(combined) %>%
sum()
[1] 55701
IT WORKED!! THIS WAS THE RIGHT ANSWER!!! 🥳🥳🥹🥹