Proposal: Add Floki.Doc #457

wojtekmach · 2023-05-19T09:25:10Z

Hi!

I maintain a tiny Floki wrapper called EasyHTML which adds a struct around nodes and thus we can implement protocols and behaviours. Here's an example:

Mix.install([:easyhtml])

html = """
<!doctype html>
<html>
<body>
  <p class="headline">Hello, World!</p>
</body>
</html>
"""

doc = EasyHTML.parse!(html)

doc
#=> #EasyHTML[<html><body><p class="headline">Hello, World!</p></body></html>]

doc["p.headline"]
#=> #EasyHTML[<p class="headline">Hello, World!</p>]

doc["#bad"]
#=> nil

to_string(doc)
#=> "Hello, World!"

I'd like to add a Floki.Doc struct and a Floki.Doc.parse!/1 function.

Feedback appreciated!

The text was updated successfully, but these errors were encountered:

wojtekmach · 2023-05-19T13:15:18Z

@philss I remember we talked a little bit about it but I don't remember much. :) I think the main concern was we obviously cannot return this from Floki.parse* functions as it would be a major breaking change. I think we solve this with a separate module.

If we go with the struct, I'm curious whether Floki.attr and Floki.attribute functions would work on it or we should have equivalents on the struct module.

Btw, is the distinction between document and fragment such that the former always contains exactly one root element? If so the struct could have attributes field which would make accessing these super convenient. But then again I'd guess working with fragments is more common. So maybe we have two different structs after all?

Hey maybe I do remember parts of our earlier conversations. :)

philss · 2023-05-19T15:29:32Z

I'd like to add a Floki.Doc struct and a Floki.Doc.parse!/1 function.

I think the main concern was we obviously cannot return this from Floki.parse* functions as it would be a major breaking change.

@wojtekmach yeah, I think it's aligned with what we discussed. We wanted to avoid this breaking change, but I think in the future this "Doc.parse" could be the main API. I'm not sure if we discussed what would be the struct, but I imagine it would be the tree representation, like we have in Floki.HTMLTree. Is this what you are thinking?

If we go with the struct, I'm curious whether Floki.attr and Floki.attribute functions would work on it or we should have equivalents on the struct module.

We would probably want to add support for the new struct on these functions.

Btw, is the distinction between document and fragment such that the former always contains exactly one root element?

Structurally speaking, yes. But semantically the document is something that has the root element being "", but the specs say that we need a <!doctype html> as well (we are just ignoring this part today). Fragments don't have this restriction, but I'm not sure if we should have another struct for them.

Something that can help us if we go for two structs is the specs (they are too complex, so we shouldn't worry that much):

Hey maybe I do remember parts of our earlier conversations. :)

:D

wojtekmach · 2023-05-19T16:43:31Z

Sorry, I wasn’t aware of HTMLTree struct. I didn’t really look into internals at all. 😅

viniciusmuller · 2023-05-19T23:04:27Z

In case this gets implemented, I would suggest the name to be Floki.Document instead of Floki.Doc, since I read this issue and thought it was something documentation-related

wojtekmach · 2023-05-31T18:36:07Z

If, per #463, we have maps as attributes and we add an ~HTML sigil (as a macro) we'd get these map match semantics for free:

html = ~HTML"""
<p class="p1">foo</p>
<p class="p2">bar</p>
"""

# these two are equivalent
assert ~HTML[<p class="p2">bar</p>] = html[".p2"]
assert ~HTML[<p>bar</p>] = html[".p2"]

assert html[".p2"] == ~HTML[<p class="p2">bar</p>]

which is potentially very interesting for testing.

mischov · 2023-05-31T18:46:21Z

@wojtekmach This is pretty similar to how Meeseeks already works. https://github.com/mischov/meeseeks/blob/8ac9b48b6f8b1daae18f9b0773882cf83c094777/lib/meeseeks/document.ex#L26-L50

wojtekmach · 2023-05-31T19:33:15Z

Similar how?

FWIW EasyHTML mentioned at the beginning uses the "floki ast", the one returned from Floki.parse* functions. The querying-optimised one in Meeseeks is very interesting. I guess the point is if we use a struct we can consider the ast as implementation detail and pick either!

mischov · 2023-05-31T19:36:31Z

Similar in that it already implements the output of both parsing and selection in terms of structs (and provides a nice toolkit for working with those structs), meaning the building blocks are in place for something like EasyHTML.

wojtekmach · 2023-05-31T19:38:35Z

Ah, makes sense!

mischov · 2023-05-31T19:42:17Z

It also goes beyond a single Node struct and has a top level Document struct, as well as Comment, Data, Doctype, Element, ProcessingInstruction, and Text structs, which is something else to consider.

wojtekmach added the Feature label May 19, 2023

philss mentioned this issue May 31, 2023

Allow option to parse attributes as maps #463

Closed

philss mentioned this issue Feb 9, 2024

Is it possible to get line/column number of a tag? #532

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Add Floki.Doc #457

Proposal: Add Floki.Doc #457

wojtekmach commented May 19, 2023 •

edited

Loading

wojtekmach commented May 19, 2023

philss commented May 19, 2023

wojtekmach commented May 19, 2023

viniciusmuller commented May 19, 2023

wojtekmach commented May 31, 2023

mischov commented May 31, 2023

wojtekmach commented May 31, 2023

mischov commented May 31, 2023

wojtekmach commented May 31, 2023

mischov commented May 31, 2023

Proposal: Add Floki.Doc #457

Proposal: Add Floki.Doc #457

Comments

wojtekmach commented May 19, 2023 • edited Loading

wojtekmach commented May 19, 2023

philss commented May 19, 2023

wojtekmach commented May 19, 2023

viniciusmuller commented May 19, 2023

wojtekmach commented May 31, 2023

mischov commented May 31, 2023

wojtekmach commented May 31, 2023

mischov commented May 31, 2023

wojtekmach commented May 31, 2023

mischov commented May 31, 2023

wojtekmach commented May 19, 2023 •

edited

Loading