AnswerDotAI
diff --git a/‎.nojekyll‎ b/‎.nojekyll‎
diff --git a/‎CHANGELOG-commonmark.md‎
Lines changed: 408 additions & 0 deletions b/‎CHANGELOG-commonmark.md‎
Lines changed: 408 additions & 0 deletions
diff --git a/‎CHANGELOG.html‎
Lines changed: 1253 additions & 0 deletions b/‎CHANGELOG.html‎
Lines changed: 1253 additions & 0 deletions
diff --git a/‎download.html‎
Lines changed: 824 additions & 0 deletions b/‎download.html‎
Lines changed: 824 additions & 0 deletions
diff --git a/‎download.html.md‎
Lines changed: 249 additions & 0 deletions b/‎download.html.md‎
Lines changed: 249 additions & 0 deletions
@@ -0,0 +1,249 @@
+# Download helpers
+
+
+<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->
+
+``` python
+from IPython.display import Markdown,HTML
+from fastcore.test import *
+```
+
+------------------------------------------------------------------------
+
+<a
+href="https://github.com/AnswerDotAI/toolslm/blob/main/toolslm/download.py#L14"
+target="_blank" style="float:right; font-size:smaller">source</a>
+
+### clean_md
+
+``` python
+
+def clean_md(
+    text, rm_comments:bool=True, rm_details:bool=True
+):
+
+```
+
+*Remove comments and `<details>` sections from `text`*
+
+------------------------------------------------------------------------
+
+<a
+href="https://github.com/AnswerDotAI/toolslm/blob/main/toolslm/download.py#L22"
+target="_blank" style="float:right; font-size:smaller">source</a>
+
+### read_md
+
+``` python
+
+def read_md(
+    url, rm_comments:bool=True, rm_details:bool=True, params:QueryParamTypes | None=None,
+    headers:HeaderTypes | None=None, cookies:CookieTypes | None=None, auth:AuthTypes | None=None,
+    proxy:ProxyTypes | None=None, follow_redirects:bool=False, verify:ssl.SSLContext | str | bool=True,
+    timeout:TimeoutTypes=Timeout(timeout=5.0), trust_env:bool=True
+):
+
+```
+
+*Read text from `url` and clean with `clean_docs`*
+
+``` python
+mdurl = 'https://claudette.answer.ai/index.html.md'
+md = read_md(mdurl)
+# Markdown(md)
+```
+
+------------------------------------------------------------------------
+
+<a
+href="https://github.com/AnswerDotAI/toolslm/blob/main/toolslm/download.py#L27"
+target="_blank" style="float:right; font-size:smaller">source</a>
+
+### html2md
+
+``` python
+
+def html2md(
+    s:str, ignore_links:bool=True
+):
+
+```
+
+*Convert `s` from HTML to markdown*
+
+------------------------------------------------------------------------
+
+<a
+href="https://github.com/AnswerDotAI/toolslm/blob/main/toolslm/download.py#L37"
+target="_blank" style="float:right; font-size:smaller">source</a>
+
+### read_html
+
+``` python
+
+def read_html(
+    url, # URL to read
+    sel:NoneType=None, # Read only outerHTML of CSS selector `sel`
+    rm_comments:bool=True, # Removes HTML comments
+    rm_details:bool=True, # Removes `<details>` tags
+    multi:bool=False, # Get all matches to `sel` or first one
+    wrap_tag:NoneType=None, # If multi, each selection wrapped with <wrap_tag>content</wrap_tag>
+    ignore_links:bool=True
+):
+
+```
+
+*Get `url`, optionally selecting CSS selector `sel`, and convert to
+clean markdown*
+
+``` python
+# test single class selector
+listings = read_html('https://www.answer.ai/', sel='.listing-description')
+assert len(listings) < 500
+
+# Test multi class selector
+listings = read_html('https://www.answer.ai/', sel='.listing-description', multi=True)
+assert len(listings) > 1000 # returns more than single so selecting multi
+
+# Test multi_wrap_tag
+listings = read_html('https://www.answer.ai/', sel='.listing-description', multi=True, wrap_tag='document')
+assert '<document>' in listings and '</document>' in listings
+```
+
+``` python
+read_html('https://www.answer.ai/', sel='.listing-description', ignore_links=False)
+```
+
+    '[ How I created a book chapter from video transcripts with SolveIt ](./posts/2025-10-13-video-to-doc.html)\n\n'
+
+``` python
+# test tag css selectors
+assert len(read_html('https://www.answer.ai/', sel='div.listing-description', multi=True)) > 1000
+assert len(read_html('https://www.answer.ai/', sel='div', multi=True)) > 1000
+```
+
+``` python
+htmlurl = 'https://hypermedia.systems/hypermedia-a-reintroduction/'
+hmd = read_html(htmlurl)
+assert len(hmd) > 100
+# Markdown(hmd)
+```
+
+------------------------------------------------------------------------
+
+<a
+href="https://github.com/AnswerDotAI/toolslm/blob/main/toolslm/download.py#L59"
+target="_blank" style="float:right; font-size:smaller">source</a>
+
+### get_llmstxt
+
+``` python
+
+def get_llmstxt(
+    url, optional:bool=False, n_workers:NoneType=None
+):
+
+```
+
+*Get llms.txt file from and expand it with `llms_txt.create_ctx()`*
+
+``` python
+# print(get_llmstxt('https://llmstxt.org/llms.txt'))
+```
+
+------------------------------------------------------------------------
+
+<a
+href="https://github.com/AnswerDotAI/toolslm/blob/main/toolslm/download.py#L68"
+target="_blank" style="float:right; font-size:smaller">source</a>
+
+### split_url
+
+``` python
+
+def split_url(
+    url
+):
+
+```
+
+*Split `url` into base, path, and file name, normalising name to ‘/’ if
+empty*
+
+``` python
+urls = ('https://claudette.answer.ai/path/', 'https://claudette.answer.ai/', 'https://llmstxt.org', 'https://llmstxt.org/')
+
+[split_url(o) for o in urls]
+```
+
+    [('https://claudette.answer.ai', '', '/path'),
+     ('https://claudette.answer.ai', '/', ''),
+     ('https://llmstxt.org', '/', ''),
+     ('https://llmstxt.org', '/', '')]
+
+------------------------------------------------------------------------
+
+<a
+href="https://github.com/AnswerDotAI/toolslm/blob/main/toolslm/download.py#L84"
+target="_blank" style="float:right; font-size:smaller">source</a>
+
+### find_docs
+
+``` python
+
+def find_docs(
+    url
+):
+
+```
+
+*If available, return LLM-friendly llms.txt context or markdown file
+location from `url`*
+
+``` python
+fl_url = 'https://answerdotai.github.io/fastlite'
+```
+
+``` python
+find_docs(fl_url)
+```
+
+    'https://answerdotai.github.io/fastlite/llms.txt'
+
+``` python
+for o in urls: print(find_docs(o))
+```
+
+    https://claudette.answer.ai/llms.txt
+    https://claudette.answer.ai/llms.txt
+    https://llmstxt.org/llms.txt
+    https://llmstxt.org/llms.txt
+
+``` python
+suffixes = ["/", "/tmp", "/tmp/tmp/"]
+for suff in suffixes:
+    for o in urls:  test_eq(find_docs(o), find_docs(o+suff))
+
+test_eq(find_docs("https://github.com"), "https://github.com/llms.txt")
+test_eq(find_docs("https://github.com/AnswerDotAI"), "https://github.com/llms.txt")
+test_eq(find_docs("https://github.com/AnswerDotAI/"), "https://github.com/llms.txt")
+```
+
+------------------------------------------------------------------------
+
+<a
+href="https://github.com/AnswerDotAI/toolslm/blob/main/toolslm/download.py#L104"
+target="_blank" style="float:right; font-size:smaller">source</a>
+
+### read_docs
+
+``` python
+
+def read_docs(
+    url, optional:bool=False, n_workers:NoneType=None, rm_comments:bool=True, rm_details:bool=True
+):
+
+```
+
+*If available, return LLM-friendly llms.txt context or markdown file
+response for `url`*