Browser-like Reader Mode with text-only output

Background: Reader Mode, as seen in Safari and other browsers, extracts the main content of article based web pages using sophisticated heuristics, and displays this with a very readable font.

All navigation, headers, footers, and other fluff is removed. The mode only works with "articles", ie. pages where there is a "main content" like a news article, scientific paper, etc.

The question: Is there an open source implementation of this for Terminals (ie. text-only)? Or alternatively, another way to accomplish the same thing?

Example: This article from The New York Times should output like so:

$ utility --reader-mode https://www.nytimes.com/2019/01/30/reader-center/polar-vortex-tips.html



SEND US YOUR IDEAS FOR WHAT TO DO DURING THE POLAR VORTEX. WE

WANT TO HEAR FROM YOU.



It’s so cold in much of the Midwest today that you could get

frostbite within five minutes once you step outside. If you’re

living through it indoors, give us your tips.



A commuter during an extremely light morning rush hour in Chicago

on Wednesday. Businesses and schools have closed as the city

copes with record low temperatures.



Across the Midwest, where wind chills were minus 51 in

Minneapolis and minus 45 in Chicago, the risks of going outside

on Wednesday were dire. So, many people simply didn’t bother,

while others took a chance to briefly experience the coldest

weather in a generation.



Whether you’re an adventurer or a hibernator, tell us your

recommendations for staying warm and busy. What are you cooking

or binge-watching? What board games are you playing? If you’re

venturing outside, what are you doing to stay safe? (Experts warn

that even a short time in the extreme cold can be very

dangerous.) How many layers of clothing are you wearing, and

which special hats and gloves are necessary? Send us your photos

and your stories.

edited 19 mins ago

asked 20 hours ago

forthrin

8901121

1

determining the "main content" seems to me to be a tricky problem to solve

– Jeff Schaller
15 hours ago

Yes. This is "solved" in best effort with various implementations of "Reader Mode" using heuristics. So it would have to be a text-only port of that, or something similar. google.com/search?q=reader+mode+source+code

– forthrin
15 hours ago

add a comment |

All navigation, headers, footers, and other fluff is removed. The mode only works with "articles", ie. pages where there is a "main content" like a news article, scientific paper, etc.

The question: Is there an open source implementation of this for Terminals (ie. text-only)? Or alternatively, another way to accomplish the same thing?

Example: This article from The New York Times should output like so:

$ utility --reader-mode https://www.nytimes.com/2019/01/30/reader-center/polar-vortex-tips.html



SEND US YOUR IDEAS FOR WHAT TO DO DURING THE POLAR VORTEX. WE

WANT TO HEAR FROM YOU.



It’s so cold in much of the Midwest today that you could get

frostbite within five minutes once you step outside. If you’re

living through it indoors, give us your tips.



A commuter during an extremely light morning rush hour in Chicago

on Wednesday. Businesses and schools have closed as the city

copes with record low temperatures.



Across the Midwest, where wind chills were minus 51 in

Minneapolis and minus 45 in Chicago, the risks of going outside

on Wednesday were dire. So, many people simply didn’t bother,

while others took a chance to briefly experience the coldest

weather in a generation.



Whether you’re an adventurer or a hibernator, tell us your

recommendations for staying warm and busy. What are you cooking

or binge-watching? What board games are you playing? If you’re

venturing outside, what are you doing to stay safe? (Experts warn

that even a short time in the extreme cold can be very

dangerous.) How many layers of clothing are you wearing, and

which special hats and gloves are necessary? Send us your photos

and your stories.

edited 19 mins ago

asked 20 hours ago

forthrin

8901121

1

determining the "main content" seems to me to be a tricky problem to solve

– Jeff Schaller
15 hours ago

Yes. This is "solved" in best effort with various implementations of "Reader Mode" using heuristics. So it would have to be a text-only port of that, or something similar. google.com/search?q=reader+mode+source+code

– forthrin
15 hours ago

add a comment |

All navigation, headers, footers, and other fluff is removed. The mode only works with "articles", ie. pages where there is a "main content" like a news article, scientific paper, etc.

The question: Is there an open source implementation of this for Terminals (ie. text-only)? Or alternatively, another way to accomplish the same thing?

Example: This article from The New York Times should output like so:

$ utility --reader-mode https://www.nytimes.com/2019/01/30/reader-center/polar-vortex-tips.html



SEND US YOUR IDEAS FOR WHAT TO DO DURING THE POLAR VORTEX. WE

WANT TO HEAR FROM YOU.



It’s so cold in much of the Midwest today that you could get

frostbite within five minutes once you step outside. If you’re

living through it indoors, give us your tips.



A commuter during an extremely light morning rush hour in Chicago

on Wednesday. Businesses and schools have closed as the city

copes with record low temperatures.



Across the Midwest, where wind chills were minus 51 in

Minneapolis and minus 45 in Chicago, the risks of going outside

on Wednesday were dire. So, many people simply didn’t bother,

while others took a chance to briefly experience the coldest

weather in a generation.



Whether you’re an adventurer or a hibernator, tell us your

recommendations for staying warm and busy. What are you cooking

or binge-watching? What board games are you playing? If you’re

venturing outside, what are you doing to stay safe? (Experts warn

that even a short time in the extreme cold can be very

dangerous.) How many layers of clothing are you wearing, and

which special hats and gloves are necessary? Send us your photos

and your stories.

edited 19 mins ago

asked 20 hours ago

forthrin

8901121

All navigation, headers, footers, and other fluff is removed. The mode only works with "articles", ie. pages where there is a "main content" like a news article, scientific paper, etc.

The question: Is there an open source implementation of this for Terminals (ie. text-only)? Or alternatively, another way to accomplish the same thing?

Example: This article from The New York Times should output like so:

$ utility --reader-mode https://www.nytimes.com/2019/01/30/reader-center/polar-vortex-tips.html



SEND US YOUR IDEAS FOR WHAT TO DO DURING THE POLAR VORTEX. WE

WANT TO HEAR FROM YOU.



It’s so cold in much of the Midwest today that you could get

frostbite within five minutes once you step outside. If you’re

living through it indoors, give us your tips.



A commuter during an extremely light morning rush hour in Chicago

on Wednesday. Businesses and schools have closed as the city

copes with record low temperatures.



Across the Midwest, where wind chills were minus 51 in

Minneapolis and minus 45 in Chicago, the risks of going outside

on Wednesday were dire. So, many people simply didn’t bother,

while others took a chance to briefly experience the coldest

weather in a generation.



Whether you’re an adventurer or a hibernator, tell us your

recommendations for staying warm and busy. What are you cooking

or binge-watching? What board games are you playing? If you’re

venturing outside, what are you doing to stay safe? (Experts warn

that even a short time in the extreme cold can be very

dangerous.) How many layers of clothing are you wearing, and

which special hats and gloves are necessary? Send us your photos

and your stories.

terminal browser

edited 19 mins ago

asked 20 hours ago

forthrin

8901121

edited 19 mins ago

asked 20 hours ago

forthrin

8901121

edited 19 mins ago

asked 20 hours ago

forthrin

8901121

asked 20 hours ago

forthrin

8901121

asked 20 hours ago

forthrin

8901121

1

determining the "main content" seems to me to be a tricky problem to solve

– Jeff Schaller
15 hours ago

Yes. This is "solved" in best effort with various implementations of "Reader Mode" using heuristics. So it would have to be a text-only port of that, or something similar. google.com/search?q=reader+mode+source+code

– forthrin
15 hours ago

add a comment |

1

determining the "main content" seems to me to be a tricky problem to solve

– Jeff Schaller
15 hours ago

Yes. This is "solved" in best effort with various implementations of "Reader Mode" using heuristics. So it would have to be a text-only port of that, or something similar. google.com/search?q=reader+mode+source+code

– forthrin
15 hours ago

determining the "main content" seems to me to be a tricky problem to solve

– Jeff Schaller
15 hours ago

Yes. This is "solved" in best effort with various implementations of "Reader Mode" using heuristics. So it would have to be a text-only port of that, or something similar. google.com/search?q=reader+mode+source+code

– forthrin
15 hours ago

add a comment |

2 Answers
2

active

oldest

votes

The comment about "navigation content" is addressed by the -nolist option, e.g.,

lynx -nolist -dump www.google.com > file.txt

which shows no links, etc:

$ lynx -nolist -dump www.google.com > file.txt

$ cat file.txt 



   Search Images Maps Play YouTube News Gmail Drive More »

   Web History | Settings | Sign in



   Google



     _______________________________________________________

     Google Search  I'm Feeling Lucky                          Advanced search

                                                               Language tools



   Advertising Programs       Business  Solutions       +Google     About

   Google



                         © 2019 - Privacy - Terms

w3m gives something similar, without the option:

$ w3m -dump https://www.google.com

Search Images Maps Play YouTube News Gmail Drive More >>

Web History | Settings | Sign in



                                    Google



           [                                                         ] Advanced

                                                                       searchLanguage

                       [Google Search][I'm Feeling Lucky]              tools



           Advertising ProgramsBusiness Solutions+GoogleAbout Google



                          (C) 2019 - Privacy - Terms

links2 output looks much like w3m's (noting the missing space before About):

$ links2 -dump www.google.com                                          

   Search Images Maps Play YouTube News Gmail Drive More >>========(97,1) 31% ==

   Web History | Settings | Sign in                                             

                                     Google



    __________________________________________________________    Advanced       

              [ Google Search ] [ I'm Feeling Lucky ]             searchLanguage 

                                                                  tools          



           Advertising ProgramsBusiness Solutions+GoogleAbout Google



                           (c) 2019 - Privacy - Terms



$ links2 -dump www.google.com >file.txt 

$ cat file.txt 

   Search Images Maps Play YouTube News Gmail Drive More >>

   Web History | Settings | Sign in

                                     Google



    __________________________________________________________    Advanced       

              [ Google Search ] [ I'm Feeling Lucky ]             searchLanguage 

                                                                  tools          



           Advertising ProgramsBusiness Solutions+GoogleAbout Google



                           (c) 2019 - Privacy - Terms

(oddly enough, it also prints progress if the dump goes directly to the terminal—not a good feature)
and elinks apparently only dumps the format with "navigation content" (ymmv).

From further comments, it turns out that OP is interested in something which could render the contents of a given division on the page. Comparing the sizes of the source and dump for that page gives some clues:



      Size    Buffer name          Contents

      ------- -------------------- ----------------------------------------------------------------------------------------

   0# 267624  [!lynx -source ht-1] !lynx -source https://www.nytimes.com/2019/01/30/reader-center/polar-vortex-tips.html

   1  5475    [!lynx -dump -nolis] !lynx -dump -nolist https://www.nytimes.com/2019/01/30/reader-center/polar-vortex-tips.html

shows that the dump is about 2% of the size of the source. Most of the page is non-informational, and the text-browsers show the information. But the division requested is in a two-line chunk that looks like this (only the beginning: the first line actually has 62265 characters):

<div id="app"><div class="css-v89234 e3w10z60"><div><div><div class="css-13lpfd6 e1nre7570"><header class="css-1bymuyk e1>

<script>window.__preloadedData = {"initialState":{"Article:QXJ0aWNsZTpueXQ6Ly9hcnRpY2xlLzBhODc0MTcxLWM0MjEtNWRjOS1hN2IzLW>

The first line holds the article text (plus a lot of markup), and offhand, looking at the second line, that's probably the script which the GUI browsers detect to show the article. None of the above-mentioned text-browsers has a feature for just showing a given <div>...</div>, or interpreting a script in that manner. These articles mention the absence of standard URI for reader mode in several GUI browsers:

Web Reading Mode: The non-standard rendering mode

Web Reading Mode: A bad reading experience

edited 8 hours ago

answered 10 hours ago

Thomas Dickey

52.7k596170

Thanks for sharing! I've updated the question a bit to point out that the target for such use is article pages, where the objective is to extract the article itself. Have a look at the posting again and see if you can help.

– forthrin
9 hours ago

add a comment |

-1

Does this satisfy your requirement? (From https://stackoverflow.com/questions/12422289/bash-command-to-convert-html-page-to-a-text-file )

lynx --dump www.google.com > file.txt

answered 18 hours ago

VBB

992

New contributor

1

Nope. This dumps a ton of navigation links, eg. lynx -dump https://www.nytimes.com/2019/01/30/reader-center/polar-vortex-tips.html A solution should strip away all navigation and fluff, and leave ONLY the MAIN CONTENT, like Reader Mode in a browser does.

– forthrin
16 hours ago

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f497911%2fbrowser-like-reader-mode-with-text-only-output%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

The comment about "navigation content" is addressed by the -nolist option, e.g.,

lynx -nolist -dump www.google.com > file.txt

which shows no links, etc:

$ lynx -nolist -dump www.google.com > file.txt

$ cat file.txt 



   Search Images Maps Play YouTube News Gmail Drive More »

   Web History | Settings | Sign in



   Google



     _______________________________________________________

     Google Search  I'm Feeling Lucky                          Advanced search

                                                               Language tools



   Advertising Programs       Business  Solutions       +Google     About

   Google



                         © 2019 - Privacy - Terms

w3m gives something similar, without the option:

$ w3m -dump https://www.google.com

Search Images Maps Play YouTube News Gmail Drive More >>

Web History | Settings | Sign in



                                    Google



           [                                                         ] Advanced

                                                                       searchLanguage

                       [Google Search][I'm Feeling Lucky]              tools



           Advertising ProgramsBusiness Solutions+GoogleAbout Google



                          (C) 2019 - Privacy - Terms

links2 output looks much like w3m's (noting the missing space before About):

$ links2 -dump www.google.com                                          

   Search Images Maps Play YouTube News Gmail Drive More >>========(97,1) 31% ==

   Web History | Settings | Sign in                                             

                                     Google



    __________________________________________________________    Advanced       

              [ Google Search ] [ I'm Feeling Lucky ]             searchLanguage 

                                                                  tools          



           Advertising ProgramsBusiness Solutions+GoogleAbout Google



                           (c) 2019 - Privacy - Terms



$ links2 -dump www.google.com >file.txt 

$ cat file.txt 

   Search Images Maps Play YouTube News Gmail Drive More >>

   Web History | Settings | Sign in

                                     Google



    __________________________________________________________    Advanced       

              [ Google Search ] [ I'm Feeling Lucky ]             searchLanguage 

                                                                  tools          



           Advertising ProgramsBusiness Solutions+GoogleAbout Google



                           (c) 2019 - Privacy - Terms

(oddly enough, it also prints progress if the dump goes directly to the terminal—not a good feature)
and elinks apparently only dumps the format with "navigation content" (ymmv).



      Size    Buffer name          Contents

      ------- -------------------- ----------------------------------------------------------------------------------------

   0# 267624  [!lynx -source ht-1] !lynx -source https://www.nytimes.com/2019/01/30/reader-center/polar-vortex-tips.html

   1  5475    [!lynx -dump -nolis] !lynx -dump -nolist https://www.nytimes.com/2019/01/30/reader-center/polar-vortex-tips.html

<div id="app"><div class="css-v89234 e3w10z60"><div><div><div class="css-13lpfd6 e1nre7570"><header class="css-1bymuyk e1>

<script>window.__preloadedData = {"initialState":{"Article:QXJ0aWNsZTpueXQ6Ly9hcnRpY2xlLzBhODc0MTcxLWM0MjEtNWRjOS1hN2IzLW>

Web Reading Mode: The non-standard rendering mode

Web Reading Mode: A bad reading experience

edited 8 hours ago

answered 10 hours ago

Thomas Dickey

52.7k596170

Thanks for sharing! I've updated the question a bit to point out that the target for such use is article pages, where the objective is to extract the article itself. Have a look at the posting again and see if you can help.

– forthrin
9 hours ago

add a comment |

The comment about "navigation content" is addressed by the -nolist option, e.g.,

lynx -nolist -dump www.google.com > file.txt

which shows no links, etc:

$ lynx -nolist -dump www.google.com > file.txt

$ cat file.txt 



   Search Images Maps Play YouTube News Gmail Drive More »

   Web History | Settings | Sign in



   Google



     _______________________________________________________

     Google Search  I'm Feeling Lucky                          Advanced search

                                                               Language tools



   Advertising Programs       Business  Solutions       +Google     About

   Google



                         © 2019 - Privacy - Terms

w3m gives something similar, without the option:

$ w3m -dump https://www.google.com

Search Images Maps Play YouTube News Gmail Drive More >>

Web History | Settings | Sign in



                                    Google



           [                                                         ] Advanced

                                                                       searchLanguage

                       [Google Search][I'm Feeling Lucky]              tools



           Advertising ProgramsBusiness Solutions+GoogleAbout Google



                          (C) 2019 - Privacy - Terms

links2 output looks much like w3m's (noting the missing space before About):

$ links2 -dump www.google.com                                          

   Search Images Maps Play YouTube News Gmail Drive More >>========(97,1) 31% ==

   Web History | Settings | Sign in                                             

                                     Google



    __________________________________________________________    Advanced       

              [ Google Search ] [ I'm Feeling Lucky ]             searchLanguage 

                                                                  tools          



           Advertising ProgramsBusiness Solutions+GoogleAbout Google



                           (c) 2019 - Privacy - Terms



$ links2 -dump www.google.com >file.txt 

$ cat file.txt 

   Search Images Maps Play YouTube News Gmail Drive More >>

   Web History | Settings | Sign in

                                     Google



    __________________________________________________________    Advanced       

              [ Google Search ] [ I'm Feeling Lucky ]             searchLanguage 

                                                                  tools          



           Advertising ProgramsBusiness Solutions+GoogleAbout Google



                           (c) 2019 - Privacy - Terms

(oddly enough, it also prints progress if the dump goes directly to the terminal—not a good feature)
and elinks apparently only dumps the format with "navigation content" (ymmv).



      Size    Buffer name          Contents

      ------- -------------------- ----------------------------------------------------------------------------------------

   0# 267624  [!lynx -source ht-1] !lynx -source https://www.nytimes.com/2019/01/30/reader-center/polar-vortex-tips.html

   1  5475    [!lynx -dump -nolis] !lynx -dump -nolist https://www.nytimes.com/2019/01/30/reader-center/polar-vortex-tips.html

<div id="app"><div class="css-v89234 e3w10z60"><div><div><div class="css-13lpfd6 e1nre7570"><header class="css-1bymuyk e1>

<script>window.__preloadedData = {"initialState":{"Article:QXJ0aWNsZTpueXQ6Ly9hcnRpY2xlLzBhODc0MTcxLWM0MjEtNWRjOS1hN2IzLW>

Web Reading Mode: The non-standard rendering mode

Web Reading Mode: A bad reading experience

edited 8 hours ago

answered 10 hours ago

Thomas Dickey

52.7k596170

Thanks for sharing! I've updated the question a bit to point out that the target for such use is article pages, where the objective is to extract the article itself. Have a look at the posting again and see if you can help.

– forthrin
9 hours ago

add a comment |

The comment about "navigation content" is addressed by the -nolist option, e.g.,

lynx -nolist -dump www.google.com > file.txt

which shows no links, etc:

$ lynx -nolist -dump www.google.com > file.txt

$ cat file.txt 



   Search Images Maps Play YouTube News Gmail Drive More »

   Web History | Settings | Sign in



   Google



     _______________________________________________________

     Google Search  I'm Feeling Lucky                          Advanced search

                                                               Language tools



   Advertising Programs       Business  Solutions       +Google     About

   Google



                         © 2019 - Privacy - Terms

w3m gives something similar, without the option:

$ w3m -dump https://www.google.com

Search Images Maps Play YouTube News Gmail Drive More >>

Web History | Settings | Sign in



                                    Google



           [                                                         ] Advanced

                                                                       searchLanguage

                       [Google Search][I'm Feeling Lucky]              tools



           Advertising ProgramsBusiness Solutions+GoogleAbout Google



                          (C) 2019 - Privacy - Terms

links2 output looks much like w3m's (noting the missing space before About):

$ links2 -dump www.google.com                                          

   Search Images Maps Play YouTube News Gmail Drive More >>========(97,1) 31% ==

   Web History | Settings | Sign in                                             

                                     Google



    __________________________________________________________    Advanced       

              [ Google Search ] [ I'm Feeling Lucky ]             searchLanguage 

                                                                  tools          



           Advertising ProgramsBusiness Solutions+GoogleAbout Google



                           (c) 2019 - Privacy - Terms



$ links2 -dump www.google.com >file.txt 

$ cat file.txt 

   Search Images Maps Play YouTube News Gmail Drive More >>

   Web History | Settings | Sign in

                                     Google



    __________________________________________________________    Advanced       

              [ Google Search ] [ I'm Feeling Lucky ]             searchLanguage 

                                                                  tools          



           Advertising ProgramsBusiness Solutions+GoogleAbout Google



                           (c) 2019 - Privacy - Terms

(oddly enough, it also prints progress if the dump goes directly to the terminal—not a good feature)
and elinks apparently only dumps the format with "navigation content" (ymmv).



      Size    Buffer name          Contents

      ------- -------------------- ----------------------------------------------------------------------------------------

   0# 267624  [!lynx -source ht-1] !lynx -source https://www.nytimes.com/2019/01/30/reader-center/polar-vortex-tips.html

   1  5475    [!lynx -dump -nolis] !lynx -dump -nolist https://www.nytimes.com/2019/01/30/reader-center/polar-vortex-tips.html

<div id="app"><div class="css-v89234 e3w10z60"><div><div><div class="css-13lpfd6 e1nre7570"><header class="css-1bymuyk e1>

<script>window.__preloadedData = {"initialState":{"Article:QXJ0aWNsZTpueXQ6Ly9hcnRpY2xlLzBhODc0MTcxLWM0MjEtNWRjOS1hN2IzLW>

Web Reading Mode: The non-standard rendering mode

Web Reading Mode: A bad reading experience

edited 8 hours ago

answered 10 hours ago

Thomas Dickey

52.7k596170

The comment about "navigation content" is addressed by the -nolist option, e.g.,

lynx -nolist -dump www.google.com > file.txt

which shows no links, etc:

$ lynx -nolist -dump www.google.com > file.txt

$ cat file.txt 



   Search Images Maps Play YouTube News Gmail Drive More »

   Web History | Settings | Sign in



   Google



     _______________________________________________________

     Google Search  I'm Feeling Lucky                          Advanced search

                                                               Language tools



   Advertising Programs       Business  Solutions       +Google     About

   Google



                         © 2019 - Privacy - Terms

w3m gives something similar, without the option:

$ w3m -dump https://www.google.com

Search Images Maps Play YouTube News Gmail Drive More >>

Web History | Settings | Sign in



                                    Google



           [                                                         ] Advanced

                                                                       searchLanguage

                       [Google Search][I'm Feeling Lucky]              tools



           Advertising ProgramsBusiness Solutions+GoogleAbout Google



                          (C) 2019 - Privacy - Terms

links2 output looks much like w3m's (noting the missing space before About):

$ links2 -dump www.google.com                                          

   Search Images Maps Play YouTube News Gmail Drive More >>========(97,1) 31% ==

   Web History | Settings | Sign in                                             

                                     Google



    __________________________________________________________    Advanced       

              [ Google Search ] [ I'm Feeling Lucky ]             searchLanguage 

                                                                  tools          



           Advertising ProgramsBusiness Solutions+GoogleAbout Google



                           (c) 2019 - Privacy - Terms



$ links2 -dump www.google.com >file.txt 

$ cat file.txt 

   Search Images Maps Play YouTube News Gmail Drive More >>

   Web History | Settings | Sign in

                                     Google



    __________________________________________________________    Advanced       

              [ Google Search ] [ I'm Feeling Lucky ]             searchLanguage 

                                                                  tools          



           Advertising ProgramsBusiness Solutions+GoogleAbout Google



                           (c) 2019 - Privacy - Terms

(oddly enough, it also prints progress if the dump goes directly to the terminal—not a good feature)
and elinks apparently only dumps the format with "navigation content" (ymmv).



      Size    Buffer name          Contents

      ------- -------------------- ----------------------------------------------------------------------------------------

   0# 267624  [!lynx -source ht-1] !lynx -source https://www.nytimes.com/2019/01/30/reader-center/polar-vortex-tips.html

   1  5475    [!lynx -dump -nolis] !lynx -dump -nolist https://www.nytimes.com/2019/01/30/reader-center/polar-vortex-tips.html

<div id="app"><div class="css-v89234 e3w10z60"><div><div><div class="css-13lpfd6 e1nre7570"><header class="css-1bymuyk e1>

<script>window.__preloadedData = {"initialState":{"Article:QXJ0aWNsZTpueXQ6Ly9hcnRpY2xlLzBhODc0MTcxLWM0MjEtNWRjOS1hN2IzLW>

Web Reading Mode: The non-standard rendering mode

Web Reading Mode: A bad reading experience

edited 8 hours ago

answered 10 hours ago

Thomas Dickey

52.7k596170

edited 8 hours ago

answered 10 hours ago

Thomas Dickey

52.7k596170

answered 10 hours ago

Thomas Dickey

52.7k596170

answered 10 hours ago

Thomas Dickey

52.7k596170

Thanks for sharing! I've updated the question a bit to point out that the target for such use is article pages, where the objective is to extract the article itself. Have a look at the posting again and see if you can help.

– forthrin
9 hours ago

add a comment |

Thanks for sharing! I've updated the question a bit to point out that the target for such use is article pages, where the objective is to extract the article itself. Have a look at the posting again and see if you can help.

– forthrin
9 hours ago

Thanks for sharing! I've updated the question a bit to point out that the target for such use is article pages, where the objective is to extract the article itself. Have a look at the posting again and see if you can help.

– forthrin
9 hours ago

add a comment |

-1

Does this satisfy your requirement? (From https://stackoverflow.com/questions/12422289/bash-command-to-convert-html-page-to-a-text-file )

lynx --dump www.google.com > file.txt

answered 18 hours ago

VBB

992

New contributor

1

Nope. This dumps a ton of navigation links, eg. lynx -dump https://www.nytimes.com/2019/01/30/reader-center/polar-vortex-tips.html A solution should strip away all navigation and fluff, and leave ONLY the MAIN CONTENT, like Reader Mode in a browser does.

– forthrin
16 hours ago

add a comment |

-1

Does this satisfy your requirement? (From https://stackoverflow.com/questions/12422289/bash-command-to-convert-html-page-to-a-text-file )

lynx --dump www.google.com > file.txt

answered 18 hours ago

VBB

992

New contributor

1

Nope. This dumps a ton of navigation links, eg. lynx -dump https://www.nytimes.com/2019/01/30/reader-center/polar-vortex-tips.html A solution should strip away all navigation and fluff, and leave ONLY the MAIN CONTENT, like Reader Mode in a browser does.

– forthrin
16 hours ago

add a comment |

-1

Does this satisfy your requirement? (From https://stackoverflow.com/questions/12422289/bash-command-to-convert-html-page-to-a-text-file )

lynx --dump www.google.com > file.txt

answered 18 hours ago

VBB

992

New contributor

Does this satisfy your requirement? (From https://stackoverflow.com/questions/12422289/bash-command-to-convert-html-page-to-a-text-file )

lynx --dump www.google.com > file.txt

answered 18 hours ago

VBB

992

New contributor

answered 18 hours ago

VBB

992

New contributor

answered 18 hours ago

VBB

992

answered 18 hours ago

VBB

992

New contributor

VBB is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

1

Nope. This dumps a ton of navigation links, eg. lynx -dump https://www.nytimes.com/2019/01/30/reader-center/polar-vortex-tips.html A solution should strip away all navigation and fluff, and leave ONLY the MAIN CONTENT, like Reader Mode in a browser does.

– forthrin
16 hours ago

add a comment |

1

Nope. This dumps a ton of navigation links, eg. lynx -dump https://www.nytimes.com/2019/01/30/reader-center/polar-vortex-tips.html A solution should strip away all navigation and fluff, and leave ONLY the MAIN CONTENT, like Reader Mode in a browser does.

– forthrin
16 hours ago

Nope. This dumps a ton of navigation links, eg. lynx -dump https://www.nytimes.com/2019/01/30/reader-center/polar-vortex-tips.html A solution should strip away all navigation and fluff, and leave ONLY the MAIN CONTENT, like Reader Mode in a browser does.

– forthrin
16 hours ago

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Unix & Linux Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Yrurtj