One way to do this would be to get all the elements in the body, iterate over them to get their text content. With jQuery it would look something like this:
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<aside>
<h1>JS Documentation</h1>
<ul>
<li>Introduction</li>
<li>What you should already know</li>
</ul>
</aside>
<main>
<h2>Introduction</h2>
<p>JavaScript is a cross platform...</p>
</main>
Note: the :not(script) selector will leave out any <script> tags (if present) in the <body> of the document.
Tip: If you need to get rid of line breaks whitespace you can use something like this:
text().trim().replace(/r?n|r/g, '')
Solution 2 :
Consider your Selector, I think your scope is grabbing too many elements. Look at the following.
$(function() {
var words = [];
$("body").children().not("script").each(function(i, el) {
words.push($(el).text().trim());
});
console.log(words);
});
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<h1>Introduction</h1>
<p>This is a paragraph. </p>
<div class="footer">02.12.2020</div>
This will iterate all the child elements of the Body tag. It will read the Text of each element and enter it into it’s own entry in the array. Your result would be like:
[
"Introduction",
"This is a paragraph.",
"02.12.2020"
]
Problem :
I have been trying to extract the full-text content from the HTML document for computation and I was able to find the solution for that in jquery but it’s quite partial…
The output is as expected for the following code: