Solution 1 :

One way to do this would be to get all the elements in the body, iterate over them to get their text content. With jQuery it would look something like this:

$(document).ready(function() {
  
  let content = []
  
  $('body * :not(script)').each((i, el) => {
    content.push($(el).text())
  })
  
  console.log(content)
})
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>

<aside>
  <h1>JS Documentation</h1>
  <ul>
    <li>Introduction</li>
    <li>What you should already know</li>
  </ul>
</aside>
<main>
  <h2>Introduction</h2>
  <p>JavaScript is a cross platform...</p>
</main>

Note: the :not(script) selector will leave out any <script> tags (if present) in the <body> of the document.

Tip: If you need to get rid of line breaks whitespace you can use something like this:

text().trim().replace(/r?n|r/g, '')

Solution 2 :

Consider your Selector, I think your scope is grabbing too many elements. Look at the following.

$(function() {
  var words = [];
  $("body").children().not("script").each(function(i, el) {
    words.push($(el).text().trim());
  });
  console.log(words);
});
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<h1>Introduction</h1>
<p>This is a paragraph. </p>
<div class="footer">02.12.2020</div>

This will iterate all the child elements of the Body tag. It will read the Text of each element and enter it into it’s own entry in the array. Your result would be like:

[
  "Introduction",
  "This is a paragraph.",
  "02.12.2020"
]

Problem :

I have been trying to extract the full-text content from the HTML document for computation and I was able to find the solution for that in jquery but it’s quite partial…
The output is as expected for the following code:

$(document).ready(function(){ 
    console.log($("*").text())
})

This is the output I was talking about.
I want to store the content in the console in a variable. When I tried doing something like

var words = []
$(document).ready(function(){ 
    words.push($("*").text())
})
console.log(words)

it returns undefined.
I came to know that it is because of the async of the callback. How do I approach this issue. Thanks in advance.

Comments

Comment posted by freedomn-m

document.ready

Comment posted by freedomn-m

var words = [];console.log(words)

Comment posted by Juan Marco

Do you need the entire structure of the HTML document? Or you’re looking for all the text inside the webpage?

Comment posted by K.R. Vijayalakshmy

@JuanMarco I’m looking for all the text inside the webpage.

Comment posted by K.R. Vijayalakshmy

@freedomn-m I did try including the log function inside the callback and it works fine. Will I be able to use the variable in an async function?

Comment posted by K.R. Vijayalakshmy

Thank you for the solution!!! Since both the answers are similar I’ll go with either of the ones. Thanks again!!

Comment posted by K.R. Vijayalakshmy

But still, you’re printing them inside the callback. Will I be able to access it beyond the scope by parametrizing it?

Comment posted by K.R. Vijayalakshmy

Yes, I am able to, just wanted to clarify. Thank you.

Comment posted by choose the answer that best works for you

Glad that that worked for you! Don’t forget to

Comment posted by K.R. Vijayalakshmy

Thank you for the solution. It’s simple and easily understandable. But can you tell what

Comment posted by K.R. Vijayalakshmy

Will I be able to access the words list beyond the callback by parametrizing it?

Comment posted by stackoverflow.com/questions/8396407/…

@K.R.Vijayalakshmy please read:

Comment posted by Twisty

@K.R.Vijayalakshmy as the variable

Comment posted by K.R. Vijayalakshmy

Yes, I am able to do so. Thank you. I just wanted to clarify.

By