The main issue with your perl line is your quoting; you used single quotes, which means you don’t have to escape the Perl variables, but it also means the shell variable ${tag} will be interpreted by Perl (where it’s empty) not the shell. You can access shell variables more easily from Perl by either passing them as arguments or environment variables. You also didn’t use the -i switch for in-place editing, so you just printed the changes to STDOUT.
With ojo installed, you can do this with a proper HTML parser and thus not be susceptible to edge cases:
env tag=$tag perl -0777 -pi -CS -Mojo -e '$_ = x($_);
$_->find($ENV{tag})->each(sub {
$_->content($_->content =~ s/A *//r =~ s/ *z//r);
my ($p, $n) = ($_->previous_node, $_->next_node);
$p->content($p->content =~ s/ *z//r) if defined $p and ($p->type eq "text" or $p->type eq "raw");
$n->content($n->content =~ s/A *//r) if defined $n and ($n->type eq "text" or $n->type eq "raw");
})' tmp_html
The -0777 switch ensures the file will be operated on in one step rather than by line, -pi wraps the code in a loop which will assign the input to $_ and then update that file in-place with the resulting value of $_, and -CS ensures it will be decoded from UTF-8 to parse and encoded back after.
The x function from ojo creates a Mojo::DOM object, which can then find every instance of the requested tag and operate on it (which includes its contents and closing tag).
The substitution operations: s/A *//r and s/ *z//r remove all space characters from the beginning or end of the string respectively, and return the modified string (/r prevents it from operating in place, so you can use this with Mojo::DOM’s content method). To instead remove any whitespace characters (including newlines), use s/As*//r and s/s*z//r.
Solution 2 :
After some communication with OP I hope that I properly understood the problem.
HTML tags is stored in separate file one per line (tag_file.txt), in separate file we have HTML webpage code (file.html).
The code should strip spaces in HTML webpage code (file.html) around tags [opening,closing] specified in tag file (tag_file.txt).
NOTE: processing done with perl script without shell’s assistance (shortens processing time)
use strict;
use warnings;
use feature 'say';
my $tag_file = 'tag_file.txt';
my $html_file = 'file.html';
open my $fh_tag, '<', $tag_file # open tag file
or die "Couldn't open $tag_file: $!";
my @tags = <$fh_tag>; # read tags into array
chomp @tags; # remove eol from tag lines
close $fh_tag; # close tag file
open my $fh_html, '<', $html_file # open html file
or die "Couldn't open $html_file: $!";
my $html = do { local $/; <$fh_html> }; # read whole file into variable
close $fh_html; # close html file
# now make substitution for each read tag
for my $tag (@tags) { $html =~ s!s*(</?$tags*.*?>)s*!$1!g; }
say $html;
Content of tag_file.txt
html
head
body
section
Problem :
I’m looking to write a shell script that minify less html files, but I’m having a problem.
I would like to delete the space on each side of a specific html tag, these tags being read from a file. With “perl”, I can’t do it, nothing happens, with sed in 2 commands I almost get what I want. In the example below, the space between some tags is removed, but not all, at the level of the “section” tags there is a problem, “h2” too, however the pattern matches …
for tag in $tag_file ; do
# perl -e '$comHtml=<>; $comHtml=~s/ *(<${tag} *.* *>) */1/g; print $comHtml' < tmp_html
sed -i -r -e "s: *(<${tag} *.* *>) *:1:gI" ./tmp_html
sed -i -r -e "s: *(</${tag} *.* *>) *:1:gI" ./tmp_html
done
here, $tag_file contains the specific tag got from a file, for example $tag_file = html n head n section n …
Your question is not very clearly conveyed for understanding. I try to understand what is value of
Comment posted by Polar Bear
Your question would make sense if you use embedded system with scarce resources. Otherwise nowadays web server can be configured to
Comment posted by Antholife
I specified that the $ tag_files variable contains the name of the specific tags obtained to tag_file = $ (cat $ VAR) $ VAR being a path to a file where the tags are written like this: html section … , tag will therefore take value of 1 “html” etc .. so I want to remove the spaces around the tags
Comment posted by Antholife
I’m just trying to do something simple in my spare time
Comment posted by Polar Bear
@Antolife — please edit your code and add a piece demonstrating how you obtain content
Comment posted by Antholife
I’m new to perl; I therefore find it difficult to understand everything .. for the output stdout, I know I wanted to test in the terminal, could you tell me if possible with sed, or a simpler perl command, if there is. outside the perl, your explanation with env is clear I think I understand this point ; Thank you
Comment posted by Grinnz
If you remove
Comment posted by Antholife
its ok ; and with sed ?
Comment posted by Grinnz
I don’t know sed, but I’ve added some more explanation.
Comment posted by extremely complicated to do correctly
I would not try it, because operating on HTML with straight regexes is