EDIT:
Nevermind, I was wrong, session should be handling all cookies, seems like the website rejects requests from borwsers without a User-Agent, just do this:
def login_tokyo(s):
header={'User-Agent':''}
s.headers.update(header)
r = s.get('https://apcis.tmou.org/public/')
str_number = re.findall("<span[^>]+(.*?)</span>", r.text)[0]
...
Orignal Answer:
You are not handling PHPSESSID
cookie, many websites use it track logins serverside, try doing this
def login_tokyo(s):
header={'User-Agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.','Cookies':'PHPSESSID=xxxxxxxxxxxxxxxxxxxxxxxxxxxx'}
s.headers.update(header)
r = s.get('https://apcis.tmou.org/public/')
str_number = re.findall("<span[^>]+(.*?)</span>", r.text)[0]
...
You can also use cookielib (more info in this question), if you do not want to manually handle cookies, though most servers do not care if the sessionid set is already in their database, and would accept any random sessionid.
I have been trying to get through a request where the first page is a mathematical calculus to pass to the main page. This part is solved. However when I try to obtain something else i get the following:
<script>
window.location.reload();
</script>
I have learned this method in a while but only now I am trying it for the first time:
import re
import requests
def login_tokyo(s):
r = s.get('https://apcis.tmou.org/public/')
str_number = re.findall("<span[^>]+(.*?)</span>", r.text)[0]
numbers = re.findall('[0-9]+', str_number)
captcha = int(numbers[0]) + int(numbers[1])
payload = {'captcha': captcha}
r = s.post('https://apcis.tmou.org/public/?action=login', data=payload)
check_text = re.findall('<b>(.*?)</b>', r.text)[0]
print(check_text)
payload1 = {'Param': 0, 'Value': 5797164, 'imo': '', 'callsign': '', 'name': '', 'compimo': 5797164,
'compname': '', 'From': '01.06.2020', 'Till': '31.08.2020', 'authority': 0, 'flag': 0, 'class': 0,
'ro': 0, 'type': 0, 'result': 0, 'insptype': -1, 'sort1': 0, 'sort2': 'DESC', 'sort3': 0,
'sort4': 'DESC'
}
r = s.post('https://apcis.tmou.org/public/?action=getcompanies', data=payload1)
perf_tm = re.findall("<p class=[^>]+(.*?)</p>", r.text)
print(r.text)
print(perf_tm)
if __name__ == '__main__':
with requests.Session() as s:
login_tokyo(s)
The print(check_text)
tells me that I am on the main page but then… nothing. From this specific request, I expected the print(perf_tm)
to get me Medium.
Appreciate all help!
I love the math question captcha on the website, they are just as effective as spell.
btw I would highly recommend using postman/insomnia to manually test requests and checking which headers are required and which are not.
also if you are lazy like me, just copy the request from network tab in your browser as a cURL request and paste it here:
This worked like a charm. Thank you for identifying the issue. I will certainly learn more about it to understand it better.