stay my Rust The first lesson , I wrote a program, right fasta Medium ATCG Count . Back , I thought of a very common requirement , Read the file , The statistical number of rows , Be similar to wc -l
Here is the first version of the code I wrote , I named it myRead.rs
use std::io::BufReader;
use std::fs::File;
use std::env;
use std::io::BufRead;
fn main() -> std::io::Result<()> {
let args: Vec<String> = env::args().collect();
let filename = &args[1];
let f = File::open(filename)?;
let reader = BufReader::new(f);
let mut line_num = 0;
for _line in reader.lines() {
line_num += 1
}
println!("{}", line_num);
Ok(())
}
And then use rustc Compile
rustc myRead.rs
Then I use a record cdna Of fasta file ( about 300MB), To test , It takes about 5.6 second
time ./myRead Homo_sapiens.GRCh38.cdna.all.fa
# 5.50s user 0.13s system 99% cpu 5.647 total
Again , I have written 5 That's ok python Compare scripts
import sys
count = 0
for line in open(sys.argv[1]):
count += 1
print(f"line number {count}")
python The code is less than 1 Seconds to complete the task
time python ./read_file.py Homo_sapiens.GRCh38.cdna.all.fa
# 0.75s user 0.16s system 99% cpu 0.904 total
Seeing this result, I was directly shocked . Rust It runs faster than Python slow 6 About times . After intensive Retrieval , I finally found a reliable answer , BufReader 100x slower than Python — am I doing something wrong?
To sum up, the reason is
rustc
Need to set up
-C opt-level=2
Or equivalent
-O
, about cargo It's settings
--release
.lines()
Memory will be reallocated for each row , So it's not just dealing with UTF-8 The problem of .
According to the first suggestion , I'll use it again rustc Compiled the code , The speed directly exceeds Python
rustc -O ./myRead.rs
time ./myRead Homo_sapiens.GRCh38.cdna.all.fa
# 0.51s user 0.12s system 72% cpu 0.874 total
According to the second suggestion , Rewrite the following code
use std::fs::File;
use std::env;
use std::io::Read;
fn main() {
let args: Vec<String> = env::args().collect();
let filename = &args[1];
let mut file = File::open(filename).unwrap();
let mut lines = 0;
let mut buf = [0u8; 4096*32];
while let Ok(num_bytes) = file.read(&mut buf) {
if num_bytes == 0 { break; }
lines += buf[..num_bytes].iter().filter(|&&byte| byte == b'\n').count();
}
println!("line number {}", lines);
}
Use rustc -O
After compiling , The running speed has increased again 2 times .
actually , For me Rust beginner , Just remember Rust The program should be compiled 「 Set optimization parameters 」, Further optimize the code , For a while and a half, I still can't understand .