Command line interface
How to use adaparse CLI
The adaparse command tool takes URL strings (ASCII/UTF-8) and it validates, normalizes and queries them efficiently.
Options
-d
,--diagram
: Print a diagram of the result-u
,--url
: URL Parameter (required)-h
,--help
: Print usage-g
,--get
: Get a specific part of the URL (e.g., 'origin', 'host', etc. as mentioned in the examples above)-b
,--benchmark
: Run benchmark for piped file functions-p
,--path
: Process all the URLs in a given file-o
,--output
: Output the results of the parsing to a file
Usage/Examples
Well-formatted URL
adaparse "http://www.google.com"
adaparse "http://www.google.com"
Output:
http://www.google.com
http://www.google.com
Diagram
adaparse -d http://www.google.com/bal\?a\=\=11\#fddfds
adaparse -d http://www.google.com/bal\?a\=\=11\#fddfds
Output:
http://www.google.com/bal?a==11#fddfds [38 bytes]
| | | | |
| | | | `------ hash_start
| | | `------------ search_start 25
| | `---------------- pathname_start 21
| | `---------------- host_end 21
| `------------------------------ host_start 7
| `------------------------------ username_end 7
`-------------------------------- protocol_end 5
http://www.google.com/bal?a==11#fddfds [38 bytes]
| | | | |
| | | | `------ hash_start
| | | `------------ search_start 25
| | `---------------- pathname_start 21
| | `---------------- host_end 21
| `------------------------------ host_start 7
| `------------------------------ username_end 7
`-------------------------------- protocol_end 5
Pipe Operator
Ada can process URLs from piped input, making it easy to integrate with other command-line tools
that produce ASCII or UTF-8 outputs. Here's an example of how to pipe the output of another command into Ada.
Given a list of URLs, one by line, we may query the normalized URL string (href
) and detect any malformed URL:
cat dragonball_url.txt | adaparse --get href
cat dragonball_url.txt | adaparse --get href
Output:
http://www.goku.com
http://www.vegeta.com
http://www.gohan.com
http://www.goku.com
http://www.vegeta.com
http://www.gohan.com
Our tool supports the passing of arguments to each URL in said file so that you can query for the hash, the host, the protocol, the port, the origin, the search, the password, the username, the pathname or the hostname:
cat dragonball_url.txt | adaparse -g host
cat dragonball_url.txt | adaparse -g host
Output:
www.goku.com
www.vegeta.com
www.gohan.com
www.goku.com
www.vegeta.com
www.gohan.com
If you omit -g
, it will only provide a list of invalid URLs. This might be
useful if you want to valid quickly a list of URLs.
Benchmark Runner
The benchmark flag can be used to output the time it takes to process piped input:
cat wikipedia_100k.txt | adaparse -b
cat wikipedia_100k.txt | adaparse -b
Output:
Invalid URL: 1968:_Die_Kinder_der_Diktatur
Invalid URL: 58957:_The_Bluegrass_Guitar_Collection
Invalid URL: 650luc:_Gangsta_Grillz
Invalid URL: Q4%3A57
Invalid URL: Q10%3A47
Invalid URL: Q5%3A45
Invalid URL: Q40%3A28
Invalid URL: 1:1_scale
Invalid URL: 1893:_A_World's_Fair_Mystery
Invalid URL: 12:51_(Krissy_%26_Ericka_song)
Invalid URL: 111:_A_Nelson_Number
Invalid URL: 7:00AM-8%3A00AM_(24_season_5)
Invalid URL: Q53%3A31
read 5209265 bytes in 32819917 ns using 100000 lines, used 160 loads
0.1587226744053009 GB/s
Invalid URL: 1968:_Die_Kinder_der_Diktatur
Invalid URL: 58957:_The_Bluegrass_Guitar_Collection
Invalid URL: 650luc:_Gangsta_Grillz
Invalid URL: Q4%3A57
Invalid URL: Q10%3A47
Invalid URL: Q5%3A45
Invalid URL: Q40%3A28
Invalid URL: 1:1_scale
Invalid URL: 1893:_A_World's_Fair_Mystery
Invalid URL: 12:51_(Krissy_%26_Ericka_song)
Invalid URL: 111:_A_Nelson_Number
Invalid URL: 7:00AM-8%3A00AM_(24_season_5)
Invalid URL: Q53%3A31
read 5209265 bytes in 32819917 ns using 100000 lines, used 160 loads
0.1587226744053009 GB/s
Saving result to file system
There is an option to output to a file on disk:
cat wikipedia_100k.txt | adaparse -o wiki_output.txt
cat wikipedia_100k.txt | adaparse -o wiki_output.txt
As well as read in from a file on disk without going through cat:
adaparse -p wikipedia_top_100_txt
adaparse -p wikipedia_top_100_txt
Advanced Usage
You may also combine different flags together. E.g. Say one wishes to extract only the host from URLs stored in wikipedia.txt and output it to the test_write.txt file:
adaparse -p wikipedia_top100.txt -o test_write.txt -g host -b
adaparse -p wikipedia_top100.txt -o test_write.txt -g host -b
Output:
read 5209265 bytes in 26737131 ns using 100000 lines, total_bytes is 5209265 used 160 loads
0.19483260937757307 GB/s(base)
read 5209265 bytes in 26737131 ns using 100000 lines, total_bytes is 5209265 used 160 loads
0.19483260937757307 GB/s(base)
Content of test_write.txt:
(---snip---)
en.wikipedia.org
en.wikipedia.org
en.wikipedia.org
en.wikipedia.org
en.wikipedia.org
en.wikipedia.org
en.wikipedia.org
en.wikipedia.org
en.wikipedia.org
en.wikipedia.org
(---snip---)
(---snip---)
en.wikipedia.org
en.wikipedia.org
en.wikipedia.org
en.wikipedia.org
en.wikipedia.org
en.wikipedia.org
en.wikipedia.org
en.wikipedia.org
en.wikipedia.org
en.wikipedia.org
(---snip---)