Sort files into folders, depending on filetype
I have about 2.8TB (yes, terabytes) of data I have recovered, this will be scanned for duplicates, the machine these files reside on is quite old and only has 2GB of memory (works fine for LVM, however), so doing the duplicate scan on it is asking for pain.
My question is this, how can I get Debian to move files into a folder with that filetype, rename automatically where needed without needing to specify a list of filetypes.
I have around 800GB of space free on it, so I can do some testing before letting this run loose on my data.
shell-script files file-types
bumped to the homepage by Community♦ 4 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
add a comment |
I have about 2.8TB (yes, terabytes) of data I have recovered, this will be scanned for duplicates, the machine these files reside on is quite old and only has 2GB of memory (works fine for LVM, however), so doing the duplicate scan on it is asking for pain.
My question is this, how can I get Debian to move files into a folder with that filetype, rename automatically where needed without needing to specify a list of filetypes.
I have around 800GB of space free on it, so I can do some testing before letting this run loose on my data.
shell-script files file-types
bumped to the homepage by Community♦ 4 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
What are you meaning by "filetype". Do you mean "extension" (eg.txt
)? Or do you mean the results of thefile
command? Or...?
– Stephen Harris
Jun 23 '16 at 22:20
Yes, as in *.jpg, *.mp3 and so on.
– MrMe01
Jun 23 '16 at 22:29
If you're sure that your duplicates are exactly the same file then you should compute checksums (e.g.sha1sum
) and compare those, no need to sort the files into separate directories.
– grochmal
Jun 24 '16 at 0:20
That's a good idea, except the machine is slow, with low resources. I have two machines on this task, the slow Debian machine is just the LVM store, files are coming into the machine via an SMB share, the files being pulled off a USB drive, plugged into a fast(er) Windows 10 laptop. I'd do more in Windows, but getting some programs to see the share as an attached disk for scanning/indexing is a problem, even with it showing as a shortcut or a mapped drive. Getdataback is writing to it via a shortcut in the root of c:
– MrMe01
Jun 24 '16 at 0:26
Another trick you might consider is to look for a python script called "hardlink.py" or something like that. It will search directories for identical files and make them into hard links of each other, saving space.
– Edward Falk
Jun 24 '16 at 2:01
add a comment |
I have about 2.8TB (yes, terabytes) of data I have recovered, this will be scanned for duplicates, the machine these files reside on is quite old and only has 2GB of memory (works fine for LVM, however), so doing the duplicate scan on it is asking for pain.
My question is this, how can I get Debian to move files into a folder with that filetype, rename automatically where needed without needing to specify a list of filetypes.
I have around 800GB of space free on it, so I can do some testing before letting this run loose on my data.
shell-script files file-types
I have about 2.8TB (yes, terabytes) of data I have recovered, this will be scanned for duplicates, the machine these files reside on is quite old and only has 2GB of memory (works fine for LVM, however), so doing the duplicate scan on it is asking for pain.
My question is this, how can I get Debian to move files into a folder with that filetype, rename automatically where needed without needing to specify a list of filetypes.
I have around 800GB of space free on it, so I can do some testing before letting this run loose on my data.
shell-script files file-types
shell-script files file-types
edited Jun 23 '16 at 22:59
Gilles
529k12810601586
529k12810601586
asked Jun 23 '16 at 22:13
MrMe01
246
246
bumped to the homepage by Community♦ 4 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
bumped to the homepage by Community♦ 4 mins ago
This question has answers that may be good or bad; the system has marked it active so that they can be reviewed.
What are you meaning by "filetype". Do you mean "extension" (eg.txt
)? Or do you mean the results of thefile
command? Or...?
– Stephen Harris
Jun 23 '16 at 22:20
Yes, as in *.jpg, *.mp3 and so on.
– MrMe01
Jun 23 '16 at 22:29
If you're sure that your duplicates are exactly the same file then you should compute checksums (e.g.sha1sum
) and compare those, no need to sort the files into separate directories.
– grochmal
Jun 24 '16 at 0:20
That's a good idea, except the machine is slow, with low resources. I have two machines on this task, the slow Debian machine is just the LVM store, files are coming into the machine via an SMB share, the files being pulled off a USB drive, plugged into a fast(er) Windows 10 laptop. I'd do more in Windows, but getting some programs to see the share as an attached disk for scanning/indexing is a problem, even with it showing as a shortcut or a mapped drive. Getdataback is writing to it via a shortcut in the root of c:
– MrMe01
Jun 24 '16 at 0:26
Another trick you might consider is to look for a python script called "hardlink.py" or something like that. It will search directories for identical files and make them into hard links of each other, saving space.
– Edward Falk
Jun 24 '16 at 2:01
add a comment |
What are you meaning by "filetype". Do you mean "extension" (eg.txt
)? Or do you mean the results of thefile
command? Or...?
– Stephen Harris
Jun 23 '16 at 22:20
Yes, as in *.jpg, *.mp3 and so on.
– MrMe01
Jun 23 '16 at 22:29
If you're sure that your duplicates are exactly the same file then you should compute checksums (e.g.sha1sum
) and compare those, no need to sort the files into separate directories.
– grochmal
Jun 24 '16 at 0:20
That's a good idea, except the machine is slow, with low resources. I have two machines on this task, the slow Debian machine is just the LVM store, files are coming into the machine via an SMB share, the files being pulled off a USB drive, plugged into a fast(er) Windows 10 laptop. I'd do more in Windows, but getting some programs to see the share as an attached disk for scanning/indexing is a problem, even with it showing as a shortcut or a mapped drive. Getdataback is writing to it via a shortcut in the root of c:
– MrMe01
Jun 24 '16 at 0:26
Another trick you might consider is to look for a python script called "hardlink.py" or something like that. It will search directories for identical files and make them into hard links of each other, saving space.
– Edward Falk
Jun 24 '16 at 2:01
What are you meaning by "filetype". Do you mean "extension" (eg
.txt
)? Or do you mean the results of the file
command? Or...?– Stephen Harris
Jun 23 '16 at 22:20
What are you meaning by "filetype". Do you mean "extension" (eg
.txt
)? Or do you mean the results of the file
command? Or...?– Stephen Harris
Jun 23 '16 at 22:20
Yes, as in *.jpg, *.mp3 and so on.
– MrMe01
Jun 23 '16 at 22:29
Yes, as in *.jpg, *.mp3 and so on.
– MrMe01
Jun 23 '16 at 22:29
If you're sure that your duplicates are exactly the same file then you should compute checksums (e.g.
sha1sum
) and compare those, no need to sort the files into separate directories.– grochmal
Jun 24 '16 at 0:20
If you're sure that your duplicates are exactly the same file then you should compute checksums (e.g.
sha1sum
) and compare those, no need to sort the files into separate directories.– grochmal
Jun 24 '16 at 0:20
That's a good idea, except the machine is slow, with low resources. I have two machines on this task, the slow Debian machine is just the LVM store, files are coming into the machine via an SMB share, the files being pulled off a USB drive, plugged into a fast(er) Windows 10 laptop. I'd do more in Windows, but getting some programs to see the share as an attached disk for scanning/indexing is a problem, even with it showing as a shortcut or a mapped drive. Getdataback is writing to it via a shortcut in the root of c:
– MrMe01
Jun 24 '16 at 0:26
That's a good idea, except the machine is slow, with low resources. I have two machines on this task, the slow Debian machine is just the LVM store, files are coming into the machine via an SMB share, the files being pulled off a USB drive, plugged into a fast(er) Windows 10 laptop. I'd do more in Windows, but getting some programs to see the share as an attached disk for scanning/indexing is a problem, even with it showing as a shortcut or a mapped drive. Getdataback is writing to it via a shortcut in the root of c:
– MrMe01
Jun 24 '16 at 0:26
Another trick you might consider is to look for a python script called "hardlink.py" or something like that. It will search directories for identical files and make them into hard links of each other, saving space.
– Edward Falk
Jun 24 '16 at 2:01
Another trick you might consider is to look for a python script called "hardlink.py" or something like that. It will search directories for identical files and make them into hard links of each other, saving space.
– Edward Falk
Jun 24 '16 at 2:01
add a comment |
2 Answers
2
active
oldest
votes
With a directory that looks like
$ ls
another.doc file.txt file1.mp3 myfile.txt
We can build a list of file extensions with this command:
$ exts=$(ls | sed 's/^.*.//' | sort -u)
We can then loop through these extensions moving files into subdirectories:
$ for ext in $exts
> do
> echo Processing $ext
> mkdir $ext
> mv -v *.$ext $ext/
> done
When this is run we get the following output:
Processing doc
'another.doc' -> 'doc/another.doc'
Processing mp3
'file1.mp3' -> 'mp3/file1.mp3'
Processing txt
'file.txt' -> 'txt/file.txt'
'myfile.txt' -> 'txt/myfile.txt'
The result:
$ ls
doc/ mp3/ txt/
$ ls *
doc:
another.doc
mp3:
file1.mp3
txt:
file.txt myfile.txt
Am I correct in assuming "$ exts=$(ls | sed 's/^.*.//' | sort -u)" will go through deeply nested folders? I assume both cp and mv will work, for both testing and the actual move when I'm satisfied this will work? I'm also assuming this is .sh (shell) scripting? (please excuse the incorrect use of tags)
– MrMe01
Jun 23 '16 at 22:47
This only works for a single directory. You need to be a lot more careful if you have subdirectories because you may havedir1/file.txt
anddir2/file.txt
and you don't want them overwriting each other. You can save this script assplitdir.sh
in your home directory and then manually call it for each directory (if you have a small number) bycd
ing into it and running this script. If you have a lot of directories you can create a second script that goes through each directory and then runs this script once per directory.
– Stephen Harris
Jun 23 '16 at 22:54
1
This is needlessly complicated and breaks on file names containing whitespace and other special characters. At least double quote your variable expansions!
– Gilles
Jun 23 '16 at 23:00
How can I get it to rename files that have the same name? I don't want to babysit this. Getdataback reports there's 955684 folders in this recovery, 4400795 files.
– MrMe01
Jun 23 '16 at 23:00
1
So you have different files, in different folders, with the same names? And you're ok with having some of those files wind up with new names? This just got a lot more complicated, and you probably need to write a proper Python or bash script to do it. Given that your requirements are probably somewhat complicated at this point, I don't think you're going to get anybody here to write it for you. If you want to write your own, and come back if you get stuck, you might get some help.
– Edward Falk
Jun 24 '16 at 2:06
|
show 1 more comment
I wrapped Stephen's code in a script and slightly improved the pipe.
#!/bin/bash
set -e
set -u
set -o pipefail
start=$SECONDS
exts=$(ls -dp *.*| grep -v / | sed 's/^.*.//' | sort -u) # not folders
ignore=""
while getopts ':f::i:h' flag; do
case "$flag" in
h)
echo "This script sorts files from the current dir into folders of the same file type. Specific file types can be specified using -f."
echo "flags:"
echo '-f (string file types to sort e.g. -f "pdf csv mp3")'
echo '-i (string file types to ignore e.g. -i "pdf")'
exit 1
;;
f)
exts=$OPTARG;;
i)
ignore=$OPTARG;;
:)
echo "Missing option argument for -$OPTARG" >&2;
exit 1;;
?)
echo "Invalid option: -$OPTARG" >&2
exit 1
;;
esac
done
for ext in $exts
do
if [[ " ${ignore} " == *" ${ext} "* ]]; then
echo "Skiping ${ext}"
continue
fi
echo Processing "$ext"
mkdir -p "$ext"
mv -vn *."$ext" "$ext"/
done
duration=$(( SECONDS - start ))
echo "--- Completed in $duration seconds ---"
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "106"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f291732%2fsort-files-into-folders-depending-on-filetype%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
With a directory that looks like
$ ls
another.doc file.txt file1.mp3 myfile.txt
We can build a list of file extensions with this command:
$ exts=$(ls | sed 's/^.*.//' | sort -u)
We can then loop through these extensions moving files into subdirectories:
$ for ext in $exts
> do
> echo Processing $ext
> mkdir $ext
> mv -v *.$ext $ext/
> done
When this is run we get the following output:
Processing doc
'another.doc' -> 'doc/another.doc'
Processing mp3
'file1.mp3' -> 'mp3/file1.mp3'
Processing txt
'file.txt' -> 'txt/file.txt'
'myfile.txt' -> 'txt/myfile.txt'
The result:
$ ls
doc/ mp3/ txt/
$ ls *
doc:
another.doc
mp3:
file1.mp3
txt:
file.txt myfile.txt
Am I correct in assuming "$ exts=$(ls | sed 's/^.*.//' | sort -u)" will go through deeply nested folders? I assume both cp and mv will work, for both testing and the actual move when I'm satisfied this will work? I'm also assuming this is .sh (shell) scripting? (please excuse the incorrect use of tags)
– MrMe01
Jun 23 '16 at 22:47
This only works for a single directory. You need to be a lot more careful if you have subdirectories because you may havedir1/file.txt
anddir2/file.txt
and you don't want them overwriting each other. You can save this script assplitdir.sh
in your home directory and then manually call it for each directory (if you have a small number) bycd
ing into it and running this script. If you have a lot of directories you can create a second script that goes through each directory and then runs this script once per directory.
– Stephen Harris
Jun 23 '16 at 22:54
1
This is needlessly complicated and breaks on file names containing whitespace and other special characters. At least double quote your variable expansions!
– Gilles
Jun 23 '16 at 23:00
How can I get it to rename files that have the same name? I don't want to babysit this. Getdataback reports there's 955684 folders in this recovery, 4400795 files.
– MrMe01
Jun 23 '16 at 23:00
1
So you have different files, in different folders, with the same names? And you're ok with having some of those files wind up with new names? This just got a lot more complicated, and you probably need to write a proper Python or bash script to do it. Given that your requirements are probably somewhat complicated at this point, I don't think you're going to get anybody here to write it for you. If you want to write your own, and come back if you get stuck, you might get some help.
– Edward Falk
Jun 24 '16 at 2:06
|
show 1 more comment
With a directory that looks like
$ ls
another.doc file.txt file1.mp3 myfile.txt
We can build a list of file extensions with this command:
$ exts=$(ls | sed 's/^.*.//' | sort -u)
We can then loop through these extensions moving files into subdirectories:
$ for ext in $exts
> do
> echo Processing $ext
> mkdir $ext
> mv -v *.$ext $ext/
> done
When this is run we get the following output:
Processing doc
'another.doc' -> 'doc/another.doc'
Processing mp3
'file1.mp3' -> 'mp3/file1.mp3'
Processing txt
'file.txt' -> 'txt/file.txt'
'myfile.txt' -> 'txt/myfile.txt'
The result:
$ ls
doc/ mp3/ txt/
$ ls *
doc:
another.doc
mp3:
file1.mp3
txt:
file.txt myfile.txt
Am I correct in assuming "$ exts=$(ls | sed 's/^.*.//' | sort -u)" will go through deeply nested folders? I assume both cp and mv will work, for both testing and the actual move when I'm satisfied this will work? I'm also assuming this is .sh (shell) scripting? (please excuse the incorrect use of tags)
– MrMe01
Jun 23 '16 at 22:47
This only works for a single directory. You need to be a lot more careful if you have subdirectories because you may havedir1/file.txt
anddir2/file.txt
and you don't want them overwriting each other. You can save this script assplitdir.sh
in your home directory and then manually call it for each directory (if you have a small number) bycd
ing into it and running this script. If you have a lot of directories you can create a second script that goes through each directory and then runs this script once per directory.
– Stephen Harris
Jun 23 '16 at 22:54
1
This is needlessly complicated and breaks on file names containing whitespace and other special characters. At least double quote your variable expansions!
– Gilles
Jun 23 '16 at 23:00
How can I get it to rename files that have the same name? I don't want to babysit this. Getdataback reports there's 955684 folders in this recovery, 4400795 files.
– MrMe01
Jun 23 '16 at 23:00
1
So you have different files, in different folders, with the same names? And you're ok with having some of those files wind up with new names? This just got a lot more complicated, and you probably need to write a proper Python or bash script to do it. Given that your requirements are probably somewhat complicated at this point, I don't think you're going to get anybody here to write it for you. If you want to write your own, and come back if you get stuck, you might get some help.
– Edward Falk
Jun 24 '16 at 2:06
|
show 1 more comment
With a directory that looks like
$ ls
another.doc file.txt file1.mp3 myfile.txt
We can build a list of file extensions with this command:
$ exts=$(ls | sed 's/^.*.//' | sort -u)
We can then loop through these extensions moving files into subdirectories:
$ for ext in $exts
> do
> echo Processing $ext
> mkdir $ext
> mv -v *.$ext $ext/
> done
When this is run we get the following output:
Processing doc
'another.doc' -> 'doc/another.doc'
Processing mp3
'file1.mp3' -> 'mp3/file1.mp3'
Processing txt
'file.txt' -> 'txt/file.txt'
'myfile.txt' -> 'txt/myfile.txt'
The result:
$ ls
doc/ mp3/ txt/
$ ls *
doc:
another.doc
mp3:
file1.mp3
txt:
file.txt myfile.txt
With a directory that looks like
$ ls
another.doc file.txt file1.mp3 myfile.txt
We can build a list of file extensions with this command:
$ exts=$(ls | sed 's/^.*.//' | sort -u)
We can then loop through these extensions moving files into subdirectories:
$ for ext in $exts
> do
> echo Processing $ext
> mkdir $ext
> mv -v *.$ext $ext/
> done
When this is run we get the following output:
Processing doc
'another.doc' -> 'doc/another.doc'
Processing mp3
'file1.mp3' -> 'mp3/file1.mp3'
Processing txt
'file.txt' -> 'txt/file.txt'
'myfile.txt' -> 'txt/myfile.txt'
The result:
$ ls
doc/ mp3/ txt/
$ ls *
doc:
another.doc
mp3:
file1.mp3
txt:
file.txt myfile.txt
answered Jun 23 '16 at 22:36
Stephen Harris
25.2k24477
25.2k24477
Am I correct in assuming "$ exts=$(ls | sed 's/^.*.//' | sort -u)" will go through deeply nested folders? I assume both cp and mv will work, for both testing and the actual move when I'm satisfied this will work? I'm also assuming this is .sh (shell) scripting? (please excuse the incorrect use of tags)
– MrMe01
Jun 23 '16 at 22:47
This only works for a single directory. You need to be a lot more careful if you have subdirectories because you may havedir1/file.txt
anddir2/file.txt
and you don't want them overwriting each other. You can save this script assplitdir.sh
in your home directory and then manually call it for each directory (if you have a small number) bycd
ing into it and running this script. If you have a lot of directories you can create a second script that goes through each directory and then runs this script once per directory.
– Stephen Harris
Jun 23 '16 at 22:54
1
This is needlessly complicated and breaks on file names containing whitespace and other special characters. At least double quote your variable expansions!
– Gilles
Jun 23 '16 at 23:00
How can I get it to rename files that have the same name? I don't want to babysit this. Getdataback reports there's 955684 folders in this recovery, 4400795 files.
– MrMe01
Jun 23 '16 at 23:00
1
So you have different files, in different folders, with the same names? And you're ok with having some of those files wind up with new names? This just got a lot more complicated, and you probably need to write a proper Python or bash script to do it. Given that your requirements are probably somewhat complicated at this point, I don't think you're going to get anybody here to write it for you. If you want to write your own, and come back if you get stuck, you might get some help.
– Edward Falk
Jun 24 '16 at 2:06
|
show 1 more comment
Am I correct in assuming "$ exts=$(ls | sed 's/^.*.//' | sort -u)" will go through deeply nested folders? I assume both cp and mv will work, for both testing and the actual move when I'm satisfied this will work? I'm also assuming this is .sh (shell) scripting? (please excuse the incorrect use of tags)
– MrMe01
Jun 23 '16 at 22:47
This only works for a single directory. You need to be a lot more careful if you have subdirectories because you may havedir1/file.txt
anddir2/file.txt
and you don't want them overwriting each other. You can save this script assplitdir.sh
in your home directory and then manually call it for each directory (if you have a small number) bycd
ing into it and running this script. If you have a lot of directories you can create a second script that goes through each directory and then runs this script once per directory.
– Stephen Harris
Jun 23 '16 at 22:54
1
This is needlessly complicated and breaks on file names containing whitespace and other special characters. At least double quote your variable expansions!
– Gilles
Jun 23 '16 at 23:00
How can I get it to rename files that have the same name? I don't want to babysit this. Getdataback reports there's 955684 folders in this recovery, 4400795 files.
– MrMe01
Jun 23 '16 at 23:00
1
So you have different files, in different folders, with the same names? And you're ok with having some of those files wind up with new names? This just got a lot more complicated, and you probably need to write a proper Python or bash script to do it. Given that your requirements are probably somewhat complicated at this point, I don't think you're going to get anybody here to write it for you. If you want to write your own, and come back if you get stuck, you might get some help.
– Edward Falk
Jun 24 '16 at 2:06
Am I correct in assuming "$ exts=$(ls | sed 's/^.*.//' | sort -u)" will go through deeply nested folders? I assume both cp and mv will work, for both testing and the actual move when I'm satisfied this will work? I'm also assuming this is .sh (shell) scripting? (please excuse the incorrect use of tags)
– MrMe01
Jun 23 '16 at 22:47
Am I correct in assuming "$ exts=$(ls | sed 's/^.*.//' | sort -u)" will go through deeply nested folders? I assume both cp and mv will work, for both testing and the actual move when I'm satisfied this will work? I'm also assuming this is .sh (shell) scripting? (please excuse the incorrect use of tags)
– MrMe01
Jun 23 '16 at 22:47
This only works for a single directory. You need to be a lot more careful if you have subdirectories because you may have
dir1/file.txt
and dir2/file.txt
and you don't want them overwriting each other. You can save this script as splitdir.sh
in your home directory and then manually call it for each directory (if you have a small number) by cd
ing into it and running this script. If you have a lot of directories you can create a second script that goes through each directory and then runs this script once per directory.– Stephen Harris
Jun 23 '16 at 22:54
This only works for a single directory. You need to be a lot more careful if you have subdirectories because you may have
dir1/file.txt
and dir2/file.txt
and you don't want them overwriting each other. You can save this script as splitdir.sh
in your home directory and then manually call it for each directory (if you have a small number) by cd
ing into it and running this script. If you have a lot of directories you can create a second script that goes through each directory and then runs this script once per directory.– Stephen Harris
Jun 23 '16 at 22:54
1
1
This is needlessly complicated and breaks on file names containing whitespace and other special characters. At least double quote your variable expansions!
– Gilles
Jun 23 '16 at 23:00
This is needlessly complicated and breaks on file names containing whitespace and other special characters. At least double quote your variable expansions!
– Gilles
Jun 23 '16 at 23:00
How can I get it to rename files that have the same name? I don't want to babysit this. Getdataback reports there's 955684 folders in this recovery, 4400795 files.
– MrMe01
Jun 23 '16 at 23:00
How can I get it to rename files that have the same name? I don't want to babysit this. Getdataback reports there's 955684 folders in this recovery, 4400795 files.
– MrMe01
Jun 23 '16 at 23:00
1
1
So you have different files, in different folders, with the same names? And you're ok with having some of those files wind up with new names? This just got a lot more complicated, and you probably need to write a proper Python or bash script to do it. Given that your requirements are probably somewhat complicated at this point, I don't think you're going to get anybody here to write it for you. If you want to write your own, and come back if you get stuck, you might get some help.
– Edward Falk
Jun 24 '16 at 2:06
So you have different files, in different folders, with the same names? And you're ok with having some of those files wind up with new names? This just got a lot more complicated, and you probably need to write a proper Python or bash script to do it. Given that your requirements are probably somewhat complicated at this point, I don't think you're going to get anybody here to write it for you. If you want to write your own, and come back if you get stuck, you might get some help.
– Edward Falk
Jun 24 '16 at 2:06
|
show 1 more comment
I wrapped Stephen's code in a script and slightly improved the pipe.
#!/bin/bash
set -e
set -u
set -o pipefail
start=$SECONDS
exts=$(ls -dp *.*| grep -v / | sed 's/^.*.//' | sort -u) # not folders
ignore=""
while getopts ':f::i:h' flag; do
case "$flag" in
h)
echo "This script sorts files from the current dir into folders of the same file type. Specific file types can be specified using -f."
echo "flags:"
echo '-f (string file types to sort e.g. -f "pdf csv mp3")'
echo '-i (string file types to ignore e.g. -i "pdf")'
exit 1
;;
f)
exts=$OPTARG;;
i)
ignore=$OPTARG;;
:)
echo "Missing option argument for -$OPTARG" >&2;
exit 1;;
?)
echo "Invalid option: -$OPTARG" >&2
exit 1
;;
esac
done
for ext in $exts
do
if [[ " ${ignore} " == *" ${ext} "* ]]; then
echo "Skiping ${ext}"
continue
fi
echo Processing "$ext"
mkdir -p "$ext"
mv -vn *."$ext" "$ext"/
done
duration=$(( SECONDS - start ))
echo "--- Completed in $duration seconds ---"
add a comment |
I wrapped Stephen's code in a script and slightly improved the pipe.
#!/bin/bash
set -e
set -u
set -o pipefail
start=$SECONDS
exts=$(ls -dp *.*| grep -v / | sed 's/^.*.//' | sort -u) # not folders
ignore=""
while getopts ':f::i:h' flag; do
case "$flag" in
h)
echo "This script sorts files from the current dir into folders of the same file type. Specific file types can be specified using -f."
echo "flags:"
echo '-f (string file types to sort e.g. -f "pdf csv mp3")'
echo '-i (string file types to ignore e.g. -i "pdf")'
exit 1
;;
f)
exts=$OPTARG;;
i)
ignore=$OPTARG;;
:)
echo "Missing option argument for -$OPTARG" >&2;
exit 1;;
?)
echo "Invalid option: -$OPTARG" >&2
exit 1
;;
esac
done
for ext in $exts
do
if [[ " ${ignore} " == *" ${ext} "* ]]; then
echo "Skiping ${ext}"
continue
fi
echo Processing "$ext"
mkdir -p "$ext"
mv -vn *."$ext" "$ext"/
done
duration=$(( SECONDS - start ))
echo "--- Completed in $duration seconds ---"
add a comment |
I wrapped Stephen's code in a script and slightly improved the pipe.
#!/bin/bash
set -e
set -u
set -o pipefail
start=$SECONDS
exts=$(ls -dp *.*| grep -v / | sed 's/^.*.//' | sort -u) # not folders
ignore=""
while getopts ':f::i:h' flag; do
case "$flag" in
h)
echo "This script sorts files from the current dir into folders of the same file type. Specific file types can be specified using -f."
echo "flags:"
echo '-f (string file types to sort e.g. -f "pdf csv mp3")'
echo '-i (string file types to ignore e.g. -i "pdf")'
exit 1
;;
f)
exts=$OPTARG;;
i)
ignore=$OPTARG;;
:)
echo "Missing option argument for -$OPTARG" >&2;
exit 1;;
?)
echo "Invalid option: -$OPTARG" >&2
exit 1
;;
esac
done
for ext in $exts
do
if [[ " ${ignore} " == *" ${ext} "* ]]; then
echo "Skiping ${ext}"
continue
fi
echo Processing "$ext"
mkdir -p "$ext"
mv -vn *."$ext" "$ext"/
done
duration=$(( SECONDS - start ))
echo "--- Completed in $duration seconds ---"
I wrapped Stephen's code in a script and slightly improved the pipe.
#!/bin/bash
set -e
set -u
set -o pipefail
start=$SECONDS
exts=$(ls -dp *.*| grep -v / | sed 's/^.*.//' | sort -u) # not folders
ignore=""
while getopts ':f::i:h' flag; do
case "$flag" in
h)
echo "This script sorts files from the current dir into folders of the same file type. Specific file types can be specified using -f."
echo "flags:"
echo '-f (string file types to sort e.g. -f "pdf csv mp3")'
echo '-i (string file types to ignore e.g. -i "pdf")'
exit 1
;;
f)
exts=$OPTARG;;
i)
ignore=$OPTARG;;
:)
echo "Missing option argument for -$OPTARG" >&2;
exit 1;;
?)
echo "Invalid option: -$OPTARG" >&2
exit 1
;;
esac
done
for ext in $exts
do
if [[ " ${ignore} " == *" ${ext} "* ]]; then
echo "Skiping ${ext}"
continue
fi
echo Processing "$ext"
mkdir -p "$ext"
mv -vn *."$ext" "$ext"/
done
duration=$(( SECONDS - start ))
echo "--- Completed in $duration seconds ---"
answered May 28 '18 at 7:10
Wytamma Wirth
1
1
add a comment |
add a comment |
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f291732%2fsort-files-into-folders-depending-on-filetype%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
What are you meaning by "filetype". Do you mean "extension" (eg
.txt
)? Or do you mean the results of thefile
command? Or...?– Stephen Harris
Jun 23 '16 at 22:20
Yes, as in *.jpg, *.mp3 and so on.
– MrMe01
Jun 23 '16 at 22:29
If you're sure that your duplicates are exactly the same file then you should compute checksums (e.g.
sha1sum
) and compare those, no need to sort the files into separate directories.– grochmal
Jun 24 '16 at 0:20
That's a good idea, except the machine is slow, with low resources. I have two machines on this task, the slow Debian machine is just the LVM store, files are coming into the machine via an SMB share, the files being pulled off a USB drive, plugged into a fast(er) Windows 10 laptop. I'd do more in Windows, but getting some programs to see the share as an attached disk for scanning/indexing is a problem, even with it showing as a shortcut or a mapped drive. Getdataback is writing to it via a shortcut in the root of c:
– MrMe01
Jun 24 '16 at 0:26
Another trick you might consider is to look for a python script called "hardlink.py" or something like that. It will search directories for identical files and make them into hard links of each other, saving space.
– Edward Falk
Jun 24 '16 at 2:01